Shakti LLM Series: Built, Not Borrowed

Shakti LLM Series – Post 2: Built, Not Borrowed

How We Created the Shakti LLMs from Scratch

In a world flooded with fine-tuned forks and renamed checkpoints, Shakti was never meant to be “just another model.” We didn’t clone someone else’s weights or slap a new label on an open-source base.

We built Shakti from the ground up — architecturally, algorithmically, and ethically — to meet the real-world demands of sovereign, enterprise-grade GenAI.

From custom tokenizers and attention mechanisms to our own training stack, quantization pipeline, and evaluation framework, every layer of Shakti was engineered with intent:

To deliver fast, safe, and adaptable AI across domains like healthcare, finance, legal, and document intelligence.

One question keeps coming up in conversations with AI leaders, investors, and developers:

“Is Shakti just a fine-tuned version of LLaMA, Mistral, or Falcon?”

The answer? Absolutely not.

This post breaks down exactly how we built the Shakti models — and more importantly, why taking the harder path matters for enterprises, governments, and global GenAI ecosystems.

Why Not Just Fine-Tune?

Fine-tuning open-source LLMs gives you speed to demo. But it also means:

You’re limited by someone else’s architecture
You inherit biases, inefficiencies, and hallucination patterns
You can’t fully control quantization, deployment, or alignment
You’re locked out of innovation at the core

That’s why we built the Shakti family of Small Language Models (SLMs) from first principles — models optimized for performance, alignment, sovereignty, and real-world readiness.

What We Built Ourselves — Not Borrowed

We didn’t fork someone else’s model and rename it.
We didn’t just sprinkle in Indian languages and call it “local.”
We trained from scratch, aligned for purpose, and optimized for enterprise use.

What Makes Shakti Different?

1. Edge-First, Cloud-Optional

Shakti models — from 100M to 2.5B — are architected for low-power environments, thanks to:

VGQA (Variable Grouped Query Attention)
Sliding window inference
Quantization-readiness

You can run Shakti on-device, on sovereign infrastructure, or scale it in your cloud.

2. Multilingual by Design

Our tokenizer and training data were intentionally designed to support Indian + global languages, not as an afterthought, but as a core capability.

3. Hallucination-Aware Training

We didn’t bolt on safety at the end. We used HALUMON — our in-house evaluation framework — during training and alignment to:

Detect hallucinations
Score factuality, relevance, and risk
Tune prompts and preference datasets accordingly

Explore the Shakti Model Family — Built from Scratch, Tuned for Reality

🔹 Shakti 100M: An ultra-compact edge AI assistant, ideal for voice commands, on-device summarization, and lightweight support bots.
🔹 Shakti 250M: Designed for multilingual SME automation — includes domain data from finance, healthcare, and retail.
🔹 Shakti 500M: A versatile conversational model tuned with DPO — perfect for chatbots, IVR, and customer service agents.
🔹 Shakti 1B VLM: Our entry-level Vision-Language Model for document parsing, OCR tasks, and form understanding.
🔹 Shakti 2.5B (Text-Only): A flagship model — excels in medical, finance, legal tasks with deeper reasoning and summarization.
🔹 Shakti 4B VLM: A full-scale multimodal model for document intelligence, chart/Q&A workflows, and large-scale enterprise document reasoning.

Dive into the Research — What It Teaches About Shakti

Each paper is a deep reflection of our commitment to building, not borrowing:

📘 Shakti 2.5B: LLM for Edge AI: Explains our core models’ architecture — VGQA, SwiGLU, RoPE — demonstrating how a 2.5 B‑parameter model achieves benchmark-level reasoning with quantization-ready design.
📘 Shakti‑VLM: Document Intelligence Models: Walks through our 1B and 4B VLM builds — covering multi-modal attention, 2D positional embedding, OCR integration, and real-world use cases like legal and compliance reading.
📘 Fine-Tuning Shakti 100M–500M for Edge AI: This paper covers our edge-first models — 100M, 250M, and 500M — highlighting efficient architectures, quantization (INT4/INT5/INT8), domain-tuned datasets, and performance benchmarks in healthcare, finance, and legal tasks. It proves that small models can beat larger ones in real-world scenarios.

Live Demo (Interactive Inference)

Try the Shakti 2.5B model live via HuggingFace:

🔗 Try Shakti 2.5B here

Run natural language queries and see how it performs across QA, summarization, and domain-specific tasks — all in real-time.

Coming Up in Post 3…

We’ve answered what Shakti is made of — and what it’s not.

In the next post, we’ll go deep into:

How we trained Shakti
Our dataset strategy, including multilingual and domain-specific pipelines
How SFT, DPO, RLHF, and HALUMON all come together to align Shakti to real-world enterprise needs

Because building a model from scratch is just the start. Making it aligned, safe, and domain-ready is what matters.

0 Comments