Shakti LLM Series – Post 2: Built, Not Borrowed
How We Created the Shakti LLMs from Scratch
In a world flooded with fine-tuned forks and renamed checkpoints, Shakti was never meant to be “just another model.” We didn’t clone someone else’s weights or slap a new label on an open-source base.
We built Shakti from the ground up — architecturally, algorithmically, and ethically — to meet the real-world demands of sovereign, enterprise-grade GenAI.
From custom tokenizers and attention mechanisms to our own training stack, quantization pipeline, and evaluation framework, every layer of Shakti was engineered with intent:
To deliver fast, safe, and adaptable AI across domains like healthcare, finance, legal, and document intelligence.
One question keeps coming up in conversations with AI leaders, investors, and developers:
“Is Shakti just a fine-tuned version of LLaMA, Mistral, or Falcon?”
The answer? Absolutely not.
This post breaks down exactly how we built the Shakti models — and more importantly, why taking the harder path matters for enterprises, governments, and global GenAI ecosystems.
Why Not Just Fine-Tune?
Fine-tuning open-source LLMs gives you speed to demo. But it also means:
- You’re limited by someone else’s architecture
- You inherit biases, inefficiencies, and hallucination patterns
- You can’t fully control quantization, deployment, or alignment
- You’re locked out of innovation at the core
That’s why we built the Shakti family of Small Language Models (SLMs) from first principles — models optimized for performance, alignment, sovereignty, and real-world readiness.
What We Built Ourselves — Not Borrowed
- We didn’t fork someone else’s model and rename it.
- We didn’t just sprinkle in Indian languages and call it “local.”
- We trained from scratch, aligned for purpose, and optimized for enterprise use.
What Makes Shakti Different?
1. Edge-First, Cloud-Optional
Shakti models — from 100M to 2.5B — are architected for low-power environments, thanks to:
- VGQA (Variable Grouped Query Attention)
- Sliding window inference
- Quantization-readiness
You can run Shakti on-device, on sovereign infrastructure, or scale it in your cloud.
2. Multilingual by Design
Our tokenizer and training data were intentionally designed to support Indian + global languages, not as an afterthought, but as a core capability.
3. Hallucination-Aware Training
We didn’t bolt on safety at the end. We used HALUMON — our in-house evaluation framework — during training and alignment to:
- Detect hallucinations
- Score factuality, relevance, and risk
- Tune prompts and preference datasets accordingly
Explore the Shakti Model Family — Built from Scratch, Tuned for Reality
- 🔹 Shakti 100M: An ultra-compact edge AI assistant, ideal for voice commands, on-device summarization, and lightweight support bots.
- 🔹 Shakti 250M: Designed for multilingual SME automation — includes domain data from finance, healthcare, and retail.
- 🔹 Shakti 500M: A versatile conversational model tuned with DPO — perfect for chatbots, IVR, and customer service agents.
- 🔹 Shakti 1B VLM: Our entry-level Vision-Language Model for document parsing, OCR tasks, and form understanding.
- 🔹 Shakti 2.5B (Text-Only): A flagship model — excels in medical, finance, legal tasks with deeper reasoning and summarization.
- 🔹 Shakti 4B VLM: A full-scale multimodal model for document intelligence, chart/Q&A workflows, and large-scale enterprise document reasoning.
Dive into the Research — What It Teaches About Shakti
Each paper is a deep reflection of our commitment to building, not borrowing:
- 📘 Shakti 2.5B: LLM for Edge AI: Explains our core models’ architecture — VGQA, SwiGLU, RoPE — demonstrating how a 2.5 B‑parameter model achieves benchmark-level reasoning with quantization-ready design.
- 📘 Shakti‑VLM: Document Intelligence Models: Walks through our 1B and 4B VLM builds — covering multi-modal attention, 2D positional embedding, OCR integration, and real-world use cases like legal and compliance reading.
- 📘 Fine-Tuning Shakti 100M–500M for Edge AI: This paper covers our edge-first models — 100M, 250M, and 500M — highlighting efficient architectures, quantization (INT4/INT5/INT8), domain-tuned datasets, and performance benchmarks in healthcare, finance, and legal tasks. It proves that small models can beat larger ones in real-world scenarios.
Live Demo (Interactive Inference)
Try the Shakti 2.5B model live via HuggingFace:
Run natural language queries and see how it performs across QA, summarization, and domain-specific tasks — all in real-time.
Coming Up in Post 3…
We’ve answered what Shakti is made of — and what it’s not.
In the next post, we’ll go deep into:
- How we trained Shakti
- Our dataset strategy, including multilingual and domain-specific pipelines
- How SFT, DPO, RLHF, and HALUMON all come together to align Shakti to real-world enterprise needs
Because building a model from scratch is just the start. Making it aligned, safe, and domain-ready is what matters.
