Models we work with
Open-weights base models we have shipped fine-tunes on, with benchmark gains we have delivered
Llama 3 / 3.1 / 3.2 (8B, 70B)
Our default for English-only commercial deployments. Llama 3.1 8B is the workhorse — strong instruction-following, 128k context, excellent community tooling (vLLM, Unsloth, Axolotl, llama.cpp). Typical gains on Llama 3 8B fine-tunes: +27 points on a SaaS support-quality eval, +34 points on a banking-compliance Q&A set, +41 points on contract-clause extraction. Llama 3.2 1B/3B are excellent for on-device inference after 4-bit quantization.
Mistral 7B / Mistral Nemo / Mixtral 8x7B
Mistral 7B v0.3 — slightly weaker on raw MMLU than Llama 3 but more permissive Apache 2.0 licence, better European coverage, excellent function-calling baseline. Mixtral 8x7B (MoE) gives GPT-3.5-quality reasoning on a 2× A100 budget. Typical gains: +18 points on French-language legal QA, +29 points on a multi-step finance reasoning task. Mistral Nemo (12B) is our 2026 pick for European clients needing OSI-approved licensing.
Phi-3 / Phi-4
Microsoft's Phi series wins when you need a tiny model (3-14B) with disproportionately strong reasoning. Phi-3.5 Mini (3.8B) consistently beats Llama 3 8B on math and code despite being half the size. Use for edge deployment and cost-sensitive serving. Typical gains: +22 points on a Pune startup's SQL-generation eval after 4 hours of QLoRA.
Qwen 2.5 / Qwen 2.5-Coder
Alibaba's Qwen 2.5 is our top pick for coding (Qwen 2.5-Coder 32B is closer to Claude 3.5 Sonnet than GPT-4o-mini on HumanEval after fine-tuning), and has the best Chinese, Japanese and Korean coverage of any open-weights model. For polyglot enterprises serving APAC + India simultaneously, Qwen 2.5 is the path of least resistance.
Indic models — Sarvam-1, Krutrim, Airavata
For Hindi, Gujarati, Tamil, Marathi, Bengali, Kannada and Telugu we start from Indic-native bases. Sarvam-1 (2B) by Sarvam AI is our default — Apache 2.0, 4T-token training, strong Indic representation, fast on consumer GPUs. Krutrim Spectre v2 (7B) from Ola is heavier but stronger on conversational Hindi. Airavata 7B (AI4Bharat) is Llama-2 7B continued pre-trained on 22 Indic languages — great for code-switching. We've shipped Indic fine-tunes for an Ahmedabad fintech (Gujarati support, +38 points) and a Chennai EdTech (Tamil math tutor, +44 points on GSM8K-Tamil).