Small Language Models: The Enterprise Opportunity You're Ignoring
Enterprise AI conversations focus on frontier models – GPT-4, Claude 3.5, Gemini Ultra. But for many business applications, smaller models deliver better outcomes at lower cost. Here’s the case for thinking smaller.
What Are Small Language Models?
Small language models (SLMs) are AI models with fewer parameters than frontier models. Where GPT-4 has rumoured hundreds of billions of parameters, SLMs might have 7 billion, 13 billion, or 70 billion.
Examples include:
- Meta’s Llama 3 8B and 70B
- Mistral 7B
- Microsoft’s Phi-3
- Google’s Gemma
- Various fine-tuned derivatives
These models are smaller but not incapable. For many tasks, they perform comparably to frontier models.
The Cost Reality
Frontier model inference is expensive. GPT-4 API calls add up quickly at scale. Enterprise applications processing thousands of requests daily generate significant ongoing costs.
Small models change this economics:
API cost comparison:
| Model | Relative Cost (input tokens) |
|---|---|
| GPT-4 | 1.0x (baseline) |
| GPT-3.5-turbo | 0.03x |
| Claude 3 Haiku | 0.008x |
| Self-hosted Llama 70B | ~0.02x |
| Self-hosted Llama 8B | ~0.003x |
For high-volume applications, this cost difference compounds dramatically. A million API calls that cost $30,000 with GPT-4 might cost $300 with smaller models.
The Performance Reality
The standard assumption: bigger models are better. But research and real-world deployment show this is task-dependent.
Where small models match or exceed frontier models:
- Classification tasks (sentiment, intent, category)
- Structured extraction (names, dates, amounts from text)
- Simple Q&A against provided context
- Text transformation (summarisation, formatting, translation)
- Routing and triage decisions
Where frontier models remain superior:
- Complex reasoning chains
- Novel problem-solving
- Nuanced creative writing
- Multi-step planning
- Tasks requiring broad knowledge
Most enterprise AI use cases fall into the first category. You’re classifying tickets, extracting data from documents, answering questions from knowledge bases, summarising content. These don’t require frontier model capability.
The Speed Advantage
Smaller models are faster. Significantly faster.
Latency matters for user-facing applications. A 3-second response feels slow. A 200-millisecond response feels instant.
For batch processing, speed translates directly to throughput. Processing a million documents overnight is feasible with small models; it’s expensive and slow with frontier models.
The Control Advantage
Self-hosted small models offer control that cloud APIs don’t:
Data sovereignty: Your data never leaves your infrastructure. For regulated industries and sensitive data, this matters.
Customisation: Fine-tune models on your specific domain, terminology, and task requirements. This often improves performance beyond generic frontier models.
Availability: No dependency on external provider uptime. Your AI works when your infrastructure works.
Cost predictability: Fixed infrastructure costs rather than usage-based pricing that can spike unexpectedly.
Implementation Approaches
Several approaches to deploying small models:
Cloud-hosted SLM APIs
Providers offering small model APIs (AWS Bedrock, Google Vertex, Azure) with similar integration patterns to frontier models but lower costs.
Best for: Organisations wanting cost reduction without operational complexity.
Self-hosted cloud deployment
Running open-source models on cloud GPU instances you control.
Best for: Organisations needing data sovereignty and customisation with cloud operational model.
On-premises deployment
Running models on your own hardware for maximum control.
Best for: Highly regulated industries with strict data requirements.
Edge deployment
Running compact models on local devices for real-time, offline-capable applications.
Best for: Manufacturing, field service, or situations with connectivity constraints.
When to Choose Small Models
Consider small models when:
Use case is well-defined. Small models excel at specific tasks. If you know exactly what you need, small models can be optimised for it.
Volume is high. Cost advantages compound with scale. High-volume applications see dramatic savings.
Latency matters. Real-time applications benefit from speed improvements.
Data sensitivity is high. Self-hosted small models provide data sovereignty that cloud APIs don’t.
Customisation is valuable. Fine-tuning on your domain can beat generic frontier models.
When to Choose Frontier Models
Keep frontier models for:
Complex, variable tasks. When you can’t predict what the AI needs to do, frontier models’ broader capability matters.
Development and exploration. Early development benefits from flexible, capable models before optimising for production.
Quality-critical applications. When the quality difference matters more than cost.
Integrated experiences. When you want native multimodal, tool use, or other advanced features.
A Practical Strategy
A sensible enterprise approach combines both:
Start with frontier models during exploration and development. Understand what’s possible without constraint.
Identify high-volume use cases where small models could work. These become optimisation targets.
Test small models on those use cases. Benchmark quality against frontier model performance.
Deploy small models where quality is sufficient. Use cost savings to fund other AI initiatives.
Keep frontier models for tasks that genuinely require their capability.
This isn’t either/or. It’s matching the right model to the right task.
The Emerging Landscape
The small model landscape is evolving rapidly:
Quality improving: Each generation of small models narrows the gap with frontier models.
Tooling maturing: Deployment, fine-tuning, and management tools are increasingly enterprise-ready.
Specialisation increasing: Domain-specific small models (legal, medical, financial) are emerging.
Cost declining: GPU pricing and efficiency improvements reduce self-hosting costs.
The opportunity for small models in enterprise AI is growing, not shrinking.
Final Thought
Enterprise AI strategy shouldn’t default to frontier models for everything. That’s expensive and often unnecessary.
Small models offer a compelling alternative for many enterprise use cases – lower cost, faster speed, and better control. The enterprises that master this optimisation will have sustainable AI economics.
Think bigger doesn’t always mean think better. Sometimes smaller is the smarter choice.