The AI arms race is pushing big tech to deploy massive resources to keep their power-hungry large language models up and running – and potentially speed up their response times. Microsoft, for instance, is reportedly planning a $100 billion splurge into a supercomputer – set to go online in 2028 – to back its AI ambitions.
But on the flip side, small language models are gathering steam too. And they’re quickly proving that prompts don’t always need several gigawatts worth of processing power – some of these models can run on devices as personal as phones while performing just as well for specific tasks. Here’s everything you need to know about SLMs.
What are small language models?
Small language models are all about challenging the notion that bigger is always better in natural language processing. Unlike the hundreds of billions of parameters (variables that a model learns during training) models like GPT-4 or Gemini Advanced boast, SLMs range from ‘only’ a few million to a few billion parameters.
Still, they are proving to be highly effective in specialised tasks and resource-constrained environments. With advancements in training techniques, architecture, and optimisation strategies, SLMs are closing the performance gap with LLMs, making them an increasingly attractive option for a wide range of applications.
What are small language models used for?
The versatility of SLMs is one of their most compelling features. These models are finding applications in diverse domains, from sentiment analysis and text summarisation to question-answering and code generation. Their compact size and efficient computation make them well-suited for deployment on edge devices, mobile applications, and resource-constrained environments.
For instance, Google’s Gemini Nano is a compact powerhouse featured on the latest Google Pixel phones that helps with replies when texting and summarises recordings – all on the device itself without even an internet connection. Microsoft’s Orca-2–7b and Orca-2–13b are other examples of SLMs.
Of course, since small language models are relatively newer and are still being researched on, you may not see many real world applications just yet. But the promise is still there. Organisations can be particularly benefited by these – by enabling on-premises deployment, these smaller models can ensure that sensitive information remains securely within an organisation’s infrastructure, reducing the risk of data breaches and addressing compliance concerns.
How are SLMs different from LLMs?
While LLMs are trained on vast amounts of general data, SLMs excel in specialisation. Through a process called fine-tuning, these models can be tailored to specific domains or tasks, achieving high accuracy and performance in narrow contexts. This targeted training approach allows SLMs to be highly efficient, requiring significantly less computational power and energy consumption compared to their larger counterparts.
Another difference lies in the inference speed and latency of SLMs. Their compact size enables faster processing times, making them more responsive and suitable for real-time applications, such as virtual assistants and chatbots.
Furthermore, the development and deployment of SLMs are often more cost-effective than LLMs, which require substantial computational resources and financial investment. This accessibility factor makes SLMs an attractive option for smaller organisations and research groups with limited budgets.
What are some of the most popular SLMs?
The landscape of small language models is rapidly evolving, with numerous research groups and companies contributing to their development. Here are some of the most notable examples:
1. Llama 2: Developed by Meta AI, the Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters that have gained significant traction in the open-source community for their impressive performance across various natural language understanding tasks.
2. Mistral and Mixtral: Mistral AI’s offerings, such as Mistral-7B and the mixture-of-experts model Mixtral 8x7B, have demonstrated competitive performance compared to larger models like GPT-3.5.
3. Microsoft’s Phi and Orca: The Phi-2 and Orca-2 models from Microsoft are known for their strong reasoning capabilities and adaptability to domain-specific tasks through fine-tuning.
4. Alpaca 7B: Developed by researchers at Stanford, Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. It has shown behaviors qualitatively similar to OpenAI’s text-davinci-003, which is based on GPT-3, in preliminary evaluations.
5. StableLM: Stability AI’s StableLM series includes models as small as 3 billion parameters.
Looking ahead
As research and development in this area continue to advance, the future of small language models looks promising. Advanced techniques such as distillation, transfer learning, and innovative training strategies are expected to further enhance the capabilities of these models, potentially closing the performance gap with LLMs in various tasks.