Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.
When Google DeepMind unveiled Gemma last February, it released two open-source models containing 2 billion and 7 billion parameters, respectively. At this year’s Google I/O developer conference, the company is introducing its Gemma 2 series, with the first member being a much larger lightweight model with 27 billion parameters. However, it won’t be available immediately—it’s scheduled to arrive in June.
“So this 27B size, we’ve picked it intentionally,” Josh Woodward, Google’s vice president of Google Labs, explained last week at a reporter’s roundtable. “It’s optimized to run on Nvidia’s next-gen GPUs or a single TPU host in Vertex AI. So that’s what makes it easy to use. And we’re already seeing some great quality. It’s outperforming models two times bigger than it already.”
Gemma is Google’s family of lightweight models for developers looking to incorporate AI into their apps and devices without requiring extensive memory or processing power consumption, making it suitable for use on resource-constrained devices like smartphones, IoT devices, and personal computers. Since its launch earlier this year, Google has added several variants, including one for code completion (CodeGemma), another to improve memory efficiency (RecurrentGemma), and lastly and more recently, a model for vision-language (PaliGemma).
Now, with 27 billion parameters, Gemma 2 promises to offer more accurate results and performance while also handling more complex assignments than its two predecessors. Having a larger dataset from which to train enables the AI to provide higher-quality responses in less time.
While Woodward claims Gemma 2 is designed to run on a single TPU, he means TPUv5e, Google’s latest-generation computer chip, which was released last August. In other words, using Gemma 2 requires a single, specialized AI chip to handle computations, reducing latency and handling tasks like image recognition and natural language processing. The fewer resources needed, the more developers save to reinvest in their app.
Gemma 2’s debut takes place in the shadow of OpenAI’s unveiling of GPT-4o, its multimodal LLM and is billed as a “significant upgrade” to what users currently experience, especially for those who are using the free version of ChatGPT.