Google releases PaliGemma, its first Gemma vision-language multimodal open model

Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.

Google has developed a new vision-language multimodal model under its Gemma umbrella of lightweight open models. Named PaliGemma, it is designed to address image captioning, visual question answering and image retrieval. It joins other Gemma variants, CodeGemma and RecurrentGemma, and is available starting today for developers to use within their projects.

Google announced PaliGemma at its developer conference. PaliGemma stands out as the sole model in the Gemma family designed to translate visual information into written language. It’s also a small language model (SLM). This distinction means it operates efficiently without requiring extensive memory or processing power, making it suitable for use on resource-constrained devices like smartphones, IoT devices, and personal computers.

Developers may be drawn to the model because it opens up a host of new potentials for their applications. PaliGemma could help app users generate content, offer more search capabilities, or help the visually impaired better understand the world around them. When we use AI, it’s usually provided through the cloud and through one or more large language models (LLMs). But in order to reduce latency — the time it takes from receiving an input to generating a response — developers may opt for SLMs. Or they may turn to these models when dealing with devices where internet reliability may be an issue.

Web and mobile apps are perhaps the more conventional use cases for PaliGemma, but it’s feasible that the model could be incorporated into wearables such as sunglasses that would compete against the Ray-Ban Meta Smart Glasses or in devices similar to the Rabbit r1 or Humane AI Pin. And let’s not forget about the robots that operate within our homes and offices. Because Gemma is built from the same research and technology behind Google Gemini, developers could be more comfortable adopting the technology in their work.

VB Event

The AI Impact Tour: The AI Audit

Join us as we return to NYC on June 5th to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Request an invite

The release of PaliGemma isn’t the only announcement Google is making today around Gemma. The company has also revealed its largest version of Gemma, containing 27 billion parameters.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

READ SOURCE

Google releases PaliGemma, its first Gemma vision-language multimodal open model

VB Event

Recommended For You

Jake Paul 10-1 career boxing record IN FULL

Dublin slides down list of European cities for potential investment, study says

Could YOU still pass your theory test today? See if you can solve the top 9 questions most learners get wrong

Pentagon says UFO nearly crashed into commercial airliner above New York – as bombshell report is released

Rust Foundation moves forward on C++ and Rust interoperability

Google releases PaliGemma, its first Gemma vision-language multimodal open model

VB Event

You Might Also Like

Recommended For You