What Google Gemini AI updates mean for software developers

Google Gemini AI foundational model updates and new AI Studio and Vertex features due next month aim to support advanced application workflows more efficiently than existing versions.

The updated Google Gemini 1.5 Pro large language model (LLM), available in preview in 200 countries in various Google consumer and developer services, is expected to become generally available in June. It will support up to 1 million tokens in its context window, according to company officials during keynote presentations Tuesday.

At its initial introduction in February, Gemini 1.5 Pro supported 128,000 tokens in practice, and 1 million tokens was an experimental feature. Window tokens refers to the amount of data — text, images, audio or video — an LLM can reason about at a time. Users of Google AI Studio and Vertex AI developer tools can also sign up for a waitlist this week to preview support for up to 2 million tokens planned for later this year. One million tokens translates to about an hour of video, 11 hours of audio, 30,000 lines of code and 750,000 words.

“The larger token window is like the working memory of the AI, and it’s one of the frontiers in terms of how helpful AI can be for advanced, highly contextual tasks,” said David Strauss, co-founder and CTO at WebOps service provider Pantheon , who has used Google’s Vertex AI machine learning platform for production and experimental projects. “It shifts more and more tasks to ones AI can accomplish on a whim, rather than with full training or even fine-tuning.”

1 million tokens — so what?

The major LLM providers have been in an arms race to expand multiple attributes of their models for the last year, and especially over the last few months, said Ian Beaver, chief scientist at Verint Systems, a contact center-as-a-service provider in Melville, NY He cited examples such as Anthropic’s Claude 3 Opus launch two months ago, which surpassed OpenAI’s GPT-4 in LLM benchmarks from GPT-4; in April, Meta boasted higher benchmark performance for Llama 3 against the preview version of Gemini 1.5 Pro Just yesterday, OpenAI announced GPT-4o and an update to ChatGPT that supported text, audio and image input and included higher benchmarks than both Llama 3 and Gemini 1.5 Pro.

All of these models have made big leaps in input token limits as well, Beaver said: GPT-4 has grown from 16,000 to 128,000 tokens; Claude has grown from 100,000 to 200,000; and Gemini has grown from 32,000 to 1 million.

Larger context windows might help for some applications, such as video prompts and generation. Still, Beaver said he isn’t sure how widely useful 1 million tokens will be.

The fact that you can now comfortably send in the entire text of War and Peace may be useful for generating reviews on large novels, but it remains to be seen how effective these models are.

Ian BeaverChief scientist, Verint

“The fact that you can now comfortably send in the entire text of War and Peace may be useful for generating reviews on large novels, but it remains to be seen how effective these models are at maintaining long-distance dependencies in context data over that large a search space,” he said. “In our experience, once you surpass a few hundred thousand tokens, it is generally unhelpful for the quality of the model response to include more, since there is typically some selection pipeline going on before the LLM, such as a database query or a search.”

Bigger isn’t necessarily better, wrote Torsten Volk, an analyst at Enterprise Management Associates, in a blog post last month.

“While the impressive 1 million token context window of Google’s Gemini 1.5 Pro offers a theoretical advantage in handling extensive data, the practical effectiveness of a language model like GPT-4 often surpasses it due to more sophisticated mechanisms … [that] efficiently manage smaller context windows by focusing computational resources on the most relevant information, thus optimizing performance,” Volk wrote in the post.

Google AI Studio, Vertex AI updates

Meanwhile, updates to the Google Gemini API and services such as Google AI Studio and Vertex AI added new features this week specifically for developers. One, context caching, might be more effective than large context windows alone, according to Volk. The feature, pitched by Google as a means to make model training and prompting more efficient by not having to resend large data sets repeatedly, can also assist with recurring queries on large document sets.

“Totally coincidentally, OpenAI said that GPT-4o also now has context caching across conversations,” Volk said in an online interview this week, referring to OpenAI’s news event the day before Google I/O.

Another Google Gemini developer update unveiled this week is parallel function calling, which means the model can invoke more than one function at once.

This will feed into an emerging trend toward deploying AI agents that carry out multi-step workflows; Google’s Vertex AI last month added an Agent Builder tool, while Atlassian added support for AI agents, or virtual teammates, with its Atlassian Rovo product.

Jaclyn Konzelmann, Google I/O Gemini API
Jaclyn Konzelmann, director of product management for Google’s Gemini API, presents on Gemini 1.5 Pro’s 1 million-token context window at Google I/O.

Gemini 1.5 Flash, Gemma adds cost flexibility

A new version of Gemini rolled out this week, Gemini 1.5 Flash, uses a technique called distillation to impart the data analysis capabilities of the larger Pro model to a lighter-weight, lower-cost LLM optimized to give faster answers than the larger version.

With Flash, Google added new pay-as-you-go pricing for AI Studio and Vertex AI. Gemini Flash 1.5 is priced at $0.35 per million tokens for 128,000-token prompts and $0.70 per million tokens for prompts larger than that. By comparison, Gemini 1.5 Pro costs $3.50 per million tokens for up to 128,000 and $7.00 per million tokens for larger prompts. In general, early adopters of hosted LLM services have said controlling cloud costs has been one of their biggest challenges so far.

“We haven’t done anything with substantial enough scale on Vertex to have [cost] become a focus, but I will say that a lot of the Vertex products seem to have real utility billing,” Strauss said. “I like that because it means we can potentially provide it by default in an isolated way for customers and only pay for actual usage.”

In the rapidly growing open source AI world, two fresh permutations of Google’s Gemma will significantly increase the size of the open source LLM with the 27 billion-parameter Gemma 2 and add a fine-tuned model for video generation with PaliGemma — the first vision -language open model from Google.

As with performance benchmarks and token input limits, all the major model providers have launched cheaper and faster versions of their flagship models, according to Verint’s Beaver.

“What used to require the largest, most expensive model to perform can now be done by a cost-effective smaller model,” he said. “The AI ​​arms race is also rapidly dropping the cost of entry for high-performing LLMs. It is only getting cheaper to deploy applications using Generative AI.”

Multi-modal support for a broader array of models will also lower the cost to produce various types of media content, Beaver predicted.

Trust, safety and quality remain top AI concerns

The natively multi-modal Gemini models are capable of processing various forms of data, such as images and videos along with text, and producing multi-format outputs, but don’t quite function that way in production-ready practices just yet.

Google is working on a fresh version of Imagen “rebuilt from the ground up,” according to a keynote presentation by Douglas Eck, senior research director at Google, following a recent controversy that forced Google to suspend the image generation tool in February. Imagen 3 is now available for trial on Google’s ImageFX AI test kitchen and coming soon to Vertex AI. Truly multi-modal functions will be more widely available later this year, I/O keynote speakers said.

Multiple keynote speakers also emphasized work Google is doing on trust and safety for AI to prevent further controversial results, including red-teaming model updates, consulting with a panel of human experts from multiple fields of study, and a watermarking tool called SynthID.

However, slow adoption for generative AI tools such as Ansible Lightspeed so far indicates that enterprises have yet to dive into production use with any desire. Strauss said early Vertex projects had mixed results, although he attributes that in part to not having integrated data sets properly.

“We’ve used Vertex AI in prototypes for recommendations to tag written content and for the Vertex AI search system,” he said. “The former is active in production, and we saw middling results from the latter but needed to put more effort into the integration to truly test it.”

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.