【AI前沿】Gemini 3.5 Flash might be fast enough for gen AI to make sense
Gemini’s efficiency playGemini 3.5 Flash might be fast enough for gen AI to make senseGoogle says its more efficient Gemini 3.5 Flash is the key to your agentic AI future.Ryan Whitwam–May 19, 2026 2:11 pm|72Credit:
Aurich LawsonCredit:
Aurich LawsonText
settingsStory textSizeSmallStandardLargeWidth*StandardWideLinksStandardOrange* Subscribers onlyLearn moreMinimize to navAt last year’s I/O event, Google was still talking about the2.5 branchof Gemini, and what a difference a year makes. We’ve gone through the3.0and3.1families since then, and now it’s on to version 3.5.Gemini 3.5 Flashis rolling out across a wide range of Google products starting today, and Google again claims this model is even better than its last-gen Pro model.That has been a trend with Google’s tick-tock model updates over the past year, but the team says this release is special. Gemini 3.5 Flash supposedly offers frontier-level intelligence while also being efficient enough that it may finally make complex agentic tasks worth doing at scale. Tulsee Doshi, senior director of product management for Gemini, explains that the innovations of Gemini 3.5 Flash are woven through multiple Google products, and this is just the start.Credit:
GoogleCredit:
GoogleIt’s no secret that generative AI is currently a money pit, and all the major AI players are trying to find paths to greater efficiency. The problem is magnified when you start building agentic experiences that are supposed to run for longer to complete complex tasks. Gemini 3.5 Flash may be a big step toward making that viable. The new model can output nearly 300 tokens per second, but its benchmark scores are similar to larger frontier models (like 3.1 Pro) that build outputs at a quarter of that speed.Google now says that the companies using the most AI tokens could save a billion dollars per year by shifting to the more efficient Gemini 3.5 Flash. API pricing for the new model is significantly lower than the Pro model it apes. Gemini 3.5 Flash clocks in at $1.50 per 1M input tokens and $9 per 1M output tokens. The 3.1 Pro model starts at $2 and $12, respectively, and it’s higher if you use more than 200k tokens.According to Doshi, the team made numerous improvements in pre-training with Gemini 3.5 Flash, but insights gleaned from how devs use Gemini models are really paying off.“With post-training, we’re really starting to unlock some of the value of the feedback we’re getting from users, for example, from Antigravity,” said Doshi. “That’s really what you’re seeing play out in terms of the code performance and the tool use performance. And then, the hope is that you’ll continue to see the step change where 3.5 Pro will be better, and the next Flash meets Pro performance with that series.”Google is focused on code generation with the new model, which is a core agentic angle for AI. Both Terminal Bench and SWE-Bench Pro tests show substantial improvements—3.5 Flash clobbers older Flash models and shows a small but measurable improvement versus Gemini 3.1 Pro. Its scores are in the same neighborhood as OpenAI’s much larger and more expensive GPT 5.5.A major barrier in agentic workflows is how generative models can use interfaces designed for humans. It’s not an easy problem to solve, Doshi said. “Certain things like UI control are expensive to do because the model has to search the page, it has to know where to click, it has to act through multiple steps. I think Flash is able to do that well because of that combination of quality and cost.”Google’s AI evaluations demonstrate these improvements, too. Among Google’s current collection of benchmarks is OSWorld-Verified, which tests how models handle general tasks in real computing environments. It’s similar to the coding improvements. Gemini 3.5 Flash substantially outperforms older Flash models and is even a bit faster than Gemini 3.1 Pro. It’s essentially tied with GPT 5.5.Google’s new Flash model is, again, a little better than the last-gen Pro.Credit:
GoogleGoogle’s new Flash model is, again, a little better than the last-gen Pro.Credit:
GoogleGemini 3.5 Flash has been rolled out internally at Google, and Doshi noted that it’s having a big impact. “We have a set of internal metrics we’ve been evaluating that measures how Googlers code, so looking at our own code bases and how well the models perform on that,” Doshi said. “And you can see a massive, massive jump between where 3.1 Pro was and where 3.5 Flash is.”Google unveiled the Antigravity IDE last year, and it’s being upgraded to version 2.0 with support for Gemini 3.5 Flash. This update will support multiple parallel workflows—essentially sub-agents spawned by Gemini 3.5 Flash. Again, Google says this is only possible because the new model is so efficient at spitting out tokens.In addition to Antigravity, Gemini 3.5