Google I/O Recap

Google I/O announced Omni, Antigravity 2.0, Gemini 3.5 Flash, Spark, AI eyewear, and AI embedded in Search and Google applications.

May 23, 2026

Figure 1. Gemini Omni is a multi-modal video/audio/text input-output model that, as the “Nano Banana for video,” can restyle any video on command into another format. Omni took a real-life video of a woman playing guitar and turned it into a stylized output. MTV-style videos will be popular.

Note for readers - Google I/O AI announcements were too big to cover in our AI Weekly, so we’ve made a separate article. Our AI Weekly will follow soon.

Google I/O’s Goodies

Google released Gemini 3 Pro in November 2025, six months ago, and for about a week, it was the world’s best AI model. Since then, Anthropic released multiple Claude Opus versions, up to 4.7, and previewed Mythos, while OpenAI released four GPT versions, making GPT-5.5 and Opus 4.7 the current frontier AI models. In the same time period, Google delivered only Gemini 3.1 Pro preview.

Many of us hoped Google would deliver a new world-beating frontier AI model. We did not get one.

Instead, we got a promise for Gemini 3.5 Pro coming this summer and two important new models: Gemini Omni, a multi-modal world generation model with ground-breaking video editing abilities; and Gemini 3.5 Flash, their fastest and highest performance Flash model yet.

Google also advanced their agentic AI platforms, introducing a revamped Antigravity 2.0 and a new AI agent for consumers, Gemini Spark. For consumers, they announced AI applications such as Docs Live, Ask YouTube, and AI enabled hardware with audio glasses. We’ll dig into all of them below.

Gemini Omni Takes On the World

Google introduced Gemini Omni, a native multimodal model that combines core intelligence with media generation and can handle any input to any output across audio, video, image, and speech. The result is the “Nano Banana for video,” a model capable of editing and generating video content directly from natural language prompts. Some of the key features that make Omni a new paradigm for video generation:

The model features advanced character consistency for marketing and education videos.
It anchors outputs in structured world knowledge to create contextually accurate media.
Commands on a video input can restyle whole scenes, change backgrounds, add elements, change angles, and more.
Omni can combine image, text, video, or audio inputs into a single cohesive output of a video with an audio track. Google says, “While only voice references will be supported for audio to start, we’ll roll out other types of audio inputs soon.”
While they have guardrails against deepfake abuses of others, Omni allows you to “create videos with your own voice by using Avatars, so you can generate videos that look and sound like you.”

Like its Nano Banana predecessor, the Omni multimodal world model is a new class of AI model that will challenge existing diffusion-based video generation models and expand what creative users can do with AI. Access to Gemini Omni requires a paid subscription.

Figure 2. Gemini Omni is a multi-modal video/audio/text input-output model that can express grounded ideas through videos. Here it generated a mini-explainer on DNA.

Gemini 3.5 Flash

Google’s headline for Gemini 3.5 is frontier intelligence with action, serving AI models for agentic AI tasks. Google presents Gemini 3.5 Flash as an efficient frontier model capable of tackling long-horizon tasks at high speed and reasonable cost.

Benchmarks show Gemini 3.5 Flash beats Gemini 3.1 Pro and Claude Sonnet 4.6 on a range of benchmarks, for example, getting 55.1% on SWE-Bench Pro and 1656 on GDP-val. It scores well on complex financial decision-making (Finance Agent V2) and other domain-specific tasks.

Gemini 3.5 Flash displays excellent speed and accuracy during single-shot prompts and smaller, short-cycle coding tasks, and could be useful for agentic AI applications, but it trails top-tier frontier models like Opus 4.7 in multi-step agentic tasks and long-horizon programming applications.

Figure 3. Benchmarks for Gemini 3.5 Flash versus leading AI models. Gemini 3.5 Flash is better than Gemini 3.1 Pro on many benchmarks.

Google shared demos of Gemini 3.5 Flash quickly producing results, touting its high token-per-second output speed (over twice the speed of its predecessors) and low latency. However, metrics also show (and my personal experience confirms) that this model is verbose and a token hog when reasoning.

The Gemini 3.5 Flash API costs $1.50 / $9.00 per million input / output tokens, three times the cost of the prior Flash version. When you compound this with the fact Gemini 3.5 Flash is verbose, it changes Flash from a cost-performance king. It now costs as much as Qwen’s latest Qwen 3.7 Max, leaving the lower-price tier more open to Chinese AI models, such as Qwen 3.6 Plus.

Google bets on Agents with Antigravity 2.0

Google’s Antigravity 2.0 is an abrupt shift from the original Anti-gravity interface, an Integrated Development Environment (IDE) based on Visual Studio Code with multi-panel layout, code editor, terminals, and extension/plugin ecosystem.

Antigravity 2.0 strips away these classic panels, plugins, and interfaces in favor of an agent-only workspace. Users interact via a prompt interface and instruct an internal agent to manage the shell, process commands, or check system states.

The system is actually four components backed by an agentic harness with a CLI. By moving away from a window-dependent IDE structure to a standardized CLI and cloud-hosted agent backend, Google can port this uniform agentic execution engine across a wide variety of development interfaces and surfaces.

The Underlying Engine - Antigravity CLI (AGY): The backend architectural engine powering the new Anti-gravity has a bare-bones interface in the new Anti-gravity Command Line Interface. This Claude Code-like CLI completely replaces the legacy Gemini CLI and can be run as a stand-alone tool with the command “agy.”

Figure 4. Antigravity CLI has a Claude Code-like interface.

The Agentic Harness - Anti-gravity 2.0 standalone app: Anti-gravity 2.0 is a stand-alone desktop application (on macOS, Linux, and Windows) that allows developers to manage multiple active agents and projects simultaneously. Each project can run distinct agent threads and independent asynchronous conversations in parallel without crossing files, eliminating the need to maintain multiple terminal windows.

While it has a steep learning curve, the interface provides many valuable features, some of them echoing features in Codex or Claude Cowork:

Built-in cron-style automation via asynchronous task management, allowing users to run scheduled agent tasks as specified command scripts at timed intervals.
Dynamic subagents: The main agent can dynamically choose to define and invoke subagents to complete focused subtasks, keeping main agent’s context window clear and allowing for parallelism.
JSON hooks, allowing users to intercept events to control Antigravity behavior.
Browser capability via /browser enables remote debugging to spin up and control instances of external applications like Google Chrome

For workflows requiring external integration, the ecosystem utilizes modular agent skills. Antigravity 2.0 is really a first version of this new architecture, lacking interface refinement and some features. New features will be rolled out over time.

Underlying Engine for developers - Anti-gravity SDK: For developers wanting to roll their own agentic AI applications, Google provides the underlying Anti-gravity 2.0 in a Python SDK.

You can download Antigravity 2.0, the Antigravity CLI, and the original Antigravity IDE, which is still available.

Spark

Google unveiled Gemini Spark, Google’s new cloud-based AI agent platform that is designed to automate consumer productivity tasks and recurring workflows directly on Google’s servers. It operates 24/7 in the cloud to manage email inboxes, organize documents, update spreadsheets, and follow up autonomously with external contacts. Gemini Spark runs on the Antigravity harness and features support for recurring tasks and new skills.

Gemini Spark was not fully released; it will roll out to trusted testers this week with a Beta planned for next week. Google promises a “packed roadmap of features” scheduled for release this summer, including upcoming Model Context Protocol (MCP) support to connect with third-party software tools.

AI Utilities for Consumers

While there were AI goodies for both business power users and consumers alike, Google highlighted consumer-centric utility in many Google I/O announcements, highlighting useful AI features and applications integrated across its existing product ecosystem. Some of the useful utilities:

Calling it a new era for AI Search, Google has pushed further into turning the search engine into an AI answer engine. The search box now is a portal for using generative AI, with conversations in AI mode for Search, AI Search agents, and even agentic coding from search.

Ask Maps allows you to ask map-related questions in natural language and have maps give you personalized answers about places.

Ask YouTube is a Gemini AI-assisted chat interface in YouTube that lets you surface answers and relevant YouTube videos on a specific topic. You can try it here.

Daily Brief is an agent that gives you a personalized morning digest that’s designed to be your first stop every day.

Docs Live is useful utility that helps you create documents on the fly from voice commands to AI. Google also announced conversational voice in Gmail to search your inbox more easily.

Google announced that audio-based AI glasses are launching this autumn. Developed in partnership with Samsung, Gentle Monster, and Warby Parker, they have Android XR and Gemini integration and support spoken assistance for calls, music, navigation, and hands-free app commands.

Google shared how Running Guide agent helps vision-impaired athletes run without human guides.

Google announced that OpenAI and other companies will incorporate Google’s Synth ID watermarking technology into its product lines to identify AI-generated imagery.

Google I/O Hits and Misses

Google most novel AI release and possibly the most profound one out of Google I/O was the Gemini Omni model. World models like Omni could displace prior generations of AI video generation models, in the same way Nano Banana has disrupted image generation.

The Omni model not only enables new video generation capabilities, but it also advances us towards a multimodal form of Artificial General Intelligence (AGI). Google DeepMind argues that advanced world generators like Gemini Omni are crucial for achieving AGI, since AI must accurately simulate real-world physics and sense in all modalities to understand the world.

OpenAI and Anthropic, on the other hand, are scaling text-based reasoning models as their line of sight to AGI. We started by noting that some of us hoped for a new frontier AI model. We didn’t get one. Google isn’t losing by any means, but neither OpenAI nor Anthropic are threatened by Google’s release of Gemini 3.5 Flash, a solid AI model but not a reason to get off GPT-5.5 or Opus 4.7.

The Antigravity 2.0 platform is the right architecture going into a fully agentic future. Paired with Gemini 3.5 Flash, it’s an effective platform for coding and many tasks. However, the release itself caused confusion, as it replaced an IDE interface, leaving existing users confused.

Both Antigravity 2.0 and Gemini Spark agentic platforms share similar features with Codex and Claude Cowork. The AI competition and fast pace of development is driving design convergence, where all these applications copy each other and begin to look the same.

AI companies are all facing the same issue: The explosion in usage and demand is making AI infrastructure a bottleneck. Google may be calculating that better margins come from optimizing their Flash model and raising prices to match its better performance. Meanwhile, Google is inserting AI throughout their ecosystem, including, Search, Gmail, YouTube, and Google docs. AI usage will only continue to grow.

AI Changes Everything

Discussion about this post

Ready for more?