
In the last two years, the basic unit of generative AI development has been "completion."
You send a text prompt to a model, it sends a text back, and the transaction ends. If you want to continue the conversation, you need to send the entire history back to the model. it "no state" architecture—consisting of Google’s legacy generateContent endpoint — perfect for simple chatbots. But as developers move to autonomous agents using tools, maintaining complex states, and "THINK" over the long horizon, that stateless model becomes a distinct bottleneck.
Last week, Google DeepMind finally addressed this infrastructure gap public beta launch of the Interactions API (/interactions).
while OpenAI started this transition back in March 2025 with its Responses APIGoogle’s entry marks its own efforts to improve the latest. The Interactions API is not just a state management tool; it is a unified interface designed to treat LLMs less like text generators and more like remote operating systems.
The ‘Remote Compute’ Model
The core innovation of the Interactions API is the introduction of server-side state as a default behavior.
Previously, a developer building a complex agent had to manually manage a growing JSON list of each "users" and "model" loop, sending megabytes of history over and over with each request. With the new API, developers simply pass a previous_interaction_id. Google’s infrastructure retains conversation history, device outputs, and "THINK" process at their end.
"Models become systems and over time, may even become agents themselves," written by Ali Çevik and Philipp Schmid of DeepMind, an official company blog post in the new paradigm. "Trying to force these capabilities to generateContent would have resulted in an overly complex and fragile API."
This switch enables Background Execution, a critical feature for agent time. Complex workflows—such as browsing the web for an hour to synthesize a report—often trigger HTTP timeouts in standard APIs. The Interactions API allows developers to trigger the agent background=true, disconnect, and poll for results later. This effectively turns the API into a work queue for intelligence.
native "Deep Research" and MCP Support
Google used this new infrastructure to deliver its first built-in agent: Gemini Deep Research.
Accessible through the same /interactions endpoint, this agent can execute "long-horizon research tasks." Unlike a standard model that predicts the next clue based on your prompt, the Deep Research agent executes a loop of search, reading, and synthesis.
Most importantly, Google is also embracing the open ecosystem by adding native support for the Model Context Protocol (MCP). This allows Gemini models to directly call external tools hosted on remote servers—such as a weather service or database—without the developer having to write custom glue code to parse the tool calls.
The Landscape: Google Joins OpenAI in ‘Stateful’ Times
Google is likely playing catch-up, but with a different philosophical twist. OpenAI transitioned from statelessness nine months ago with Responses API launch in March 2025.
While both giants solve the problem of increasing context, their solutions differ in transparency:
OpenAI (The Compression Approach): OpenAI’s Responses API introduces Compaction—a feature that reduces conversation history by replacing tool outputs and reasoning chains with opaque ones. "encrypted compaction objects." It prioritizes token efficiency but creates a "black box" where the previous reasoning of the model is hidden from the developer.
Google (The Hosted Method): Google’s Interactions API has been historically available and composable. The data model allows developers to "debug, manipulate, stream and reason interleaved messages." It prioritizes inspectability over compression.
Supported Models and Availability
The Interactions API is currently in Public Beta (documentation here) and is immediately available through Google AI Studio. It supports the full spectrum of the latest generation Google models, ensuring that developers can match the right model size to their specific agent task:
-
Gemini 3.0: Gemini 3 Pro Preview.
-
Gemini 2.5: Flash, Flash-lite, and Pro.
-
Agent: Research Insights (
deep-research-pro-preview-12-2025).
Commercially, the API integrates with Google’s existing pricing structure—you pay standard fees for input and output tokens based on the model you choose. However, the value proposition is changing with new data retention policies. Because this API is stateful, Google needs to store your interaction history to enable features like implicit caching and context retrieval.
Access to this storage is determined by your level. Free Tier developers are limited to a 1-day retention policy, which is suitable for ephemeral testing but not sufficient for long-term agent memory.
Paid Tier developers open a 55-day maintenance policy. This extended stay is not just for the audit; this effectively lowers your total cost of ownership by maximizing cache hits. By keeping history "heat" on the server for nearly two months, you avoid paying to reprocess large context windows for recurring users, making the Paid Tier more efficient for production-grade agents.
Note: Since this is a Beta release, Google advises that features and features may be subject to change.
‘You are interacting with a system’
Sam Witteveen, a Google Developer Expert in Machine Learning and CEO of Red Dragon AI, sees this release as a necessary evolution of the developer stack.
"If we go back in history… the whole idea is simple text-in, text-out," Witteveen stated in a technical breakdown in YouTube release. "But now… you’re interacting with a system. A system that can use multiple models, create multiple loops of calls, use tools, and perform code execution on the backend."
Witteveen highlighted an immediate economic benefit of this architecture: Implicit Caching. Since the history of the conversation lives on Google’s servers, developers do not have to pay for re-uploading the same context over and over again. "You don’t have to pay much for the tokens you call," he explained.
However, the release was not without friction. Witteveen criticized the current implementation of Deep Research’s agent citation system. While the agent provides sources, the URLs returned are often wrapped with internal Google/Vertex AI redirection links rather than raw, usable URLs.
"My biggest complaint is that … these URLs, when I save them and try to use them in a different session, they don’t work," Witteveen warned. "If I want to create a report for someone with citations, I want them to be able to click on URLs from a PDF file… Having something like medium.com as a citation (without a direct link) is not very good."
What This Means for Your Team
For Lead AI Engineers focused on rapid model deployment and fine-tuning, this release offers a direct architectural solution to ongoing "timeout" problem: Background Execution.
Instead of building complex asynchronous handlers or managing separate job queues for long-running reasoning tasks, you can offload this complexity directly to Google. However, this convenience introduces a strategic trade-off.
While the new Deep Research agent allows for the rapid deployment of sophisticated research capabilities, it acts as a "black box" compared to custom-built LangChain or LangGraph flows. Engineers should prototype a "slow thinking" feature using background=true parameter to evaluate if the speed of execution is greater than the loss of fine-grained control of the research loop.
Senior engineers in charge of AI orchestration and budget will find that moving to a server-side state through previous_interaction_id opens up Implicit Caching, a big win for cost and latency metrics.
By referring to the history stored on Google’s servers, you automatically avoid the token costs associated with re-uploading large context windows, directly addressing budget constraints while maintaining high performance.
The challenge here lies in the supply chain; the inclusion of Remote MCP (Model Context Protocol) means that your agents connect directly to external devices, which requires you to strictly validate that these remote services are secure and authenticated. It’s time to audit your current token cost of resending conversation history—if it’s high, prioritizing the migration to the stateful Interactions API can yield significant savings.
For Senior Data Engineers, the Interactions API offers a more robust data model than raw text logs. The structured schema allows complex histories to be debugged and reasoned, improving the overall Data Integrity of your pipelines. However, you should remain cautious about Data Quality, especially the issue raised by expert Sam Witteveen about citations.
The Deep Research agent is now back "wrapped" URLs that may expire or break, rather than raw source links. If your pipelines rely on scraping or archiving these sources, you need to perform a cleanup step to remove the available URLs. You should also test the structured output capabilities (response_format) to see if they can replace weak regex parsing in your current ETL pipelines.
Finally, for IT Security Directors, the state transfer to Google’s centralized servers offers a paradox. This improves security by hiding API keys and conversation history on client devices, but it introduces a new risk to data residency. The critical check here is Google’s Data Retention Policies: while the Free Tier keeps data for just one day, the Paid Tier keeps interaction history for 55 days.
This is in contrast to OpenAI’s "Zero Data Retention" (ZDR) business options. You must ensure that the storage of sensitive conversation history for approximately two months complies with your internal management. If this violates your policy, you must configure calls to store=falsealthough doing so would prevent the stateful features—and the cost benefits—that make this new API so valuable.






