Takeaways from Google Cloud Next 25
Google Cloud Next 2025 in Mandalay Bay Convention Center
I went to the Google Cloud Next annual conference on April 9-10th in Las Vegas. There were ~40,000 attendees and the Mandalay Bay Convention Center was packed with crowds.
The theme for this year is “A New Way to Cloud”. The new way, of course, refers to AI. How does AI give Google Cloud a competitive edge? Let’s find out.
New AI Products
In the keynote on the first day, Thomas Kurian first showcased the multimodal GenAI capabilities in Vertex AI Studio — Imagen for image generation, Chirp 3 for voice generation, Lyria for music generation, and Veo 2 for video generation. For example, with video generation, with the picture of one section of the Vegas strip, Veo 2 can generate a panoramic video of the dynamic scenery of the entire strip and the surroundings. The 85 year old movie Wizard of Oz is remade with Veo to fill the giant wraparound display in Las Vegas Spheres covering 160K square feet with 16K high resolution.
Vertex AI Platform allows developers to access and customize 200+ models from Model Garden, as well as to train, tune and deploy custom models. These models can connect to databases in Google Cloud and 3P SaaS offerings, as well as ground with Google Search, Maps and 3P data sources.
A variety of tools for AI agents are announced. Agent Development Kit (ADK) is an open-source framework for building and deploying agents with workflows, multi-agent systems, and benefiting from close integration with Gemini and Google ecosystem while also supporting other popular LLMs and tools (e.g. LangChain, CrewAI). ADK supports MCP (Model Context Protocol) for agents to access diverse data sources. Agent2Agent Protocol solves the agent interoperability issue by providing an open standard for agents built on different frameworks and by different vendors to communicate and collaborate with each other, and the community has over 50 partners now. Agent Engine is a fully managed cloud service for running, managing and scaling agents in production, with monitoring, quality evaluation, short and long term memory and security. Agentspace brings enterprise search and agents together by providing a portal for employees to discover knowledge, find and share agents via Agent Gallery, use agents to automate tasks, create agents with no-code Agent Designer, and deploy Google’s pre-built agents such as Deep Research and Idea Generation.
Thomas Kurian highlighted five key types of agents that Google is building.
Customer Agent - for sales and customer support
Creative Agent - for media, entertainment, and digital marketing
Data Agent - for processing and managing data, automating data analysis, correlating data across disparate systems, analyzing data with natural language, and synthesizing consumer feedback across channels
Coding Agent - for automating software development tasks across the SDLC — design, build, test, deploy, run, scale, monitor, log, issue diagnosis and resolution with code fix.
Security Agent - for automating a variety of security operations, including threat analysis, cyber defense, detection and response, red-teaming, alert triaging, and malware analysis, contributing to Google Unified Security.
The Developer Keynote on the second day went into more depths and showed various demos for all the above technologies and products. Here are links to the main keynote and the developer keynote, which are well worth watching.
I attended a number of breakout sessions which are very educational on the latest product offerings. I also walked around the exhibit hall and talked with many vendors to understand the overall ecosystem.
AI Customization and Evaluation
There are four ways to customize a pre-trained LLM — in context learning or prompting, RAG, supervised fine tuning (with smaller datasets and adjusting a subset of model params), and full fine tuning (with larger datasets and adjusting all params). The tuning goals include performing specific tasks better, changing tone/style, as well as reducing costs and latency. For the tuning datasets, they should match with the expected production data, and quality is more important than quantity, with right data coverage and distribution based on tasks and contexts, as well as data fluency, coherence, factuality and complexity. For tuning dataset size, we can start small (with ~500 examples) and then scale up. For hyper-parameters (e.g. epochs, adapter size, learning rate multiplier), we can start with defaults. We can use checkpoints to save tuning progress, and use evals to compare checkpoints.
Google’s GenAI Batch Evaluation Service helps developers to run evals efficiently in batch with customized eval criteria and metrics. The service uses AutoRater, an LLM model, to judge the output based on the defined criteria including task completion, efficiency, safety, etc. Developers can evaluate the quality of the AutoRater by providing human ratings, and tune the AutoRater by changing the prompt and settings. The AutoRater can also use a rubrics based evaluation approach (with well defined criteria and performance levels) for a more structured, in-depth, and transparent assessment of the model/app. Finally, for AI agents, we should not only evaluate the final output of the agents, but also evaluate the path they take including reasoning, planning, tools use, API calls, responsible AI/safety, etc.
AI-Ready Databases
Google Cloud has a comprehensive portfolio of database offerings, including in-memory (Redis, Valkey, Memcached), relational (Oracle, Postgres, MySQL, SQL Server), key-value (Bigtable), document (Firestore), as well as analytics (BigQuery). As a cloud service, offering choice is necessary, as customers have the habit of using different databases for different needs. However this “one size does not fit all” approach also has its drawbacks with data fragmentation, operational inefficiency, and data governance challenge. I find Google’s Spanner database to be particularly interesting. It’s taking the approach of what I would call “one database to rule them all”. It’s infinitely scalable (no replication or sharding), zero-maintenance, inherently multi-model (relational, key-value, document, graph, text search), and AI-ready. There are three AI-ready features — natural language interface, vector embedding and vector search, and AI query engine (e.g. embedding vector search and cross-attention re-ranking with LLM in SQL queries). Spanner has 6B RPS, <5ms read/write latency, and 17EB of storage today. Spanner also works seamlessly with BigQuery (Google’s serverless data warehouse and analytics platform) — BigQuery can query data stored in Spanner directly without ETL, and Spanner supports real-time analytics on latest transaction data.
Chips and Infrastructure
Google released the 7th generation of TPU, Ironwood. According to the staff member I spoke to at the TPU booth, Gemini 2.5 and all Google’s internal ML training are done with TPU now. Google still purchases a lot GPUs from Nvidia simply because Google Cloud customers demand that, primarily due to the stickiness of the CUDA code they have already written.
What surprised me is the newly released Axion, which is Google’s first general purpose server processor based on ARM architecture. AWS offers Graviton, its own ARM based CPU for its EC2 for a few years already. What’s interesting is that ARM based processors offer both better price performance and more energy efficiency compared to x86. According to Google, Axion is 65% more price performant and 60% more energy efficient vs. x86. I attended a customer testimonial session, where both Spotify and Databricks are in the process of moving all their internal workloads to ARM. I was also told by a Googler that Axiom is also used in ML and HPC (high-performance computing) workloads.
I enjoyed an early morning session with Urs Hölzle and Parthasarathy Ranganathan, both Google Fellows, and Urs is employee #6 and first VP of engineering. Urs is also a co-author of the book “Datacenter as Computer”, with a new edition coming out soon. They discussed the future of datacenter infrastructure for AI, and here are a few interesting takeaways:
Model params have scaled 10X per year. However both compute and data scaling are flattening. Google is spending $75B this year on Capex. Obviously it can’t 10X that next year. Model training is way too inefficient today. The biggest opportunity is to innovate on model architecture, so that we can train models much more quickly and with far less compute and data. Human brains don’t need that much data or training to learn.
They advocate much more energy efficient compute infrastructure, and a cross-stack and cross-discipline system design approach is needed — from chips, to systems (liquid cooling, optical switching networks, HW/SW co-design to manage power and thermal fluctuations), to platforms (much greater performance/energy efficiency with AutoML and model exploration).
Since last year, there has been concern on the forecast of significant increase in electricity demand from datacenters in the coming years. Urs offers a contrarian view that there is enough power for AI, as the US grid only has 42% of utilization. He thinks there will be opportunities for significant energy saving from AI applications in various industries - e.g. building, transport, industrials and materials, etc. Despite more energy going into powering AI factories, the net effect to the society could be net-zero or even negative emissions across the value chains.
AI will transform all industries. They gave an example that RL will significantly accelerate chip design by having the RL agent to play the ASIC chip layout game. AI has solved the problems of playing chess (10^123 states), and go (10^360 states). Chip placement requires 10^9000 states.
Parting Thoughts
Google Cloud has already become a major business with ~$50B+ revenue run rate and growing at 30%+. Google Cloud provides a tremendous platform for Google to deliver its full-stack AI capabilities and wide-ranging AI platforms and applications to all companies (from enterprises to startups), and these AI offerings are mutually beneficial and enhancing with the breadth and depth of its enterprise platforms (e.g. data, security, developer).
Finally I share an amusing anecdote which testifies the coming of age of the public cloud. One person at the booth of Workday, one of the leading SaaS vendors with ~$8B revenue run rate, told me that Workday will shift the entire customer base (11,000+) to public clouds (AWS, Google, etc) by 2028, and new customers can only buy Workday running on these public clouds. The advantage is that customers can use their already committed cloud budget on these public clouds to pay for Workday subscriptions. Workday still maintains its direct sales motion, but the billing relationship will be indirect through the clouds now. Workday (founded in 2005), like most of the major early SaaS vendors, built and operates their own cloud infrastructure. It is finally conceding that it makes more economic and technical sense to outsource that completely to the public clouds. I wonder if the other major SaaS vendors are also planning the similar move?