A plain-language, end-to-end guide to Google's AI for web developers — the Gemini model family, AI Studio, the Gemini API, Vertex AI, Firebase Genkit, MediaPipe, Gemini Nano in Chrome, Google Antigravity, NotebookLM, and the agent protocols that tie them together. What each one is for, when to use which, and the exact commands to type.
Built to be more comprehensive and more factual than its sibling sites at codex.wholetech.com and claude.wholetech.com.
Gemini is one model family with many front doors. The fastest is Google AI Studio in your browser — sign in, paste a prompt, get a key. The Gemini API (@google/genai) is what you wire into a Node, Next.js, or Python app. Vertex AI is the same models with enterprise plumbing (VPC-SC, regional residency, IAM). Google Antigravity is Google's standalone agent IDE. The Gemini app at gemini.google.com is the consumer surface. Most developers start in AI Studio and graduate to the API within an hour.
Browser playground for every Gemini model. Free tier with generous limits. Tune prompts, attach images/video/audio, compare Flash vs. Pro side-by-side, mint an API key, and export the call as Python, Node, curl, or Swift. The canonical sandbox.
Open aistudio.google.com →One npm i @google/genai and a 5-line script later you're streaming Gemini 2.5 Pro from Node. The unified SDK replaces the older @google/generative-ai package and works against both AI Studio keys and Vertex.
Same Gemini models, billed to a GCP project, with IAM, VPC Service Controls, regional residency, customer-managed encryption, batch prediction, Model Garden, and a managed agent runtime (Agent Engine). What enterprise security teams want to see.
See Vertex setup →Google's agent-first development environment. A standalone desktop app that runs multi-step coding agents across an editor, a browser, and a terminal — surfaces "Artifacts" (plans, screenshots, browser recordings) so you can verify what the agent did instead of trusting it.
See Antigravity setup →The consumer Gemini app (web, Android, iOS) with Deep Research, Gems, Canvas, and Veo. Gemini Code Assist brings Gemini into VS Code, IntelliJ, and the rest of the JetBrains family with chat, completions, and an agentic mode.
See the app & IDE →"Gemini" names three things at once, and untangling them up front saves a lot of confusion later. (1) A family of models — multimodal foundation models from Google DeepMind, currently in their 2.x generation, that come in Pro, Flash, Flash-Lite, and Nano sizes. (2) A consumer product at gemini.google.com — Google's answer to ChatGPT. (3) A developer platform exposed through Google AI Studio, the Gemini API, and Vertex AI. This site is mostly about (3), with enough of (1) and (2) to keep you oriented.
Gemini Pro for hard problems, Gemini Flash for everyday work, Flash-Lite for cost-sensitive workloads, and Gemini Nano for on-device. All natively multimodal — text, images, audio, video, PDFs — with a 1M-token context window on the 2.5 generation and 2M on Pro Deep Think.
AI Studio keys are the fast lane: free tier, generous limits, billed to Google. Vertex AI is the enterprise lane: GCP project, IAM, VPC-SC, regional endpoints, longer SLA. The model weights and capabilities are the same; only the surrounding controls and billing differ.
Google ships a lot of AI surface area. For a web developer, the useful split isn't "consumer vs. enterprise" or "old vs. new" — it's where the inference runs and who calls it. Four categories below. Pick the row that matches the problem; pick the product inside it that matches the size.
Stateless calls from your backend. Pick a model size, send a prompt (with optional images/audio/video/PDF), get a response. Stream tokens, call tools, ground on Google Search, or run structured output with a JSON schema.
Highest-quality long-context reasoning. 1M-token window. Native thinking ("Deep Think" variant for hard problems). Best for code, math, multi-step planning, agent orchestration.
The right default. Faster and cheaper than Pro; near-Pro quality on most tasks. Supports thinking budgets — dial reasoning effort up or down per request.
The cheapest token in the family. Classification, extraction, routing, summaries-of-summaries. Great as the first stage in a multi-model pipeline.
Conversational image generation and editing. Multi-turn refinement, character consistency, style transfer. Lives inside the same SDK as the chat models.
Google's open-weight family. Self-host on your own hardware, fine-tune freely, run on Ollama or vLLM. Multimodal in the 4B+ sizes; same tokenizer family as Gemini.
Not a single request — a job. You hand Google a goal (and optional tools, browser, code interpreter) and a managed runtime drives the loop, persists state, and reports back. Cheaper to operate than the stateless API for any task that takes more than one turn or needs to survive a redeploy.
Deploy an agent built with the Agent Development Kit (ADK), LangChain, LangGraph, or LlamaIndex onto a managed runtime. Sessions, memory, tracing, autoscaling, IAM. The production target for serious agents.
Google's async coding agent. Connect a GitHub repo, hand it an issue, walk away. Plans, edits, runs tests in a cloud VM, opens a PR. Best for backlog grooming and "I wonder if this is doable" tasks.
An agent that drives a browser on your behalf — clicks, types, fills forms, completes multi-tab workflows. Currently behind Labs/early access; the API will surface as browser-use tools inside the Gemini SDK and ADK.
Submit up to thousands of requests as a single job, get results back within 24h at half the per-token price. The right surface for nightly summarization, bulk enrichment, eval runs.
Pitched as a 24/7 autonomous operations runtime — agents that wake on schedule or event, run a task, persist state, and report. Treat as forward-looking until a stable SDK surface lands; build the same pattern today on Agent Engine + Cloud Scheduler.
For latency, privacy, or cost reasons, you want the model to run where the user is. Chrome ships Gemini Nano behind a set of built-in Web APIs. MediaPipe gives you the same idea for arbitrary tasks — image classification, hand tracking, on-device LLM inference — across web, Android, and iOS.
Chrome ships a small Gemini model with the browser. Stable JS APIs for Summarizer, Writer, Rewriter, Translator, Language Detector, and a Prompt API for free-form use. No API key, no network call — runs against the on-device model.
Google's on-device ML SDK. Drop-in JS packages for image classification, object detection, face/hand/pose landmarking, text classification, audio classification, and an LLM Inference task that runs Gemma or Phi locally via WebGPU/WebAssembly.
Pixel 8 Pro+ and a growing list of Android devices ship Gemini Nano in AICore. App developers get summarization, smart reply, and a Prompt API via the AI Edge SDK with no model download in the app binary.
Call Gemini from web and mobile client code without exposing an API key. App Check verifies the request is from your real app; Firebase rewrites it as a Vertex AI call on the server side. The right pattern for shipping Gemini in a public web app.
An "agent" is a loop: model → tool call → result → model. Google ships an opinionated runtime (the Agent Development Kit, ADK), an interoperability protocol (A2A), a hosted runtime (Agent Engine), and a desktop development environment (Antigravity). They compose.
Standalone desktop app for building, running, and verifying agents. Editor + browser + terminal, with first-class "Artifacts" — plans, screenshots, browser recordings — so you can audit what the agent did. Multi-agent workspaces, agent-to-agent handoff.
Open-source framework for defining tools, sessions, memory, planners, and multi-agent orchestration. Model-agnostic (Gemini, Claude, GPT, open weights). Deploys cleanly onto Vertex Agent Engine.
The TypeScript/Go answer to ADK. Flow-based authoring, built-in tracing, dotprompt files, evals, a local dev UI, and one-command deploys to Cloud Functions or Cloud Run. The cleanest path from a Next.js API route to a production agent.
Open protocol for agents from different vendors to discover each other's capabilities and exchange tasks. Pairs with MCP (which standardizes tool-calling) to make multi-vendor agent meshes possible.
An emerging open spec, championed by Google and partners, for agents to discover catalogs, negotiate purchases, and complete payments with signed mandates from the user. The user's source prompt mentions "Universal Commerce Protocol (UCP)" — verify whether that's a separate effort or the same one rebranded.
Almost every cost or latency complaint about Gemini traces back to picking the wrong size. Pro for genuinely hard problems; Flash for everything else; Flash-Lite for high-volume routing and extraction; Nano for on-device. Below: the table you actually need.
| Model | Context | Best for | Modalities | Notes |
|---|---|---|---|---|
gemini-2.5-pro |
1M in · 64K out | Hard reasoning, code, long-document analysis, agent planning | text · image · audio · video · PDF | Native thinking; "Deep Think" mode for the toughest problems. |
gemini-2.5-flash |
1M in · 64K out | Default for production. Chat, summarization, routine tool use. | text · image · audio · video · PDF | Configurable thinking budget — trade latency for quality per request. |
gemini-2.5-flash-lite |
1M in · 64K out | Routing, classification, extraction, cheap first stage. | text · image · audio · video · PDF | Cheapest in the family. Pair with Flash/Pro as a second stage. |
gemini-2.5-flash-image |
— | Conversational image generation & editing ("Nano Banana"). | text + image → image | Multi-turn refinement; preserves identity across edits. |
gemini-2.0-flash-live |
streaming | Realtime voice + video. Project Astra-style interactions. | audio in/out · video in | WebSocket-based Live API. Sub-second time-to-first-byte. |
gemini-nano |
32K typical | On-device. Chrome Built-in AI, Android AICore. | text (image on newer revs) | No API call. Used by the Prompt API, Summarizer, Writer, Rewriter. |
imagen-4 |
— | High-fidelity text-to-image (when Nano Banana isn't enough). | text → image | Stronger at typography and photorealism than the Flash Image model. |
veo-3 |
— | Text- and image-to-video, with native audio. | text/image → video+audio | Up to 8s clips in the API; longer on the Flow product. |
lyria-2 |
— | Text-to-music for soundtracks, jingles, generative audio. | text → audio | Available via the Gemini API and on the MusicFX surface. |
gemma-3 |
up to 128K | Open-weight self-hosted Gemini sibling. Ollama, vLLM, llama.cpp. | text · image (4B+) | Sizes: 1B, 4B, 12B, 27B. Same tokenizer family as Gemini. |
gemini-2.5-flash-001), and check ai.google.dev/gemini-api/docs/models before adopting anything time-sensitive.
The unified Google GenAI SDK (@google/genai for Node, google-genai for Python) is the SDK you want. It supersedes the older @google/generative-ai package and speaks to both the Gemini Developer API (AI Studio keys) and Vertex AI through the same surface.
# 1. install $ npm install @google/genai # 2. get a key from aistudio.google.com → "Get API key" $ export GEMINI_API_KEY="AIza..." # 3. five-line program $ node --input-type=module -e ' import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI({}); const r = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "Write a haiku about TypeScript." }); console.log(r.text); '
The four things you'll reach for most often, in one place:
import { GoogleGenAI, Type } from "@google/genai"; const ai = new GoogleGenAI({}); // 1. STREAMING — token-by-token, drop into a Next.js route handler const stream = await ai.models.generateContentStream({ model: "gemini-2.5-flash", contents: "Explain rate limiting in one paragraph." }); for await (const chunk of stream) process.stdout.write(chunk.text); // 2. TOOL USE — let the model call your functions const r = await ai.models.generateContent({ model: "gemini-2.5-pro", contents: "What's the weather in Austin right now?", config: { tools: [{ functionDeclarations: [{ name: "get_weather", description: "Current weather for a city", parameters: { type: Type.OBJECT, properties: { city: { type: Type.STRING } }, required: ["city"] } }]}] } }); // 3. STRUCTURED OUTPUT — JSON that matches a schema, no parsing hacks const typed = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: "List 3 capital cities with population.", config: { responseMimeType: "application/json", responseSchema: { type: Type.ARRAY, items: { type: Type.OBJECT, properties: { city: {type: Type.STRING}, pop: {type: Type.INTEGER}}}} } }); // 4. MULTIMODAL — feed a PDF, an image, audio, a YouTube URL const file = await ai.files.upload({ file: "./contract.pdf" }); const summary = await ai.models.generateContent({ model: "gemini-2.5-pro", contents: [ { fileData: { fileUri: file.uri, mimeType: file.mimeType }}, "Summarize the indemnity clauses." ] });
| Capability | What it does | Why it matters |
|---|---|---|
ai.models.generateContent | Single entry point for chat, completion, multimodal, structured, tools. | No more juggling four method names. |
ai.live.connect | Bidirectional WebSocket for voice/video (Project Astra-style). | Sub-second time-to-first-byte for realtime agents. |
ai.files.upload | 48-hour file store for PDFs, audio, video. | Cuts request size; lets you reference one upload across turns. |
ai.batches.create | Submit thousands of requests; pay 50% less. | The right path for any async/offline workload. |
ai.caches.create | Explicit context caching; pay ~25% of token rate on cached input. | Long system prompts & document grounding get dramatically cheaper. |
config.thinkingConfig | Per-call thinking budget — 0 disables thinking, >0 sets a max. | Trade latency for quality without switching models. |
config.tools: googleSearch | Built-in grounding on Google Search results. | Citations & freshness without standing up a RAG stack. |
config.tools: urlContext | Fetch & reason over arbitrary URLs at request time. | One-liner web reading without writing a scraper. |
config.tools: codeExecution | Sandboxed Python execution inside the response loop. | Math, data analysis, code-correctness checks in one round-trip. |
@google/generative-ai? The old package is in maintenance mode. The new SDK reorganizes calls under ai.models.*, ai.files.*, ai.caches.*, etc. The model IDs and response shapes are compatible enough that the migration is usually 30 minutes of search-and-replace. Do it before you build anything new.
vertexai: true, project, location to the SDK constructor.Vertex AI is the same Gemini models with the controls your security team will ask about: IAM, audit logging, VPC Service Controls, CMEK, regional residency, and an SLA. If you'd answer "yes" to "does anyone need to approve this AI vendor?", you probably want Vertex.
import { GoogleGenAI } from "@google/genai"; // Same SDK, two-flag switch from AI Studio → Vertex const ai = new GoogleGenAI({ vertexai: true, project: "my-gcp-project", location: "us-central1" // or "global" for the multi-region router }); // auth is ADC: `gcloud auth application-default login` locally, // the metadata server in Cloud Run/Functions, or a service-account JSON.
Agent Engine is Vertex's managed runtime for agent frameworks. You ship an ADK, LangGraph, LangChain, LlamaIndex, or CrewAI agent; Agent Engine handles sessions, memory, tracing, autoscaling, and gives you a stable HTTPS endpoint. The mental model is "Cloud Run for agents, with sessions baked in" — no need to stand up Redis for chat history, a queue for long jobs, or your own OTel pipeline for traces.
Genkit is the framework most web developers want and don't know to ask for. TypeScript-first (Go and Python in beta), flow-based authoring, a local dev UI for inspecting traces, dotprompt files for prompt-as-code, evals, and one-command deploys to Cloud Functions or Cloud Run. It composes with Gemini, Vertex, Claude, OpenAI, Ollama, and any model behind an OpenAI-shaped API.
# scaffold a new Genkit project (Node) $ npm init -y && npm install genkit @genkit-ai/google-genai $ npx genkit init # start the local dev UI — inspect every flow run, every trace $ npx genkit start -- npx tsx --watch src/index.ts
import { genkit, z } from "genkit"; import { googleAI } from "@genkit-ai/google-genai"; const ai = genkit({ plugins: [googleAI()], model: googleAI.model("gemini-2.5-flash") }); export const summarize = ai.defineFlow({ name: "summarize", inputSchema: z.object({ url: z.string().url() }), outputSchema: z.object({ bullets: z.array(z.string()) }) }, async ({ url }) => { const page = await fetch(url).then(r => r.text()); const { output } = await ai.generate({ prompt: `Summarize this page in 5 bullets:\n\n${page}`, output: { schema: z.object({ bullets: z.array(z.string()) }) } }); return output!; });
Antigravity is Google's standalone agent-development desktop app — Mac, Windows, Linux. It's not a VS Code extension; it's a separate workspace where the unit of work is an agent task, not a file open in a tab. The agent gets a planner, an editor, a managed browser, and a terminal; you get "Artifacts" (a plan, a recording of the browser session, a diff, screenshots of every important step) so you can verify what happened instead of trusting the chat log.
Chrome ships Gemini Nano with the browser. There is no API key, no network call, and no per-token cost — inference runs on the user's machine against a model the browser downloads and shares across sites. The JS APIs are designed for a web developer's mental model: await Summarizer.create(), then summarizer.summarize(text).
| API | Use it for | Status |
|---|---|---|
Summarizer | Article summaries, meeting notes, TL;DRs. | Stable |
Writer | Generate net-new text from a prompt & context. | Stable |
Rewriter | Tone shift, length change, simplify, formalize. | Stable |
Translator | On-device translation across major languages. | Stable |
LanguageDetector | Detect the language of arbitrary text. | Stable |
LanguageModel (Prompt API) | General-purpose chat against Nano. Free-form prompts. | Origin trial / EPP |
Proofreader | Grammar & style suggestions over a span of text. | Origin trial |
// 1. check availability — model may need to download on first call const status = await Summarizer.availability(); // "unavailable" | "downloadable" | "downloading" | "available" // 2. create a session with the parameters you want const s = await Summarizer.create({ type: "key-points", format: "markdown", length: "short", monitor(m) { m.addEventListener("downloadprogress", e => { console.log(`${(e.loaded * 100).toFixed(0)}%`); }); } }); // 3. use it const summary = await s.summarize(longArticleText); // 4. stream if you want incremental output const stream = s.summarizeStreaming(longArticleText); for await (const chunk of stream) ui.append(chunk);
availability() === "unavailable" — design a server-side fallback.downloadprogress and show a UI.MediaPipe is Google's cross-platform on-device ML SDK. For a web developer it's a set of @mediapipe/tasks-* npm packages: import, point at a model file, call .detect(). Runs on WebGPU when available and WebAssembly everywhere else.
| Domain | Tasks | Package |
|---|---|---|
| Vision | Object detection, image classification, image segmentation, face landmarker, hand landmarker, pose landmarker, gesture recognizer, image embedder, interactive segmentation | @mediapipe/tasks-vision |
| Text | Text classification, text embedder, language detector | @mediapipe/tasks-text |
| Audio | Audio classification, audio embedder | @mediapipe/tasks-audio |
| GenAI | LLM Inference (Gemma, Phi, Falcon), Image Generation | @mediapipe/tasks-genai |
import { FilesetResolver, LlmInference } from "@mediapipe/tasks-genai"; const genai = await FilesetResolver.forGenAiTasks( "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai/wasm" ); const llm = await LlmInference.createFromOptions(genai, { baseOptions: { modelAssetPath: "/models/gemma-3-1b-it.task" }, maxTokens: 1024, topK: 40, temperature: 0.8 }); const reply = await llm.generateResponse("why is the sky blue?");
The user prompt referenced Gemini Spark as a 24/7 autonomous operations runtime announced at I/O 2026. As of this writing the public, stable surface area for "always-on, schedule-driven Gemini agents" is the combination of Vertex Agent Engine + Cloud Scheduler + Cloud Tasks, with the ADK or Genkit providing the agent loop. If Spark ships a packaged SKU that wraps this, the pattern below is the one it will encapsulate.
How to optimize your web apps for autonomous AI consumers — schemas, API structures, and making e-commerce platforms readable for agents that buy without rendering your CSS.
For 25 years the web optimized for one consumer: a human reading a screen. Page layouts, ad units, paywalls, dark patterns, infinite scroll — all of it was tuned for an eyeball moving across pixels. That consumer is no longer alone. A second consumer is now showing up to your origin: an autonomous agent acting on behalf of a person who never loaded your page. It doesn't render your CSS. It doesn't see your hero image. It doesn't accept your cookie banner. It reads structured data, calls JSON endpoints, follows links your sitemap promised it could, and abandons your domain the moment it can't find what it needs.
The web apps that win the agentic decade will be the ones that noticed the second consumer and shipped for it. The rest will keep getting traffic that bounces in milliseconds.
An agent should not have to scrape your HTML to know your name, your prices, your inventory, your operating hours, your return policy. Schema.org JSON-LD, well-formed feeds, and a stable JSON API say "this is what I am" in 200 lines instead of guessing.
Humans land on your homepage. Agents land on whatever you advertised in /.well-known/, robots.txt, your sitemap, and your llms.txt. If those files don't exist or contradict each other, you've handed the agent a coin flip.
Reading is the easy half. Buying, booking, reserving, submitting — the action the user actually asked for — requires a predictable, authenticated, agent-friendly API. If the only way to check out is to click through three JS-heavy modals, you're invisible to the buyer.
Schema.org JSON-LD is the lowest-cost, highest-leverage thing you can ship. Every page should describe itself. For an e-commerce product page, the contract is roughly:
<script type="application/ld+json"> { "@context": "https://schema.org/", "@type": "Product", "sku": "AX-104-BLK", "name": "Field Notebook, A5, black", "image": ["https://cdn.example.com/ax104/main.jpg"], "description": "96-page lay-flat field notebook...", "brand": { "@type": "Brand", "name": "Acme Stationery" }, "offers": { "@type": "Offer", "url": "https://example.com/p/ax-104-blk", "priceCurrency": "USD", "price": "18.00", "priceValidUntil": "2026-12-31", "availability": "https://schema.org/InStock", "itemCondition": "https://schema.org/NewCondition", "shippingDetails": { "@type": "OfferShippingDetails", "shippingRate": { "@type": "MonetaryAmount", "value": "4.95", "currency": "USD" } }, "hasMerchantReturnPolicy": { "@type": "MerchantReturnPolicy", "returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow", "merchantReturnDays": 30 } }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.7", "reviewCount": "312" } } </script>
Five rules for product schema that agents actually use:
price is a Text. Numbers lose trailing zeros and currency information.priceValidUntil is the freshness signal. An agent that sees a stale date will discount your price as unreliable.availability enums (InStock, OutOfStock, LimitedAvailability, PreOrder). Free-form text is invisible.If your website is an e-commerce front, your API is your agent surface. Any action a human can take with a click should be possible with an authenticated JSON request. Treat your API as a first-class product, not as a leftover.
| UI action | Agent equivalent | Shape |
|---|---|---|
| Browse a category | List products in a category | GET /api/v1/products?category=notebooks&limit=50&cursor=… |
| View a product | Fetch full product detail | GET /api/v1/products/ax-104-blk |
| Check stock | Authoritative inventory snapshot | GET /api/v1/products/ax-104-blk/availability |
| Add to cart | Create an order intent | POST /api/v1/order-intents |
| Get a quote | Price a basket with shipping & tax | POST /api/v1/quotes → totals, breakdown, expires_at |
| Check out | Confirm with a signed payment mandate | POST /api/v1/orders with attached mandate |
| Track | Order status & tracking | GET /api/v1/orders/{id} |
Six properties make an API agent-friendly:
Idempotency-Key as required, not optional./v1/ doesn't change shape; new fields are additive; breaking changes go to /v2/.{ "error": { "code": "out_of_stock", "message": "…", "retryable": false } }. Free-text errors are unparseable.expires_at on prices, shipping rates, and tax. Agents that can't trust freshness will price defensively.The OAuth flows you built for humans don't work for agents. A consent screen with a "Continue" button stops them cold. Three patterns work:
The agent has to find your machine-readable surface before it can use it. Four files do most of the work:
| File | Purpose | Required keys |
|---|---|---|
/robots.txt | Crawl & agent allow/deny | User-agent, Allow, Disallow, optional Crawl-delay |
/sitemap.xml | The canonical URL list | <loc>, <lastmod>; optional <changefreq> |
/llms.txt | A prose contract for LLM consumers — short description, primary URLs, "what we'd like you to know" | plain Markdown; one H1, link sections |
/.well-known/agent.json | Capabilities the site exposes to agents (search endpoint, catalog endpoint, checkout endpoint, auth mode) | JSON; convention emerging via A2A and related specs |
Ship all four. They take an afternoon. The agent that finds them rewards you with traffic that converts; the agent that doesn't, doesn't return.
This is the hard one — and the one with the highest payoff in e-commerce. A traditional human checkout has too many failure modes for an agent: CAPTCHAs, hidden fees revealed late, JavaScript-only address validators, payment redirects, 3-D Secure prompts. The agent-friendly equivalent is a small, predictable API:
expires_at.Equally important: detect the agent and skip the friction.
Cache-Control. Stale prices behind aggressive caching produce angry users when the agent's basket re-prices at checkout./llms.txt. One H1, a paragraph describing the site, a list of canonical URLs, a short "do/don't" for agents.robots.txt. Distinguish search crawlers, AI training crawlers, and shopping/agent crawlers. Decide each, deliberately./openapi.yaml and link it from /.well-known/agent.json.The user prompt asked specifically for a Next.js/Tailwind frontend, a LAMP/Node.js backend, and integration with the unified Gemini SDK, Firebase Genkit, and an agent-commerce protocol. Below is the shape that holds up under both human and agent traffic without forking the codebase.
/api/v1/products/*, in the recommendation flow's tool schema, and in the agent's catalog response, all from the same source of truth.
Three ways to pay for Gemini, and they mix. The AI Studio free tier is generous for prototyping and indie projects. The Gemini API paid tier bills per token against your Google AI Studio billing profile. Vertex AI bills against a Google Cloud project with the rest of your cloud spend. Consumer plans (Google AI Pro, Google AI Ultra) are separate — they cover the Gemini app, NotebookLM, Veo, and higher quotas in AI Studio, not API token costs.
| Surface | Pricing model | Free tier | Best for |
|---|---|---|---|
| AI Studio (free) | Free with daily/minute limits | Yes — generous | Prototyping, learning, hackathons |
| Gemini API (paid) | Per million input/output tokens; cached input ~25%; batch 50% off | — | Production apps using AI Studio keys |
| Vertex AI | Same per-token model; billed to GCP project | $300 GCP credit for new accounts | Anything that needs IAM, residency, SLA |
| Provisioned Throughput | Reserved capacity, fixed monthly | — | Predictable, latency-sensitive workloads |
| Model | Relative cost | What you're paying for |
|---|---|---|
gemini-2.5-pro | Frontier-tier | Hardest reasoning, longest context. |
gemini-2.5-flash | ~5-10× cheaper than Pro | Near-Pro quality, much faster. |
gemini-2.5-flash-lite | ~3-5× cheaper than Flash | Routing, extraction, simple Q&A. |
gemini-2.5-flash-image | Image-token billing | Image generation/editing. |
gemma-3 (self-hosted) | Your hardware | No per-token cost — pay for the GPU/CPU. |
| Plan | Price (USD) | What it unlocks |
|---|---|---|
| Free | $0 | Gemini 2.5 Flash in the app, limited Deep Research, basic image gen. |
| Google AI Pro | ~$20/mo (often bundled with Google One) | Gemini 2.5 Pro in the app, more Deep Research, Veo for video, NotebookLM Plus, higher AI Studio limits. |
| Google AI Ultra | ~$250/mo | Highest quotas, "Deep Think" reasoning, more Veo, early access to new models. |
| Workspace AI | Bundled with Workspace plans | "Help me write"/"Help me organize" across Docs, Sheets, Gmail, Meet. |
thinkingConfig: { thinkingBudget: 0 } on Flash for routine tasks saves a lot of tokens; raise it for hard problems.ai.caches.maxOutputTokens. Most apps over-allocate by 4×. The token you don't generate is free.Products you'll bump into often, with one paragraph on what they are and when to reach for them. Use the strip below as an index.
A research workspace where every answer is grounded in sources you upload (PDFs, Google Docs, Drive, YouTube, websites). Generates "Audio Overviews" (podcast-style summaries with two synthetic hosts), mind maps, study guides, and timelines. Excellent for digesting long-form material and as a personal "ask my notes" interface. Available free; NotebookLM Plus raises quotas via Google AI Pro/Ultra.
Google DeepMind's prototype universal assistant — realtime multimodal, with video + audio in and audio out. The streaming primitives that power Astra ship publicly through the Gemini Live API (ai.live.connect) and the upcoming voice/video features in the Gemini app. Reach for the Live API when you want sub-second latency and a true conversation, not a turn-based chat.
A research preview of an agent that drives a web browser on your behalf — clicks, types, fills forms, completes workflows across tabs. Browser-use capabilities are flowing into the Gemini SDK as a hosted browser tool and into the ADK as built-in tools. Currently behind Labs/early access for end users; developers should track the SDK release notes.
Google's async coding agent at jules.google.com. Connect a GitHub repo, file an issue, hand it off. Jules clones the repo into a cloud VM, plans, edits, runs tests, and opens a PR. Best for backlog grooming, dependency upgrades, "I wonder if this is doable," and any change small enough to verify by reading the diff. The async-PR equivalent of OpenAI's Codex Cloud or Anthropic's Claude Code in headless mode.
Gemini in your IDE. VS Code, IntelliJ, PyCharm, GoLand, the rest of JetBrains, plus Android Studio and Cloud Workstations. Chat panel, ghost-text completions, "explain this", "fix this", repo-wide context. The agentic mode lets it plan a change across files, propose a diff, and run terminal commands behind an approval gate. Free for individuals at modest limits; Standard and Enterprise tiers add private code awareness, audit logging, and IAM.
Google's high-fidelity text-to-image model. Stronger than the Gemini Flash Image model on photorealism and on typography (rendering legible text inside images). Available via Vertex AI and the Gemini API. Use Flash Image for conversational editing, Imagen for one-shot generation that has to look like a photograph.
Text- and image-to-video, with native audio (dialog, sound effects, ambient). Up to 8-second clips from the API; longer compositions via the Flow product. Cinematic camera controls, character consistency across shots, and a reference-image input for style and subject. The video equivalent of the jump Imagen made.
Text-to-music: instrumental pieces, vocal tracks, soundtracks, jingles. Available via the Gemini API and the consumer-facing MusicFX surface. Best for short-form generative audio (under a minute); longer compositions need stitching.
The filmmaking workspace built on Veo and Imagen. Scene-by-scene composition, camera control, frame-to-video extension, asset library. The product Google built so professional creators stop fighting raw prompts and start storyboarding.
Custom Gemini personas — a name, an instruction set, optional file knowledge, optional tool access. The Gemini-app equivalent of OpenAI's GPTs. Free for everyone (with Pro/Ultra raising limits); excellent for canned workflows ("rewrite-in-our-voice", "code reviewer", "weekly digest").
A multi-step research mode in the Gemini app — Gemini drafts a plan, browses dozens to hundreds of sources, drafts a brief with citations, and (optionally) generates an Audio Overview. Available on the free tier with limits, expanded on Pro/Ultra. Useful for "do a literature review on X" tasks; not a replacement for a domain expert.
Gemini in Docs, Sheets, Gmail, Slides, Meet, Drive, Chat. "Help me write", "Help me organize", in-Meet note-taking and translation, in-Drive cross-document search. Included with Workspace Business and Enterprise plans (the standalone "Gemini for Workspace" add-on was rolled into the base SKUs).
Prompt-to-UI: describe a screen and get a clean Figma-shaped layout you can export to Figma or React/HTML. Useful for spinning up wireframes and starting Tailwind components without hand-laying them out.
An in-browser, agentic app development workspace from the Firebase team (the evolution of Project IDX). Spin up a project from a prompt, get an AI-built scaffold (web, mobile, Genkit backend), preview live, deploy to Firebase Hosting / Cloud Run. The "from idea to deployed app in a session" surface.
An open protocol for connecting LLMs to tools and data sources, originally championed by Anthropic and now broadly adopted across the industry — Gemini, the ADK, Genkit, Antigravity, Code Assist, and Claude all speak it. Run an MCP server in front of your database, your filesystem, your ticketing system, your CMS; mount that one server into every agent that needs it. The closest thing the industry has to "USB for AI tools."
Google's AI portfolio moves week to week; this page is current as of the date in the hero and was built against publicly documented surfaces. The items below have either announcement-only status, fast-moving APIs, or both — confirm against Google's own docs before quoting in a roadmap or proposal.
| Item | Status | Where to verify |
|---|---|---|
| Gemini Spark (24/7 autonomous SKU) | Not confirmed as a packaged product as of writing | I/O 2026 recap, cloud.google.com/blog |
| Antigravity 2.0 feature set | Treat specific 2.0 features as forward-looking | antigravity.google.com |
| Universal Commerce Protocol (UCP) | Name not yet confirmed; the public open spec for agent payments is AP2 (Agent Payments Protocol) | A2A docs, github.com/google/A2A |
Exact model IDs (-001, -002, -latest) | Stable shape, churning suffixes | ai.google.dev/gemini-api/docs/models |
| Per-token prices in USD | Shape stable, exact cents change | ai.google.dev/pricing |
| Consumer plan names & prices | Renamed periodically (was "Gemini Advanced", now "Google AI Pro/Ultra") | gemini.google.com |
| Chrome built-in AI API surface | Stable APIs listed; Prompt API still in origin trial | developer.chrome.com/docs/ai/built-in |
| Firebase Genkit version & plugin names | 1.x stable; check the README before pinning | genkit.dev |