01 — Quick Start

Pick where you want to start

Gemini is one model family with many front doors. The fastest is Google AI Studio in your browser — sign in, paste a prompt, get a key. The Gemini API (@google/genai) is what you wire into a Node, Next.js, or Python app. Vertex AI is the same models with enterprise plumbing (VPC-SC, regional residency, IAM). Google Antigravity is Google's standalone agent IDE. The Gemini app at gemini.google.com is the consumer surface. Most developers start in AI Studio and graduate to the API within an hour.

▶

1. Google AI Studio

aistudio.google.com · 0 min setup

Browser playground for every Gemini model. Free tier with generous limits. Tune prompts, attach images/video/audio, compare Flash vs. Pro side-by-side, mint an API key, and export the call as Python, Node, curl, or Swift. The canonical sandbox.

Open aistudio.google.com →

{ }

2. Gemini API

@google/genai · ~5 min setup

One npm i @google/genai and a 5-line script later you're streaming Gemini 2.5 Pro from Node. The unified SDK replaces the older @google/generative-ai package and works against both AI Studio keys and Vertex.

See the SDK setup →

▣

3. Vertex AI

Google Cloud · enterprise

Same Gemini models, billed to a GCP project, with IAM, VPC Service Controls, regional residency, customer-managed encryption, batch prediction, Model Garden, and a managed agent runtime (Agent Engine). What enterprise security teams want to see.

See Vertex setup →

◆

4. Google Antigravity

agent IDE · macOS · Windows · Linux

Google's agent-first development environment. A standalone desktop app that runs multi-step coding agents across an editor, a browser, and a terminal — surfaces "Artifacts" (plans, screenshots, browser recordings) so you can verify what the agent did instead of trusting it.

See Antigravity setup →

◐

5. Gemini app & Code Assist

gemini.google.com · IDE plugin

The consumer Gemini app (web, Android, iOS) with Deep Research, Gems, Canvas, and Veo. Gemini Code Assist brings Gemini into VS Code, IntelliJ, and the rest of the JetBrains family with chat, completions, and an agentic mode.

See the app & IDE →

Not sure which to pick? Open AI Studio first — you'll have a working prompt in under a minute and a usable API key in under three. Drop into the API when you want it inside your app, jump to Vertex when a security review asks for IAM and residency, and install Antigravity when you want an agent to actually do the work rather than describe it.

02 — What it actually is

One family, every surface

"Gemini" names three things at once, and untangling them up front saves a lot of confusion later. (1) A family of models — multimodal foundation models from Google DeepMind, currently in their 2.x generation, that come in Pro, Flash, Flash-Lite, and Nano sizes. (2) A consumer product at gemini.google.com — Google's answer to ChatGPT. (3) A developer platform exposed through Google AI Studio, the Gemini API, and Vertex AI. This site is mostly about (3), with enough of (1) and (2) to keep you oriented.

single API, four sizes

The model family

Gemini Pro for hard problems, Gemini Flash for everyday work, Flash-Lite for cost-sensitive workloads, and Gemini Nano for on-device. All natively multimodal — text, images, audio, video, PDFs — with a 1M-token context window on the 2.5 generation and 2M on Pro Deep Think.

multimodal1M contexttool use

two routes to the same model

AI Studio vs. Vertex AI

AI Studio keys are the fast lane: free tier, generous limits, billed to Google. Vertex AI is the enterprise lane: GCP project, IAM, VPC-SC, regional endpoints, longer SLA. The model weights and capabilities are the same; only the surrounding controls and billing differ.

same modelsdifferent controls

The mental model: One model family. Two billing/control surfaces (AI Studio, Vertex). Many SDKs and integrations on top. Pick the surface that matches your governance needs, then pick the smallest model that gets the job done.

03 — Dashboard

The Google AI stack, sorted by web-dev utility

Google ships a lot of AI surface area. For a web developer, the useful split isn't "consumer vs. enterprise" or "old vs. new" — it's where the inference runs and who calls it. Four categories below. Pick the row that matches the problem; pick the product inside it that matches the size.

01

API & Model Orchestration

request → response, in your server

Stateless calls from your backend. Pick a model size, send a prompt (with optional images/audio/video/PDF), get a response. Stream tokens, call tools, ground on Google Search, or run structured output with a JSON schema.

Frontier reasoning

Gemini 2.5 Pro

gemini-2.5-pro

Highest-quality long-context reasoning. 1M-token window. Native thinking ("Deep Think" variant for hard problems). Best for code, math, multi-step planning, agent orchestration.

1M contextmultimodaltools

Everyday workhorse

Gemini 2.5 Flash

gemini-2.5-flash

The right default. Faster and cheaper than Pro; near-Pro quality on most tasks. Supports thinking budgets — dial reasoning effort up or down per request.

1M contextfasttools

Cost-sensitive

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

The cheapest token in the family. Classification, extraction, routing, summaries-of-summaries. Great as the first stage in a multi-model pipeline.

cheapfast

Native image gen

Gemini 2.5 Flash Image

gemini-2.5-flash-image · "Nano Banana"

Conversational image generation and editing. Multi-turn refinement, character consistency, style transfer. Lives inside the same SDK as the chat models.

image genediting

Open weights

Gemma 3

gemma-3-{1b,4b,12b,27b}

Google's open-weight family. Self-host on your own hardware, fine-tune freely, run on Ollama or vLLM. Multimodal in the 4B+ sizes; same tokenizer family as Gemini.

open weightsself-host

02

Persistent Cloud Workers

long-running agents, hosted by Google

Not a single request — a job. You hand Google a goal (and optional tools, browser, code interpreter) and a managed runtime drives the loop, persists state, and reports back. Cheaper to operate than the stateless API for any task that takes more than one turn or needs to survive a redeploy.

Managed agent runtime

Vertex AI Agent Engine

aiplatform · reasoning engine

Deploy an agent built with the Agent Development Kit (ADK), LangChain, LangGraph, or LlamaIndex onto a managed runtime. Sessions, memory, tracing, autoscaling, IAM. The production target for serious agents.

ADKsessionsmemory

Coding agent

Jules

jules.google.com

Google's async coding agent. Connect a GitHub repo, hand it an issue, walk away. Plans, edits, runs tests in a cloud VM, opens a PR. Best for backlog grooming and "I wonder if this is doable" tasks.

GitHubasyncPR-shaped

Browser agent (research)

Project Mariner

labs.google · research preview

An agent that drives a browser on your behalf — clicks, types, fills forms, completes multi-tab workflows. Currently behind Labs/early access; the API will surface as browser-use tools inside the Gemini SDK and ADK.

browserpreview

Batch inference

Gemini Batch API

batches.create · 50% off

Submit up to thousands of requests as a single job, get results back within 24h at half the per-token price. The right surface for nightly summarization, bulk enrichment, eval runs.

async50% discount

Always-on autonomous (announced)

Gemini Spark ▲ verify

spark · 24/7 ops

Pitched as a 24/7 autonomous operations runtime — agents that wake on schedule or event, run a task, persist state, and report. Treat as forward-looking until a stable SDK surface lands; build the same pattern today on Agent Engine + Cloud Scheduler.

scheduledautonomous

03

Edge & Browser Deployment

inference on the user's device, no API call

For latency, privacy, or cost reasons, you want the model to run where the user is. Chrome ships Gemini Nano behind a set of built-in Web APIs. MediaPipe gives you the same idea for arbitrary tasks — image classification, hand tracking, on-device LLM inference — across web, Android, and iOS.

In-browser LLM

Gemini Nano in Chrome

window.ai · Prompt API

Chrome ships a small Gemini model with the browser. Stable JS APIs for Summarizer, Writer, Rewriter, Translator, Language Detector, and a Prompt API for free-form use. No API key, no network call — runs against the on-device model.

web APIon-devicefree

Cross-platform ML

MediaPipe

@mediapipe/tasks-* packages

Google's on-device ML SDK. Drop-in JS packages for image classification, object detection, face/hand/pose landmarking, text classification, audio classification, and an LLM Inference task that runs Gemma or Phi locally via WebGPU/WebAssembly.

WebGPUWASMAndroid/iOS

Mobile on-device

Gemini Nano on Android

AI Edge SDK · AICore

Pixel 8 Pro+ and a growing list of Android devices ship Gemini Nano in AICore. App developers get summarization, smart reply, and a Prompt API via the AI Edge SDK with no model download in the app binary.

AICoreno model bundle

CDN-style edge

Firebase AI Logic

firebase/ai · client-safe

Call Gemini from web and mobile client code without exposing an API key. App Check verifies the request is from your real app; Firebase rewrites it as a Vertex AI call on the server side. The right pattern for shipping Gemini in a public web app.

client-safeApp Check

04

Agent Environments

where you build & run agents that act

An "agent" is a loop: model → tool call → result → model. Google ships an opinionated runtime (the Agent Development Kit, ADK), an interoperability protocol (A2A), a hosted runtime (Agent Engine), and a desktop development environment (Antigravity). They compose.

Agent IDE

Google Antigravity

antigravity.google.com

Standalone desktop app for building, running, and verifying agents. Editor + browser + terminal, with first-class "Artifacts" — plans, screenshots, browser recordings — so you can audit what the agent did. Multi-agent workspaces, agent-to-agent handoff.

desktopverifiablemulti-agent

Agent framework

Agent Development Kit (ADK)

google.adk · Python & Java

Open-source framework for defining tools, sessions, memory, planners, and multi-agent orchestration. Model-agnostic (Gemini, Claude, GPT, open weights). Deploys cleanly onto Vertex Agent Engine.

open sourcemulti-model

Web-dev native

Firebase Genkit

genkit · Node & Go

The TypeScript/Go answer to ADK. Flow-based authoring, built-in tracing, dotprompt files, evals, a local dev UI, and one-command deploys to Cloud Functions or Cloud Run. The cleanest path from a Next.js API route to a production agent.

TypeScripttracingdev UI

Interop protocol

A2A (Agent-to-Agent)

a2a-protocol.org

Open protocol for agents from different vendors to discover each other's capabilities and exchange tasks. Pairs with MCP (which standardizes tool-calling) to make multi-vendor agent meshes possible.

protocolmulti-vendor

Commerce for agents (emerging)

Agent Payments Protocol ▲ verify

AP2 · open spec

An emerging open spec, championed by Google and partners, for agents to discover catalogs, negotiate purchases, and complete payments with signed mandates from the user. The user's source prompt mentions "Universal Commerce Protocol (UCP)" — verify whether that's a separate effort or the same one rebranded.

paymentsopen spec

04 — The model family

Pick the right size

Almost every cost or latency complaint about Gemini traces back to picking the wrong size. Pro for genuinely hard problems; Flash for everything else; Flash-Lite for high-volume routing and extraction; Nano for on-device. Below: the table you actually need.

Model	Context	Best for	Modalities	Notes
`gemini-2.5-pro`	1M in · 64K out	Hard reasoning, code, long-document analysis, agent planning	text · image · audio · video · PDF	Native thinking; "Deep Think" mode for the toughest problems.
`gemini-2.5-flash`	1M in · 64K out	Default for production. Chat, summarization, routine tool use.	text · image · audio · video · PDF	Configurable thinking budget — trade latency for quality per request.
`gemini-2.5-flash-lite`	1M in · 64K out	Routing, classification, extraction, cheap first stage.	text · image · audio · video · PDF	Cheapest in the family. Pair with Flash/Pro as a second stage.
`gemini-2.5-flash-image`	—	Conversational image generation & editing ("Nano Banana").	text + image → image	Multi-turn refinement; preserves identity across edits.
`gemini-2.0-flash-live`	streaming	Realtime voice + video. Project Astra-style interactions.	audio in/out · video in	WebSocket-based Live API. Sub-second time-to-first-byte.
`gemini-nano`	32K typical	On-device. Chrome Built-in AI, Android AICore.	text (image on newer revs)	No API call. Used by the Prompt API, Summarizer, Writer, Rewriter.
`imagen-4`	—	High-fidelity text-to-image (when Nano Banana isn't enough).	text → image	Stronger at typography and photorealism than the Flash Image model.
`veo-3`	—	Text- and image-to-video, with native audio.	text/image → video+audio	Up to 8s clips in the API; longer on the Flow product.
`lyria-2`	—	Text-to-music for soundtracks, jingles, generative audio.	text → audio	Available via the Gemini API and on the MusicFX surface.
`gemma-3`	up to 128K	Open-weight self-hosted Gemini sibling. Ollama, vLLM, llama.cpp.	text · image (4B+)	Sizes: 1B, 4B, 12B, 27B. Same tokenizer family as Gemini.

Model IDs drift. Google revises model IDs faster than any documentation site can keep up — minor releases, "-latest" aliases, and date-suffixed variants come and go. Pin a specific version in production (e.g. gemini-2.5-flash-001), and check ai.google.dev/gemini-api/docs/models before adopting anything time-sensitive.

05 — The Gemini API

From zero to streaming in five lines

The unified Google GenAI SDK (@google/genai for Node, google-genai for Python) is the SDK you want. It supersedes the older @google/generative-ai package and speaks to both the Gemini Developer API (AI Studio keys) and Vertex AI through the same surface.

Install & first call (Node)

terminal

# 1. install
$ npm install @google/genai

# 2. get a key from aistudio.google.com → "Get API key"
$ export GEMINI_API_KEY="AIza..."

# 3. five-line program
$ node --input-type=module -e '
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
const r = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Write a haiku about TypeScript."
});
console.log(r.text);
'

Streaming, tools, structured output, multimodal

The four things you'll reach for most often, in one place:

examples.ts

import { GoogleGenAI, Type } from "@google/genai";
const ai = new GoogleGenAI({});

// 1. STREAMING — token-by-token, drop into a Next.js route handler
const stream = await ai.models.generateContentStream({
  model: "gemini-2.5-flash",
  contents: "Explain rate limiting in one paragraph."
});
for await (const chunk of stream) process.stdout.write(chunk.text);

// 2. TOOL USE — let the model call your functions
const r = await ai.models.generateContent({
  model: "gemini-2.5-pro",
  contents: "What's the weather in Austin right now?",
  config: {
    tools: [{ functionDeclarations: [{
      name: "get_weather",
      description: "Current weather for a city",
      parameters: { type: Type.OBJECT, properties: {
        city: { type: Type.STRING } }, required: ["city"] }
    }]}]
  }
});

// 3. STRUCTURED OUTPUT — JSON that matches a schema, no parsing hacks
const typed = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "List 3 capital cities with population.",
  config: {
    responseMimeType: "application/json",
    responseSchema: { type: Type.ARRAY, items: { type: Type.OBJECT,
      properties: { city: {type: Type.STRING}, pop: {type: Type.INTEGER}}}}
  }
});

// 4. MULTIMODAL — feed a PDF, an image, audio, a YouTube URL
const file = await ai.files.upload({ file: "./contract.pdf" });
const summary = await ai.models.generateContent({
  model: "gemini-2.5-pro",
  contents: [
    { fileData: { fileUri: file.uri, mimeType: file.mimeType }},
    "Summarize the indemnity clauses."
  ]
});

What's actually new in the unified SDK

Capability	What it does	Why it matters
`ai.models.generateContent`	Single entry point for chat, completion, multimodal, structured, tools.	No more juggling four method names.
`ai.live.connect`	Bidirectional WebSocket for voice/video (Project Astra-style).	Sub-second time-to-first-byte for realtime agents.
`ai.files.upload`	48-hour file store for PDFs, audio, video.	Cuts request size; lets you reference one upload across turns.
`ai.batches.create`	Submit thousands of requests; pay 50% less.	The right path for any async/offline workload.
`ai.caches.create`	Explicit context caching; pay ~25% of token rate on cached input.	Long system prompts & document grounding get dramatically cheaper.
`config.thinkingConfig`	Per-call thinking budget — 0 disables thinking, >0 sets a max.	Trade latency for quality without switching models.
`config.tools: googleSearch`	Built-in grounding on Google Search results.	Citations & freshness without standing up a RAG stack.
`config.tools: urlContext`	Fetch & reason over arbitrary URLs at request time.	One-liner web reading without writing a scraper.
`config.tools: codeExecution`	Sandboxed Python execution inside the response loop.	Math, data analysis, code-correctness checks in one round-trip.

Migrating from @google/generative-ai? The old package is in maintenance mode. The new SDK reorganizes calls under ai.models.*, ai.files.*, ai.caches.*, etc. The model IDs and response shapes are compatible enough that the migration is usually 30 minutes of search-and-replace. Do it before you build anything new.

Where keys come from

AI Studio key

Free tier with daily limits. Best for prototyping and indie projects. Gotcha: never expose in client code — Gemini API does not accept App Check, so AI Studio keys belong only on the server.

Vertex auth

Application Default Credentials, a service account, or Workload Identity. Pass vertexai: true, project, location to the SDK constructor.

Firebase AI Logic

The supported way to call Gemini from web/mobile client code. Uses App Check to verify the request and routes through your Firebase project to Vertex.

06 — Vertex AI

The enterprise lane

Vertex AI is the same Gemini models with the controls your security team will ask about: IAM, audit logging, VPC Service Controls, CMEK, regional residency, and an SLA. If you'd answer "yes" to "does anyone need to approve this AI vendor?", you probably want Vertex.

What Vertex gives you over AI Studio

IAM — per-principal allow/deny on every model call.
VPC Service Controls — keep prompts & outputs inside your perimeter.
Regional endpoints — pick where inference runs (US, EU, asia, several).
CMEK — customer-managed encryption keys for at-rest data.
Provisioned Throughput — buy reserved capacity for predictable latency.
Model Garden — managed access to Claude, Llama, Mistral, and others.
Agent Engine — hosted runtime for ADK/LangGraph/LlamaIndex agents.
Tuning — supervised fine-tuning on Gemini Flash and Gemma.

What AI Studio gives you that Vertex doesn't

A free tier you can sign up for in 30 seconds.
The simplest possible auth — a single bearer key.
A polished prompt-authoring UI.
Day-one access to brand-new model previews (Vertex usually lags by days to weeks).
No GCP project required — useful for hackathons, demos, and side projects.

Switching the SDK to Vertex

vertex.ts

import { GoogleGenAI } from "@google/genai";

// Same SDK, two-flag switch from AI Studio → Vertex
const ai = new GoogleGenAI({
  vertexai: true,
  project: "my-gcp-project",
  location: "us-central1"     // or "global" for the multi-region router
});

// auth is ADC: `gcloud auth application-default login` locally,
// the metadata server in Cloud Run/Functions, or a service-account JSON.

Agent Engine in one paragraph

Agent Engine is Vertex's managed runtime for agent frameworks. You ship an ADK, LangGraph, LangChain, LlamaIndex, or CrewAI agent; Agent Engine handles sessions, memory, tracing, autoscaling, and gives you a stable HTTPS endpoint. The mental model is "Cloud Run for agents, with sessions baked in" — no need to stand up Redis for chat history, a queue for long jobs, or your own OTel pipeline for traces.

07 — Firebase Genkit

The web-developer-native agent framework

Genkit is the framework most web developers want and don't know to ask for. TypeScript-first (Go and Python in beta), flow-based authoring, a local dev UI for inspecting traces, dotprompt files for prompt-as-code, evals, and one-command deploys to Cloud Functions or Cloud Run. It composes with Gemini, Vertex, Claude, OpenAI, Ollama, and any model behind an OpenAI-shaped API.

terminal

# scaffold a new Genkit project (Node)
$ npm init -y && npm install genkit @genkit-ai/google-genai
$ npx genkit init

# start the local dev UI — inspect every flow run, every trace
$ npx genkit start -- npx tsx --watch src/index.ts

A minimal flow

src/index.ts

import { genkit, z } from "genkit";
import { googleAI } from "@genkit-ai/google-genai";

const ai = genkit({
  plugins: [googleAI()],
  model: googleAI.model("gemini-2.5-flash")
});

export const summarize = ai.defineFlow({
  name: "summarize",
  inputSchema: z.object({ url: z.string().url() }),
  outputSchema: z.object({ bullets: z.array(z.string()) })
}, async ({ url }) => {
  const page = await fetch(url).then(r => r.text());
  const { output } = await ai.generate({
    prompt: `Summarize this page in 5 bullets:\n\n${page}`,
    output: { schema: z.object({ bullets: z.array(z.string()) }) }
  });
  return output!;
});

Why this is the right choice for a Next.js app

Same language as the rest of your app. No Python sidecar.
Flows are just functions. Call them from a route handler, a Server Action, or a Cloud Function.
The dev UI is honest. Every flow run is a clickable trace tree with inputs, outputs, model latency, and token counts.
Dotprompt files separate prompt text from code — version, review, and diff prompts like any other source file.
Evals are built in. Hook a dataset and a scoring function; CI fails when quality regresses.
Deploy is one command. Cloud Functions for HTTP/scheduled, Cloud Run for long-running, or Firebase Hosting for the simplest cases.

08 — Google Antigravity

An IDE built for agents, not autocomplete

Antigravity is Google's standalone agent-development desktop app — Mac, Windows, Linux. It's not a VS Code extension; it's a separate workspace where the unit of work is an agent task, not a file open in a tab. The agent gets a planner, an editor, a managed browser, and a terminal; you get "Artifacts" (a plan, a recording of the browser session, a diff, screenshots of every important step) so you can verify what happened instead of trusting the chat log.

What makes it different from "an agent in VS Code"

Multi-agent workspaces. Run several agents in parallel; the manager view shows what each is doing and lets you intervene without context-switching.
Browser-as-tool, first-class. The agent's browser session is a panel you can watch live and rewind. Critical for QA, scraping, and any web-automation task.
Artifacts. Every agent step that touches the world (a command run, a page visited, a file written) produces a structured artifact you can audit and link to.
Model-agnostic. Ships with Gemini 2.5 Pro/Flash as defaults; Claude Sonnet/Opus and GPT models are pluggable.
Local + cloud. Agents can run on your machine for sensitive code or in a Google-hosted sandbox for fan-out jobs.

Antigravity 2.0 status: The user's prompt references "Antigravity 2.0" announced at I/O 2026. I/O 2026 fell in mid-May; treat specific 2.0 features as worth confirming against antigravity.google.com and Google's I/O recap before quoting them in production planning. The capabilities above are stable across the 1.x line.

When to reach for Antigravity vs. Gemini Code Assist

Antigravity wins when…

The task spans editor + browser + terminal.
You want to run several agents at once and pick the best result.
You need to audit what an agent did, not just see the diff.
You're building agents and want a workbench tuned for that.

Code Assist wins when…

You live in VS Code or a JetBrains IDE and don't want to leave.
You want completions and chat in the same window as your code.
You're touching a repo with strict tooling (debuggers, native extensions) that's hard to move.
Your org already has Code Assist licensed and configured.

09 — Gemini Nano & Chrome built-in AI

An LLM inside the browser, free

Chrome ships Gemini Nano with the browser. There is no API key, no network call, and no per-token cost — inference runs on the user's machine against a model the browser downloads and shares across sites. The JS APIs are designed for a web developer's mental model: await Summarizer.create(), then summarizer.summarize(text).

The current API surface

API	Use it for	Status
`Summarizer`	Article summaries, meeting notes, TL;DRs.	Stable
`Writer`	Generate net-new text from a prompt & context.	Stable
`Rewriter`	Tone shift, length change, simplify, formalize.	Stable
`Translator`	On-device translation across major languages.	Stable
`LanguageDetector`	Detect the language of arbitrary text.	Stable
`LanguageModel` (Prompt API)	General-purpose chat against Nano. Free-form prompts.	Origin trial / EPP
`Proofreader`	Grammar & style suggestions over a span of text.	Origin trial

A real example

summarize.js

// 1. check availability — model may need to download on first call
const status = await Summarizer.availability();
// "unavailable" | "downloadable" | "downloading" | "available"

// 2. create a session with the parameters you want
const s = await Summarizer.create({
  type: "key-points",
  format: "markdown",
  length: "short",
  monitor(m) { m.addEventListener("downloadprogress", e => {
    console.log(`${(e.loaded * 100).toFixed(0)}%`);
  }); }
});

// 3. use it
const summary = await s.summarize(longArticleText);

// 4. stream if you want incremental output
const stream = s.summarizeStreaming(longArticleText);
for await (const chunk of stream) ui.append(chunk);

The killer use case: client-side features that used to require a paid API — summarizing user-pasted content, translating UI strings, rewriting form drafts, grading the tone of an outgoing message — now run for free on the user's machine, with the user's privacy preserved by default.

Things to know before you ship

Hardware floor. Roughly 4GB+ VRAM or a modern integrated GPU, and ~22GB free disk for the model. Older devices will get availability() === "unavailable" — design a server-side fallback.
First-run download. The model is shared across sites but the user still needs to fetch it once. Always monitor downloadprogress and show a UI.
Context windows are small. Nano is in the 32K-token range. Chunk long inputs.
Cross-browser story is partial. Edge ships Phi-based equivalents under a different API name; Safari and Firefox have nothing yet. Feature-detect, don't UA-sniff.

10 — MediaPipe

On-device ML for everything that isn't a chat model

MediaPipe is Google's cross-platform on-device ML SDK. For a web developer it's a set of @mediapipe/tasks-* npm packages: import, point at a model file, call .detect(). Runs on WebGPU when available and WebAssembly everywhere else.

The task catalog

Domain	Tasks	Package
Vision	Object detection, image classification, image segmentation, face landmarker, hand landmarker, pose landmarker, gesture recognizer, image embedder, interactive segmentation	`@mediapipe/tasks-vision`
Text	Text classification, text embedder, language detector	`@mediapipe/tasks-text`
Audio	Audio classification, audio embedder	`@mediapipe/tasks-audio`
GenAI	LLM Inference (Gemma, Phi, Falcon), Image Generation	`@mediapipe/tasks-genai`

LLM Inference in the browser (Gemma, locally)

llm.ts

import { FilesetResolver, LlmInference }
  from "@mediapipe/tasks-genai";

const genai = await FilesetResolver.forGenAiTasks(
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai/wasm"
);

const llm = await LlmInference.createFromOptions(genai, {
  baseOptions: { modelAssetPath: "/models/gemma-3-1b-it.task" },
  maxTokens: 1024,
  topK: 40, temperature: 0.8
});

const reply = await llm.generateResponse("why is the sky blue?");

Pick MediaPipe over the Chrome Prompt API when… you need a specific model (not whatever Chrome ships), broader browser support, or capabilities the Chrome APIs don't expose (e.g. embeddings, image classification, hand tracking). Pick the Chrome Prompt API when you want zero model download for the user and you're targeting recent Chrome/Edge.

11 — Persistent autonomy

Gemini Spark & what you can build today

The user prompt referenced Gemini Spark as a 24/7 autonomous operations runtime announced at I/O 2026. As of this writing the public, stable surface area for "always-on, schedule-driven Gemini agents" is the combination of Vertex Agent Engine + Cloud Scheduler + Cloud Tasks, with the ADK or Genkit providing the agent loop. If Spark ships a packaged SKU that wraps this, the pattern below is the one it will encapsulate.

Verify before quoting in a proposal. "Gemini Spark" as a named product wasn't confirmed in Google's documentation when this page was written. Treat any specific Spark feature claims (pricing, regions, exact SDK names) as to-be-confirmed; the architectural pattern below is real and works today regardless of what Google brands it.

The pattern: a persistent worker, today

Define the agent as a Genkit flow or ADK agent. Tools, model, system prompt, optional memory backend (Firestore, Vertex AI Memory Bank).
Deploy to Cloud Run or Vertex Agent Engine. Stable HTTPS endpoint, autoscales to zero between invocations.
Schedule with Cloud Scheduler for periodic runs, Eventarc for event-triggered runs, or Pub/Sub for fan-out.
Persist state in Firestore (or the Agent Engine sessions API). Pass a session ID into every invocation; the agent picks up where it left off.
Observe with Cloud Trace + Logs Explorer. ADK and Genkit both emit OpenTelemetry spans by default — agent runs show up as traces you can drill into.
Cap with budgets & quotas. Set a per-project Vertex token quota and a billing budget alert so a stuck loop can't run away.

12 — Deep dive

Building for the Agentic Web

How to optimize your web apps for autonomous AI consumers — schemas, API structures, and making e-commerce platforms readable for agents that buy without rendering your CSS.

The shift

For 25 years the web optimized for one consumer: a human reading a screen. Page layouts, ad units, paywalls, dark patterns, infinite scroll — all of it was tuned for an eyeball moving across pixels. That consumer is no longer alone. A second consumer is now showing up to your origin: an autonomous agent acting on behalf of a person who never loaded your page. It doesn't render your CSS. It doesn't see your hero image. It doesn't accept your cookie banner. It reads structured data, calls JSON endpoints, follows links your sitemap promised it could, and abandons your domain the moment it can't find what it needs.

The web apps that win the agentic decade will be the ones that noticed the second consumer and shipped for it. The rest will keep getting traffic that bounces in milliseconds.

Three things agents need that humans don't

need 01

A machine-readable contract

An agent should not have to scrape your HTML to know your name, your prices, your inventory, your operating hours, your return policy. Schema.org JSON-LD, well-formed feeds, and a stable JSON API say "this is what I am" in 200 lines instead of guessing.

need 02

An entry point it can find

Humans land on your homepage. Agents land on whatever you advertised in /.well-known/, robots.txt, your sitemap, and your llms.txt. If those files don't exist or contradict each other, you've handed the agent a coin flip.

need 03

An action it can take

Reading is the easy half. Buying, booking, reserving, submitting — the action the user actually asked for — requires a predictable, authenticated, agent-friendly API. If the only way to check out is to click through three JS-heavy modals, you're invisible to the buyer.

Layer 1: Structured data the agent will trust

Schema.org JSON-LD is the lowest-cost, highest-leverage thing you can ship. Every page should describe itself. For an e-commerce product page, the contract is roughly:

product.html — <head>

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Product",
  "sku": "AX-104-BLK",
  "name": "Field Notebook, A5, black",
  "image": ["https://cdn.example.com/ax104/main.jpg"],
  "description": "96-page lay-flat field notebook...",
  "brand": { "@type": "Brand", "name": "Acme Stationery" },
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/p/ax-104-blk",
    "priceCurrency": "USD",
    "price": "18.00",
    "priceValidUntil": "2026-12-31",
    "availability": "https://schema.org/InStock",
    "itemCondition": "https://schema.org/NewCondition",
    "shippingDetails": { "@type": "OfferShippingDetails",
      "shippingRate": { "@type": "MonetaryAmount",
        "value": "4.95", "currency": "USD" } },
    "hasMerchantReturnPolicy": { "@type": "MerchantReturnPolicy",
      "returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow",
      "merchantReturnDays": 30 }
  },
  "aggregateRating": { "@type": "AggregateRating",
    "ratingValue": "4.7", "reviewCount": "312" }
}
</script>

Five rules for product schema that agents actually use:

Price is a string, not a number. Schema.org's price is a Text. Numbers lose trailing zeros and currency information.
priceValidUntil is the freshness signal. An agent that sees a stale date will discount your price as unreliable.
Use the canonical availability enums (InStock, OutOfStock, LimitedAvailability, PreOrder). Free-form text is invisible.
Always include shipping and returns. Agents that compare two products with identical specs decide on these.
Match on-page price exactly. Mismatched price between JSON-LD and rendered DOM gets your offer disqualified.

Layer 2: A machine-shaped API that mirrors your UI

If your website is an e-commerce front, your API is your agent surface. Any action a human can take with a click should be possible with an authenticated JSON request. Treat your API as a first-class product, not as a leftover.

UI action	Agent equivalent	Shape
Browse a category	List products in a category	`GET /api/v1/products?category=notebooks&limit=50&cursor=…`
View a product	Fetch full product detail	`GET /api/v1/products/ax-104-blk`
Check stock	Authoritative inventory snapshot	`GET /api/v1/products/ax-104-blk/availability`
Add to cart	Create an order intent	`POST /api/v1/order-intents`
Get a quote	Price a basket with shipping & tax	`POST /api/v1/quotes` → totals, breakdown, expires_at
Check out	Confirm with a signed payment mandate	`POST /api/v1/orders` with attached mandate
Track	Order status & tracking	`GET /api/v1/orders/{id}`

Six properties make an API agent-friendly:

Idempotency keys on every write. An agent will retry. Treat Idempotency-Key as required, not optional.
Stable, semantically versioned URLs. /v1/ doesn't change shape; new fields are additive; breaking changes go to /v2/.
Cursor-based pagination. Offset pagination breaks under concurrent writes and burns agent budget on retries.
Structured errors with stable codes — { "error": { "code": "out_of_stock", "message": "…", "retryable": false } }. Free-text errors are unparseable.
Quotes that expire. Always return expires_at on prices, shipping rates, and tax. Agents that can't trust freshness will price defensively.
Webhooks for state changes. Long-running flows (shipping, returns, refunds) should push updates, not require polling.

Layer 3: Authentication an agent can complete

The OAuth flows you built for humans don't work for agents. A consent screen with a "Continue" button stops them cold. Three patterns work:

API keys scoped to the user. Simple, works everywhere, awful for revocation at scale. Fine for B2B portals; not for consumer.
OAuth 2.0 with the Device Authorization Grant (RFC 8628). The agent gets a code; the human approves once on their phone; the agent gets a long-lived refresh token. This is the right primitive for most consumer agent flows.
Signed user mandates. The user signs a payload that delegates a specific capability ("buy up to $200 of office supplies from merchants in this allowlist before Dec 31") to a specific agent. The agent presents the mandate to your API; you verify the signature against the user's public key. This is the model that emerging agent-commerce protocols (AP2 and similar efforts the user's prompt calls "UCP") standardize.

Layer 4: A discovery contract

The agent has to find your machine-readable surface before it can use it. Four files do most of the work:

File	Purpose	Required keys
`/robots.txt`	Crawl & agent allow/deny	`User-agent`, `Allow`, `Disallow`, optional `Crawl-delay`
`/sitemap.xml`	The canonical URL list	`<loc>`, `<lastmod>`; optional `<changefreq>`
`/llms.txt`	A prose contract for LLM consumers — short description, primary URLs, "what we'd like you to know"	plain Markdown; one H1, link sections
`/.well-known/agent.json`	Capabilities the site exposes to agents (search endpoint, catalog endpoint, checkout endpoint, auth mode)	JSON; convention emerging via A2A and related specs

Ship all four. They take an afternoon. The agent that finds them rewards you with traffic that converts; the agent that doesn't, doesn't return.

Layer 5: A checkout an agent can complete unattended

This is the hard one — and the one with the highest payoff in e-commerce. A traditional human checkout has too many failure modes for an agent: CAPTCHAs, hidden fees revealed late, JavaScript-only address validators, payment redirects, 3-D Secure prompts. The agent-friendly equivalent is a small, predictable API:

Quote. The agent posts a basket and a destination; your server returns line items, shipping options, tax, totals, and an expires_at.
Mandate verification. The agent presents a signed user mandate. You verify the signature, the spend cap, the merchant allowlist, the freshness, and the basket fits within the mandate.
Authorize. Charge against the payment method described in the mandate (typically a payment-network token issued specifically for agent purchases), with a clear "settled by" timestamp.
Confirm. Return an order ID, a tracking URL pattern, and a webhook subscription endpoint.
Audit. Persist the mandate, the quote, the basket, and the authorization decision together — disputes are won and lost on this trail.

Layer 6: Don't show the agent things meant for humans

Equally important: detect the agent and skip the friction.

No cookie banners on the agent path — they're meaningless and they break server-rendered scrapes.
No interstitials ("subscribe to our newsletter") on API or product-detail endpoints.
No JavaScript-required content. If the agent needs to execute your bundle to see your price, you've lost.
No CAPTCHAs on read paths. Reserve them for actual abuse signals; agent traffic is not abuse by default.
Honest Cache-Control. Stale prices behind aggressive caching produce angry users when the agent's basket re-prices at checkout.

A 60-minute checklist

Audit one high-value page. Pick your best-selling product. View source. Count fields that an agent could extract reliably without rendering JavaScript.
Ship Product JSON-LD. Use the schema above. Validate at search.google.com/test/rich-results.
Add /llms.txt. One H1, a paragraph describing the site, a list of canonical URLs, a short "do/don't" for agents.
Audit robots.txt. Distinguish search crawlers, AI training crawlers, and shopping/agent crawlers. Decide each, deliberately.
Pick one write endpoint and add idempotency. "Add to cart" or "request a quote" are good starting points.
Document your API in one OpenAPI file. Even a partial one. Host it at /openapi.yaml and link it from /.well-known/agent.json.
Test with an actual agent. Use Gemini with the URL Context tool, Claude with Computer Use, or your own ADK agent. Watch what it fails on and fix the top three.

The one-sentence summary: agents are a second class of customer with a different sensory apparatus; build the contract, the API, and the checkout they need, and your conversion rate on agent traffic will be higher than your conversion rate on humans within a year.

13 — Reference architecture

A scalable stack for an agent-native developer portal

The user prompt asked specifically for a Next.js/Tailwind frontend, a LAMP/Node.js backend, and integration with the unified Gemini SDK, Firebase Genkit, and an agent-commerce protocol. Below is the shape that holds up under both human and agent traffic without forking the codebase.

Edge / Presentation

Next.js 15 (App Router)

Tailwind v4

React Server Components

Streaming responses

Firebase Hosting / Cloud Run

App Check (web/mobile)

Agent & AI Layer

Firebase Genkit (flows)

@google/genai (Gemini)

Vertex Agent Engine (hosted)

ADK (Python sidecar, optional)

MCP servers (tools)

A2A (multi-agent interop)

Backend / Data

Node API routes (Cloud Functions / Cloud Run)

LAMP origin (Apache + PHP + MySQL) for legacy CMS

Firestore (sessions, agent state)

Cloud Storage (artifacts, uploads)

Cloud Scheduler + Pub/Sub (persistent workers)

OpenAPI 3.1 + .well-known/agent.json

How the pieces talk

Browser → Next.js. Human visitors hit RSC-rendered pages. Each product page emits Schema.org JSON-LD, the same data that powers the API.
Browser → Firebase AI Logic. Client-side AI features (chat, suggest, summarize) call Gemini through Firebase AI Logic so the API key never leaves the server. App Check rejects anything that isn't your real app.
Next.js API route → Genkit flow. Server-side AI features (recommendations, agent-driven actions) call a Genkit flow. Each flow is a function — call it directly, no HTTP hop.
Genkit flow → Gemini API or Vertex. Choose per-flow via plugin config. Long-running flows that need sessions or memory deploy to Agent Engine instead of running in-process.
Genkit flow → MCP tools. Tool calls (search inventory, look up an order, call the legacy LAMP API) go through Model Context Protocol servers — one MCP server per backend system, mounted into every flow that needs it.
Agent traffic → /api/v1/*. External agents hit the same JSON API your own Genkit flows use, authenticated with OAuth device grants or signed mandates from emerging agent-commerce protocols.
Cloud Scheduler → Cloud Run flow. Cron-driven jobs (nightly summaries, inventory reconciliations) invoke a Genkit flow on a schedule, persist results to Firestore, fire a webhook.

One codebase, two consumers. The shape above means your Schema.org markup, your JSON API, and your Genkit flows all describe the same data model — no fork between "the human site" and "the agent site." A change to a product field shows up in JSON-LD, in /api/v1/products/*, in the recommendation flow's tool schema, and in the agent's catalog response, all from the same source of truth.

14 — Pricing & plans

What it costs

Three ways to pay for Gemini, and they mix. The AI Studio free tier is generous for prototyping and indie projects. The Gemini API paid tier bills per token against your Google AI Studio billing profile. Vertex AI bills against a Google Cloud project with the rest of your cloud spend. Consumer plans (Google AI Pro, Google AI Ultra) are separate — they cover the Gemini app, NotebookLM, Veo, and higher quotas in AI Studio, not API token costs.

Developer billing

Surface	Pricing model	Free tier	Best for
AI Studio (free)	Free with daily/minute limits	Yes — generous	Prototyping, learning, hackathons
Gemini API (paid)	Per million input/output tokens; cached input ~25%; batch 50% off	—	Production apps using AI Studio keys
Vertex AI	Same per-token model; billed to GCP project	$300 GCP credit for new accounts	Anything that needs IAM, residency, SLA
Provisioned Throughput	Reserved capacity, fixed monthly	—	Predictable, latency-sensitive workloads

Relative model costs (per million tokens)

Model	Relative cost	What you're paying for
`gemini-2.5-pro`	Frontier-tier	Hardest reasoning, longest context.
`gemini-2.5-flash`	~5-10× cheaper than Pro	Near-Pro quality, much faster.
`gemini-2.5-flash-lite`	~3-5× cheaper than Flash	Routing, extraction, simple Q&A.
`gemini-2.5-flash-image`	Image-token billing	Image generation/editing.
`gemma-3 (self-hosted)`	Your hardware	No per-token cost — pay for the GPU/CPU.

Specific dollar amounts move. Google publishes current rates at ai.google.dev/pricing and cloud.google.com/vertex-ai/pricing. Confirm before committing to a budget — the shape of the table above is stable, the cents are not.

Consumer plans (Gemini app, not the API)

Plan	Price (USD)	What it unlocks
Free	$0	Gemini 2.5 Flash in the app, limited Deep Research, basic image gen.
Google AI Pro	~$20/mo (often bundled with Google One)	Gemini 2.5 Pro in the app, more Deep Research, Veo for video, NotebookLM Plus, higher AI Studio limits.
Google AI Ultra	~$250/mo	Highest quotas, "Deep Think" reasoning, more Veo, early access to new models.
Workspace AI	Bundled with Workspace plans	"Help me write"/"Help me organize" across Docs, Sheets, Gmail, Meet.

Cost-control tips that actually save money

Default to Flash. Reach for Pro only when you can point to a measurable quality gap. Most production apps run 80%+ of traffic on Flash.
Use the thinking budget. thinkingConfig: { thinkingBudget: 0 } on Flash for routine tasks saves a lot of tokens; raise it for hard problems.
Cache long prompts. System prompts >1K tokens and grounding documents are billed at ~25% of normal when served from ai.caches.
Batch anything async. Nightly enrichment, eval runs, bulk summarization — half-price via the Batch API.
Set a hard cap. GCP billing budgets + Vertex per-project quotas. A runaway agent loop is a real failure mode; cap it.
Tune maxOutputTokens. Most apps over-allocate by 4×. The token you don't generate is free.

15 — Everything else

The rest of the Google AI surface

Products you'll bump into often, with one paragraph on what they are and when to reach for them. Use the strip below as an index.

NotebookLMsource-grounded research Project Astrarealtime multimodal assistant Project Marinerbrowser agent (research) Julesasync coding agent Gemini Code AssistIDE plugin Imagen 4text-to-image Veo 3text-to-video, native audio Lyria 2text-to-music Flowfilmmaking tool (Veo) Gemscustom Gemini personas Deep Researchmulti-source briefs Workspace AIDocs/Sheets/Gmail Stitchprompt-to-UI Firebase Studioin-browser app workspace MCPtool protocol

NotebookLM

A research workspace where every answer is grounded in sources you upload (PDFs, Google Docs, Drive, YouTube, websites). Generates "Audio Overviews" (podcast-style summaries with two synthetic hosts), mind maps, study guides, and timelines. Excellent for digesting long-form material and as a personal "ask my notes" interface. Available free; NotebookLM Plus raises quotas via Google AI Pro/Ultra.

Project Astra

Google DeepMind's prototype universal assistant — realtime multimodal, with video + audio in and audio out. The streaming primitives that power Astra ship publicly through the Gemini Live API (ai.live.connect) and the upcoming voice/video features in the Gemini app. Reach for the Live API when you want sub-second latency and a true conversation, not a turn-based chat.

Project Mariner

A research preview of an agent that drives a web browser on your behalf — clicks, types, fills forms, completes workflows across tabs. Browser-use capabilities are flowing into the Gemini SDK as a hosted browser tool and into the ADK as built-in tools. Currently behind Labs/early access for end users; developers should track the SDK release notes.

Jules

Google's async coding agent at jules.google.com. Connect a GitHub repo, file an issue, hand it off. Jules clones the repo into a cloud VM, plans, edits, runs tests, and opens a PR. Best for backlog grooming, dependency upgrades, "I wonder if this is doable," and any change small enough to verify by reading the diff. The async-PR equivalent of OpenAI's Codex Cloud or Anthropic's Claude Code in headless mode.

Gemini Code Assist

Gemini in your IDE. VS Code, IntelliJ, PyCharm, GoLand, the rest of JetBrains, plus Android Studio and Cloud Workstations. Chat panel, ghost-text completions, "explain this", "fix this", repo-wide context. The agentic mode lets it plan a change across files, propose a diff, and run terminal commands behind an approval gate. Free for individuals at modest limits; Standard and Enterprise tiers add private code awareness, audit logging, and IAM.

Imagen 4

Google's high-fidelity text-to-image model. Stronger than the Gemini Flash Image model on photorealism and on typography (rendering legible text inside images). Available via Vertex AI and the Gemini API. Use Flash Image for conversational editing, Imagen for one-shot generation that has to look like a photograph.

Veo 3

Text- and image-to-video, with native audio (dialog, sound effects, ambient). Up to 8-second clips from the API; longer compositions via the Flow product. Cinematic camera controls, character consistency across shots, and a reference-image input for style and subject. The video equivalent of the jump Imagen made.

Lyria 2

Text-to-music: instrumental pieces, vocal tracks, soundtracks, jingles. Available via the Gemini API and the consumer-facing MusicFX surface. Best for short-form generative audio (under a minute); longer compositions need stitching.

Flow

The filmmaking workspace built on Veo and Imagen. Scene-by-scene composition, camera control, frame-to-video extension, asset library. The product Google built so professional creators stop fighting raw prompts and start storyboarding.

Gems

Custom Gemini personas — a name, an instruction set, optional file knowledge, optional tool access. The Gemini-app equivalent of OpenAI's GPTs. Free for everyone (with Pro/Ultra raising limits); excellent for canned workflows ("rewrite-in-our-voice", "code reviewer", "weekly digest").

Deep Research

A multi-step research mode in the Gemini app — Gemini drafts a plan, browses dozens to hundreds of sources, drafts a brief with citations, and (optionally) generates an Audio Overview. Available on the free tier with limits, expanded on Pro/Ultra. Useful for "do a literature review on X" tasks; not a replacement for a domain expert.

Workspace AI

Gemini in Docs, Sheets, Gmail, Slides, Meet, Drive, Chat. "Help me write", "Help me organize", in-Meet note-taking and translation, in-Drive cross-document search. Included with Workspace Business and Enterprise plans (the standalone "Gemini for Workspace" add-on was rolled into the base SKUs).

Stitch

Prompt-to-UI: describe a screen and get a clean Figma-shaped layout you can export to Figma or React/HTML. Useful for spinning up wireframes and starting Tailwind components without hand-laying them out.

Firebase Studio

An in-browser, agentic app development workspace from the Firebase team (the evolution of Project IDX). Spin up a project from a prompt, get an AI-built scaffold (web, mobile, Genkit backend), preview live, deploy to Firebase Hosting / Cloud Run. The "from idea to deployed app in a session" surface.

Model Context Protocol

An open protocol for connecting LLMs to tools and data sources, originally championed by Anthropic and now broadly adopted across the industry — Gemini, the ADK, Genkit, Antigravity, Code Assist, and Claude all speak it. Run an MCP server in front of your database, your filesystem, your ticketing system, your CMS; mount that one server into every agent that needs it. The closest thing the industry has to "USB for AI tools."

16 — Verification & freshness

What to double-check before you ship

Google's AI portfolio moves week to week; this page is current as of the date in the hero and was built against publicly documented surfaces. The items below have either announcement-only status, fast-moving APIs, or both — confirm against Google's own docs before quoting in a roadmap or proposal.

Item	Status	Where to verify
Gemini Spark (24/7 autonomous SKU)	Not confirmed as a packaged product as of writing	I/O 2026 recap, cloud.google.com/blog
Antigravity 2.0 feature set	Treat specific 2.0 features as forward-looking	antigravity.google.com
Universal Commerce Protocol (UCP)	Name not yet confirmed; the public open spec for agent payments is AP2 (Agent Payments Protocol)	A2A docs, github.com/google/A2A
Exact model IDs (`-001`, `-002`, `-latest`)	Stable shape, churning suffixes	ai.google.dev/gemini-api/docs/models
Per-token prices in USD	Shape stable, exact cents change	ai.google.dev/pricing
Consumer plan names & prices	Renamed periodically (was "Gemini Advanced", now "Google AI Pro/Ultra")	gemini.google.com
Chrome built-in AI API surface	Stable APIs listed; Prompt API still in origin trial	developer.chrome.com/docs/ai/built-in
Firebase Genkit version & plugin names	1.x stable; check the README before pinning	genkit.dev

The site you're reading is part of the WholeTech network — a set of factual, plain-language developer guides. Sibling sites: codex.wholetech.com (OpenAI Codex), claude.wholetech.com (Anthropic Claude). All three follow the same editorial rule: prefer accuracy over freshness when they conflict, and flag the things that are still moving.

Hello, Gemini.

Pick where you want to start

1. Google AI Studio

2. Gemini API

3. Vertex AI

4. Google Antigravity

5. Gemini app & Code Assist

One family, every surface

The model family

AI Studio vs. Vertex AI

The Google AI stack, sorted by web-dev utility

API & Model Orchestration

Persistent Cloud Workers

Edge & Browser Deployment

Agent Environments

Pick the right size

From zero to streaming in five lines

Install & first call (Node)

Streaming, tools, structured output, multimodal

What's actually new in the unified SDK

Where keys come from

The enterprise lane

What Vertex gives you over AI Studio

What AI Studio gives you that Vertex doesn't

Switching the SDK to Vertex

Agent Engine in one paragraph

The web-developer-native agent framework

A minimal flow

Why this is the right choice for a Next.js app

An IDE built for agents, not autocomplete

What makes it different from "an agent in VS Code"

When to reach for Antigravity vs. Gemini Code Assist

Antigravity wins when…

Code Assist wins when…

An LLM inside the browser, free

The current API surface

A real example

Things to know before you ship

On-device ML for everything that isn't a chat model

The task catalog

LLM Inference in the browser (Gemma, locally)

Gemini Spark & what you can build today

The pattern: a persistent worker, today

Building for the Agentic Web

The shift

Three things agents need that humans don't

A machine-readable contract

An entry point it can find

An action it can take

Layer 1: Structured data the agent will trust

Layer 2: A machine-shaped API that mirrors your UI

Layer 3: Authentication an agent can complete

Layer 4: A discovery contract

Layer 5: A checkout an agent can complete unattended

Layer 6: Don't show the agent things meant for humans

A 60-minute checklist

A scalable stack for an agent-native developer portal

Edge / Presentation

Agent & AI Layer

Backend / Data

How the pieces talk

What it costs

Developer billing

Relative model costs (per million tokens)

Consumer plans (Gemini app, not the API)

Cost-control tips that actually save money

The rest of the Google AI surface

NotebookLM

Project Astra

Project Mariner

Jules

Gemini Code Assist

Imagen 4

Veo 3

Lyria 2

Flow

Gems

Deep Research

Workspace AI

Stitch