Gemini on WholeTech network
A complete, factual developer guide

Hello, Gemini.

A plain-language, end-to-end guide to Google's AI for web developers — the Gemini model family, AI Studio, the Gemini API, Vertex AI, Firebase Genkit, MediaPipe, Gemini Nano in Chrome, Google Antigravity, NotebookLM, and the agent protocols that tie them together. What each one is for, when to use which, and the exact commands to type.

Built to be more comprehensive and more factual than its sibling sites at codex.wholetech.com and claude.wholetech.com.

Last updated: May 2026 Reading time: ~45 min · Hands-on: ~3 hours Audience: web developers · beginners → builders
01 — Quick Start

Pick where you want to start

Gemini is one model family with many front doors. The fastest is Google AI Studio in your browser — sign in, paste a prompt, get a key. The Gemini API (@google/genai) is what you wire into a Node, Next.js, or Python app. Vertex AI is the same models with enterprise plumbing (VPC-SC, regional residency, IAM). Google Antigravity is Google's standalone agent IDE. The Gemini app at gemini.google.com is the consumer surface. Most developers start in AI Studio and graduate to the API within an hour.

1. Google AI Studio

aistudio.google.com · 0 min setup

Browser playground for every Gemini model. Free tier with generous limits. Tune prompts, attach images/video/audio, compare Flash vs. Pro side-by-side, mint an API key, and export the call as Python, Node, curl, or Swift. The canonical sandbox.

Open aistudio.google.com →
{ }

2. Gemini API

@google/genai · ~5 min setup

One npm i @google/genai and a 5-line script later you're streaming Gemini 2.5 Pro from Node. The unified SDK replaces the older @google/generative-ai package and works against both AI Studio keys and Vertex.

See the SDK setup →

3. Vertex AI

Google Cloud · enterprise

Same Gemini models, billed to a GCP project, with IAM, VPC Service Controls, regional residency, customer-managed encryption, batch prediction, Model Garden, and a managed agent runtime (Agent Engine). What enterprise security teams want to see.

See Vertex setup →

4. Google Antigravity

agent IDE · macOS · Windows · Linux

Google's agent-first development environment. A standalone desktop app that runs multi-step coding agents across an editor, a browser, and a terminal — surfaces "Artifacts" (plans, screenshots, browser recordings) so you can verify what the agent did instead of trusting it.

See Antigravity setup →

5. Gemini app & Code Assist

gemini.google.com · IDE plugin

The consumer Gemini app (web, Android, iOS) with Deep Research, Gems, Canvas, and Veo. Gemini Code Assist brings Gemini into VS Code, IntelliJ, and the rest of the JetBrains family with chat, completions, and an agentic mode.

See the app & IDE →
Not sure which to pick? Open AI Studio first — you'll have a working prompt in under a minute and a usable API key in under three. Drop into the API when you want it inside your app, jump to Vertex when a security review asks for IAM and residency, and install Antigravity when you want an agent to actually do the work rather than describe it.
02 — What it actually is

One family, every surface

"Gemini" names three things at once, and untangling them up front saves a lot of confusion later. (1) A family of models — multimodal foundation models from Google DeepMind, currently in their 2.x generation, that come in Pro, Flash, Flash-Lite, and Nano sizes. (2) A consumer product at gemini.google.com — Google's answer to ChatGPT. (3) A developer platform exposed through Google AI Studio, the Gemini API, and Vertex AI. This site is mostly about (3), with enough of (1) and (2) to keep you oriented.

single API, four sizes

The model family

Gemini Pro for hard problems, Gemini Flash for everyday work, Flash-Lite for cost-sensitive workloads, and Gemini Nano for on-device. All natively multimodal — text, images, audio, video, PDFs — with a 1M-token context window on the 2.5 generation and 2M on Pro Deep Think.

multimodal1M contexttool use
two routes to the same model

AI Studio vs. Vertex AI

AI Studio keys are the fast lane: free tier, generous limits, billed to Google. Vertex AI is the enterprise lane: GCP project, IAM, VPC-SC, regional endpoints, longer SLA. The model weights and capabilities are the same; only the surrounding controls and billing differ.

same modelsdifferent controls
The mental model: One model family. Two billing/control surfaces (AI Studio, Vertex). Many SDKs and integrations on top. Pick the surface that matches your governance needs, then pick the smallest model that gets the job done.
03 — Dashboard

The Google AI stack, sorted by web-dev utility

Google ships a lot of AI surface area. For a web developer, the useful split isn't "consumer vs. enterprise" or "old vs. new" — it's where the inference runs and who calls it. Four categories below. Pick the row that matches the problem; pick the product inside it that matches the size.

01

API & Model Orchestration

request → response, in your server

Stateless calls from your backend. Pick a model size, send a prompt (with optional images/audio/video/PDF), get a response. Stream tokens, call tools, ground on Google Search, or run structured output with a JSON schema.

Frontier reasoning
Gemini 2.5 Pro
gemini-2.5-pro

Highest-quality long-context reasoning. 1M-token window. Native thinking ("Deep Think" variant for hard problems). Best for code, math, multi-step planning, agent orchestration.

1M contextmultimodaltools
Everyday workhorse
Gemini 2.5 Flash
gemini-2.5-flash

The right default. Faster and cheaper than Pro; near-Pro quality on most tasks. Supports thinking budgets — dial reasoning effort up or down per request.

1M contextfasttools
Cost-sensitive
Gemini 2.5 Flash-Lite
gemini-2.5-flash-lite

The cheapest token in the family. Classification, extraction, routing, summaries-of-summaries. Great as the first stage in a multi-model pipeline.

cheapfast
Native image gen
Gemini 2.5 Flash Image
gemini-2.5-flash-image · "Nano Banana"

Conversational image generation and editing. Multi-turn refinement, character consistency, style transfer. Lives inside the same SDK as the chat models.

image genediting
Open weights
Gemma 3
gemma-3-{1b,4b,12b,27b}

Google's open-weight family. Self-host on your own hardware, fine-tune freely, run on Ollama or vLLM. Multimodal in the 4B+ sizes; same tokenizer family as Gemini.

open weightsself-host
02

Persistent Cloud Workers

long-running agents, hosted by Google

Not a single request — a job. You hand Google a goal (and optional tools, browser, code interpreter) and a managed runtime drives the loop, persists state, and reports back. Cheaper to operate than the stateless API for any task that takes more than one turn or needs to survive a redeploy.

Managed agent runtime
Vertex AI Agent Engine
aiplatform · reasoning engine

Deploy an agent built with the Agent Development Kit (ADK), LangChain, LangGraph, or LlamaIndex onto a managed runtime. Sessions, memory, tracing, autoscaling, IAM. The production target for serious agents.

ADKsessionsmemory
Coding agent
Jules
jules.google.com

Google's async coding agent. Connect a GitHub repo, hand it an issue, walk away. Plans, edits, runs tests in a cloud VM, opens a PR. Best for backlog grooming and "I wonder if this is doable" tasks.

GitHubasyncPR-shaped
Browser agent (research)
Project Mariner
labs.google · research preview

An agent that drives a browser on your behalf — clicks, types, fills forms, completes multi-tab workflows. Currently behind Labs/early access; the API will surface as browser-use tools inside the Gemini SDK and ADK.

browserpreview
Batch inference
Gemini Batch API
batches.create · 50% off

Submit up to thousands of requests as a single job, get results back within 24h at half the per-token price. The right surface for nightly summarization, bulk enrichment, eval runs.

async50% discount
Always-on autonomous (announced)
Gemini Spark ▲ verify
spark · 24/7 ops

Pitched as a 24/7 autonomous operations runtime — agents that wake on schedule or event, run a task, persist state, and report. Treat as forward-looking until a stable SDK surface lands; build the same pattern today on Agent Engine + Cloud Scheduler.

scheduledautonomous
03

Edge & Browser Deployment

inference on the user's device, no API call

For latency, privacy, or cost reasons, you want the model to run where the user is. Chrome ships Gemini Nano behind a set of built-in Web APIs. MediaPipe gives you the same idea for arbitrary tasks — image classification, hand tracking, on-device LLM inference — across web, Android, and iOS.

In-browser LLM
Gemini Nano in Chrome
window.ai · Prompt API

Chrome ships a small Gemini model with the browser. Stable JS APIs for Summarizer, Writer, Rewriter, Translator, Language Detector, and a Prompt API for free-form use. No API key, no network call — runs against the on-device model.

web APIon-devicefree
Cross-platform ML
MediaPipe
@mediapipe/tasks-* packages

Google's on-device ML SDK. Drop-in JS packages for image classification, object detection, face/hand/pose landmarking, text classification, audio classification, and an LLM Inference task that runs Gemma or Phi locally via WebGPU/WebAssembly.

WebGPUWASMAndroid/iOS
Mobile on-device
Gemini Nano on Android
AI Edge SDK · AICore

Pixel 8 Pro+ and a growing list of Android devices ship Gemini Nano in AICore. App developers get summarization, smart reply, and a Prompt API via the AI Edge SDK with no model download in the app binary.

AICoreno model bundle
CDN-style edge
Firebase AI Logic
firebase/ai · client-safe

Call Gemini from web and mobile client code without exposing an API key. App Check verifies the request is from your real app; Firebase rewrites it as a Vertex AI call on the server side. The right pattern for shipping Gemini in a public web app.

client-safeApp Check
04

Agent Environments

where you build & run agents that act

An "agent" is a loop: model → tool call → result → model. Google ships an opinionated runtime (the Agent Development Kit, ADK), an interoperability protocol (A2A), a hosted runtime (Agent Engine), and a desktop development environment (Antigravity). They compose.

Agent IDE
Google Antigravity
antigravity.google.com

Standalone desktop app for building, running, and verifying agents. Editor + browser + terminal, with first-class "Artifacts" — plans, screenshots, browser recordings — so you can audit what the agent did. Multi-agent workspaces, agent-to-agent handoff.

desktopverifiablemulti-agent
Agent framework
Agent Development Kit (ADK)
google.adk · Python & Java

Open-source framework for defining tools, sessions, memory, planners, and multi-agent orchestration. Model-agnostic (Gemini, Claude, GPT, open weights). Deploys cleanly onto Vertex Agent Engine.

open sourcemulti-model
Web-dev native
Firebase Genkit
genkit · Node & Go

The TypeScript/Go answer to ADK. Flow-based authoring, built-in tracing, dotprompt files, evals, a local dev UI, and one-command deploys to Cloud Functions or Cloud Run. The cleanest path from a Next.js API route to a production agent.

TypeScripttracingdev UI
Interop protocol
A2A (Agent-to-Agent)
a2a-protocol.org

Open protocol for agents from different vendors to discover each other's capabilities and exchange tasks. Pairs with MCP (which standardizes tool-calling) to make multi-vendor agent meshes possible.

protocolmulti-vendor
Commerce for agents (emerging)
Agent Payments Protocol ▲ verify
AP2 · open spec

An emerging open spec, championed by Google and partners, for agents to discover catalogs, negotiate purchases, and complete payments with signed mandates from the user. The user's source prompt mentions "Universal Commerce Protocol (UCP)" — verify whether that's a separate effort or the same one rebranded.

paymentsopen spec
04 — The model family

Pick the right size

Almost every cost or latency complaint about Gemini traces back to picking the wrong size. Pro for genuinely hard problems; Flash for everything else; Flash-Lite for high-volume routing and extraction; Nano for on-device. Below: the table you actually need.

ModelContextBest forModalitiesNotes
gemini-2.5-pro 1M in · 64K out Hard reasoning, code, long-document analysis, agent planning text · image · audio · video · PDF Native thinking; "Deep Think" mode for the toughest problems.
gemini-2.5-flash 1M in · 64K out Default for production. Chat, summarization, routine tool use. text · image · audio · video · PDF Configurable thinking budget — trade latency for quality per request.
gemini-2.5-flash-lite 1M in · 64K out Routing, classification, extraction, cheap first stage. text · image · audio · video · PDF Cheapest in the family. Pair with Flash/Pro as a second stage.
gemini-2.5-flash-image Conversational image generation & editing ("Nano Banana"). text + image → image Multi-turn refinement; preserves identity across edits.
gemini-2.0-flash-live streaming Realtime voice + video. Project Astra-style interactions. audio in/out · video in WebSocket-based Live API. Sub-second time-to-first-byte.
gemini-nano 32K typical On-device. Chrome Built-in AI, Android AICore. text (image on newer revs) No API call. Used by the Prompt API, Summarizer, Writer, Rewriter.
imagen-4 High-fidelity text-to-image (when Nano Banana isn't enough). text → image Stronger at typography and photorealism than the Flash Image model.
veo-3 Text- and image-to-video, with native audio. text/image → video+audio Up to 8s clips in the API; longer on the Flow product.
lyria-2 Text-to-music for soundtracks, jingles, generative audio. text → audio Available via the Gemini API and on the MusicFX surface.
gemma-3 up to 128K Open-weight self-hosted Gemini sibling. Ollama, vLLM, llama.cpp. text · image (4B+) Sizes: 1B, 4B, 12B, 27B. Same tokenizer family as Gemini.
Model IDs drift. Google revises model IDs faster than any documentation site can keep up — minor releases, "-latest" aliases, and date-suffixed variants come and go. Pin a specific version in production (e.g. gemini-2.5-flash-001), and check ai.google.dev/gemini-api/docs/models before adopting anything time-sensitive.
05 — The Gemini API

From zero to streaming in five lines

The unified Google GenAI SDK (@google/genai for Node, google-genai for Python) is the SDK you want. It supersedes the older @google/generative-ai package and speaks to both the Gemini Developer API (AI Studio keys) and Vertex AI through the same surface.

Install & first call (Node)

terminal
# 1. install
$ npm install @google/genai

# 2. get a key from aistudio.google.com → "Get API key"
$ export GEMINI_API_KEY="AIza..."

# 3. five-line program
$ node --input-type=module -e '
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
const r = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Write a haiku about TypeScript."
});
console.log(r.text);
'

Streaming, tools, structured output, multimodal

The four things you'll reach for most often, in one place:

examples.ts
import { GoogleGenAI, Type } from "@google/genai";
const ai = new GoogleGenAI({});

// 1. STREAMING — token-by-token, drop into a Next.js route handler
const stream = await ai.models.generateContentStream({
  model: "gemini-2.5-flash",
  contents: "Explain rate limiting in one paragraph."
});
for await (const chunk of stream) process.stdout.write(chunk.text);

// 2. TOOL USE — let the model call your functions
const r = await ai.models.generateContent({
  model: "gemini-2.5-pro",
  contents: "What's the weather in Austin right now?",
  config: {
    tools: [{ functionDeclarations: [{
      name: "get_weather",
      description: "Current weather for a city",
      parameters: { type: Type.OBJECT, properties: {
        city: { type: Type.STRING } }, required: ["city"] }
    }]}]
  }
});

// 3. STRUCTURED OUTPUT — JSON that matches a schema, no parsing hacks
const typed = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "List 3 capital cities with population.",
  config: {
    responseMimeType: "application/json",
    responseSchema: { type: Type.ARRAY, items: { type: Type.OBJECT,
      properties: { city: {type: Type.STRING}, pop: {type: Type.INTEGER}}}}
  }
});

// 4. MULTIMODAL — feed a PDF, an image, audio, a YouTube URL
const file = await ai.files.upload({ file: "./contract.pdf" });
const summary = await ai.models.generateContent({
  model: "gemini-2.5-pro",
  contents: [
    { fileData: { fileUri: file.uri, mimeType: file.mimeType }},
    "Summarize the indemnity clauses."
  ]
});

What's actually new in the unified SDK

CapabilityWhat it doesWhy it matters
ai.models.generateContentSingle entry point for chat, completion, multimodal, structured, tools.No more juggling four method names.
ai.live.connectBidirectional WebSocket for voice/video (Project Astra-style).Sub-second time-to-first-byte for realtime agents.
ai.files.upload48-hour file store for PDFs, audio, video.Cuts request size; lets you reference one upload across turns.
ai.batches.createSubmit thousands of requests; pay 50% less.The right path for any async/offline workload.
ai.caches.createExplicit context caching; pay ~25% of token rate on cached input.Long system prompts & document grounding get dramatically cheaper.
config.thinkingConfigPer-call thinking budget — 0 disables thinking, >0 sets a max.Trade latency for quality without switching models.
config.tools: googleSearchBuilt-in grounding on Google Search results.Citations & freshness without standing up a RAG stack.
config.tools: urlContextFetch & reason over arbitrary URLs at request time.One-liner web reading without writing a scraper.
config.tools: codeExecutionSandboxed Python execution inside the response loop.Math, data analysis, code-correctness checks in one round-trip.
Migrating from @google/generative-ai? The old package is in maintenance mode. The new SDK reorganizes calls under ai.models.*, ai.files.*, ai.caches.*, etc. The model IDs and response shapes are compatible enough that the migration is usually 30 minutes of search-and-replace. Do it before you build anything new.

Where keys come from

AI Studio key
Free tier with daily limits. Best for prototyping and indie projects. Gotcha: never expose in client code — Gemini API does not accept App Check, so AI Studio keys belong only on the server.
Vertex auth
Application Default Credentials, a service account, or Workload Identity. Pass vertexai: true, project, location to the SDK constructor.
Firebase AI Logic
The supported way to call Gemini from web/mobile client code. Uses App Check to verify the request and routes through your Firebase project to Vertex.
06 — Vertex AI

The enterprise lane

Vertex AI is the same Gemini models with the controls your security team will ask about: IAM, audit logging, VPC Service Controls, CMEK, regional residency, and an SLA. If you'd answer "yes" to "does anyone need to approve this AI vendor?", you probably want Vertex.

What Vertex gives you over AI Studio

  • IAM — per-principal allow/deny on every model call.
  • VPC Service Controls — keep prompts & outputs inside your perimeter.
  • Regional endpoints — pick where inference runs (US, EU, asia, several).
  • CMEK — customer-managed encryption keys for at-rest data.
  • Provisioned Throughput — buy reserved capacity for predictable latency.
  • Model Garden — managed access to Claude, Llama, Mistral, and others.
  • Agent Engine — hosted runtime for ADK/LangGraph/LlamaIndex agents.
  • Tuning — supervised fine-tuning on Gemini Flash and Gemma.

What AI Studio gives you that Vertex doesn't

  • A free tier you can sign up for in 30 seconds.
  • The simplest possible auth — a single bearer key.
  • A polished prompt-authoring UI.
  • Day-one access to brand-new model previews (Vertex usually lags by days to weeks).
  • No GCP project required — useful for hackathons, demos, and side projects.

Switching the SDK to Vertex

vertex.ts
import { GoogleGenAI } from "@google/genai";

// Same SDK, two-flag switch from AI Studio → Vertex
const ai = new GoogleGenAI({
  vertexai: true,
  project: "my-gcp-project",
  location: "us-central1"     // or "global" for the multi-region router
});

// auth is ADC: `gcloud auth application-default login` locally,
// the metadata server in Cloud Run/Functions, or a service-account JSON.

Agent Engine in one paragraph

Agent Engine is Vertex's managed runtime for agent frameworks. You ship an ADK, LangGraph, LangChain, LlamaIndex, or CrewAI agent; Agent Engine handles sessions, memory, tracing, autoscaling, and gives you a stable HTTPS endpoint. The mental model is "Cloud Run for agents, with sessions baked in" — no need to stand up Redis for chat history, a queue for long jobs, or your own OTel pipeline for traces.

07 — Firebase Genkit

The web-developer-native agent framework

Genkit is the framework most web developers want and don't know to ask for. TypeScript-first (Go and Python in beta), flow-based authoring, a local dev UI for inspecting traces, dotprompt files for prompt-as-code, evals, and one-command deploys to Cloud Functions or Cloud Run. It composes with Gemini, Vertex, Claude, OpenAI, Ollama, and any model behind an OpenAI-shaped API.

terminal
# scaffold a new Genkit project (Node)
$ npm init -y && npm install genkit @genkit-ai/google-genai
$ npx genkit init

# start the local dev UI — inspect every flow run, every trace
$ npx genkit start -- npx tsx --watch src/index.ts

A minimal flow

src/index.ts
import { genkit, z } from "genkit";
import { googleAI } from "@genkit-ai/google-genai";

const ai = genkit({
  plugins: [googleAI()],
  model: googleAI.model("gemini-2.5-flash")
});

export const summarize = ai.defineFlow({
  name: "summarize",
  inputSchema: z.object({ url: z.string().url() }),
  outputSchema: z.object({ bullets: z.array(z.string()) })
}, async ({ url }) => {
  const page = await fetch(url).then(r => r.text());
  const { output } = await ai.generate({
    prompt: `Summarize this page in 5 bullets:\n\n${page}`,
    output: { schema: z.object({ bullets: z.array(z.string()) }) }
  });
  return output!;
});

Why this is the right choice for a Next.js app

08 — Google Antigravity

An IDE built for agents, not autocomplete

Antigravity is Google's standalone agent-development desktop app — Mac, Windows, Linux. It's not a VS Code extension; it's a separate workspace where the unit of work is an agent task, not a file open in a tab. The agent gets a planner, an editor, a managed browser, and a terminal; you get "Artifacts" (a plan, a recording of the browser session, a diff, screenshots of every important step) so you can verify what happened instead of trusting the chat log.

What makes it different from "an agent in VS Code"

Antigravity 2.0 status: The user's prompt references "Antigravity 2.0" announced at I/O 2026. I/O 2026 fell in mid-May; treat specific 2.0 features as worth confirming against antigravity.google.com and Google's I/O recap before quoting them in production planning. The capabilities above are stable across the 1.x line.

When to reach for Antigravity vs. Gemini Code Assist

Antigravity wins when…

  • The task spans editor + browser + terminal.
  • You want to run several agents at once and pick the best result.
  • You need to audit what an agent did, not just see the diff.
  • You're building agents and want a workbench tuned for that.

Code Assist wins when…

  • You live in VS Code or a JetBrains IDE and don't want to leave.
  • You want completions and chat in the same window as your code.
  • You're touching a repo with strict tooling (debuggers, native extensions) that's hard to move.
  • Your org already has Code Assist licensed and configured.
09 — Gemini Nano & Chrome built-in AI

An LLM inside the browser, free

Chrome ships Gemini Nano with the browser. There is no API key, no network call, and no per-token cost — inference runs on the user's machine against a model the browser downloads and shares across sites. The JS APIs are designed for a web developer's mental model: await Summarizer.create(), then summarizer.summarize(text).

The current API surface

APIUse it forStatus
SummarizerArticle summaries, meeting notes, TL;DRs.Stable
WriterGenerate net-new text from a prompt & context.Stable
RewriterTone shift, length change, simplify, formalize.Stable
TranslatorOn-device translation across major languages.Stable
LanguageDetectorDetect the language of arbitrary text.Stable
LanguageModel (Prompt API)General-purpose chat against Nano. Free-form prompts.Origin trial / EPP
ProofreaderGrammar & style suggestions over a span of text.Origin trial

A real example

summarize.js
// 1. check availability — model may need to download on first call
const status = await Summarizer.availability();
// "unavailable" | "downloadable" | "downloading" | "available"

// 2. create a session with the parameters you want
const s = await Summarizer.create({
  type: "key-points",
  format: "markdown",
  length: "short",
  monitor(m) { m.addEventListener("downloadprogress", e => {
    console.log(`${(e.loaded * 100).toFixed(0)}%`);
  }); }
});

// 3. use it
const summary = await s.summarize(longArticleText);

// 4. stream if you want incremental output
const stream = s.summarizeStreaming(longArticleText);
for await (const chunk of stream) ui.append(chunk);
The killer use case: client-side features that used to require a paid API — summarizing user-pasted content, translating UI strings, rewriting form drafts, grading the tone of an outgoing message — now run for free on the user's machine, with the user's privacy preserved by default.

Things to know before you ship

10 — MediaPipe

On-device ML for everything that isn't a chat model

MediaPipe is Google's cross-platform on-device ML SDK. For a web developer it's a set of @mediapipe/tasks-* npm packages: import, point at a model file, call .detect(). Runs on WebGPU when available and WebAssembly everywhere else.

The task catalog

DomainTasksPackage
VisionObject detection, image classification, image segmentation, face landmarker, hand landmarker, pose landmarker, gesture recognizer, image embedder, interactive segmentation@mediapipe/tasks-vision
TextText classification, text embedder, language detector@mediapipe/tasks-text
AudioAudio classification, audio embedder@mediapipe/tasks-audio
GenAILLM Inference (Gemma, Phi, Falcon), Image Generation@mediapipe/tasks-genai

LLM Inference in the browser (Gemma, locally)

llm.ts
import { FilesetResolver, LlmInference }
  from "@mediapipe/tasks-genai";

const genai = await FilesetResolver.forGenAiTasks(
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai/wasm"
);

const llm = await LlmInference.createFromOptions(genai, {
  baseOptions: { modelAssetPath: "/models/gemma-3-1b-it.task" },
  maxTokens: 1024,
  topK: 40, temperature: 0.8
});

const reply = await llm.generateResponse("why is the sky blue?");
Pick MediaPipe over the Chrome Prompt API when… you need a specific model (not whatever Chrome ships), broader browser support, or capabilities the Chrome APIs don't expose (e.g. embeddings, image classification, hand tracking). Pick the Chrome Prompt API when you want zero model download for the user and you're targeting recent Chrome/Edge.
11 — Persistent autonomy

Gemini Spark & what you can build today

The user prompt referenced Gemini Spark as a 24/7 autonomous operations runtime announced at I/O 2026. As of this writing the public, stable surface area for "always-on, schedule-driven Gemini agents" is the combination of Vertex Agent Engine + Cloud Scheduler + Cloud Tasks, with the ADK or Genkit providing the agent loop. If Spark ships a packaged SKU that wraps this, the pattern below is the one it will encapsulate.

Verify before quoting in a proposal. "Gemini Spark" as a named product wasn't confirmed in Google's documentation when this page was written. Treat any specific Spark feature claims (pricing, regions, exact SDK names) as to-be-confirmed; the architectural pattern below is real and works today regardless of what Google brands it.

The pattern: a persistent worker, today

  1. Define the agent as a Genkit flow or ADK agent. Tools, model, system prompt, optional memory backend (Firestore, Vertex AI Memory Bank).
  2. Deploy to Cloud Run or Vertex Agent Engine. Stable HTTPS endpoint, autoscales to zero between invocations.
  3. Schedule with Cloud Scheduler for periodic runs, Eventarc for event-triggered runs, or Pub/Sub for fan-out.
  4. Persist state in Firestore (or the Agent Engine sessions API). Pass a session ID into every invocation; the agent picks up where it left off.
  5. Observe with Cloud Trace + Logs Explorer. ADK and Genkit both emit OpenTelemetry spans by default — agent runs show up as traces you can drill into.
  6. Cap with budgets & quotas. Set a per-project Vertex token quota and a billing budget alert so a stuck loop can't run away.
12 — Deep dive

Building for the Agentic Web

How to optimize your web apps for autonomous AI consumers — schemas, API structures, and making e-commerce platforms readable for agents that buy without rendering your CSS.

The shift

For 25 years the web optimized for one consumer: a human reading a screen. Page layouts, ad units, paywalls, dark patterns, infinite scroll — all of it was tuned for an eyeball moving across pixels. That consumer is no longer alone. A second consumer is now showing up to your origin: an autonomous agent acting on behalf of a person who never loaded your page. It doesn't render your CSS. It doesn't see your hero image. It doesn't accept your cookie banner. It reads structured data, calls JSON endpoints, follows links your sitemap promised it could, and abandons your domain the moment it can't find what it needs.

The web apps that win the agentic decade will be the ones that noticed the second consumer and shipped for it. The rest will keep getting traffic that bounces in milliseconds.

Three things agents need that humans don't

need 01

A machine-readable contract

An agent should not have to scrape your HTML to know your name, your prices, your inventory, your operating hours, your return policy. Schema.org JSON-LD, well-formed feeds, and a stable JSON API say "this is what I am" in 200 lines instead of guessing.

need 02

An entry point it can find

Humans land on your homepage. Agents land on whatever you advertised in /.well-known/, robots.txt, your sitemap, and your llms.txt. If those files don't exist or contradict each other, you've handed the agent a coin flip.

need 03

An action it can take

Reading is the easy half. Buying, booking, reserving, submitting — the action the user actually asked for — requires a predictable, authenticated, agent-friendly API. If the only way to check out is to click through three JS-heavy modals, you're invisible to the buyer.

Layer 1: Structured data the agent will trust

Schema.org JSON-LD is the lowest-cost, highest-leverage thing you can ship. Every page should describe itself. For an e-commerce product page, the contract is roughly:

product.html — <head>
<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Product",
  "sku": "AX-104-BLK",
  "name": "Field Notebook, A5, black",
  "image": ["https://cdn.example.com/ax104/main.jpg"],
  "description": "96-page lay-flat field notebook...",
  "brand": { "@type": "Brand", "name": "Acme Stationery" },
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/p/ax-104-blk",
    "priceCurrency": "USD",
    "price": "18.00",
    "priceValidUntil": "2026-12-31",
    "availability": "https://schema.org/InStock",
    "itemCondition": "https://schema.org/NewCondition",
    "shippingDetails": { "@type": "OfferShippingDetails",
      "shippingRate": { "@type": "MonetaryAmount",
        "value": "4.95", "currency": "USD" } },
    "hasMerchantReturnPolicy": { "@type": "MerchantReturnPolicy",
      "returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow",
      "merchantReturnDays": 30 }
  },
  "aggregateRating": { "@type": "AggregateRating",
    "ratingValue": "4.7", "reviewCount": "312" }
}
</script>

Five rules for product schema that agents actually use:

Layer 2: A machine-shaped API that mirrors your UI

If your website is an e-commerce front, your API is your agent surface. Any action a human can take with a click should be possible with an authenticated JSON request. Treat your API as a first-class product, not as a leftover.

UI actionAgent equivalentShape
Browse a categoryList products in a categoryGET /api/v1/products?category=notebooks&limit=50&cursor=…
View a productFetch full product detailGET /api/v1/products/ax-104-blk
Check stockAuthoritative inventory snapshotGET /api/v1/products/ax-104-blk/availability
Add to cartCreate an order intentPOST /api/v1/order-intents
Get a quotePrice a basket with shipping & taxPOST /api/v1/quotes → totals, breakdown, expires_at
Check outConfirm with a signed payment mandatePOST /api/v1/orders with attached mandate
TrackOrder status & trackingGET /api/v1/orders/{id}

Six properties make an API agent-friendly:

Layer 3: Authentication an agent can complete

The OAuth flows you built for humans don't work for agents. A consent screen with a "Continue" button stops them cold. Three patterns work:

Layer 4: A discovery contract

The agent has to find your machine-readable surface before it can use it. Four files do most of the work:

FilePurposeRequired keys
/robots.txtCrawl & agent allow/denyUser-agent, Allow, Disallow, optional Crawl-delay
/sitemap.xmlThe canonical URL list<loc>, <lastmod>; optional <changefreq>
/llms.txtA prose contract for LLM consumers — short description, primary URLs, "what we'd like you to know"plain Markdown; one H1, link sections
/.well-known/agent.jsonCapabilities the site exposes to agents (search endpoint, catalog endpoint, checkout endpoint, auth mode)JSON; convention emerging via A2A and related specs

Ship all four. They take an afternoon. The agent that finds them rewards you with traffic that converts; the agent that doesn't, doesn't return.

Layer 5: A checkout an agent can complete unattended

This is the hard one — and the one with the highest payoff in e-commerce. A traditional human checkout has too many failure modes for an agent: CAPTCHAs, hidden fees revealed late, JavaScript-only address validators, payment redirects, 3-D Secure prompts. The agent-friendly equivalent is a small, predictable API:

  1. Quote. The agent posts a basket and a destination; your server returns line items, shipping options, tax, totals, and an expires_at.
  2. Mandate verification. The agent presents a signed user mandate. You verify the signature, the spend cap, the merchant allowlist, the freshness, and the basket fits within the mandate.
  3. Authorize. Charge against the payment method described in the mandate (typically a payment-network token issued specifically for agent purchases), with a clear "settled by" timestamp.
  4. Confirm. Return an order ID, a tracking URL pattern, and a webhook subscription endpoint.
  5. Audit. Persist the mandate, the quote, the basket, and the authorization decision together — disputes are won and lost on this trail.

Layer 6: Don't show the agent things meant for humans

Equally important: detect the agent and skip the friction.

A 60-minute checklist

  1. Audit one high-value page. Pick your best-selling product. View source. Count fields that an agent could extract reliably without rendering JavaScript.
  2. Ship Product JSON-LD. Use the schema above. Validate at search.google.com/test/rich-results.
  3. Add /llms.txt. One H1, a paragraph describing the site, a list of canonical URLs, a short "do/don't" for agents.
  4. Audit robots.txt. Distinguish search crawlers, AI training crawlers, and shopping/agent crawlers. Decide each, deliberately.
  5. Pick one write endpoint and add idempotency. "Add to cart" or "request a quote" are good starting points.
  6. Document your API in one OpenAPI file. Even a partial one. Host it at /openapi.yaml and link it from /.well-known/agent.json.
  7. Test with an actual agent. Use Gemini with the URL Context tool, Claude with Computer Use, or your own ADK agent. Watch what it fails on and fix the top three.
The one-sentence summary: agents are a second class of customer with a different sensory apparatus; build the contract, the API, and the checkout they need, and your conversion rate on agent traffic will be higher than your conversion rate on humans within a year.
13 — Reference architecture

A scalable stack for an agent-native developer portal

The user prompt asked specifically for a Next.js/Tailwind frontend, a LAMP/Node.js backend, and integration with the unified Gemini SDK, Firebase Genkit, and an agent-commerce protocol. Below is the shape that holds up under both human and agent traffic without forking the codebase.

Edge / Presentation

Next.js 15 (App Router)
Tailwind v4
React Server Components
Streaming responses
Firebase Hosting / Cloud Run
App Check (web/mobile)

Agent & AI Layer

Firebase Genkit (flows)
@google/genai (Gemini)
Vertex Agent Engine (hosted)
ADK (Python sidecar, optional)
MCP servers (tools)
A2A (multi-agent interop)

Backend / Data

Node API routes (Cloud Functions / Cloud Run)
LAMP origin (Apache + PHP + MySQL) for legacy CMS
Firestore (sessions, agent state)
Cloud Storage (artifacts, uploads)
Cloud Scheduler + Pub/Sub (persistent workers)
OpenAPI 3.1 + .well-known/agent.json

How the pieces talk

  1. Browser → Next.js. Human visitors hit RSC-rendered pages. Each product page emits Schema.org JSON-LD, the same data that powers the API.
  2. Browser → Firebase AI Logic. Client-side AI features (chat, suggest, summarize) call Gemini through Firebase AI Logic so the API key never leaves the server. App Check rejects anything that isn't your real app.
  3. Next.js API route → Genkit flow. Server-side AI features (recommendations, agent-driven actions) call a Genkit flow. Each flow is a function — call it directly, no HTTP hop.
  4. Genkit flow → Gemini API or Vertex. Choose per-flow via plugin config. Long-running flows that need sessions or memory deploy to Agent Engine instead of running in-process.
  5. Genkit flow → MCP tools. Tool calls (search inventory, look up an order, call the legacy LAMP API) go through Model Context Protocol servers — one MCP server per backend system, mounted into every flow that needs it.
  6. Agent traffic → /api/v1/*. External agents hit the same JSON API your own Genkit flows use, authenticated with OAuth device grants or signed mandates from emerging agent-commerce protocols.
  7. Cloud Scheduler → Cloud Run flow. Cron-driven jobs (nightly summaries, inventory reconciliations) invoke a Genkit flow on a schedule, persist results to Firestore, fire a webhook.
One codebase, two consumers. The shape above means your Schema.org markup, your JSON API, and your Genkit flows all describe the same data model — no fork between "the human site" and "the agent site." A change to a product field shows up in JSON-LD, in /api/v1/products/*, in the recommendation flow's tool schema, and in the agent's catalog response, all from the same source of truth.
14 — Pricing & plans

What it costs

Three ways to pay for Gemini, and they mix. The AI Studio free tier is generous for prototyping and indie projects. The Gemini API paid tier bills per token against your Google AI Studio billing profile. Vertex AI bills against a Google Cloud project with the rest of your cloud spend. Consumer plans (Google AI Pro, Google AI Ultra) are separate — they cover the Gemini app, NotebookLM, Veo, and higher quotas in AI Studio, not API token costs.

Developer billing

SurfacePricing modelFree tierBest for
AI Studio (free)Free with daily/minute limitsYes — generousPrototyping, learning, hackathons
Gemini API (paid)Per million input/output tokens; cached input ~25%; batch 50% offProduction apps using AI Studio keys
Vertex AISame per-token model; billed to GCP project$300 GCP credit for new accountsAnything that needs IAM, residency, SLA
Provisioned ThroughputReserved capacity, fixed monthlyPredictable, latency-sensitive workloads

Relative model costs (per million tokens)

ModelRelative costWhat you're paying for
gemini-2.5-proFrontier-tierHardest reasoning, longest context.
gemini-2.5-flash~5-10× cheaper than ProNear-Pro quality, much faster.
gemini-2.5-flash-lite~3-5× cheaper than FlashRouting, extraction, simple Q&A.
gemini-2.5-flash-imageImage-token billingImage generation/editing.
gemma-3 (self-hosted)Your hardwareNo per-token cost — pay for the GPU/CPU.
Specific dollar amounts move. Google publishes current rates at ai.google.dev/pricing and cloud.google.com/vertex-ai/pricing. Confirm before committing to a budget — the shape of the table above is stable, the cents are not.

Consumer plans (Gemini app, not the API)

PlanPrice (USD)What it unlocks
Free$0Gemini 2.5 Flash in the app, limited Deep Research, basic image gen.
Google AI Pro~$20/mo (often bundled with Google One)Gemini 2.5 Pro in the app, more Deep Research, Veo for video, NotebookLM Plus, higher AI Studio limits.
Google AI Ultra~$250/moHighest quotas, "Deep Think" reasoning, more Veo, early access to new models.
Workspace AIBundled with Workspace plans"Help me write"/"Help me organize" across Docs, Sheets, Gmail, Meet.

Cost-control tips that actually save money

15 — Everything else

The rest of the Google AI surface

Products you'll bump into often, with one paragraph on what they are and when to reach for them. Use the strip below as an index.

NotebookLM

A research workspace where every answer is grounded in sources you upload (PDFs, Google Docs, Drive, YouTube, websites). Generates "Audio Overviews" (podcast-style summaries with two synthetic hosts), mind maps, study guides, and timelines. Excellent for digesting long-form material and as a personal "ask my notes" interface. Available free; NotebookLM Plus raises quotas via Google AI Pro/Ultra.

Project Astra

Google DeepMind's prototype universal assistant — realtime multimodal, with video + audio in and audio out. The streaming primitives that power Astra ship publicly through the Gemini Live API (ai.live.connect) and the upcoming voice/video features in the Gemini app. Reach for the Live API when you want sub-second latency and a true conversation, not a turn-based chat.

Project Mariner

A research preview of an agent that drives a web browser on your behalf — clicks, types, fills forms, completes workflows across tabs. Browser-use capabilities are flowing into the Gemini SDK as a hosted browser tool and into the ADK as built-in tools. Currently behind Labs/early access for end users; developers should track the SDK release notes.

Jules

Google's async coding agent at jules.google.com. Connect a GitHub repo, file an issue, hand it off. Jules clones the repo into a cloud VM, plans, edits, runs tests, and opens a PR. Best for backlog grooming, dependency upgrades, "I wonder if this is doable," and any change small enough to verify by reading the diff. The async-PR equivalent of OpenAI's Codex Cloud or Anthropic's Claude Code in headless mode.

Gemini Code Assist

Gemini in your IDE. VS Code, IntelliJ, PyCharm, GoLand, the rest of JetBrains, plus Android Studio and Cloud Workstations. Chat panel, ghost-text completions, "explain this", "fix this", repo-wide context. The agentic mode lets it plan a change across files, propose a diff, and run terminal commands behind an approval gate. Free for individuals at modest limits; Standard and Enterprise tiers add private code awareness, audit logging, and IAM.

Imagen 4

Google's high-fidelity text-to-image model. Stronger than the Gemini Flash Image model on photorealism and on typography (rendering legible text inside images). Available via Vertex AI and the Gemini API. Use Flash Image for conversational editing, Imagen for one-shot generation that has to look like a photograph.

Veo 3

Text- and image-to-video, with native audio (dialog, sound effects, ambient). Up to 8-second clips from the API; longer compositions via the Flow product. Cinematic camera controls, character consistency across shots, and a reference-image input for style and subject. The video equivalent of the jump Imagen made.

Lyria 2

Text-to-music: instrumental pieces, vocal tracks, soundtracks, jingles. Available via the Gemini API and the consumer-facing MusicFX surface. Best for short-form generative audio (under a minute); longer compositions need stitching.

Flow

The filmmaking workspace built on Veo and Imagen. Scene-by-scene composition, camera control, frame-to-video extension, asset library. The product Google built so professional creators stop fighting raw prompts and start storyboarding.

Gems

Custom Gemini personas — a name, an instruction set, optional file knowledge, optional tool access. The Gemini-app equivalent of OpenAI's GPTs. Free for everyone (with Pro/Ultra raising limits); excellent for canned workflows ("rewrite-in-our-voice", "code reviewer", "weekly digest").

Deep Research

A multi-step research mode in the Gemini app — Gemini drafts a plan, browses dozens to hundreds of sources, drafts a brief with citations, and (optionally) generates an Audio Overview. Available on the free tier with limits, expanded on Pro/Ultra. Useful for "do a literature review on X" tasks; not a replacement for a domain expert.

Workspace AI

Gemini in Docs, Sheets, Gmail, Slides, Meet, Drive, Chat. "Help me write", "Help me organize", in-Meet note-taking and translation, in-Drive cross-document search. Included with Workspace Business and Enterprise plans (the standalone "Gemini for Workspace" add-on was rolled into the base SKUs).

Stitch

Prompt-to-UI: describe a screen and get a clean Figma-shaped layout you can export to Figma or React/HTML. Useful for spinning up wireframes and starting Tailwind components without hand-laying them out.

Firebase Studio

An in-browser, agentic app development workspace from the Firebase team (the evolution of Project IDX). Spin up a project from a prompt, get an AI-built scaffold (web, mobile, Genkit backend), preview live, deploy to Firebase Hosting / Cloud Run. The "from idea to deployed app in a session" surface.

Model Context Protocol

An open protocol for connecting LLMs to tools and data sources, originally championed by Anthropic and now broadly adopted across the industry — Gemini, the ADK, Genkit, Antigravity, Code Assist, and Claude all speak it. Run an MCP server in front of your database, your filesystem, your ticketing system, your CMS; mount that one server into every agent that needs it. The closest thing the industry has to "USB for AI tools."

16 — Verification & freshness

What to double-check before you ship

Google's AI portfolio moves week to week; this page is current as of the date in the hero and was built against publicly documented surfaces. The items below have either announcement-only status, fast-moving APIs, or both — confirm against Google's own docs before quoting in a roadmap or proposal.

ItemStatusWhere to verify
Gemini Spark (24/7 autonomous SKU)Not confirmed as a packaged product as of writingI/O 2026 recap, cloud.google.com/blog
Antigravity 2.0 feature setTreat specific 2.0 features as forward-lookingantigravity.google.com
Universal Commerce Protocol (UCP)Name not yet confirmed; the public open spec for agent payments is AP2 (Agent Payments Protocol)A2A docs, github.com/google/A2A
Exact model IDs (-001, -002, -latest)Stable shape, churning suffixesai.google.dev/gemini-api/docs/models
Per-token prices in USDShape stable, exact cents changeai.google.dev/pricing
Consumer plan names & pricesRenamed periodically (was "Gemini Advanced", now "Google AI Pro/Ultra")gemini.google.com
Chrome built-in AI API surfaceStable APIs listed; Prompt API still in origin trialdeveloper.chrome.com/docs/ai/built-in
Firebase Genkit version & plugin names1.x stable; check the README before pinninggenkit.dev
The site you're reading is part of the WholeTech network — a set of factual, plain-language developer guides. Sibling sites: codex.wholetech.com (OpenAI Codex), claude.wholetech.com (Anthropic Claude). All three follow the same editorial rule: prefer accuracy over freshness when they conflict, and flag the things that are still moving.