A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.
The Google Gemini API is how an app or AI agent works with Google's Gemini models: generating text and images from a prompt, turning text into embeddings for search, uploading files for a model to read, caching context to reuse, and fine-tuning a model. Access is granted through an account-wide API key from Google AI Studio, which has no per-endpoint permissions, so any call the key can reach, it can make. The model is the versioned, dated thing rather than the API path, and the API can notify an endpoint when asynchronous batch work finishes.
How an app or AI agent connects to Gemini determines what it can reach. There is a route for making calls, a real-time route for live audio and video, and a hosted documentation server, and each is governed by the API key behind it.
The REST API answers at https://generativelanguage.googleapis.com, with the stable surface under /v1 and the broader preview surface under /v1beta. A call authenticates with an API key from Google AI Studio, sent in the x-goog-api-key header or as a query parameter.
The Live API uses a bidirectional WebSocket connection for real-time, low-latency interaction, streaming audio and video in and audio and text out. It authenticates with the same API key and suits voice and live multimodal agents rather than single request-and-response calls.
Google hosts a public Model Context Protocol server at https://gemini-api-docs-mcp.dev that exposes a search_documentation function, so an agent can pull current Gemini API definitions and patterns into its context. It serves documentation lookup, not calls against the generative API itself, which are made over REST or the Live API.
A Gemini API key from Google AI Studio authenticates every call, sent in the x-goog-api-key header or as a query parameter. The key is account-wide and carries no per-endpoint scopes, so any call it can reach, it can make. A key must never be exposed in client code.
Some operations, like calling a tuned model, can use OAuth 2.0 access tokens tied to a Google account rather than a plain API key. OAuth suits flows that act on behalf of a user, while most generation work uses an API key.
The Gemini API is split into areas an agent can act on, like generating content, creating embeddings, counting tokens, uploading files, caching context, running batches, and tuning models. Each area has its own methods, and some create lasting resources or spend against the account's quota.
Generate content from a model, stream it back as it is produced, and list or read the available models.
Turn text into embedding vectors, one input at a time or in a batch, for search and similarity work.
Run a model's tokenizer over input to count tokens before sending a generation call.
Upload files for a model to read, then list, read, or delete them. Files are held for 48 hours.
Save precomputed input tokens as cached content and reuse them across calls, then list, read, update, or delete the cache.
Submit many generation requests as one asynchronous job at half the standard cost, then read, list, or cancel it.
Create a fine-tuned model from training data, then list, read, generate from, or delete it.
Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.
| Method | Endpoint | What it does | Access | Permission | Version | |
|---|---|---|---|---|---|---|
Models & contentGenerate content from a model, stream it back as it is produced, and list or read the available models.4 | ||||||
| POST | /v1beta/{model=models/*}:generateContent | Generate a single model response from an input request, which can include text, images, audio, code, and tool calls. | write | — | Current | |
The API key carries no per-method scope. Marked as a write because the call consumes token quota and can run tools, though it creates no stored resource. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1beta/{model=models/*}:streamGenerateContent | Generate a model response that streams back in chunks as it is produced, rather than waiting for the full output. | write | — | Current | |
Takes the same request as generateContent and returns a stream of partial responses. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/models | List the models available to the key, with their token limits and metadata. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/{name=models/*} | Read the details of one model, including its version and token limits. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
EmbeddingsTurn text into embedding vectors, one input at a time or in a batch, for search and similarity work.2 | ||||||
| POST | /v1beta/{model=models/*}:embedContent | Generate one text embedding vector from input content using an embedding model. | write | — | Current | |
Marked as a write because it consumes token quota; it creates no stored resource. The API key carries no per-method scope. Acts onembedding Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1beta/{model=models/*}:batchEmbedContents | Generate many embedding vectors in one call from a batch of input content. | write | — | Current | |
Each input in the batch is an EmbedContentRequest. The API key carries no per-method scope. Acts onembedding Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Token countingRun a model's tokenizer over input to count tokens before sending a generation call.1 | ||||||
| POST | /v1beta/{model=models/*}:countTokens | Run a model's tokenizer over input content and return the token count, without generating anything. | read | — | Current | |
A read-only utility used to size a prompt before a generation call. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
FilesUpload files for a model to read, then list, read, or delete them. Files are held for 48 hours.4 | ||||||
| POST | /upload/v1beta/files | Upload a file for a model to read later, such as an image, audio clip, video, or document. | write | — | Current | |
Uses a separate upload host. Stored files are held for 48 hours, up to 2 GB per file and 20 GB per project. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/files | List the files uploaded with the key. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/{name=files/*} | Read the metadata of one uploaded file. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1beta/{name=files/*} | Delete an uploaded file before its 48-hour expiry. | write | — | Current | |
Irreversible. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Context cachingSave precomputed input tokens as cached content and reuse them across calls, then list, read, update, or delete the cache.5 | ||||||
| POST | /v1beta/cachedContents | Create cached content, saving precomputed input tokens to reuse across later calls. | write | — | Current | |
A time-to-live is set with the ttl field or an expireTime. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/cachedContents | List the cached content created with the key. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/{name=cachedContents/*} | Read one cached content resource. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| PATCH | /v1beta/{cachedContent.name=cachedContents/*} | Update a cached content resource, such as extending its time-to-live. | write | — | Current | |
The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1beta/{name=cachedContents/*} | Delete a cached content resource. | write | — | Current | |
Irreversible. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
BatchSubmit many generation requests as one asynchronous job at half the standard cost, then read, list, or cancel it.5 | ||||||
| POST | /v1beta/{batch.model=models/*}:batchGenerateContent | Submit many generation requests as one asynchronous batch job, at half the standard cost. | write | — | Current | |
Targets a 24-hour turnaround and can notify a registered endpoint on completion. The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook event batch-completedRate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/{name=batches/*} | Read the status and results of one batch job. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/batches | List the batch jobs created with the key. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1beta/{name=batches/*}:cancel | Cancel a batch job that is still running. | write | — | Current | |
The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1beta/{name=batches/*} | Delete a batch job record. | write | — | Current | |
The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Tuned modelsCreate a fine-tuned model from training data, then list, read, generate from, or delete it.5 | ||||||
| POST | /v1beta/tunedModels | Create a fine-tuned model from training data. | write | — | Current | |
Starts a long-running tuning job that produces a lasting tuned model. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/tunedModels | List the tuned models created with the key. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1beta/{name=tunedModels/*} | Read the details of one tuned model, including its tuning state. | read | — | Current | |
Read-only. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1beta/{model=tunedModels/*}:generateContent | Generate a response from a tuned model. | write | — | Current | |
Calling a tuned model can need proper authentication beyond a plain API key. The key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1beta/{name=tunedModels/*} | Delete a tuned model. | write | — | Current | |
Irreversible. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Gemini has no general webhook system for content generation, so an app reads results back from the call it made. The exception is asynchronous work, like a batch job or a long-running operation, which can notify a registered endpoint when it finishes.
| Event | What it signals | Triggered by |
|---|---|---|
Batch / long-running operation completion | Fires when an asynchronous batch job or long-running operation finishes, so an integration learns the result is ready without polling. This event-driven notification was introduced for the Batch API and long-running operations in May 2026. | /v1beta/{batch.model=models/*}:batchGenerateContent |
Gemini limits how fast and how much an app or AI agent can call, through per-model ceilings on requests per minute, tokens per minute, and requests per day, with the ceilings rising as the billing account moves up a usage tier.
Gemini sets per-model limits across three dimensions at once: requests per minute, tokens per minute, and requests per day. Usage is checked against each, and exceeding any one returns HTTP 429 with the status RESOURCE_EXHAUSTED. The ceilings rise as the billing account moves up a usage tier, decided by cumulative spend: a Free tier, then Tier 1 once billing is set up, Tier 2 after 100 US dollars of spend and 30 days, and Tier 3 after 1,000 US dollars and 30 days. Higher tiers and newer or larger models carry higher ceilings, and a separate, much larger token allowance applies to enqueued batch work, for example several million enqueued tokens for a flash model at Tier 1 rising into the billions at Tier 3.
List endpoints page through results with a pageSize parameter that caps the page and a pageToken parameter that requests the next page. A response returns a nextPageToken when more results remain, and an empty nextPageToken means the last page has been reached.
Limits are set by the model and the resource rather than one global cap. The Files API holds up to 20 GB per project, with a per-file maximum of 2 GB and files retained for 48 hours; uploaded PDFs are capped at 50 MB. Each model has its own input and output token limits, readable on the model resource, and context caching has its own minimum token count to be eligible.
The status codes an agent should handle, and what to do about each.
| Status | Code | Meaning | What to do |
|---|---|---|---|
| 400 | INVALID_ARGUMENT | The request body is malformed, with a typo, a missing required field, or an invalid value. | Check the request against the API reference, correct the named field, and resend. |
| 400 | FAILED_PRECONDITION | The free tier is not available in the caller's region, or billing is not enabled for a request that needs it. | Enable a paid plan or billing on the project in Google AI Studio. |
| 403 | PERMISSION_DENIED | The API key lacks permission for the request, often a wrong key or missing authentication for a tuned model. | Confirm the right key is sent and use the correct authentication for the resource. |
| 404 | NOT_FOUND | The requested resource was not found, such as a model, file, or tuned model that does not exist for this key or version. | Check the resource name and the API version in the path, then retry. |
| 429 | RESOURCE_EXHAUSTED | A rate limit was exceeded for the model, on requests per minute, tokens per minute, or requests per day. | Back off and retry, smooth the request rate, or request a quota increase or a higher usage tier. |
| 500 | INTERNAL | An unexpected error on Google's side, sometimes triggered by an unusually long input context. | Retry with backoff, and reduce the input context if the error persists. |
| 503 | UNAVAILABLE | The service is temporarily overloaded or down. | Retry with backoff, or switch to another model for the moment. |
| 504 | DEADLINE_EXCEEDED | The request could not finish within the deadline, often because the prompt or context is too large. | Raise the client timeout, or reduce the prompt and context size. |
Gemini exposes a stable v1 surface and a broader v1beta surface that carries preview features first, and the model itself is the thing that is versioned and dated, not the API path.
The API exposes a stable v1 surface and a broader v1beta preview surface that carries newer features first, and the SDKs default to v1beta. The API path itself has no dated version string; the model is the versioned, dated thing, promoted from preview to general availability and later retired. The dated entries below are notable model and platform changes from the Gemini API release notes.
Streaming support was added for the text-to-speech preview model.
The native visual models known as Nano Banana 2 and Pro were released as generally available versions.
Gemini 3.5 Flash was released as generally available, and managed agents launched in public preview.
Event-driven webhook support was introduced for the Batch API and long-running operations, so an integration can be notified on completion rather than polling.
The second-generation embedding model was released as generally available.
New Flex and Priority inference tiers were introduced to trade off cost against latency.
An integration can target the stable surface or opt into the preview surface for newer features.
Gemini API release notes ↗Bollard AI sits between a team's AI agents and Google Gemini. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.