A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.
The Cohere API is how an app or AI agent works with Cohere's models: generating a chat reply, turning text or images into embeddings, reranking a list of documents against a query, or classifying text into labels. Access is granted through a single API key sent as a bearer token, and that key reaches every method the account is entitled to, since Cohere does not offer per-method permissions. Each key is either a Trial key for evaluation or a Production key for paid use, which sets the rate limits and the models it can reach rather than which methods it can call.
How an app or AI agent connects to Cohere determines what it can reach. There is one route for calling the models, and the key behind it carries the same access across every method.
The REST API takes JSON request bodies and returns JSON, at https://api.cohere.com. The current v2 path serves Chat, Embed, Rerank and Classify; the v1 path serves Tokenize, Detokenize, Datasets, Embed Jobs and Fine-tuning. Every call authenticates with an API key sent as a bearer token in the Authorization header. Official SDKs exist for Python, TypeScript, Java and Go.
Cohere authenticates every first-party call with an API key sent as a bearer token in the Authorization header. The key is not scoped to specific endpoints, so it reaches every method the account is entitled to. A key can be confirmed with the check-api-key method, which returns whether it is valid and the organization and owner it belongs to.
Every key is one of two types. A Trial key is free for evaluation, capped at 1,000 calls a month with lower per-minute limits, and cannot reach the newest models. A Production key is paid, with much higher per-minute limits and the full model lineup. The type sets rate limits and model access, not which methods can be called.
The Cohere API is split into areas an agent can act on, like generating chat replies, turning text into embeddings, reranking search results, and classifying text. Some methods only return generated output, while others create and delete stored resources like datasets and fine-tuned models.
Methods that generate text replies or classify text into labels.
Methods that turn text or images into embeddings, or reorder documents by relevance.
Methods that convert text to tokens and back.
Methods that list and retrieve the models available to the account.
Methods for uploading, listing, retrieving and deleting datasets used for training and embed jobs.
Methods for running and tracking batch embedding jobs over a dataset.
Methods for creating, listing, retrieving and deleting fine-tuned models.
Methods for checking the API key.
Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.
| Method | Endpoint | What it does | Access | Permission | Version | |
|---|---|---|---|---|---|---|
Generate (Chat & Classify)Methods that generate text replies or classify text into labels.2 | ||||||
| POST | /v2/chat | Generate a chat reply to a messages array, with optional tools, documents for citations, structured outputs, and streaming. | write | — | Current | |
No per-method permission; any valid key can call it. Trial keys cannot use the newest models. Acts onchat Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 20 req/min. Production: 500 req/min (per model). SourceOfficial documentation ↗ | ||||||
| POST | /v1/classify | Predict which of a set of labels best fits each input text, using example text-and-label pairs or a fine-tuned classifier. | write | — | Current | |
No per-method permission; any valid key can call it. Acts onclassify Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial and Production: 500 req/min (default tier). SourceOfficial documentation ↗ | ||||||
Represent (Embed & Rerank)Methods that turn text or images into embeddings, or reorder documents by relevance.2 | ||||||
| POST | /v2/embed | Return embeddings for up to 96 texts or for images, with model, input_type and embedding_types as required parameters. | write | — | Current | |
No per-method permission; any valid key can call it. Generation only, nothing is stored. Acts onembed Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitText: 2,000 inputs/min. Images: Trial 5/min, Production 400/min. SourceOfficial documentation ↗ | ||||||
| POST | /v2/rerank | Take a query and a list of documents and return them ordered by a relevance score, with model and query required. | write | — | Current | |
No per-method permission; any valid key can call it. Each document is truncated at max_tokens_per_doc (default 4,096). Acts onrerank Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 10 req/min. Production: 1,000 req/min. SourceOfficial documentation ↗ | ||||||
TokensMethods that convert text to tokens and back.2 | ||||||
| POST | /v1/tokenize | Split text into the tokens used by a given model's tokenizer. | read | — | Current | |
Text must be 1 to 65,536 characters. Returns token data only. Acts ontoken Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 100 req/min. Production: 2,000 req/min. SourceOfficial documentation ↗ | ||||||
| POST | /v1/detokenize | Turn a list of token IDs back into text, using a given model's tokenizer. | read | — | Current | |
Returns text only; nothing is stored. Acts ontoken Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ModelsMethods that list and retrieve the models available to the account.2 | ||||||
| GET | /v1/models | List the models available to the account, with their endpoints, context length and features. | read | — | Current | |
Read-only; paginated with a next_page_token. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/models/{model} | Retrieve metadata for a single model by name. | read | — | Current | |
Read-only. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
DatasetsMethods for uploading, listing, retrieving and deleting datasets used for training and embed jobs.4 | ||||||
| POST | /v1/datasets | Upload a dataset for use in training or embed jobs. | write | — | Current | |
Creates a stored dataset on the account. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/datasets | List the datasets on the account. | read | — | Current | |
Read-only. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/datasets/{id} | Retrieve a single dataset by ID. | read | — | Current | |
Read-only. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1/datasets/{id} | Permanently delete a dataset by ID. | write | — | Current | |
Irreversible; removes the stored dataset. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Embed jobsMethods for running and tracking batch embedding jobs over a dataset.4 | ||||||
| POST | /v1/embed-jobs | Start a batch job that embeds an entire dataset. | write | — | Current | |
Starts a long-running job; track it by polling its status. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 5 req/min. Production: 50 req/min. SourceOfficial documentation ↗ | ||||||
| GET | /v1/embed-jobs | List the embed jobs run on the account. | read | — | Current | |
Read-only. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/embed-jobs/{id} | Retrieve a single embed job by ID, including its status. | read | — | Current | |
Read-only; poll this to learn when a job completes. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/embed-jobs/{id}/cancel | Cancel a running embed job by ID. | write | — | Current | |
Stops a running job. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Fine-tuningMethods for creating, listing, retrieving and deleting fine-tuned models.3 | ||||||
| POST | /v1/finetuning/finetuned-models | Create a fine-tuned model, trained on a dataset named in the request. | write | — | Current | |
Requires name and settings (base_model and dataset_id); starts a training run. Acts onfinetuned-model Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/finetuning/finetuned-models | List the fine-tuned models on the account. | read | — | Current | |
Read-only. Acts onfinetuned-model Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/finetuning/finetuned-models/{id} | Retrieve a single fine-tuned model by ID. | read | — | Current | |
Read-only. Acts onfinetuned-model Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
AccountMethods for checking the API key.1 | ||||||
| POST | /v1/check-api-key | Check that the API key in the Authorization header is valid and active. | read | — | Current | |
Returns whether the key is valid and the organization and owner it belongs to. Acts onapi-key Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Cohere does not push events. An app or AI agent learns the outcome of a long-running job, like an embed job or a fine-tuning run, by polling its status method.
| Event | What it signals | Triggered by |
|---|
Cohere limits how fast an app can call each endpoint, with separate ceilings for trial keys and production keys, and a monthly cap on trial-key calls.
Cohere meters requests per minute, per endpoint, with separate ceilings for Trial keys and Production keys. The free Trial key is also capped at 1,000 API calls a month across all endpoints. The per-minute limits differ widely by endpoint and key type, so they are listed on each method's row rather than summarized here, for example Chat at 20 requests per minute on a Trial key and 500 on a Production key, and Rerank at 10 versus 1,000. Going over returns HTTP 429, and the fix is to slow down or move to a Production key.
List methods that can return many items, like List Embed Jobs, List Datasets, List Models and List Fine-tuned Models, page with a page_size parameter and return a next_page_token to fetch the following page. Generation methods like Chat, Embed and Rerank return their full result in one response and do not paginate.
Per-call input limits apply by method. Embed takes up to 96 texts per call. Tokenize accepts text of 1 to 65,536 characters. Rerank truncates each document at max_tokens_per_doc, which defaults to 4,096 tokens. Beyond these, the model's context length sets how much text a single Chat or Embed call can process.
The status codes an agent should handle, and what to do about each.
| Status | Code | Meaning | What to do |
|---|---|---|---|
| 400 | Bad Request | The request body is not valid, for example a required field is missing or a value is out of range. | Check the request against the API spec, fix the fields or values, and resend. |
| 401 | Unauthorized | The API key is missing, invalid, or has expired. | Send a valid API key in the Authorization header, and rotate the key if it was compromised. |
| 402 | Payment Required | The account has reached a billing or spending limit. | Add or update the payment method in the dashboard to continue. |
| 404 | Not Found | The requested resource does not exist, for example a wrong or deleted model, dataset, or job ID. | Verify the resource identifier and confirm it belongs to this account. |
| 429 | Too Many Requests | The rate limit for the endpoint and key type was exceeded. | Back off and retry, smooth the request rate, or move from a Trial key to a Production key. |
| 499 | Request Cancelled | The client cancelled the request before it completed. | Retry the request when ready. |
| 500 | Internal Server Error | An unexpected error occurred on Cohere's side. | Retry with backoff, and contact Cohere support with the request details if it persists. |
Cohere runs two numbered API versions side by side, an older v1 and a current v2, and ships dated model and API changes through its release notes.
The v2 API reworked the four model endpoints. Chat combines message and chat history into a single messages array and uses JSON-schema tool definitions; Embed and Rerank made model a required parameter, and Embed added a required embedding_types parameter; streaming moved to server-sent events and citation handling was consolidated. The rest of the API stays on v1.
Older Command models were deprecated in favour of the current lineup.
Embed v4.0 embeds interleaved text and images in the same vector space, so screenshots of PDFs, slides, figures and tables can be indexed alongside text.
command-a-03-2025 is Cohere's most performant Command model, built for enterprise tool use, retrieval-augmented generation, agents and multilingual tasks.
Rerank-v3.5 shipped alongside the v2 Rerank API, with a 4,096-token context length and stronger multilingual retrieval. The Rerank-v2.0 model family was deprecated.
Call the v2 endpoints for chat, embed, rerank and classify; the rest of the API stays on v1.
Cohere release notes ↗Bollard AI sits between a team's AI agents and Cohere. Grant each agent exactly the access it needs, method by method, and every call is checked and logged.