Everything an AI agent can do with the Cohere API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints20

API versionv2

Last updated23 June 2026

Orientation

How the Cohere API works.

The Cohere API is how an app or AI agent works with Cohere's models: generating a chat reply, turning text or images into embeddings, reranking a list of documents against a query, or classifying text into labels. Access is granted through a single API key sent as a bearer token, and that key reaches every method the account is entitled to, since Cohere does not offer per-method permissions. Each key is either a Trial key for evaluation or a Production key for paid use, which sets the rate limits and the models it can reach rather than which methods it can call.

20Endpoints

8Capability groups

11Read

9Write

0Permissions

Authentication

Cohere authenticates every call with an API key sent as a bearer token in the Authorization header. There is no OAuth flow for first-party calls and no per-method permission system, so the key reaches every endpoint the account is entitled to. A key can be confirmed with the check-api-key method, which returns whether it is valid and the organization it belongs to.

Permissions

Cohere does not scope a key to specific endpoints. The only distinction is the key type: a Trial key for free evaluation, capped at 1,000 calls a month with lower per-minute limits, or a Production key for paid use with higher limits. Some of the newest models additionally require a Production key or sales approval. There is no Read-only or write-only key.

Versioning

Cohere runs two numbered API versions at once. The current v2 covers Chat, Embed, Rerank and Classify, where model is a required parameter and chat history is a single messages array. The older v1 still serves the rest of the API, like Tokenize, Datasets, Embed Jobs and Fine-tuning. Models carry their own dated names, like command-a-03-2025 and embed-v4.0.

Data model

Most calls are stateless generation: text in, generated output back, with nothing stored. The exceptions are stored resources, datasets uploaded for training, embed jobs that run over a dataset, and fine-tuned models, each of which can be created, listed, retrieved and deleted. Long-running work like an embed job or a fine-tuning run is tracked by polling its status, since Cohere does not push events.

Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to Cohere determines what it can reach. There is one route for calling the models, and the key behind it carries the same access across every method.

Ways to connect

REST API

The REST API takes JSON request bodies and returns JSON, at https://api.cohere.com. The current v2 path serves Chat, Embed, Rerank and Classify; the v1 path serves Tokenize, Detokenize, Datasets, Embed Jobs and Fine-tuning. Every call authenticates with an API key sent as a bearer token in the Authorization header. Official SDKs exist for Python, TypeScript, Java and Go.

Best forConnecting an app or AI agent to Cohere's models.

Governed byThe API key, which reaches every method the account is entitled to.

Docs ↗

Authentication

API key (bearer)

Cohere authenticates every first-party call with an API key sent as a bearer token in the Authorization header. The key is not scoped to specific endpoints, so it reaches every method the account is entitled to. A key can be confirmed with the check-api-key method, which returns whether it is valid and the organization and owner it belongs to.

TokenBearer API key

Best forServer-side calls to Cohere's models.

Docs ↗

Key type (Trial or Production)

Every key is one of two types. A Trial key is free for evaluation, capped at 1,000 calls a month with lower per-minute limits, and cannot reach the newest models. A Production key is paid, with much higher per-minute limits and the full model lineup. The type sets rate limits and model access, not which methods can be called.

TokenTrial or Production API key

Best forChoosing between free evaluation and paid production use.

Docs ↗

Capability map

What an AI agent can do in Cohere.

The Cohere API is split into areas an agent can act on, like generating chat replies, turning text into embeddings, reranking search results, and classifying text. Some methods only return generated output, while others create and delete stored resources like datasets and fine-tuned models.

Generate (Chat & Classify)

2 endpoints

Methods that generate text replies or classify text into labels.

These calls send text to Cohere's models and return generated output.

View endpoints →

Represent (Embed & Rerank)

2 endpoints

Methods that turn text or images into embeddings, or reorder documents by relevance.

These calls send text or images to Cohere's models and return scores or vectors.

View endpoints →

Tokens

2 endpoints

Methods that convert text to tokens and back.

These calls return token data and do not change anything.

View endpoints →

Models

2 endpoints

Methods that list and retrieve the models available to the account.

These calls only read model metadata.

View endpoints →

Datasets

4 endpoints

Methods for uploading, listing, retrieving and deleting datasets used for training and embed jobs.

A write here creates or deletes a stored dataset.

View endpoints →

Embed jobs

4 endpoints

Methods for running and tracking batch embedding jobs over a dataset.

A write here starts or cancels a batch job.

View endpoints →

Fine-tuning

3 endpoints

Methods for creating, listing, retrieving and deleting fine-tuned models.

A write here creates or deletes a fine-tuned model.

View endpoints →

Account

1 endpoint

Methods for checking the API key.

This call only reads the key's status.

View endpoints →

Endpoint reference

Every Cohere API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

Hide deprecated

Method	Endpoint	What it does	Access	Permission	Version
Generate (Chat & Classify) Methods that generate text replies or classify text into labels.2
POST	`/v2/chat`	Generate a chat reply to a messages array, with optional tools, documents for citations, structured outputs, and streaming.	write	—	Current
No per-method permission; any valid key can call it. Trial keys cannot use the newest models. Acts onchat Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 20 req/min. Production: 500 req/min (per model). SourceOfficial documentation ↗
POST	`/v1/classify`	Predict which of a set of labels best fits each input text, using example text-and-label pairs or a fine-tuned classifier.	write	—	Current
No per-method permission; any valid key can call it. Acts onclassify Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial and Production: 500 req/min (default tier). SourceOfficial documentation ↗
Represent (Embed & Rerank) Methods that turn text or images into embeddings, or reorder documents by relevance.2
POST	`/v2/embed`	Return embeddings for up to 96 texts or for images, with model, input_type and embedding_types as required parameters.	write	—	Current
No per-method permission; any valid key can call it. Generation only, nothing is stored. Acts onembed Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitText: 2,000 inputs/min. Images: Trial 5/min, Production 400/min. SourceOfficial documentation ↗
POST	`/v2/rerank`	Take a query and a list of documents and return them ordered by a relevance score, with model and query required.	write	—	Current
No per-method permission; any valid key can call it. Each document is truncated at max_tokens_per_doc (default 4,096). Acts onrerank Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 10 req/min. Production: 1,000 req/min. SourceOfficial documentation ↗
Tokens Methods that convert text to tokens and back.2
POST	`/v1/tokenize`	Split text into the tokens used by a given model's tokenizer.	read	—	Current
Text must be 1 to 65,536 characters. Returns token data only. Acts ontoken Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 100 req/min. Production: 2,000 req/min. SourceOfficial documentation ↗
POST	`/v1/detokenize`	Turn a list of token IDs back into text, using a given model's tokenizer.	read	—	Current
Returns text only; nothing is stored. Acts ontoken Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Models Methods that list and retrieve the models available to the account.2
GET	`/v1/models`	List the models available to the account, with their endpoints, context length and features.	read	—	Current
Read-only; paginated with a next_page_token. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/models/{model}`	Retrieve metadata for a single model by name.	read	—	Current
Read-only. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Datasets Methods for uploading, listing, retrieving and deleting datasets used for training and embed jobs.4
POST	`/v1/datasets`	Upload a dataset for use in training or embed jobs.	write	—	Current
Creates a stored dataset on the account. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/datasets`	List the datasets on the account.	read	—	Current
Read-only. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/datasets/{id}`	Retrieve a single dataset by ID.	read	—	Current
Read-only. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
DELETE	`/v1/datasets/{id}`	Permanently delete a dataset by ID.	write	—	Current
Irreversible; removes the stored dataset. Acts ondataset Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Embed jobs Methods for running and tracking batch embedding jobs over a dataset.4
POST	`/v1/embed-jobs`	Start a batch job that embeds an entire dataset.	write	—	Current
Starts a long-running job; track it by polling its status. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitTrial: 5 req/min. Production: 50 req/min. SourceOfficial documentation ↗
GET	`/v1/embed-jobs`	List the embed jobs run on the account.	read	—	Current
Read-only. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/embed-jobs/{id}`	Retrieve a single embed job by ID, including its status.	read	—	Current
Read-only; poll this to learn when a job completes. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/embed-jobs/{id}/cancel`	Cancel a running embed job by ID.	write	—	Current
Stops a running job. Acts onembed-job Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Fine-tuning Methods for creating, listing, retrieving and deleting fine-tuned models.3
POST	`/v1/finetuning/finetuned-models`	Create a fine-tuned model, trained on a dataset named in the request.	write	—	Current
Requires name and settings (base_model and dataset_id); starts a training run. Acts onfinetuned-model Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/finetuning/finetuned-models`	List the fine-tuned models on the account.	read	—	Current
Read-only. Acts onfinetuned-model Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/finetuning/finetuned-models/{id}`	Retrieve a single fine-tuned model by ID.	read	—	Current
Read-only. Acts onfinetuned-model Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Account Methods for checking the API key.1
POST	`/v1/check-api-key`	Check that the API key in the Authorization header is valid and active.	read	—	Current
Returns whether the key is valid and the organization and owner it belongs to. Acts onapi-key Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗

No endpoints match those filters.

Webhooks

Webhook events.

Cohere does not push events. An app or AI agent learns the outcome of a long-running job, like an embed job or a fine-tuning run, by polling its status method.

Event	What it signals	Triggered by

No events match that search.

Rate limits & pagination

Rate limits, pagination & request size.

Cohere limits how fast an app can call each endpoint, with separate ceilings for trial keys and production keys, and a monthly cap on trial-key calls.

Request rate

Cohere meters requests per minute, per endpoint, with separate ceilings for Trial keys and Production keys. The free Trial key is also capped at 1,000 API calls a month across all endpoints. The per-minute limits differ widely by endpoint and key type, so they are listed on each method's row rather than summarized here, for example Chat at 20 requests per minute on a Trial key and 500 on a Production key, and Rerank at 10 versus 1,000. Going over returns HTTP 429, and the fix is to slow down or move to a Production key.

Pagination

List methods that can return many items, like List Embed Jobs, List Datasets, List Models and List Fine-tuned Models, page with a page_size parameter and return a next_page_token to fetch the following page. Generation methods like Chat, Embed and Rerank return their full result in one response and do not paginate.

Request size

Per-call input limits apply by method. Embed takes up to 96 texts per call. Tokenize accepts text of 1 to 65,536 characters. Rerank truncates each document at max_tokens_per_doc, which defaults to 4,096 tokens. Beyond these, the model's context length sets how much text a single Chat or Embed call can process.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

Status	Code	Meaning	What to do
400	`Bad Request`	The request body is not valid, for example a required field is missing or a value is out of range.	Check the request against the API spec, fix the fields or values, and resend.
401	`Unauthorized`	The API key is missing, invalid, or has expired.	Send a valid API key in the Authorization header, and rotate the key if it was compromised.
402	`Payment Required`	The account has reached a billing or spending limit.	Add or update the payment method in the dashboard to continue.
404	`Not Found`	The requested resource does not exist, for example a wrong or deleted model, dataset, or job ID.	Verify the resource identifier and confirm it belongs to this account.
429	`Too Many Requests`	The rate limit for the endpoint and key type was exceeded.	Back off and retry, smooth the request rate, or move from a Trial key to a Production key.
499	`Request Cancelled`	The client cancelled the request before it completed.	Retry the request when ready.
500	`Internal Server Error`	An unexpected error occurred on Cohere's side.	Retry with backoff, and contact Cohere support with the request details if it persists.

Versioning & freshness

Version history.

Cohere runs two numbered API versions side by side, an older v1 and a current v2, and ships dated model and API changes through its release notes.

Version history

What changed, and when

Latest versionv2

v2Current version

Current major version (Chat, Embed, Rerank, Classify on v2)

The v2 API reworked the four model endpoints. Chat combines message and chat history into a single messages array and uses JSON-schema tool definitions; Embed and Rerank made model a required parameter, and Embed added a required embedding_types parameter; streaming moved to server-sent events and citation handling was consolidated. The rest of the API stays on v1.

What changed

Chat v2: messages array replaces separate message and chat_history.
Embed v2 and Rerank v2: model is now a required parameter.
Embed v2: embedding_types is now a required parameter.
Streaming switched to server-sent events; citations consolidated under citation_options.
v1-only features removed from v2: connectors, server-side conversation management, prompt truncation.

2025-09-15Requires migration

Major Command model deprecations

Older Command models were deprecated in favour of the current lineup.

What changed

Deprecated: command-r-03-2024, command-r-plus-04-2024, command-light, command, and summarize.
Recommended replacements: command-r-08-2024, command-r-plus-08-2024, or command-a-03-2025.

2025-04-15Feature update

Embed Multimodal v4

Embed v4.0 embeds interleaved text and images in the same vector space, so screenshots of PDFs, slides, figures and tables can be indexed alongside text.

What changed

embed-v4.0 released, with multimodal (text and image) embeddings in one model.
Available on the Cohere platform and major cloud providers.

2025-03

Command A

command-a-03-2025 is Cohere's most performant Command model, built for enterprise tool use, retrieval-augmented generation, agents and multilingual tasks.

What changed

command-a-03-2025 released as the flagship Chat model.

2024-12-02Requires migration

Rerank-v3.5 and the v2 Rerank API

Rerank-v3.5 shipped alongside the v2 Rerank API, with a 4,096-token context length and stronger multilingual retrieval. The Rerank-v2.0 model family was deprecated.

What changed

rerank-v3.5 released; v2 Rerank API introduced.
model made a required parameter; max_chunks_per_doc replaced by max_tokens_per_doc.
Rerank-v2.0 model family deprecated.

Call the v2 endpoints for chat, embed, rerank and classify; the rest of the API stays on v1.

Cohere release notes ↗

Questions

Cohere API, answered.

What is the difference between a Trial key and a Production key?+

A Trial key is free and meant for evaluation. It is capped at 1,000 API calls a month and has lower per-minute rate limits, and the newest models are not available on it. A Production key is paid, has much higher per-minute limits, and unlocks the full model lineup. Both keys authenticate the same way and call the same endpoints; the difference is rate limits and model access, not which methods are allowed.

Does Cohere support OAuth or per-method permissions?+

No. First-party calls authenticate with a single API key sent as a bearer token, and that key reaches every endpoint the account is entitled to. Cohere does not offer a Read-only key, a write-only key, or a way to scope a key to specific methods. To limit what an agent can do, the access has to be controlled in front of the key, which is what Bollard does.

What is the difference between the v1 and v2 APIs?+

Cohere runs both at once. The v2 endpoints cover Chat, Embed, Rerank and Classify, where model is a required parameter, chat history is a single messages array, and streaming uses server-sent events. The v1 endpoints still serve the rest of the API, like Tokenize, Detokenize, Datasets, Embed Jobs and Fine-tuning. New integrations should call v2 for the four model APIs and v1 for everything else.

How does an agent track a long-running job?+

Cohere does not push events or send webhooks. An embed job or a fine-tuning run returns an ID when it is created, and the agent polls the matching get method to read its status until it completes or fails. There is no callback or event stream for job completion.

How are rate limits enforced?+

Limits are per minute, per endpoint, and differ by key type. For example Chat allows 20 requests per minute on a Trial key and 500 on a Production key, while Rerank allows 10 versus 1,000. Trial keys also have a hard cap of 1,000 calls a month across all endpoints. Exceeding a limit returns HTTP 429; the response should be retried with backoff or the key upgraded to Production.

What does the v2 Chat endpoint do?+

It generates a text response to a conversation. The request carries a model and a messages array holding the conversation so far, and the response is the assistant's reply plus usage metrics and, when documents are supplied, citations. It also supports tool use, structured outputs, and streaming the reply token by token over server-sent events.

What is Bollard AI?

Control what every AI agent can do in Cohere.

Bollard AI sits between a team's AI agents and Cohere. Grant each agent exactly the access it needs, method by method, and every call is checked and logged.

Allow only the methods an agent needs, never a shared Cohere key.
Denied by default, so an agent reaches only what has been explicitly allowed.
Every call recorded in plain English: who, what, where, and the decision.

Control Cohere access in Bollard Browse all APIs →

Cohere

Search Agent

Generate chat replies ActionOffReadFull use

Create embeddings ActionOffReadFull use

Delete fine-tuned models ActionOffReadFull use

Per-agent access, set in Bollard AI, not in Cohere

How the Cohere API works.

Connection & authentication methods.

REST API

API key (bearer)

Key type (Trial or Production)

What an AI agent can do in Cohere.

Generate (Chat & Classify)

Represent (Embed & Rerank)

Tokens

Models

Datasets

Embed jobs

Fine-tuning

Account

Every Cohere API method.

Generate (Chat & Classify)

Represent (Embed & Rerank)

Tokens

Models

Datasets

Embed jobs

Fine-tuning

Account

Webhook events.

Rate limits, pagination & request size.

Request rate

Pagination

Request size

Status codes & error handling.

Version history.

What changed, and when

Cohere API, answered.

More ai API guides for agents

Hugging Face

ElevenLabs

Replicate

Google Gemini

OpenAI

Anthropic

Control what every AI agent can do in Cohere.