A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.
The Replicate API is how an app or AI agent runs machine-learning models: creating a prediction to generate an image or transcribe audio, fine-tuning a model through a training, fetching a result, or listing models and deployments. Access is granted through an account API token, which carries the full access of the account it belongs to with no per-endpoint scopes to narrow it. Replicate can push a prediction's state changes to a webhook, so an integration learns when a long-running job finishes without polling.
How an app or AI agent connects to Replicate determines what it can reach. There is a route for making calls, a route for receiving events when a prediction changes state, and a hosted server that exposes Replicate operations to agents, and each is governed by the API token behind it.
The HTTP API answers at https://api.replicate.com/v1. It takes JSON request bodies, returns JSON, and pages through lists with a cursor. Every call authenticates with an account API token sent as 'Authorization: Bearer
Replicate POSTs the prediction or training object to an HTTPS URL named on the request when the job changes state, filtered by start, output, logs, and completed. The receiver verifies the webhook-id, webhook-timestamp, and webhook-signature headers against the default endpoint's signing secret (whsec_...), an HMAC-SHA256 over the signed content, to confirm the request came from Replicate.
Replicate's official Model Context Protocol server exposes the operations of the HTTP API to AI agents and LLM clients, like searching and fetching models, running predictions and retrieving results, and managing deployments and webhooks. The remote server at mcp.replicate.com authenticates through a web flow where an account API key is provided for the server to use; a local npm package, replicate-mcp, runs with an API token set in the client. It stays current as the HTTP API adds features.
Replicate authenticates every call with an account API token sent as a Bearer token in the Authorization header. A token is account-level: it carries the full access of the user or organization it belongs to, with no per-endpoint or per-resource scopes to narrow it. The same token that reads a model can create predictions that cost money, delete a private model, or cancel a training. A token is created and revoked on the account's API tokens page, and an organization token can be tied to a service account.
The Replicate API is split into areas an agent can act on, like predictions, models, deployments, trainings, files, and the account. A Replicate API token carries the full access of the account it belongs to, so the same token that lists a model can also create predictions that cost money, delete a private model, or cancel a training.
Create a prediction to run a model, retrieve its state and output, list past predictions, and cancel a running one.
Get and list models, create and update a model, run a model's official version, and manage its versions.
Create, read, update, and delete deployments, and run predictions against a deployment.
Start a training to fine-tune a model, retrieve its state, list past trainings, and cancel a running one.
Upload a file to use as model input, retrieve a file's metadata, and list uploaded files.
Read the authenticated account and list the hardware a model can run on.
Retrieve the signing secret used to verify that a webhook came from Replicate.
Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.
| Method | Endpoint | What it does | Access | Permission | Version | |
|---|---|---|---|---|---|---|
PredictionsCreate a prediction to run a model, retrieve its state and output, list past predictions, and cancel a running one.5 | ||||||
| POST | /v1/predictions | Create a prediction to run a model version, optionally with a webhook for state changes. | write | API token | Current | |
Runs a model and bills the account. Takes a version and input, and an optional webhook plus webhook_events_filter (start, output, logs, completed). Acts onprediction Permission (capability) API tokenVersionAvailable since the API’s base version Webhook event prediction.completedRate limit600 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/predictions/{prediction_id} | Retrieve the current state and output of a prediction. | read | API token | Current | |
Status moves through starting, processing, then succeeded, failed, or canceled. Acts onprediction Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/predictions | List the authenticated account's predictions (cursor-paginated). | read | API token | Current | |
Read-only. Acts onprediction Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| POST | /v1/predictions/{prediction_id}/cancel | Cancel a prediction that is still running. | write | API token | Current | |
Stops billing for any remaining run time. Acts onprediction Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| POST | /v1/models/{model_owner}/{model_name}/predictions | Create a prediction using an official model's latest or pinned version. | write | API token | Current | |
Runs a model and bills the account, addressing the model by owner and name rather than a version id. Acts onprediction Permission (capability) API tokenVersionAvailable since the API’s base version Webhook event prediction.completedRate limit600 requests per minute SourceOfficial documentation ↗ | ||||||
ModelsGet and list models, create and update a model, run a model's official version, and manage its versions.8 | ||||||
| GET | /v1/models/{model_owner}/{model_name} | Get a model's details, including its latest version. | read | API token | Current | |
Read-only. Acts onmodel Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/models | List public models (cursor-paginated). | read | API token | Current | |
Read-only. Acts onmodel Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| POST | /v1/models | Create a new model. | write | API token | Current | |
Sets owner, name, visibility (public or private), and the hardware it runs on. Acts onmodel Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| PATCH | /v1/models/{model_owner}/{model_name} | Update a model's metadata. | write | API token | Current | |
Changes properties on a model the account owns. Acts onmodel Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| DELETE | /v1/models/{model_owner}/{model_name} | Delete a private model that has no versions. | write | API token | Current | |
Only a private model with no published versions can be deleted. Acts onmodel Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/models/{model_owner}/{model_name}/versions/{version_id} | Get a specific version of a model, including its input and output schema. | read | API token | Current | |
Read-only. Acts onmodel version Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/models/{model_owner}/{model_name}/versions | List all versions of a model. | read | API token | Current | |
Read-only. Acts onmodel version Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| DELETE | /v1/models/{model_owner}/{model_name}/versions/{version_id} | Delete a model version and its associated output files. | write | API token | Current | |
Deletes the version and the predictions and output files tied to it. Acts onmodel version Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
DeploymentsCreate, read, update, and delete deployments, and run predictions against a deployment.4 | ||||||
| POST | /v1/deployments | Create a deployment that serves a model version on chosen hardware. | write | API token | Current | |
Sets the model version, hardware, and the minimum and maximum number of running instances. Acts ondeployment Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/deployments/{deployment_owner}/{deployment_name} | Get a deployment's details, including its current release. | read | API token | Current | |
Read-only. Acts ondeployment Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| PATCH | /v1/deployments/{deployment_owner}/{deployment_name} | Update a deployment, such as its version or instance count. | write | API token | Current | |
Changing instance counts affects running cost. Acts ondeployment Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| POST | /v1/deployments/{deployment_owner}/{deployment_name}/predictions | Create a prediction against a deployment. | write | API token | Current | |
Runs the deployment's model and bills the account. Acts onprediction Permission (capability) API tokenVersionAvailable since the API’s base version Webhook event prediction.completedRate limit600 requests per minute SourceOfficial documentation ↗ | ||||||
TrainingsStart a training to fine-tune a model, retrieve its state, list past trainings, and cancel a running one.4 | ||||||
| POST | /v1/models/{model_owner}/{model_name}/versions/{version_id}/trainings | Start a training to fine-tune a model from a base version. | write | API token | Current | |
Runs a training job and bills the account, writing the result into a destination model. Acts ontraining Permission (capability) API tokenVersionAvailable since the API’s base version Webhook event training.completedRate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/trainings/{training_id} | Retrieve the current state of a training. | read | API token | Current | |
Read-only. Acts ontraining Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/trainings | List the authenticated account's trainings (cursor-paginated). | read | API token | Current | |
Read-only. Acts ontraining Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| POST | /v1/trainings/{training_id}/cancel | Cancel a training that is still running. | write | API token | Current | |
Stops billing for any remaining run time. Acts ontraining Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
FilesUpload a file to use as model input, retrieve a file's metadata, and list uploaded files.3 | ||||||
| POST | /v1/files | Upload a file to use as input to a model. | write | API token | Current | |
Stores a file on the account and returns a URL to reference as model input. Acts onfile Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/files/{file_id} | Retrieve a file's metadata. | read | API token | Current | |
Read-only. Acts onfile Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/files | List the files uploaded by the authenticated account. | read | API token | Current | |
Read-only. Acts onfile Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
Account & hardwareRead the authenticated account and list the hardware a model can run on.2 | ||||||
| GET | /v1/account | Get the authenticated user or organization the token belongs to. | read | API token | Current | |
Read-only; confirms which account a token authenticates as. Acts onaccount Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
| GET | /v1/hardware | List the hardware a model can run on. | read | API token | Current | |
Read-only; each entry has a name and an SKU used when creating a model. Acts onhardware Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
WebhooksRetrieve the signing secret used to verify that a webhook came from Replicate.1 | ||||||
| GET | /v1/webhooks/default/secret | Get the signing secret for the default webhook endpoint, used to verify webhooks. | read | API token | Current | |
Returns a key prefixed with whsec_ used to check the webhook-signature header. Read-only. Acts onwebhook Permission (capability) API tokenVersionAvailable since the API’s base version Webhook eventNone Rate limit3000 requests per minute SourceOfficial documentation ↗ | ||||||
Replicate can notify an app when a prediction or training changes state, like starting, producing output, or completing. It posts the prediction object to an HTTPS URL named on the request, so an integration learns when a long-running job finishes without polling.
| Event | What it signals | Triggered by |
|---|---|---|
prediction completed | A prediction finished, reaching succeeded, failed, or canceled. The completed event in webhook_events_filter delivers the final prediction object, while start, output, and logs deliver earlier states. | /v1/predictions/v1/models/{model_owner}/{model_name}/predictions/v1/deployments/{deployment_owner}/{deployment_name}/predictions |
training completed | A training finished, reaching succeeded, failed, or canceled. A training is a kind of prediction, so the same webhook and webhook_events_filter options apply. | /v1/models/{model_owner}/{model_name}/versions/{version_id}/trainings |
Replicate limits how fast an app can call by a per-minute request rate, with a separate, lower ceiling on creating predictions, and stricter limits apply as account credit runs low.
Replicate meters requests by a per-minute rate, not by a point or quota cost. Creating predictions is capped at 600 requests per minute, and every other endpoint at 3000 requests per minute. Short bursts above these defaults are allowed before throttling begins, and the ceilings tighten as account credit runs low to prevent overspending. Going over returns HTTP 429 with a detail field that names when the limit resets, for example 'Request was throttled. Expected available in 30s.'
List endpoints, like predictions, trainings, models, and files, are cursor-paginated. A response carries next and previous URLs, and following the next URL fetches the next page until it is absent, rather than building the URL by hand.
Responses are JSON. A file can be uploaded through the files endpoint to reference as model input rather than inlining large data, and output files and the predictions tied to a model version are removed when that version is deleted.
The status codes an agent should handle, and what to do about each.
| Status | Code | Meaning | What to do |
|---|---|---|---|
| 401 | Unauthorized | No valid API token was provided, or it is invalid or revoked. | Send a valid token in the Authorization header as 'Bearer |
| 402 | Payment Required | The account cannot be billed for the request, for example when it has no payment method or has run out of credit. | Add or update the account's billing details, then retry. |
| 404 | Not Found | The requested object does not exist, or the token's account cannot see it. | Check the path, owner, name, or id, and confirm the token's account has access. |
| 422 | Unprocessable Entity | The request was well-formed but a field failed validation, such as model input that does not match the version's input schema. | Read the detail field, fix the named input, and resend. |
| 429 | Throttled | The request rate was exceeded. Creating predictions is capped lower than other endpoints, and limits tighten as account credit runs low. The body's detail field names when the limit resets, for example 'Request was throttled. Expected available in 30s.' | Back off and retry after the time named in the detail message, and smooth the request rate. |
Replicate serves a single dated version of its HTTP API and ships changes through a public changelog rather than minting new version strings. A model is versioned separately, and a prediction can pin the exact model version it runs.
Replicate serves one current version of its HTTP API at the /v1 path and ships changes through a public changelog rather than minting new dated version strings. Models are versioned separately from the API, and a prediction can pin the exact model version it runs. The entries below are notable dated changes from the changelog.
Replicate's MCP server became discoverable through the official MCP Registry, publishing metadata at a /.well-known/mcp/server.json endpoint following the server.json specification.
The HTTP API is a single current version; pin the model version a prediction runs.
Replicate changelog ↗Bollard AI sits between a team's AI agents and Replicate. Grant each agent exactly the access it needs, read or write, action by action, and every call is checked and logged.