A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.
The Perplexity API is how an app or AI agent gets answers from the web: asking a grounded question and getting an answer with its sources, running a raw web search, or kicking off an in-depth research report. Access is granted through a single account API key, and that one key carries the same full access on every method, with no per-method permissions to narrow it. Perplexity reads results back on request rather than pushing events, and a long research job is submitted and then polled until it is done.
How an app or AI agent connects to Perplexity determines what it can reach. There is a route for grounded chat answers, a route for raw web search, a long-running route for deep research, and a hosted server that exposes Perplexity tools to agents, and each is governed by the API key behind it.
The REST API takes JSON request bodies and returns JSON, at https://api.perplexity.ai. The chat completions route is OpenAI-compatible, so the OpenAI client libraries work by pointing their base URL at Perplexity. A call authenticates with an API key sent as a Bearer token.
Perplexity publishes an official Model Context Protocol server, @perplexity-ai/mcp-server, that exposes Perplexity to AI agents and LLM clients. It runs locally over stdio by default and also supports an HTTP deployment for shared use. It exposes four tools: perplexity_search for raw web search, perplexity_ask for grounded answers on sonar-pro, perplexity_research on sonar-deep-research, and perplexity_reason on sonar-reasoning-pro. It authenticates with the PERPLEXITY_API_KEY environment variable.
Every call authenticates with a single account API key, created in the API portal and sent as a Bearer token in the Authorization header. The key carries the full access of the account, the same access on every endpoint, with no per-endpoint scopes to narrow it. A leaked key reaches everything the account can do and spends its credits, so it must stay server-side and be rotated if exposed.
The Perplexity API is split into areas an agent can act on, like grounded chat answers over the Sonar models, raw web search, long-running asynchronous jobs, a unified agent endpoint, text embeddings, and managing the keys that authenticate calls. Most answers can be backed by live web results with the sources they drew on.
Grounded chat answers over the Sonar models, optionally backed by live web search.
Raw web search that returns ranked results without generating an answer.
Long-running chat jobs, used for the deep-research model, submitted then polled by id.
A single agent endpoint that answers with optional web search, reasoning, and tools.
Turn text into vectors for semantic search, clustering, and retrieval.
Read-only listing of the models available to the account.
Generate and revoke the auth tokens that authenticate API calls.
Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.
| Method | Endpoint | What it does | Access | Permission | Version | |
|---|---|---|---|---|---|---|
Chat CompletionsGrounded chat answers over the Sonar models, optionally backed by live web search.1 | ||||||
| POST | /chat/completions | Generate a grounded chat completion over a Sonar model from a message history, optionally streamed. | write | — | Current | |
OpenAI-compatible; the model field selects sonar, sonar-pro, or sonar-reasoning-pro. Each call spends credits. Acts onchat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limit50 RPM at Tier 0, up to 4,000 RPM at Tier 4 (per model) SourceOfficial documentation ↗ | ||||||
SearchRaw web search that returns ranked results without generating an answer.1 | ||||||
| POST | /search | Search the web and return ranked page results without generating an answer. | write | — | Current | |
Returns results[] of title, url, and snippet, with optional date and last_updated. query may be one string or an array. Acts onsearch_result Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Async Chat CompletionsLong-running chat jobs, used for the deep-research model, submitted then polled by id.3 | ||||||
| POST | /async/chat/completions | Submit a long-running chat completion as an asynchronous job, used for the deep-research model. | write | — | Current | |
Returns a request id with status CREATED. Used with the sonar-deep-research model, which runs an extended search. Acts onasync_chat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limit5 RPM at Tier 0 for sonar-deep-research, up to 100 RPM at Tier 5 SourceOfficial documentation ↗ | ||||||
| GET | /async/chat/completions/{request_id} | Retrieve a single async chat completion by its request id, including status and the result when done. | read | — | Current | |
status is one of CREATED, IN_PROGRESS, COMPLETED, or FAILED. Poll until COMPLETED. Acts onasync_chat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /async/chat/completions | List the asynchronous chat completion requests created by the account. | read | — | Current | |
Returns a requests[] of id, model, status, and timestamps, plus a next_token for the next page. Acts onasync_chat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
AgentA single agent endpoint that answers with optional web search, reasoning, and tools.1 | ||||||
| POST | /v1/agent | Generate an agent response that answers with optional web search, reasoning, and tools. | write | — | New | |
Supports web, finance, and people search, URL fetch, a code sandbox, and reasoning effort from low to max. Became generally available in February 2026. Acts onagent_response Permission (capability)None required VersionIntroduced 2026-02-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
EmbeddingsTurn text into vectors for semantic search, clustering, and retrieval.2 | ||||||
| POST | /v1/embeddings | Generate embeddings for a list of texts, for semantic search, clustering, and retrieval. | write | — | New | |
No web search; turns text into vectors. Launched February 2026. Acts onembedding Permission (capability)None required VersionIntroduced 2026-02-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/contextualizedembeddings | Generate contextualized embeddings for document chunks that share context awareness across a document. | write | — | New | |
Chunks from the same document share context, improving retrieval for document-based applications. Acts onembedding Permission (capability)None required VersionIntroduced 2026-02-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ModelsRead-only listing of the models available to the account.2 | ||||||
| GET | /v1/models | List the models available to the agent endpoint for dynamic model selection. | read | — | New | |
Read-only. Returns model identifiers usable with the agent endpoint. Introduced in April 2026. Acts onmodel Permission (capability)None required VersionIntroduced 2026-04-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /models | List the models available to the account for dynamic model selection. | read | — | New | |
Read-only. Introduced in April 2026 for dynamic model selection. Acts onmodel Permission (capability)None required VersionIntroduced 2026-04-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
API KeysGenerate and revoke the auth tokens that authenticate API calls.2 | ||||||
| POST | /generate_auth_token | Generate a new authentication token for API access. | write | — | Current | |
Mints a credential that authenticates calls for the whole account. Acts onauth_token Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /revoke_auth_token | Revoke an existing authentication token so it can no longer be used. | write | — | Current | |
Pass the token to revoke in the request body; it is invalidated immediately. Acts onauth_token Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Perplexity does not push events. An agent gets results by reading a response directly, and for a long-running deep-research job it submits the request, then polls for the result by its id until the job is done.
| Event | What it signals | Triggered by |
|---|
Perplexity limits how fast an app can call by a per-minute request rate that depends on the model and on the account's spend-based usage tier, with the heavier deep-research model capped far lower than the standard Sonar models.
Perplexity meters requests by a per-minute request rate, not by a per-call cost or point weighting, and the ceiling depends on both the model and the account's usage tier. A usage tier is a level set by how much the account has spent over its lifetime, from Tier 0 (new accounts, $0) up through Tier 5 ($5,000+ cumulative), and a tier is kept permanently once reached. The standard chat models (sonar, sonar-pro, sonar-reasoning-pro) allow 50 requests per minute at Tier 0, rising to 4,000 per minute at Tier 4. The heavier sonar-deep-research model is capped far lower, at 5 per minute at Tier 0 rising to 100 at Tier 5. Going over returns HTTP 429, and the limit uses a leaky-bucket model that allows short bursts.
The list async chat completions method returns a next_token in its response, and passing that token back fetches the next page of requests. The search method instead caps the number of results returned through its max_results parameter rather than paging, and the chat completions method returns a single answer per call.
The search method returns up to 20 results per call through max_results (default 10), and search_context_size (low, medium, high) plus max_tokens and max_tokens_per_page bound how much page content is pulled in. A chat completion's length is bounded by the model's context window and the max_tokens parameter on the request.
The status codes an agent should handle, and what to do about each.
| Status | Code | Meaning | What to do |
|---|---|---|---|
| 400 | bad_request | The request was malformed or a parameter was invalid, for example user and assistant messages that are not alternating, or an unknown model name. | Read the error message, fix the request body, and resend. The request is not retryable as-is. |
| 401 | unauthorized | The API key is missing, invalid, or deleted, or the account has run out of credits. | Confirm a valid key is sent in the Authorization header, and top up credits in the API portal if the balance is exhausted. |
| 429 | too_many_requests | The per-minute rate limit for the account's usage tier and the chosen model was exceeded. Limits use a leaky-bucket model that allows short bursts. | Back off and retry with exponential backoff and jitter, smooth the request rate, and raise the usage tier for sustained higher throughput. |
| 500 | internal_server_error | An error on Perplexity's side, which can also appear as 502, 503, or 504. | Retry with backoff, and contact support if it persists. |
Perplexity does not put a dated version in the API. There is one continuously updated API, and changes such as new models and new endpoints ship through dated release notes.
Perplexity does not put a dated version in the API. There is one continuously updated API, and notable changes ship through dated release notes rather than a pinned version string.
The sonar-reasoning model was deprecated and removed, leaving sonar-reasoning-pro as the reasoning model. The Search API also gained max_tokens and last-updated filtering around this time.
Perplexity launched the OpenAI-compatible Sonar chat completions API and, later, the standalone Search API and the asynchronous deep-research jobs.
There is no version to pin; track the changelog for new models and changes.
Perplexity API changelog ↗Bollard AI sits between a team's AI agents and Perplexity. Grant each agent exactly the access it needs, search and answer or nothing, and every call is checked and logged.