Resources/The Agent API Atlas/AI/Perplexity

Everything an AI agent can do with the Perplexity API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints12

AuthenticationAPI key

Last updated23 June 2026

Orientation

How the Perplexity API works.

The Perplexity API is how an app or AI agent gets answers from the web: asking a grounded question and getting an answer with its sources, running a raw web search, or kicking off an in-depth research report. Access is granted through a single account API key, and that one key carries the same full access on every method, with no per-method permissions to narrow it. Perplexity reads results back on request rather than pushing events, and a long research job is submitted and then polled until it is done.

12Endpoints

7Capability groups

4Read

8Write

0Permissions

Authentication

Perplexity authenticates every call with a single account API key (pplx-...), sent as a Bearer token in the Authorization header. There is no OAuth and no per-endpoint scope: the one key carries the full access of the account on every endpoint. A leaked key reaches everything the account can do and spends its credits, so it must stay server-side and be rotated if exposed.

Permissions

The API has no granular permission model. The same key can call chat completions, search, the async deep-research jobs, and the models list, with no way to narrow it to a subset of endpoints. Control over what an agent may do therefore sits outside the key, in whatever layer mediates the calls.

Versioning

Perplexity does not put a dated version in the API. There is one continuously updated API, and changes such as new models and new endpoints ship through dated release notes rather than a pinned version string. A model can be deprecated and removed, as sonar-reasoning was on 2025-12-15.

Data model

The chat completions endpoint is OpenAI-compatible, taking a messages array and a model field that selects a Sonar model, and answers are grounded in live web results with the sources they drew on. The search endpoint returns ranked web results without an answer. The async endpoint runs long jobs, used for deep research, that are submitted then polled by id. Perplexity does not push events.

Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to Perplexity determines what it can reach. There is a route for grounded chat answers, a route for raw web search, a long-running route for deep research, and a hosted server that exposes Perplexity tools to agents, and each is governed by the API key behind it.

Ways to connect

REST API

The REST API takes JSON request bodies and returns JSON, at https://api.perplexity.ai. The chat completions route is OpenAI-compatible, so the OpenAI client libraries work by pointing their base URL at Perplexity. A call authenticates with an API key sent as a Bearer token.

Best forConnecting an app or AI agent to Perplexity.

Governed byThe API key.

Docs ↗

MCP server

Perplexity publishes an official Model Context Protocol server, @perplexity-ai/mcp-server, that exposes Perplexity to AI agents and LLM clients. It runs locally over stdio by default and also supports an HTTP deployment for shared use. It exposes four tools: perplexity_search for raw web search, perplexity_ask for grounded answers on sonar-pro, perplexity_research on sonar-deep-research, and perplexity_reason on sonar-reasoning-pro. It authenticates with the PERPLEXITY_API_KEY environment variable.

Best forConnecting an AI agent to Perplexity through MCP.

Governed byThe API key in PERPLEXITY_API_KEY.

Docs ↗

Authentication

API key

Every call authenticates with a single account API key, created in the API portal and sent as a Bearer token in the Authorization header. The key carries the full access of the account, the same access on every endpoint, with no per-endpoint scopes to narrow it. A leaked key reaches everything the account can do and spends its credits, so it must stay server-side and be rotated if exposed.

TokenBearer API key (pplx-...)

Best forAll server-side calls to Perplexity.

Docs ↗

Capability map

What an AI agent can do with Perplexity.

The Perplexity API is split into areas an agent can act on, like grounded chat answers over the Sonar models, raw web search, long-running asynchronous jobs, a unified agent endpoint, text embeddings, and managing the keys that authenticate calls. Most answers can be backed by live web results with the sources they drew on.

Chat Completions

1 endpoint

Grounded chat answers over the Sonar models, optionally backed by live web search.

Each call spends credits and can send the conversation to the web for search.

View endpoints →

Search

1 endpoint

Raw web search that returns ranked results without generating an answer.

Each call spends credits and queries the live web.

View endpoints →

Async Chat Completions

3 endpoints

Long-running chat jobs, used for the deep-research model, submitted then polled by id.

Each submitted job spends credits and runs an extended web search.

View endpoints →

Agent

1 endpoint

A single agent endpoint that answers with optional web search, reasoning, and tools.

Each call spends credits and can search the web, fetch URLs, and run code.

View endpoints →

Embeddings

2 endpoints

Turn text into vectors for semantic search, clustering, and retrieval.

Each call spends credits; no web search.

View endpoints →

Models

2 endpoints

Read-only listing of the models available to the account.

Read-only; lists the models on offer.

View endpoints →

API Keys

2 endpoints

Generate and revoke the auth tokens that authenticate API calls.

A write here mints or revokes a credential for the whole account.

View endpoints →

Endpoint reference

Every Perplexity API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

Hide deprecated

Method	Endpoint	What it does	Access	Permission	Version
Chat Completions Grounded chat answers over the Sonar models, optionally backed by live web search.1
POST	`/chat/completions`	Generate a grounded chat completion over a Sonar model from a message history, optionally streamed.	write	—	Current
OpenAI-compatible; the model field selects sonar, sonar-pro, or sonar-reasoning-pro. Each call spends credits. Acts onchat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limit50 RPM at Tier 0, up to 4,000 RPM at Tier 4 (per model) SourceOfficial documentation ↗
Search Raw web search that returns ranked results without generating an answer.1
POST	`/search`	Search the web and return ranked page results without generating an answer.	write	—	Current
Returns results[] of title, url, and snippet, with optional date and last_updated. query may be one string or an array. Acts onsearch_result Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Async Chat Completions Long-running chat jobs, used for the deep-research model, submitted then polled by id.3
POST	`/async/chat/completions`	Submit a long-running chat completion as an asynchronous job, used for the deep-research model.	write	—	Current
Returns a request id with status CREATED. Used with the sonar-deep-research model, which runs an extended search. Acts onasync_chat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limit5 RPM at Tier 0 for sonar-deep-research, up to 100 RPM at Tier 5 SourceOfficial documentation ↗
GET	`/async/chat/completions/{request_id}`	Retrieve a single async chat completion by its request id, including status and the result when done.	read	—	Current
status is one of CREATED, IN_PROGRESS, COMPLETED, or FAILED. Poll until COMPLETED. Acts onasync_chat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/async/chat/completions`	List the asynchronous chat completion requests created by the account.	read	—	Current
Returns a requests[] of id, model, status, and timestamps, plus a next_token for the next page. Acts onasync_chat_completion Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Agent A single agent endpoint that answers with optional web search, reasoning, and tools.1
POST	`/v1/agent`	Generate an agent response that answers with optional web search, reasoning, and tools.	write	—	New
Supports web, finance, and people search, URL fetch, a code sandbox, and reasoning effort from low to max. Became generally available in February 2026. Acts onagent_response Permission (capability)None required VersionIntroduced 2026-02-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Embeddings Turn text into vectors for semantic search, clustering, and retrieval.2
POST	`/v1/embeddings`	Generate embeddings for a list of texts, for semantic search, clustering, and retrieval.	write	—	New
No web search; turns text into vectors. Launched February 2026. Acts onembedding Permission (capability)None required VersionIntroduced 2026-02-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/contextualizedembeddings`	Generate contextualized embeddings for document chunks that share context awareness across a document.	write	—	New
Chunks from the same document share context, improving retrieval for document-based applications. Acts onembedding Permission (capability)None required VersionIntroduced 2026-02-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Models Read-only listing of the models available to the account.2
GET	`/v1/models`	List the models available to the agent endpoint for dynamic model selection.	read	—	New
Read-only. Returns model identifiers usable with the agent endpoint. Introduced in April 2026. Acts onmodel Permission (capability)None required VersionIntroduced 2026-04-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/models`	List the models available to the account for dynamic model selection.	read	—	New
Read-only. Introduced in April 2026 for dynamic model selection. Acts onmodel Permission (capability)None required VersionIntroduced 2026-04-01 Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
API Keys Generate and revoke the auth tokens that authenticate API calls.2
POST	`/generate_auth_token`	Generate a new authentication token for API access.	write	—	Current
Mints a credential that authenticates calls for the whole account. Acts onauth_token Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/revoke_auth_token`	Revoke an existing authentication token so it can no longer be used.	write	—	Current
Pass the token to revoke in the request body; it is invalidated immediately. Acts onauth_token Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗

No endpoints match those filters.

Webhooks

Webhook events.

Perplexity does not push events. An agent gets results by reading a response directly, and for a long-running deep-research job it submits the request, then polls for the result by its id until the job is done.

Event	What it signals	Triggered by

No events match that search.

Rate limits & pagination

Rate limits, pagination & request size.

Perplexity limits how fast an app can call by a per-minute request rate that depends on the model and on the account's spend-based usage tier, with the heavier deep-research model capped far lower than the standard Sonar models.

Request rate

Perplexity meters requests by a per-minute request rate, not by a per-call cost or point weighting, and the ceiling depends on both the model and the account's usage tier. A usage tier is a level set by how much the account has spent over its lifetime, from Tier 0 (new accounts, $0) up through Tier 5 ($5,000+ cumulative), and a tier is kept permanently once reached. The standard chat models (sonar, sonar-pro, sonar-reasoning-pro) allow 50 requests per minute at Tier 0, rising to 4,000 per minute at Tier 4. The heavier sonar-deep-research model is capped far lower, at 5 per minute at Tier 0 rising to 100 at Tier 5. Going over returns HTTP 429, and the limit uses a leaky-bucket model that allows short bursts.

Pagination

The list async chat completions method returns a next_token in its response, and passing that token back fetches the next page of requests. The search method instead caps the number of results returned through its max_results parameter rather than paging, and the chat completions method returns a single answer per call.

Request size

The search method returns up to 20 results per call through max_results (default 10), and search_context_size (low, medium, high) plus max_tokens and max_tokens_per_page bound how much page content is pulled in. A chat completion's length is bounded by the model's context window and the max_tokens parameter on the request.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

Status	Code	Meaning	What to do
400	`bad_request`	The request was malformed or a parameter was invalid, for example user and assistant messages that are not alternating, or an unknown model name.	Read the error message, fix the request body, and resend. The request is not retryable as-is.
401	`unauthorized`	The API key is missing, invalid, or deleted, or the account has run out of credits.	Confirm a valid key is sent in the Authorization header, and top up credits in the API portal if the balance is exhausted.
429	`too_many_requests`	The per-minute rate limit for the account's usage tier and the chosen model was exceeded. Limits use a leaky-bucket model that allows short bursts.	Back off and retry with exponential backoff and jitter, smooth the request rate, and raise the usage tier for sustained higher throughput.
500	`internal_server_error`	An error on Perplexity's side, which can also appear as 502, 503, or 504.	Retry with backoff, and contact support if it persists.

Versioning & freshness

Version history.

Perplexity does not put a dated version in the API. There is one continuously updated API, and changes such as new models and new endpoints ship through dated release notes.

Version history

What changed, and when

Latest versionCurrent

CurrentCurrent version

Continuously updated API

Perplexity does not put a dated version in the API. There is one continuously updated API, and notable changes ship through dated release notes rather than a pinned version string.

What changed

May 2026: Finance Search tool added to the Agent API.
April 2026: GET /models endpoint introduced for dynamic model selection; one-time key reveal for new API keys.
February 2026: Embeddings API launched, with contextualized embeddings.

2025-12-15Requires migration

sonar-reasoning removed

The sonar-reasoning model was deprecated and removed, leaving sonar-reasoning-pro as the reasoning model. The Search API also gained max_tokens and last-updated filtering around this time.

What changed

sonar-reasoning deprecated and removed (December 15).
Search API: max_tokens parameter and last_updated filter added.

2025-Earlier

Sonar API and Search API

Perplexity launched the OpenAI-compatible Sonar chat completions API and, later, the standalone Search API and the asynchronous deep-research jobs.

What changed

Sonar chat completions API (OpenAI-compatible) launched.
Sonar Pro introduced for complex queries.
Standalone Search API and async deep-research jobs added.

There is no version to pin; track the changelog for new models and changes.

Perplexity API changelog ↗

Questions

Perplexity API, answered.

Is the Perplexity API OpenAI-compatible?+

The chat completions endpoint is. It takes the same messages array and model field, so the OpenAI client libraries can call Perplexity by pointing their base URL at https://api.perplexity.ai and passing a Perplexity key. The model field selects a Sonar model, and the response adds the web sources the answer was grounded in.

Which models can the API use?+

The current Sonar models are sonar (a lightweight, low-cost grounded search model), sonar-pro (advanced search for complex queries and follow-ups), sonar-reasoning-pro (grounded reasoning with a chain-of-thought), and sonar-deep-research (exhaustive research run as an async job). The earlier sonar-reasoning model was removed on 2025-12-15.

How do rate limits and usage tiers work?+

Limits are a per-minute request rate that depends on the model and on the account's usage tier. A usage tier is a level set by lifetime spend, from Tier 0 (new accounts) up through Tier 5 ($5,000+ cumulative), and once a tier is reached it is kept permanently. The standard chat models allow 50 requests per minute at Tier 0 up to 4,000 at Tier 4, while the deep-research model is capped far lower. Going over returns a 429.

How is deep research handled (the async API)?+

A deep-research request is submitted to the async chat completions endpoint, which returns a request id with status CREATED. The job runs an extended web search in the background, and the result is fetched by polling the get-by-id method until its status reaches COMPLETED. The list method shows all of the account's async requests and their statuses.

Does Perplexity send webhooks?+

No. Perplexity does not push events to an endpoint. A standard chat or search result is read straight from the response, and a long-running async job is retrieved by polling its request id until it is done.

Does Perplexity have an official MCP server?+

Yes. Perplexity publishes @perplexity-ai/mcp-server, an official Model Context Protocol server that runs locally over stdio (with an HTTP deployment option) and exposes four tools: perplexity_search, perplexity_ask, perplexity_research, and perplexity_reason. It authenticates with the PERPLEXITY_API_KEY environment variable.

What is Bollard AI?

Control what every AI agent can do with Perplexity.

Bollard AI sits between a team's AI agents and Perplexity. Grant each agent exactly the access it needs, search and answer or nothing, and every call is checked and logged.

Allow live search and answers per agent, never a shared Perplexity key.
Denied by default, so an agent reaches only what has been explicitly allowed.
Every call recorded in plain English: who, what, where, and the decision.

Control Perplexity access in Bollard Browse all APIs →

Perplexity

Research Agent

Search the web ActionOffReadFull use

Ask Sonar (grounded answers) ActionOffReadFull use

Deep research reports ActionOffReadFull use

Per-agent access, set in Bollard AI, not in Perplexity

How the Perplexity API works.

Connection & authentication methods.

REST API

MCP server

API key

What an AI agent can do with Perplexity.

Chat Completions

Search

Async Chat Completions

Agent

Embeddings

Models

API Keys

Every Perplexity API method.

Chat Completions

Search

Async Chat Completions

Agent

Embeddings

Models

API Keys

Webhook events.

Rate limits, pagination & request size.

Request rate

Pagination

Request size

Status codes & error handling.

Version history.

What changed, and when

Perplexity API, answered.

More ai API guides for agents

Hugging Face

ElevenLabs

Replicate

Google Gemini

OpenAI

Anthropic

Control what every AI agent can do with Perplexity.