Everything an AI agent can do with the Perplexity API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints12
AuthenticationAPI key
Last updated23 June 2026
Orientation

How the Perplexity API works.

The Perplexity API is how an app or AI agent gets answers from the web: asking a grounded question and getting an answer with its sources, running a raw web search, or kicking off an in-depth research report. Access is granted through a single account API key, and that one key carries the same full access on every method, with no per-method permissions to narrow it. Perplexity reads results back on request rather than pushing events, and a long research job is submitted and then polled until it is done.

12Endpoints
7Capability groups
4Read
8Write
0Permissions
Authentication
Perplexity authenticates every call with a single account API key (pplx-...), sent as a Bearer token in the Authorization header. There is no OAuth and no per-endpoint scope: the one key carries the full access of the account on every endpoint. A leaked key reaches everything the account can do and spends its credits, so it must stay server-side and be rotated if exposed.
Permissions
The API has no granular permission model. The same key can call chat completions, search, the async deep-research jobs, and the models list, with no way to narrow it to a subset of endpoints. Control over what an agent may do therefore sits outside the key, in whatever layer mediates the calls.
Versioning
Perplexity does not put a dated version in the API. There is one continuously updated API, and changes such as new models and new endpoints ship through dated release notes rather than a pinned version string. A model can be deprecated and removed, as sonar-reasoning was on 2025-12-15.
Data model
The chat completions endpoint is OpenAI-compatible, taking a messages array and a model field that selects a Sonar model, and answers are grounded in live web results with the sources they drew on. The search endpoint returns ranked web results without an answer. The async endpoint runs long jobs, used for deep research, that are submitted then polled by id. Perplexity does not push events.
Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to Perplexity determines what it can reach. There is a route for grounded chat answers, a route for raw web search, a long-running route for deep research, and a hosted server that exposes Perplexity tools to agents, and each is governed by the API key behind it.

Ways to connect

REST API

The REST API takes JSON request bodies and returns JSON, at https://api.perplexity.ai. The chat completions route is OpenAI-compatible, so the OpenAI client libraries work by pointing their base URL at Perplexity. A call authenticates with an API key sent as a Bearer token.

Best forConnecting an app or AI agent to Perplexity.
Governed byThe API key.
Docs ↗

MCP server

Perplexity publishes an official Model Context Protocol server, @perplexity-ai/mcp-server, that exposes Perplexity to AI agents and LLM clients. It runs locally over stdio by default and also supports an HTTP deployment for shared use. It exposes four tools: perplexity_search for raw web search, perplexity_ask for grounded answers on sonar-pro, perplexity_research on sonar-deep-research, and perplexity_reason on sonar-reasoning-pro. It authenticates with the PERPLEXITY_API_KEY environment variable.

Best forConnecting an AI agent to Perplexity through MCP.
Governed byThe API key in PERPLEXITY_API_KEY.
Docs ↗
Authentication

API key

Every call authenticates with a single account API key, created in the API portal and sent as a Bearer token in the Authorization header. The key carries the full access of the account, the same access on every endpoint, with no per-endpoint scopes to narrow it. A leaked key reaches everything the account can do and spends its credits, so it must stay server-side and be rotated if exposed.

TokenBearer API key (pplx-...)
Best forAll server-side calls to Perplexity.
Docs ↗
Capability map

What an AI agent can do with Perplexity.

The Perplexity API is split into areas an agent can act on, like grounded chat answers over the Sonar models, raw web search, long-running asynchronous jobs, a unified agent endpoint, text embeddings, and managing the keys that authenticate calls. Most answers can be backed by live web results with the sources they drew on.

Endpoint reference

Every Perplexity API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

MethodEndpointWhat it doesAccessPermissionVersion

Chat Completions

Grounded chat answers over the Sonar models, optionally backed by live web search.1

OpenAI-compatible; the model field selects sonar, sonar-pro, or sonar-reasoning-pro. Each call spends credits.

Acts onchat_completion
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limit50 RPM at Tier 0, up to 4,000 RPM at Tier 4 (per model)
Raw web search that returns ranked results without generating an answer.1

Returns results[] of title, url, and snippet, with optional date and last_updated. query may be one string or an array.

Acts onsearch_result
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Async Chat Completions

Long-running chat jobs, used for the deep-research model, submitted then polled by id.3

Returns a request id with status CREATED. Used with the sonar-deep-research model, which runs an extended search.

Acts onasync_chat_completion
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limit5 RPM at Tier 0 for sonar-deep-research, up to 100 RPM at Tier 5

status is one of CREATED, IN_PROGRESS, COMPLETED, or FAILED. Poll until COMPLETED.

Acts onasync_chat_completion
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Returns a requests[] of id, model, status, and timestamps, plus a next_token for the next page.

Acts onasync_chat_completion
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Agent

A single agent endpoint that answers with optional web search, reasoning, and tools.1

Supports web, finance, and people search, URL fetch, a code sandbox, and reasoning effort from low to max. Became generally available in February 2026.

Acts onagent_response
Permission (capability)None required
VersionIntroduced 2026-02-01
Webhook eventNone
Rate limitStandard limits apply

Embeddings

Turn text into vectors for semantic search, clustering, and retrieval.2

No web search; turns text into vectors. Launched February 2026.

Acts onembedding
Permission (capability)None required
VersionIntroduced 2026-02-01
Webhook eventNone
Rate limitStandard limits apply

Chunks from the same document share context, improving retrieval for document-based applications.

Acts onembedding
Permission (capability)None required
VersionIntroduced 2026-02-01
Webhook eventNone
Rate limitStandard limits apply

Models

Read-only listing of the models available to the account.2

Read-only. Returns model identifiers usable with the agent endpoint. Introduced in April 2026.

Acts onmodel
Permission (capability)None required
VersionIntroduced 2026-04-01
Webhook eventNone
Rate limitStandard limits apply

Read-only. Introduced in April 2026 for dynamic model selection.

Acts onmodel
Permission (capability)None required
VersionIntroduced 2026-04-01
Webhook eventNone
Rate limitStandard limits apply

API Keys

Generate and revoke the auth tokens that authenticate API calls.2

Mints a credential that authenticates calls for the whole account.

Acts onauth_token
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Pass the token to revoke in the request body; it is invalidated immediately.

Acts onauth_token
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply
No endpoints match those filters.
Webhooks

Webhook events.

Perplexity does not push events. An agent gets results by reading a response directly, and for a long-running deep-research job it submits the request, then polls for the result by its id until the job is done.

EventWhat it signalsTriggered by
No events match that search.
Rate limits & pagination

Rate limits, pagination & request size.

Perplexity limits how fast an app can call by a per-minute request rate that depends on the model and on the account's spend-based usage tier, with the heavier deep-research model capped far lower than the standard Sonar models.

Request rate

Perplexity meters requests by a per-minute request rate, not by a per-call cost or point weighting, and the ceiling depends on both the model and the account's usage tier. A usage tier is a level set by how much the account has spent over its lifetime, from Tier 0 (new accounts, $0) up through Tier 5 ($5,000+ cumulative), and a tier is kept permanently once reached. The standard chat models (sonar, sonar-pro, sonar-reasoning-pro) allow 50 requests per minute at Tier 0, rising to 4,000 per minute at Tier 4. The heavier sonar-deep-research model is capped far lower, at 5 per minute at Tier 0 rising to 100 at Tier 5. Going over returns HTTP 429, and the limit uses a leaky-bucket model that allows short bursts.

Pagination

The list async chat completions method returns a next_token in its response, and passing that token back fetches the next page of requests. The search method instead caps the number of results returned through its max_results parameter rather than paging, and the chat completions method returns a single answer per call.

Request size

The search method returns up to 20 results per call through max_results (default 10), and search_context_size (low, medium, high) plus max_tokens and max_tokens_per_page bound how much page content is pulled in. A chat completion's length is bounded by the model's context window and the max_tokens parameter on the request.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

StatusCodeMeaningWhat to do
400bad_requestThe request was malformed or a parameter was invalid, for example user and assistant messages that are not alternating, or an unknown model name.Read the error message, fix the request body, and resend. The request is not retryable as-is.
401unauthorizedThe API key is missing, invalid, or deleted, or the account has run out of credits.Confirm a valid key is sent in the Authorization header, and top up credits in the API portal if the balance is exhausted.
429too_many_requestsThe per-minute rate limit for the account's usage tier and the chosen model was exceeded. Limits use a leaky-bucket model that allows short bursts.Back off and retry with exponential backoff and jitter, smooth the request rate, and raise the usage tier for sustained higher throughput.
500internal_server_errorAn error on Perplexity's side, which can also appear as 502, 503, or 504.Retry with backoff, and contact support if it persists.
Versioning & freshness

Version history.

Perplexity does not put a dated version in the API. There is one continuously updated API, and changes such as new models and new endpoints ship through dated release notes.

Version history

What changed, and when

Latest versionCurrent
CurrentCurrent version
Continuously updated API

Perplexity does not put a dated version in the API. There is one continuously updated API, and notable changes ship through dated release notes rather than a pinned version string.

What changed
  • May 2026: Finance Search tool added to the Agent API.
  • April 2026: GET /models endpoint introduced for dynamic model selection; one-time key reveal for new API keys.
  • February 2026: Embeddings API launched, with contextualized embeddings.
2025-12-15Requires migration
sonar-reasoning removed

The sonar-reasoning model was deprecated and removed, leaving sonar-reasoning-pro as the reasoning model. The Search API also gained max_tokens and last-updated filtering around this time.

What changed
  • sonar-reasoning deprecated and removed (December 15).
  • Search API: max_tokens parameter and last_updated filter added.
2025-Earlier
Sonar API and Search API

Perplexity launched the OpenAI-compatible Sonar chat completions API and, later, the standalone Search API and the asynchronous deep-research jobs.

What changed
  • Sonar chat completions API (OpenAI-compatible) launched.
  • Sonar Pro introduced for complex queries.
  • Standalone Search API and async deep-research jobs added.

There is no version to pin; track the changelog for new models and changes.

Perplexity API changelog ↗
Questions

Perplexity API, answered.

Is the Perplexity API OpenAI-compatible?+
The chat completions endpoint is. It takes the same messages array and model field, so the OpenAI client libraries can call Perplexity by pointing their base URL at https://api.perplexity.ai and passing a Perplexity key. The model field selects a Sonar model, and the response adds the web sources the answer was grounded in.
Which models can the API use?+
The current Sonar models are sonar (a lightweight, low-cost grounded search model), sonar-pro (advanced search for complex queries and follow-ups), sonar-reasoning-pro (grounded reasoning with a chain-of-thought), and sonar-deep-research (exhaustive research run as an async job). The earlier sonar-reasoning model was removed on 2025-12-15.
How do rate limits and usage tiers work?+
Limits are a per-minute request rate that depends on the model and on the account's usage tier. A usage tier is a level set by lifetime spend, from Tier 0 (new accounts) up through Tier 5 ($5,000+ cumulative), and once a tier is reached it is kept permanently. The standard chat models allow 50 requests per minute at Tier 0 up to 4,000 at Tier 4, while the deep-research model is capped far lower. Going over returns a 429.
How is deep research handled (the async API)?+
A deep-research request is submitted to the async chat completions endpoint, which returns a request id with status CREATED. The job runs an extended web search in the background, and the result is fetched by polling the get-by-id method until its status reaches COMPLETED. The list method shows all of the account's async requests and their statuses.
Does Perplexity send webhooks?+
No. Perplexity does not push events to an endpoint. A standard chat or search result is read straight from the response, and a long-running async job is retrieved by polling its request id until it is done.
Does Perplexity have an official MCP server?+
Yes. Perplexity publishes @perplexity-ai/mcp-server, an official Model Context Protocol server that runs locally over stdio (with an HTTP deployment option) and exposes four tools: perplexity_search, perplexity_ask, perplexity_research, and perplexity_reason. It authenticates with the PERPLEXITY_API_KEY environment variable.
Related

More ai API guides for agents

What is Bollard AI?

Control what every AI agent can do with Perplexity.

Bollard AI sits between a team's AI agents and Perplexity. Grant each agent exactly the access it needs, search and answer or nothing, and every call is checked and logged.

  • Allow live search and answers per agent, never a shared Perplexity key.
  • Denied by default, so an agent reaches only what has been explicitly allowed.
  • Every call recorded in plain English: who, what, where, and the decision.
Perplexity
Research Agent
Search the web ActionOffReadFull use
Ask Sonar (grounded answers) ActionOffReadFull use
Deep research reports ActionOffReadFull use
Per-agent access, set in Bollard AI, not in Perplexity