Everything an AI agent can do with the OpenAI API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints26
API versionv1
Last updated23 June 2026
Orientation

How the OpenAI API works.

The OpenAI API is how an app or AI agent generates text and tool-using responses, turns text into embeddings, transcribes or synthesizes audio, generates images, and runs content moderation. Access is granted through an API key scoped to a project, and a key can be set read-only or restricted so it reaches only the areas it was given. OpenAI can also push an event to a registered endpoint when a long-running job, like a batch or a background response, finishes.

26Endpoints
11Capability groups
10Read
16Write
12Permissions
Authentication
OpenAI authenticates with a Bearer API key, sent as Authorization: Bearer. A key belongs to one project and inherits that project's resource access, and an optional OpenAI-Organization or OpenAI-Project header targets a specific organization or project when an account has more than one. There is no OAuth for first-party calls; the key is the credential.
Permissions
A project API key is created with one of three permission levels: All (full project access), Read Only (list and read metadata, no content generation), or Restricted, where each endpoint group is set to None, Read, or Write independently. A call the key was not granted returns a 401, so a leaked restricted key reaches only the areas it was scoped to.
Versioning
The API serves a single path version and is updated continuously, so there is no dated version string to pin per request. Notable changes, additions, and deprecations are published with dates in the changelog, and models carry their own names and deprecation dates separately from the API itself.
Data model
OpenAI is resource-oriented over HTTPS: JSON request and response bodies, with multipart form uploads for files and audio. The Responses API is the current interface for generating tool-using output, with Chat Completions still supported for compatibility. Long-running work runs as Batches and fine-tuning jobs, whose completion can be delivered by webhook. Lists are cursor-paginated with an after parameter.
Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to OpenAI determines what it can reach. There is a route for making calls, and a route for receiving events when a long-running job finishes, and each is governed by the key behind it and the permissions that key carries.

Ways to connect

REST API

The REST API takes JSON request bodies and returns JSON, with multipart form uploads for files and audio, at https://api.openai.com/v1. A call authenticates with a project API key sent as Authorization: Bearer, and optional OpenAI-Organization and OpenAI-Project headers select the organization or project. Lists page through results with an after cursor.

Best forConnecting an app or AI agent to OpenAI.
Governed byThe project API key and the permission level it carries.
Docs ↗

Webhooks

OpenAI POSTs an event to an HTTPS endpoint registered per project when a long-running job finishes, like a background response, a batch, or a fine-tuning job. Deliveries follow the Standard Webhooks specification, carrying webhook-id, webhook-timestamp, and webhook-signature headers that the receiver verifies against the endpoint's signing secret.

Best forReceiving job-completion events at an app or AI agent.
Governed byThe signing secret on the endpoint.
Docs ↗
Authentication

Project API key (All access)

A standard project API key with full access to every endpoint in its project. It is sent as a Bearer token and is the default permission level. A key belongs to one project and inherits that project's resources; OpenAI recommends restricting keys for production rather than using full access everywhere.

TokenBearer API key (sk-proj-...)
Best forServer-side calls that need broad project access.
Docs ↗

Project API key (Restricted / Read Only)

A project key can be created Read Only, limited to listing and reading metadata so it cannot generate content, or Restricted, where each endpoint group is independently set to None, Read, or Write. It authenticates the same way as a full key, so a leaked restricted key reaches only the areas it was scoped to.

TokenBearer API key (sk-proj-...) with scoped permissions
Best forLeast-privilege server access scoped to specific endpoint groups.
Docs ↗

Service account key

A service account is a bot user inside a project, used to issue an API key that is not tied to an individual person, so access survives a person leaving the team. Its key defaults to read and write across the project's resources and can be narrowed in the project's API key settings.

TokenBearer API key issued to a project service account
Best forAutomated workloads that should not be tied to a personal account.
Docs ↗
Capability map

What an AI agent can do with OpenAI.

The OpenAI API is split into areas an agent can act on, like generating text and tool-using responses, creating embeddings, transcribing and synthesizing audio, generating images, running moderation, and managing files, batches, fine-tuning, and vector stores. Each area has its own methods, and a write here spends usage and creates billable jobs or stored objects.

Responses

3 endpoints

Methods for generating model output through the Responses API.

A write here spends model usage and is billed per token.
View endpoints

Chat Completions

1 endpoint

Methods for the Chat Completions interface, supported for compatibility.

A write here spends model usage and is billed per token.
View endpoints

Embeddings

1 endpoint

Methods for turning text into embedding vectors.

A write here spends model usage and is billed per token.
View endpoints

Models

2 endpoints

Methods for listing and inspecting available models.

Read-only metadata about available models.
View endpoints

Files

5 endpoints

Methods for uploading, listing, and retrieving stored files.

A write here uploads or deletes stored file data.
View endpoints

Images

2 endpoints

Methods for generating and editing images.

A write here spends model usage and is billed per image.
View endpoints

Audio

2 endpoints

Methods for transcribing audio and synthesizing speech.

A write here spends model usage and is billed per request.
View endpoints

Moderations

1 endpoint

Methods for classifying whether content is potentially harmful.

A write here submits content for classification.
View endpoints

Batches

3 endpoints

Methods for running large request sets asynchronously at lower cost.

A write here creates a billable batch job.
View endpoints

Fine-tuning

3 endpoints

Methods for training a custom model on uploaded data.

A write here creates a billable training job.
View endpoints

Vector Stores

3 endpoints

Methods for managing vector stores used by file search.

A write here creates or changes a stored, billable vector store.
View endpoints
Endpoint reference

Every OpenAI API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

MethodEndpointWhat it doesAccessPermissionVersion

Responses

Methods for generating model output through the Responses API.3

A core write; with background mode, completion is delivered by webhook. Spends model usage.

Acts onresponse
Permission (capability)Responses: write
VersionAvailable since the API’s base version
Webhook eventresponse.completed
Rate limitStandard limits apply

Read-only.

Acts onresponse
Permission (capability)Responses: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Only applies to responses created in background mode.

Acts onresponse
Permission (capability)Responses: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Chat Completions

Methods for the Chat Completions interface, supported for compatibility.1

Responses is the recommended interface for new integrations. Spends model usage.

Acts onchat.completion
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Embeddings

Methods for turning text into embedding vectors.1

Spends model usage; returns vectors for search and similarity.

Acts onembedding
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Models

Methods for listing and inspecting available models.2

Read-only.

Acts onmodel
Permission (capability)Models: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onmodel
Permission (capability)Models: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Files

Methods for uploading, listing, and retrieving stored files.5

A file is up to 512 MB; larger sources use the Uploads API. Tagged with a purpose, like fine-tune or batch.

Acts onfile
Permission (capability)Files: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only; can filter by purpose and page with after.

Acts onfile
Permission (capability)Files: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onfile
Permission (capability)Files: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only; returns the stored bytes.

Acts onfile
Permission (capability)Files: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Irreversible; also detaches the file from any vector store.

Acts onfile
Permission (capability)Files: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Images

Methods for generating and editing images.2

Spends model usage; billed per generated image.

Acts onimage
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Takes a multipart upload of the source image(s).

Acts onimage
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Audio

Methods for transcribing audio and synthesizing speech.2

Multipart audio upload; spends model usage.

Acts ontranscription
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Returns an audio stream; spends model usage.

Acts onspeech
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Moderations

Methods for classifying whether content is potentially harmful.1

Submitting content for classification; the moderation endpoint is free to use.

Acts onmoderation
Permission (capability)Model capabilities: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Batches

Methods for running large request sets asynchronously at lower cost.3

Runs within a window at lower cost; completion is delivered by webhook.

Acts onbatch
Permission (capability)Batch: write
VersionAvailable since the API’s base version
Webhook eventbatch.completed
Rate limitStandard limits apply

Read-only.

Acts onbatch
Permission (capability)Batch: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Stops further processing of the batch.

Acts onbatch
Permission (capability)Batch: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Fine-tuning

Methods for training a custom model on uploaded data.3

A billable training job; completion is delivered by webhook.

Acts onfine_tuning.job
Permission (capability)Fine-tuning: write
VersionAvailable since the API’s base version
Webhook eventfine_tuning.job.succeeded
Rate limitStandard limits apply

Read-only.

Acts onfine_tuning.job
Permission (capability)Fine-tuning: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Stops the training run.

Acts onfine_tuning.job
Permission (capability)Fine-tuning: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Vector Stores

Methods for managing vector stores used by file search.3

A stored, billable object that indexes attached files.

Acts onvector_store
Permission (capability)Vector stores: write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onvector_store
Permission (capability)Vector stores: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

A POST that reads, not writes; returns ranked results without changing the store.

Acts onvector_store
Permission (capability)Vector stores: read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply
No endpoints match those filters.
Webhooks

Webhook events.

OpenAI can notify an app when a long-running job finishes, like a background response completing, a batch finishing, or a fine-tuning job succeeding. It POSTs an event describing what changed, so an integration learns about completion without polling.

EventWhat it signalsTriggered by
response.completedA background response finished generating and is ready to retrieve. An integration fetches the response on this event instead of polling./v1/responses
response.failedA background response failed to complete./v1/responses
batch.completedA batch job finished and its output file is ready to download./v1/batches
batch.failedA batch job failed./v1/batches
fine_tuning.job.succeededA fine-tuning job completed and the fine-tuned model is available./v1/fine_tuning/jobs
fine_tuning.job.failedA fine-tuning job failed before producing a model./v1/fine_tuning/jobs
No events match that search.
Rate limits & pagination

Rate limits, pagination & request size.

OpenAI limits how fast an app can call, by a per-model rate measured in requests and tokens per minute that rises with an organization's usage tier.

Request rate

OpenAI meters usage per model, not by a single account-wide quota. Each model has limits measured in requests per minute and tokens per minute, with some endpoints also limited in requests or tokens per day, images per minute, or audio minutes per minute; models in a shared-limit group draw from one pool. The ceilings rise automatically as an organization moves up usage tiers (Free, then Tier 1 to Tier 5) with cumulative spend. Going over returns HTTP 429, and every response carries x-ratelimit-limit-requests, x-ratelimit-remaining-tokens, and related headers showing the current allowance and reset time. A 429 with type insufficient_quota is a billing problem, not a speed problem, and backing off will not clear it.

Pagination

A list endpoint is cursor-based: after takes an object id to fetch the next page, limit sets the page size, and order sorts by creation time. A has_more field in the response signals whether more pages remain, and the SDKs offer auto-pagination over the cursor. Page-size ranges and defaults vary by endpoint, for example the Files list accepts a limit of 1 to 10,000.

Request size

An individual file uploaded through the Files API can be up to 512 MB, and a project can store up to roughly 2.5 TB of files in total. Larger source files can be sent in parts through the Uploads API, which assembles them into a single File. Token limits per request are set by the chosen model's context window rather than by the API.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

StatusCodeMeaningWhat to do
400invalid_request_errorThe request was malformed or missing a required parameter. The error object names the offending field in error.param.Read error.message and error.param, fix the request body, and resend. It is not retryable as-is.
401invalid_api_keyThe API key is missing, wrong, revoked, or lacks permission for this endpoint group under a restricted key.Send a valid key for the right project, and grant the endpoint group on a restricted key if that is the cause.
403unsupported_country_region_territoryThe request came from a country, region, or territory OpenAI does not support.Call from a supported location, and check the supported-countries documentation.
404not_foundThe requested object, like a file, model, or job, does not exist or is not visible to this key or project.Verify the id and confirm it lives in the same project the key belongs to.
429rate_limit_exceededThe request rate or token rate for the model was exceeded for the current usage tier.Back off and retry with exponential backoff, and smooth the request rate. The x-ratelimit headers show the remaining allowance and reset time.
429insufficient_quotaThe account has no remaining credits or has hit its billing limit. This is a billing problem, not a speed one.Add a payment method or raise the spend limit. Backing off will not clear this error.
500server_errorAn error on OpenAI's side while processing the request. It is uncommon.Retry after a brief wait, and check the status page if it persists.
503engine_overloadedThe service is temporarily overloaded by traffic.Reduce the request rate, hold steady, then ramp back up gradually.
Versioning & freshness

Version history.

OpenAI serves a single, continuously updated API under one path version, and ships dated, mostly backward-compatible changes through its changelog rather than minting new version numbers per release.

Version history

What changed, and when

Latest versionv1
v1Current version
Single path version, continuously updated

OpenAI serves one path version and ships dated, mostly backward-compatible changes through its changelog rather than minting a new version number per release. Models carry their own names and deprecation dates separately from the API.

What changed
  • Responses API is the current interface for tool-using model output; Chat Completions remains supported.
  • Webhooks deliver completion of background responses, batches, fine-tuning jobs, and eval runs.
  • Permission levels on project API keys: All, Read Only, and Restricted (per-group None/Read/Write).
2026-06-04Feature update
Moderation scores in Responses and Chat Completions

Moderation scores were added to the Responses and Chat Completions APIs for evaluating input and output.

2026-05-29Feature update
Extended prompt caching defaults to 24h

For organizations without zero-data-retention, prompt_cache_retention now defaults to 24-hour retention instead of in-memory only.

2026-02-23Feature update
WebSocket mode for the Responses API

A WebSocket mode launched for the Responses API, enabling streaming connections.

2025-08-20Feature update
Conversations API

The Conversations API was released to manage long-running discussions alongside the Responses API.

There is one path version; track the changelog for dated additions and deprecations.

OpenAI API changelog ↗
Questions

OpenAI API, answered.

How does authentication work, and is there OAuth?+
First-party API calls authenticate with a Bearer API key sent in the Authorization header. There is no OAuth flow for calling the API itself. A key belongs to a single project, and an optional OpenAI-Organization or OpenAI-Project header selects which organization or project a request runs against when an account has more than one.
Can an API key be limited to certain endpoints?+
Yes. When a project key is created it can be set to All (full access), Read Only (list and read metadata only, so it cannot generate content), or Restricted, where each endpoint group is independently set to None, Read, or Write. A restricted key that is denied an area returns a 401 if it tries to call it, which limits the blast radius of a leaked key.
What is the difference between the Responses API and Chat Completions?+
Responses is OpenAI's current API for generating model output, built for tool use and agents, with built-in tools like web search and file search and stateful conversation handling. Chat Completions is the older interface and remains supported for compatibility. New integrations are pointed at Responses; both ultimately call the underlying models.
How do I verify a webhook came from OpenAI?+
Webhooks follow the Standard Webhooks specification. Each delivery carries webhook-id, webhook-timestamp, and webhook-signature headers, and the receiver verifies the signature against the endpoint's signing secret. The OpenAI SDKs provide an unwrap helper that validates the signature and parses the event, raising an error on a mismatch so a spoofed request is rejected.
Why am I getting a 429 when I have not sent many requests?+
A 429 has two causes. With type rate_limit_exceeded the request rate or token rate for the model was exceeded, and exponential backoff clears it. With type insufficient_quota the account has no remaining credits or has hit its billing limit, which backing off will not fix; it needs a payment method or a higher spend limit.
Are API rate limits the same for every model?+
No. Limits are set per model in requests and tokens per minute, and some models share a single pooled limit. The ceilings rise as an organization moves up usage tiers, from Free through Tier 1 to Tier 5, based on cumulative spend and time. The exact numbers for an account are shown in its limits settings.
Does OpenAI host a Model Context Protocol server for its own API?+
No first-party hosted MCP server exposes the OpenAI API itself. OpenAI's MCP support runs the other way: the Responses API includes a hosted MCP tool that lets a model connect out to third-party MCP servers and connectors. To govern access to the OpenAI API, the credential is still the project API key.
Related

More ai API guides for agents

What is Bollard AI?

Control what every AI agent can do with OpenAI.

Bollard AI sits between a team's AI agents and OpenAI. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.

  • Set read, write, or full access per agent, never a shared OpenAI key.
  • Denied by default, so an agent reaches only what has been explicitly allowed.
  • Every call recorded in plain English: who, what, where, and the decision.
OpenAI
Drafting Agent
Generate responses ActionOffReadFull use
Create embeddings ActionOffReadFull use
Start fine-tuning jobs ActionOffReadFull use
Files ResourceOffReadFull use
Per-agent access, set in Bollard AI, not in OpenAI