Everything an AI agent can do with the OpenAI API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints26

API versionv1

Last updated23 June 2026

Orientation

How the OpenAI API works.

The OpenAI API is how an app or AI agent generates text and tool-using responses, turns text into embeddings, transcribes or synthesizes audio, generates images, and runs content moderation. Access is granted through an API key scoped to a project, and a key can be set read-only or restricted so it reaches only the areas it was given. OpenAI can also push an event to a registered endpoint when a long-running job, like a batch or a background response, finishes.

26Endpoints

11Capability groups

10Read

16Write

12Permissions

Authentication

OpenAI authenticates with a Bearer API key, sent as Authorization: Bearer. A key belongs to one project and inherits that project's resource access, and an optional OpenAI-Organization or OpenAI-Project header targets a specific organization or project when an account has more than one. There is no OAuth for first-party calls; the key is the credential.

Permissions

A project API key is created with one of three permission levels: All (full project access), Read Only (list and read metadata, no content generation), or Restricted, where each endpoint group is set to None, Read, or Write independently. A call the key was not granted returns a 401, so a leaked restricted key reaches only the areas it was scoped to.

Versioning

The API serves a single path version and is updated continuously, so there is no dated version string to pin per request. Notable changes, additions, and deprecations are published with dates in the changelog, and models carry their own names and deprecation dates separately from the API itself.

Data model

OpenAI is resource-oriented over HTTPS: JSON request and response bodies, with multipart form uploads for files and audio. The Responses API is the current interface for generating tool-using output, with Chat Completions still supported for compatibility. Long-running work runs as Batches and fine-tuning jobs, whose completion can be delivered by webhook. Lists are cursor-paginated with an after parameter.

Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to OpenAI determines what it can reach. There is a route for making calls, and a route for receiving events when a long-running job finishes, and each is governed by the key behind it and the permissions that key carries.

Ways to connect

REST API

The REST API takes JSON request bodies and returns JSON, with multipart form uploads for files and audio, at https://api.openai.com/v1. A call authenticates with a project API key sent as Authorization: Bearer, and optional OpenAI-Organization and OpenAI-Project headers select the organization or project. Lists page through results with an after cursor.

Best forConnecting an app or AI agent to OpenAI.

Governed byThe project API key and the permission level it carries.

Docs ↗

Webhooks

OpenAI POSTs an event to an HTTPS endpoint registered per project when a long-running job finishes, like a background response, a batch, or a fine-tuning job. Deliveries follow the Standard Webhooks specification, carrying webhook-id, webhook-timestamp, and webhook-signature headers that the receiver verifies against the endpoint's signing secret.

Best forReceiving job-completion events at an app or AI agent.

Governed byThe signing secret on the endpoint.

Docs ↗

Authentication

Project API key (All access)

A standard project API key with full access to every endpoint in its project. It is sent as a Bearer token and is the default permission level. A key belongs to one project and inherits that project's resources; OpenAI recommends restricting keys for production rather than using full access everywhere.

TokenBearer API key (sk-proj-...)

Best forServer-side calls that need broad project access.

Docs ↗

Project API key (Restricted / Read Only)

A project key can be created Read Only, limited to listing and reading metadata so it cannot generate content, or Restricted, where each endpoint group is independently set to None, Read, or Write. It authenticates the same way as a full key, so a leaked restricted key reaches only the areas it was scoped to.

TokenBearer API key (sk-proj-...) with scoped permissions

Best forLeast-privilege server access scoped to specific endpoint groups.

Docs ↗

Service account key

A service account is a bot user inside a project, used to issue an API key that is not tied to an individual person, so access survives a person leaving the team. Its key defaults to read and write across the project's resources and can be narrowed in the project's API key settings.

TokenBearer API key issued to a project service account

Best forAutomated workloads that should not be tied to a personal account.

Docs ↗

Capability map

What an AI agent can do with OpenAI.

The OpenAI API is split into areas an agent can act on, like generating text and tool-using responses, creating embeddings, transcribing and synthesizing audio, generating images, running moderation, and managing files, batches, fine-tuning, and vector stores. Each area has its own methods, and a write here spends usage and creates billable jobs or stored objects.

Responses

3 endpoints

Methods for generating model output through the Responses API.

A write here spends model usage and is billed per token.

View endpoints →

Chat Completions

1 endpoint

Methods for the Chat Completions interface, supported for compatibility.

A write here spends model usage and is billed per token.

View endpoints →

Embeddings

1 endpoint

Methods for turning text into embedding vectors.

A write here spends model usage and is billed per token.

View endpoints →

Models

2 endpoints

Methods for listing and inspecting available models.

Read-only metadata about available models.

View endpoints →

Files

5 endpoints

Methods for uploading, listing, and retrieving stored files.

A write here uploads or deletes stored file data.

View endpoints →

Images

2 endpoints

Methods for generating and editing images.

A write here spends model usage and is billed per image.

View endpoints →

Audio

2 endpoints

Methods for transcribing audio and synthesizing speech.

A write here spends model usage and is billed per request.

View endpoints →

Moderations

1 endpoint

Methods for classifying whether content is potentially harmful.

A write here submits content for classification.

View endpoints →

Batches

3 endpoints

Methods for running large request sets asynchronously at lower cost.

A write here creates a billable batch job.

View endpoints →

Fine-tuning

3 endpoints

Methods for training a custom model on uploaded data.

A write here creates a billable training job.

View endpoints →

Vector Stores

3 endpoints

Methods for managing vector stores used by file search.

A write here creates or changes a stored, billable vector store.

View endpoints →

Endpoint reference

Every OpenAI API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

Hide deprecated

Method	Endpoint	What it does	Access	Permission	Version
Responses Methods for generating model output through the Responses API.3
POST	`/v1/responses`	Create a model response, the current interface for generating text and tool-using output.	write	`Responses: write`	Current
A core write; with background mode, completion is delivered by webhook. Spends model usage. Acts onresponse Permission (capability)`Responses: write` VersionAvailable since the API’s base version Webhook event`response.completed` Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/responses/{response_id}`	Retrieve a previously created model response.	read	`Responses: read`	Current
Read-only. Acts onresponse Permission (capability)`Responses: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/responses/{response_id}/cancel`	Cancel a background response that is still running.	write	`Responses: write`	Current
Only applies to responses created in background mode. Acts onresponse Permission (capability)`Responses: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Chat Completions Methods for the Chat Completions interface, supported for compatibility.1
POST	`/v1/chat/completions`	Create a chat completion, the older interface for generating model output, supported for compatibility.	write	`Model capabilities: write`	Current
Responses is the recommended interface for new integrations. Spends model usage. Acts onchat.completion Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Embeddings Methods for turning text into embedding vectors.1
POST	`/v1/embeddings`	Create an embedding vector representing the input text.	write	`Model capabilities: write`	Current
Spends model usage; returns vectors for search and similarity. Acts onembedding Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Models Methods for listing and inspecting available models.2
GET	`/v1/models`	List the models available to the account, with basic metadata.	read	`Models: read`	Current
Read-only. Acts onmodel Permission (capability)`Models: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/models/{model}`	Retrieve a single model's metadata, such as ownership and permissions.	read	`Models: read`	Current
Read-only. Acts onmodel Permission (capability)`Models: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Files Methods for uploading, listing, and retrieving stored files.5
POST	`/v1/files`	Upload a file for use by other endpoints, such as fine-tuning, batches, or assistants.	write	`Files: write`	Current
A file is up to 512 MB; larger sources use the Uploads API. Tagged with a purpose, like fine-tune or batch. Acts onfile Permission (capability)`Files: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/files`	List the files that belong to the project (cursor-paginated).	read	`Files: read`	Current
Read-only; can filter by purpose and page with after. Acts onfile Permission (capability)`Files: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/files/{file_id}`	Retrieve metadata about a specific file.	read	`Files: read`	Current
Read-only. Acts onfile Permission (capability)`Files: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/files/{file_id}/content`	Download the contents of a file.	read	`Files: read`	Current
Read-only; returns the stored bytes. Acts onfile Permission (capability)`Files: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
DELETE	`/v1/files/{file_id}`	Delete a file and remove it from all vector stores.	write	`Files: write`	Current
Irreversible; also detaches the file from any vector store. Acts onfile Permission (capability)`Files: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Images Methods for generating and editing images.2
POST	`/v1/images/generations`	Generate one or more images from a text prompt.	write	`Model capabilities: write`	Current
Spends model usage; billed per generated image. Acts onimage Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/images/edits`	Edit or extend an image from one or more source images and a prompt.	write	`Model capabilities: write`	Current
Takes a multipart upload of the source image(s). Acts onimage Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Audio Methods for transcribing audio and synthesizing speech.2
POST	`/v1/audio/transcriptions`	Transcribe an audio file into text.	write	`Model capabilities: write`	Current
Multipart audio upload; spends model usage. Acts ontranscription Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/audio/speech`	Generate spoken audio from input text.	write	`Model capabilities: write`	Current
Returns an audio stream; spends model usage. Acts onspeech Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Moderations Methods for classifying whether content is potentially harmful.1
POST	`/v1/moderations`	Classify whether text or image content is potentially harmful.	write	`Model capabilities: write`	Current
Submitting content for classification; the moderation endpoint is free to use. Acts onmoderation Permission (capability)`Model capabilities: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Batches Methods for running large request sets asynchronously at lower cost.3
POST	`/v1/batches`	Create and run a batch from an uploaded file of requests, processed asynchronously.	write	`Batch: write`	Current
Runs within a window at lower cost; completion is delivered by webhook. Acts onbatch Permission (capability)`Batch: write` VersionAvailable since the API’s base version Webhook event`batch.completed` Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/batches/{batch_id}`	Retrieve a batch and its current status.	read	`Batch: read`	Current
Read-only. Acts onbatch Permission (capability)`Batch: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/batches/{batch_id}/cancel`	Cancel a batch that is still in progress.	write	`Batch: write`	Current
Stops further processing of the batch. Acts onbatch Permission (capability)`Batch: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Fine-tuning Methods for training a custom model on uploaded data.3
POST	`/v1/fine_tuning/jobs`	Create a fine-tuning job to train a custom model from an uploaded training file.	write	`Fine-tuning: write`	Current
A billable training job; completion is delivered by webhook. Acts onfine_tuning.job Permission (capability)`Fine-tuning: write` VersionAvailable since the API’s base version Webhook event`fine_tuning.job.succeeded` Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/fine_tuning/jobs`	List fine-tuning jobs in the project (cursor-paginated).	read	`Fine-tuning: read`	Current
Read-only. Acts onfine_tuning.job Permission (capability)`Fine-tuning: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/fine_tuning/jobs/{fine_tuning_job_id}/cancel`	Cancel a fine-tuning job that is still running.	write	`Fine-tuning: write`	Current
Stops the training run. Acts onfine_tuning.job Permission (capability)`Fine-tuning: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Vector Stores Methods for managing vector stores used by file search.3
POST	`/v1/vector_stores`	Create a vector store that file search can run against.	write	`Vector stores: write`	Current
A stored, billable object that indexes attached files. Acts onvector_store Permission (capability)`Vector stores: write` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1/vector_stores`	List the vector stores in the project (cursor-paginated).	read	`Vector stores: read`	Current
Read-only. Acts onvector_store Permission (capability)`Vector stores: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1/vector_stores/{vector_store_id}/search`	Search a vector store for the most relevant chunks for a query.	read	`Vector stores: read`	Current
A POST that reads, not writes; returns ranked results without changing the store. Acts onvector_store Permission (capability)`Vector stores: read` VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗

No endpoints match those filters.

Webhooks

Webhook events.

OpenAI can notify an app when a long-running job finishes, like a background response completing, a batch finishing, or a fine-tuning job succeeding. It POSTs an event describing what changed, so an integration learns about completion without polling.

Event	What it signals	Triggered by
`response.completed`	A background response finished generating and is ready to retrieve. An integration fetches the response on this event instead of polling.	`/v1/responses`
`response.failed`	A background response failed to complete.	`/v1/responses`
`batch.completed`	A batch job finished and its output file is ready to download.	`/v1/batches`
`batch.failed`	A batch job failed.	`/v1/batches`
`fine_tuning.job.succeeded`	A fine-tuning job completed and the fine-tuned model is available.	`/v1/fine_tuning/jobs`
`fine_tuning.job.failed`	A fine-tuning job failed before producing a model.	`/v1/fine_tuning/jobs`

No events match that search.

Rate limits & pagination

Rate limits, pagination & request size.

OpenAI limits how fast an app can call, by a per-model rate measured in requests and tokens per minute that rises with an organization's usage tier.

Request rate

OpenAI meters usage per model, not by a single account-wide quota. Each model has limits measured in requests per minute and tokens per minute, with some endpoints also limited in requests or tokens per day, images per minute, or audio minutes per minute; models in a shared-limit group draw from one pool. The ceilings rise automatically as an organization moves up usage tiers (Free, then Tier 1 to Tier 5) with cumulative spend. Going over returns HTTP 429, and every response carries x-ratelimit-limit-requests, x-ratelimit-remaining-tokens, and related headers showing the current allowance and reset time. A 429 with type insufficient_quota is a billing problem, not a speed problem, and backing off will not clear it.

Pagination

A list endpoint is cursor-based: after takes an object id to fetch the next page, limit sets the page size, and order sorts by creation time. A has_more field in the response signals whether more pages remain, and the SDKs offer auto-pagination over the cursor. Page-size ranges and defaults vary by endpoint, for example the Files list accepts a limit of 1 to 10,000.

Request size

An individual file uploaded through the Files API can be up to 512 MB, and a project can store up to roughly 2.5 TB of files in total. Larger source files can be sent in parts through the Uploads API, which assembles them into a single File. Token limits per request are set by the chosen model's context window rather than by the API.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

Status	Code	Meaning	What to do
400	`invalid_request_error`	The request was malformed or missing a required parameter. The error object names the offending field in error.param.	Read error.message and error.param, fix the request body, and resend. It is not retryable as-is.
401	`invalid_api_key`	The API key is missing, wrong, revoked, or lacks permission for this endpoint group under a restricted key.	Send a valid key for the right project, and grant the endpoint group on a restricted key if that is the cause.
403	`unsupported_country_region_territory`	The request came from a country, region, or territory OpenAI does not support.	Call from a supported location, and check the supported-countries documentation.
404	`not_found`	The requested object, like a file, model, or job, does not exist or is not visible to this key or project.	Verify the id and confirm it lives in the same project the key belongs to.
429	`rate_limit_exceeded`	The request rate or token rate for the model was exceeded for the current usage tier.	Back off and retry with exponential backoff, and smooth the request rate. The x-ratelimit headers show the remaining allowance and reset time.
429	`insufficient_quota`	The account has no remaining credits or has hit its billing limit. This is a billing problem, not a speed one.	Add a payment method or raise the spend limit. Backing off will not clear this error.
500	`server_error`	An error on OpenAI's side while processing the request. It is uncommon.	Retry after a brief wait, and check the status page if it persists.
503	`engine_overloaded`	The service is temporarily overloaded by traffic.	Reduce the request rate, hold steady, then ramp back up gradually.

Versioning & freshness

Version history.

OpenAI serves a single, continuously updated API under one path version, and ships dated, mostly backward-compatible changes through its changelog rather than minting new version numbers per release.

Version history

What changed, and when

Latest versionv1

v1Current version

Single path version, continuously updated

OpenAI serves one path version and ships dated, mostly backward-compatible changes through its changelog rather than minting a new version number per release. Models carry their own names and deprecation dates separately from the API.

What changed

Responses API is the current interface for tool-using model output; Chat Completions remains supported.
Webhooks deliver completion of background responses, batches, fine-tuning jobs, and eval runs.
Permission levels on project API keys: All, Read Only, and Restricted (per-group None/Read/Write).

2026-06-04Feature update

Moderation scores in Responses and Chat Completions

Moderation scores were added to the Responses and Chat Completions APIs for evaluating input and output.

2026-05-29Feature update

Extended prompt caching defaults to 24h

For organizations without zero-data-retention, prompt_cache_retention now defaults to 24-hour retention instead of in-memory only.

2026-02-23Feature update

WebSocket mode for the Responses API

A WebSocket mode launched for the Responses API, enabling streaming connections.

2025-08-20Feature update

Conversations API

The Conversations API was released to manage long-running discussions alongside the Responses API.

There is one path version; track the changelog for dated additions and deprecations.

OpenAI API changelog ↗

Questions

OpenAI API, answered.

How does authentication work, and is there OAuth?+

First-party API calls authenticate with a Bearer API key sent in the Authorization header. There is no OAuth flow for calling the API itself. A key belongs to a single project, and an optional OpenAI-Organization or OpenAI-Project header selects which organization or project a request runs against when an account has more than one.

Can an API key be limited to certain endpoints?+

Yes. When a project key is created it can be set to All (full access), Read Only (list and read metadata only, so it cannot generate content), or Restricted, where each endpoint group is independently set to None, Read, or Write. A restricted key that is denied an area returns a 401 if it tries to call it, which limits the blast radius of a leaked key.

What is the difference between the Responses API and Chat Completions?+

Responses is OpenAI's current API for generating model output, built for tool use and agents, with built-in tools like web search and file search and stateful conversation handling. Chat Completions is the older interface and remains supported for compatibility. New integrations are pointed at Responses; both ultimately call the underlying models.

How do I verify a webhook came from OpenAI?+

Webhooks follow the Standard Webhooks specification. Each delivery carries webhook-id, webhook-timestamp, and webhook-signature headers, and the receiver verifies the signature against the endpoint's signing secret. The OpenAI SDKs provide an unwrap helper that validates the signature and parses the event, raising an error on a mismatch so a spoofed request is rejected.

Why am I getting a 429 when I have not sent many requests?+

A 429 has two causes. With type rate_limit_exceeded the request rate or token rate for the model was exceeded, and exponential backoff clears it. With type insufficient_quota the account has no remaining credits or has hit its billing limit, which backing off will not fix; it needs a payment method or a higher spend limit.

Are API rate limits the same for every model?+

No. Limits are set per model in requests and tokens per minute, and some models share a single pooled limit. The ceilings rise as an organization moves up usage tiers, from Free through Tier 1 to Tier 5, based on cumulative spend and time. The exact numbers for an account are shown in its limits settings.

Does OpenAI host a Model Context Protocol server for its own API?+

No first-party hosted MCP server exposes the OpenAI API itself. OpenAI's MCP support runs the other way: the Responses API includes a hosted MCP tool that lets a model connect out to third-party MCP servers and connectors. To govern access to the OpenAI API, the credential is still the project API key.

What is Bollard AI?

Control what every AI agent can do with OpenAI.

Bollard AI sits between a team's AI agents and OpenAI. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.

Set read, write, or full access per agent, never a shared OpenAI key.
Denied by default, so an agent reaches only what has been explicitly allowed.
Every call recorded in plain English: who, what, where, and the decision.

Control OpenAI access in Bollard Browse all APIs →

OpenAI

Drafting Agent

Generate responses ActionOffReadFull use

Create embeddings ActionOffReadFull use

Start fine-tuning jobs ActionOffReadFull use

Files ResourceOffReadFull use

Per-agent access, set in Bollard AI, not in OpenAI

How the OpenAI API works.

Connection & authentication methods.

REST API

Webhooks

Project API key (All access)

Project API key (Restricted / Read Only)

Service account key

What an AI agent can do with OpenAI.

Responses

Chat Completions

Embeddings

Models

Files

Images

Audio

Moderations

Batches

Fine-tuning

Vector Stores

Every OpenAI API method.

Responses

Chat Completions

Embeddings

Models

Files

Images

Audio

Moderations

Batches

Fine-tuning

Vector Stores

Webhook events.

Rate limits, pagination & request size.

Request rate

Pagination

Request size

Status codes & error handling.

Version history.

What changed, and when

OpenAI API, answered.

More ai API guides for agents

Hugging Face

ElevenLabs

Replicate

Google Gemini

Anthropic

Cohere

Control what every AI agent can do with OpenAI.