A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.
The OpenAI API is how an app or AI agent generates text and tool-using responses, turns text into embeddings, transcribes or synthesizes audio, generates images, and runs content moderation. Access is granted through an API key scoped to a project, and a key can be set read-only or restricted so it reaches only the areas it was given. OpenAI can also push an event to a registered endpoint when a long-running job, like a batch or a background response, finishes.
How an app or AI agent connects to OpenAI determines what it can reach. There is a route for making calls, and a route for receiving events when a long-running job finishes, and each is governed by the key behind it and the permissions that key carries.
The REST API takes JSON request bodies and returns JSON, with multipart form uploads for files and audio, at https://api.openai.com/v1. A call authenticates with a project API key sent as Authorization: Bearer, and optional OpenAI-Organization and OpenAI-Project headers select the organization or project. Lists page through results with an after cursor.
OpenAI POSTs an event to an HTTPS endpoint registered per project when a long-running job finishes, like a background response, a batch, or a fine-tuning job. Deliveries follow the Standard Webhooks specification, carrying webhook-id, webhook-timestamp, and webhook-signature headers that the receiver verifies against the endpoint's signing secret.
A standard project API key with full access to every endpoint in its project. It is sent as a Bearer token and is the default permission level. A key belongs to one project and inherits that project's resources; OpenAI recommends restricting keys for production rather than using full access everywhere.
A project key can be created Read Only, limited to listing and reading metadata so it cannot generate content, or Restricted, where each endpoint group is independently set to None, Read, or Write. It authenticates the same way as a full key, so a leaked restricted key reaches only the areas it was scoped to.
A service account is a bot user inside a project, used to issue an API key that is not tied to an individual person, so access survives a person leaving the team. Its key defaults to read and write across the project's resources and can be narrowed in the project's API key settings.
The OpenAI API is split into areas an agent can act on, like generating text and tool-using responses, creating embeddings, transcribing and synthesizing audio, generating images, running moderation, and managing files, batches, fine-tuning, and vector stores. Each area has its own methods, and a write here spends usage and creates billable jobs or stored objects.
Methods for generating model output through the Responses API.
Methods for the Chat Completions interface, supported for compatibility.
Methods for turning text into embedding vectors.
Methods for listing and inspecting available models.
Methods for uploading, listing, and retrieving stored files.
Methods for generating and editing images.
Methods for transcribing audio and synthesizing speech.
Methods for classifying whether content is potentially harmful.
Methods for running large request sets asynchronously at lower cost.
Methods for training a custom model on uploaded data.
Methods for managing vector stores used by file search.
Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.
| Method | Endpoint | What it does | Access | Permission | Version | |
|---|---|---|---|---|---|---|
ResponsesMethods for generating model output through the Responses API.3 | ||||||
| POST | /v1/responses | Create a model response, the current interface for generating text and tool-using output. | write | Responses: write | Current | |
A core write; with background mode, completion is delivered by webhook. Spends model usage. Acts onresponse Permission (capability) Responses: writeVersionAvailable since the API’s base version Webhook event response.completedRate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/responses/{response_id} | Retrieve a previously created model response. | read | Responses: read | Current | |
Read-only. Acts onresponse Permission (capability) Responses: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/responses/{response_id}/cancel | Cancel a background response that is still running. | write | Responses: write | Current | |
Only applies to responses created in background mode. Acts onresponse Permission (capability) Responses: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Chat CompletionsMethods for the Chat Completions interface, supported for compatibility.1 | ||||||
| POST | /v1/chat/completions | Create a chat completion, the older interface for generating model output, supported for compatibility. | write | Model capabilities: write | Current | |
Responses is the recommended interface for new integrations. Spends model usage. Acts onchat.completion Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
EmbeddingsMethods for turning text into embedding vectors.1 | ||||||
| POST | /v1/embeddings | Create an embedding vector representing the input text. | write | Model capabilities: write | Current | |
Spends model usage; returns vectors for search and similarity. Acts onembedding Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ModelsMethods for listing and inspecting available models.2 | ||||||
| GET | /v1/models | List the models available to the account, with basic metadata. | read | Models: read | Current | |
Read-only. Acts onmodel Permission (capability) Models: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/models/{model} | Retrieve a single model's metadata, such as ownership and permissions. | read | Models: read | Current | |
Read-only. Acts onmodel Permission (capability) Models: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
FilesMethods for uploading, listing, and retrieving stored files.5 | ||||||
| POST | /v1/files | Upload a file for use by other endpoints, such as fine-tuning, batches, or assistants. | write | Files: write | Current | |
A file is up to 512 MB; larger sources use the Uploads API. Tagged with a purpose, like fine-tune or batch. Acts onfile Permission (capability) Files: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/files | List the files that belong to the project (cursor-paginated). | read | Files: read | Current | |
Read-only; can filter by purpose and page with after. Acts onfile Permission (capability) Files: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/files/{file_id} | Retrieve metadata about a specific file. | read | Files: read | Current | |
Read-only. Acts onfile Permission (capability) Files: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/files/{file_id}/content | Download the contents of a file. | read | Files: read | Current | |
Read-only; returns the stored bytes. Acts onfile Permission (capability) Files: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1/files/{file_id} | Delete a file and remove it from all vector stores. | write | Files: write | Current | |
Irreversible; also detaches the file from any vector store. Acts onfile Permission (capability) Files: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ImagesMethods for generating and editing images.2 | ||||||
| POST | /v1/images/generations | Generate one or more images from a text prompt. | write | Model capabilities: write | Current | |
Spends model usage; billed per generated image. Acts onimage Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/images/edits | Edit or extend an image from one or more source images and a prompt. | write | Model capabilities: write | Current | |
Takes a multipart upload of the source image(s). Acts onimage Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
AudioMethods for transcribing audio and synthesizing speech.2 | ||||||
| POST | /v1/audio/transcriptions | Transcribe an audio file into text. | write | Model capabilities: write | Current | |
Multipart audio upload; spends model usage. Acts ontranscription Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/audio/speech | Generate spoken audio from input text. | write | Model capabilities: write | Current | |
Returns an audio stream; spends model usage. Acts onspeech Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ModerationsMethods for classifying whether content is potentially harmful.1 | ||||||
| POST | /v1/moderations | Classify whether text or image content is potentially harmful. | write | Model capabilities: write | Current | |
Submitting content for classification; the moderation endpoint is free to use. Acts onmoderation Permission (capability) Model capabilities: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
BatchesMethods for running large request sets asynchronously at lower cost.3 | ||||||
| POST | /v1/batches | Create and run a batch from an uploaded file of requests, processed asynchronously. | write | Batch: write | Current | |
Runs within a window at lower cost; completion is delivered by webhook. Acts onbatch Permission (capability) Batch: writeVersionAvailable since the API’s base version Webhook event batch.completedRate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/batches/{batch_id} | Retrieve a batch and its current status. | read | Batch: read | Current | |
Read-only. Acts onbatch Permission (capability) Batch: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/batches/{batch_id}/cancel | Cancel a batch that is still in progress. | write | Batch: write | Current | |
Stops further processing of the batch. Acts onbatch Permission (capability) Batch: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Fine-tuningMethods for training a custom model on uploaded data.3 | ||||||
| POST | /v1/fine_tuning/jobs | Create a fine-tuning job to train a custom model from an uploaded training file. | write | Fine-tuning: write | Current | |
A billable training job; completion is delivered by webhook. Acts onfine_tuning.job Permission (capability) Fine-tuning: writeVersionAvailable since the API’s base version Webhook event fine_tuning.job.succeededRate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/fine_tuning/jobs | List fine-tuning jobs in the project (cursor-paginated). | read | Fine-tuning: read | Current | |
Read-only. Acts onfine_tuning.job Permission (capability) Fine-tuning: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/fine_tuning/jobs/{fine_tuning_job_id}/cancel | Cancel a fine-tuning job that is still running. | write | Fine-tuning: write | Current | |
Stops the training run. Acts onfine_tuning.job Permission (capability) Fine-tuning: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Vector StoresMethods for managing vector stores used by file search.3 | ||||||
| POST | /v1/vector_stores | Create a vector store that file search can run against. | write | Vector stores: write | Current | |
A stored, billable object that indexes attached files. Acts onvector_store Permission (capability) Vector stores: writeVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/vector_stores | List the vector stores in the project (cursor-paginated). | read | Vector stores: read | Current | |
Read-only. Acts onvector_store Permission (capability) Vector stores: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/vector_stores/{vector_store_id}/search | Search a vector store for the most relevant chunks for a query. | read | Vector stores: read | Current | |
A POST that reads, not writes; returns ranked results without changing the store. Acts onvector_store Permission (capability) Vector stores: readVersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
OpenAI can notify an app when a long-running job finishes, like a background response completing, a batch finishing, or a fine-tuning job succeeding. It POSTs an event describing what changed, so an integration learns about completion without polling.
| Event | What it signals | Triggered by |
|---|---|---|
response.completed | A background response finished generating and is ready to retrieve. An integration fetches the response on this event instead of polling. | /v1/responses |
response.failed | A background response failed to complete. | /v1/responses |
batch.completed | A batch job finished and its output file is ready to download. | /v1/batches |
batch.failed | A batch job failed. | /v1/batches |
fine_tuning.job.succeeded | A fine-tuning job completed and the fine-tuned model is available. | /v1/fine_tuning/jobs |
fine_tuning.job.failed | A fine-tuning job failed before producing a model. | /v1/fine_tuning/jobs |
OpenAI limits how fast an app can call, by a per-model rate measured in requests and tokens per minute that rises with an organization's usage tier.
OpenAI meters usage per model, not by a single account-wide quota. Each model has limits measured in requests per minute and tokens per minute, with some endpoints also limited in requests or tokens per day, images per minute, or audio minutes per minute; models in a shared-limit group draw from one pool. The ceilings rise automatically as an organization moves up usage tiers (Free, then Tier 1 to Tier 5) with cumulative spend. Going over returns HTTP 429, and every response carries x-ratelimit-limit-requests, x-ratelimit-remaining-tokens, and related headers showing the current allowance and reset time. A 429 with type insufficient_quota is a billing problem, not a speed problem, and backing off will not clear it.
A list endpoint is cursor-based: after takes an object id to fetch the next page, limit sets the page size, and order sorts by creation time. A has_more field in the response signals whether more pages remain, and the SDKs offer auto-pagination over the cursor. Page-size ranges and defaults vary by endpoint, for example the Files list accepts a limit of 1 to 10,000.
An individual file uploaded through the Files API can be up to 512 MB, and a project can store up to roughly 2.5 TB of files in total. Larger source files can be sent in parts through the Uploads API, which assembles them into a single File. Token limits per request are set by the chosen model's context window rather than by the API.
The status codes an agent should handle, and what to do about each.
| Status | Code | Meaning | What to do |
|---|---|---|---|
| 400 | invalid_request_error | The request was malformed or missing a required parameter. The error object names the offending field in error.param. | Read error.message and error.param, fix the request body, and resend. It is not retryable as-is. |
| 401 | invalid_api_key | The API key is missing, wrong, revoked, or lacks permission for this endpoint group under a restricted key. | Send a valid key for the right project, and grant the endpoint group on a restricted key if that is the cause. |
| 403 | unsupported_country_region_territory | The request came from a country, region, or territory OpenAI does not support. | Call from a supported location, and check the supported-countries documentation. |
| 404 | not_found | The requested object, like a file, model, or job, does not exist or is not visible to this key or project. | Verify the id and confirm it lives in the same project the key belongs to. |
| 429 | rate_limit_exceeded | The request rate or token rate for the model was exceeded for the current usage tier. | Back off and retry with exponential backoff, and smooth the request rate. The x-ratelimit headers show the remaining allowance and reset time. |
| 429 | insufficient_quota | The account has no remaining credits or has hit its billing limit. This is a billing problem, not a speed one. | Add a payment method or raise the spend limit. Backing off will not clear this error. |
| 500 | server_error | An error on OpenAI's side while processing the request. It is uncommon. | Retry after a brief wait, and check the status page if it persists. |
| 503 | engine_overloaded | The service is temporarily overloaded by traffic. | Reduce the request rate, hold steady, then ramp back up gradually. |
OpenAI serves a single, continuously updated API under one path version, and ships dated, mostly backward-compatible changes through its changelog rather than minting new version numbers per release.
OpenAI serves one path version and ships dated, mostly backward-compatible changes through its changelog rather than minting a new version number per release. Models carry their own names and deprecation dates separately from the API.
Moderation scores were added to the Responses and Chat Completions APIs for evaluating input and output.
For organizations without zero-data-retention, prompt_cache_retention now defaults to 24-hour retention instead of in-memory only.
A WebSocket mode launched for the Responses API, enabling streaming connections.
The Conversations API was released to manage long-running discussions alongside the Responses API.
There is one path version; track the changelog for dated additions and deprecations.
OpenAI API changelog ↗Bollard AI sits between a team's AI agents and OpenAI. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.