Everything an AI agent can do with the Hugging Face API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints32
API versionv1
Last updated23 June 2026
Orientation

How the Hugging Face API works.

The Hugging Face API is how an app or AI agent works with the Hugging Face Hub and the models on it: searching and reading models, datasets, and Spaces, creating and committing to a repository, and running a model for chat, embeddings, or image generation. Access is granted through a user access token, where a fine-grained token sets each scope to read or write on chosen repositories or organizations, and an agent is limited to what that token reaches. The Hub serves one continuously updated API, and it can push events to a webhook URL when a repository changes.

32Endpoints
8Capability groups
17Read
15Write
6Permissions
Authentication
Every write and every private read needs a user access token sent as 'Authorization: Bearer '. Three token roles exist: read, which reads repositories the user can see; write, which adds write access to repositories the user can write; and fine-grained, which sets individual scopes on chosen repositories or organizations. The same token authenticates the Hub API and Inference Providers. Hugging Face recommends fine-grained tokens for production, because a leaked token is limited to the scopes it was given.
Permissions
A fine-grained token carries named scopes that decide what each call can do. Repository scopes include repo.content.read for reading files and repo.write for creating, committing, moving, or deleting a repository; discussion.write covers discussions, Pull Requests, and comments. Inference is its own scope: inference.serverless.write calls models through Inference Providers, and inference.endpoints.write manages dedicated endpoints. Further scopes cover collections, organizations, and Jobs. A read or write token instead grants coarse access across every repository the account can reach.
Versioning
The Hub API has no dated version. It is a single, continuously updated API served under one path prefix, so there is no version header to pin and no version to migrate between. Notable changes are announced through dated release notes, and the OpenAPI specification, published at a well-known path, is the always-current machine reference. Inference Providers exposes an OpenAI-compatible surface for chat completions under its own path.
Data model
The Hub is organised around repositories, which are one of three types: a model, a dataset, or a Space. Each repository lives at a namespace and name, holds files under Git references such as branches, tags, and commits, and carries discussions and Pull Requests. Hub methods read and write these repositories and the account and organization data around them. A separate inference router runs the models themselves rather than touching repository contents.
Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to Hugging Face determines what it can reach. There are several routes, one for working with the Hub and its repositories, one for running models, and one for receiving events, each governed by the access token behind it and the scopes that token carries.

Ways to connect

Hub API

The Hub API answers at huggingface.co under the /api path. It reads and writes models, datasets, and Spaces, creates and commits to repositories, and manages the account and organization data around them. It is a single, continuously updated API with no version to pin.

Best forConnecting an app or AI agent to the Hugging Face Hub.
Governed byThe user access token and the scopes it carries.
Docs ↗

Inference Providers router

The inference router answers at router.huggingface.co and runs models across partner providers through one token. Its chat completions endpoint is OpenAI-compatible, so existing OpenAI client code can target it by swapping the base URL to the router's v1 path.

Best forRunning a model for chat, embeddings, or image generation.
Governed byThe user access token and the inference scope it carries.
Docs ↗

Webhooks

Webhooks deliver the chosen repository events to a receiver URL, and an optional secret on the X-Webhook-Secret header confirms each delivery came from Hugging Face. Webhooks are created and listed through the settings webhooks endpoints or the settings page.

Best forReacting to repository changes without polling.
Governed byThe user access token and the scopes it carries.
Docs ↗

MCP server (Model Context Protocol)

Hugging Face's first-party MCP server at huggingface.co/mcp lets an agent search and explore models, datasets, Spaces, and papers, search the documentation, run Jobs, and call community Gradio Spaces as tools. It authenticates with a Hugging Face token and supports streamable HTTP and server-sent-events transports.

Best forConnecting an MCP-compatible assistant to the Hub.
Governed byThe user access token and the scopes it carries.
Docs ↗
Authentication

Fine-grained access token

A fine-grained access token sets individual scopes, each read or write, on chosen repositories or a specific organization, such as repo.content.read on one model or inference.serverless.write for running models. It is the least-privilege choice and what Hugging Face recommends for production.

TokenFine-grained access token
Best forLeast-privilege access to specific repositories or scopes
Docs ↗

Read token

A read token grants read access to every repository the account can see, public and private, across the user and the organizations they belong to. It cannot write, which suits downloading models or running inference.

TokenRead access token
Best forDownloading content or running inference
Docs ↗

Write token

A write token adds write access to every repository the account can write, on top of read. It is coarse, all-or-nothing access, which suits a trusted local workflow more than a shared production agent.

TokenWrite access token
Best forPushing content from a trusted workflow
Docs ↗
Capability map

What an AI agent can do in Hugging Face.

The Hugging Face API is split into areas an agent can act on, such as models, datasets, Spaces, repository management, inference, and webhooks. Each area has its own methods and its own scopes, and some grant access to far more than others.

Models

3 endpoints

Search and list models, read a single model's details, and read its model tags.

These are read methods over public and accessible model repositories.
View endpoints

Datasets

3 endpoints

Search and list datasets, read a single dataset's details, and read its dataset tags.

These are read methods over public and accessible dataset repositories.
View endpoints

Spaces

2 endpoints

Search and list Spaces and read a single Space's details.

These are read methods over public and accessible Space repositories.
View endpoints

Repository management

9 endpoints

Create, move, rename, and delete repositories, update their visibility, list commits and references, read the file tree, and commit files.

Writes here change real repository data, and deleting a repository removes it.
View endpoints

Discussions & Pull Requests

3 endpoints

List a repository's discussions and Pull Requests, create a discussion, and add a comment.

Writes here are visible to everyone who can see the repository.
View endpoints

Inference

4 endpoints

Run a model through the provider router for chat completions, feature extraction embeddings, and text-to-image generation, and list available models.

Inference calls run models and are metered for billing.
View endpoints

Account & collections

3 endpoints

Read the authenticated user, and list and create collections.

Writes here change the account's own collections.
View endpoints

Webhooks

5 endpoints

List, read, create, update, and delete the account's webhooks.

Writes here change which events are delivered and where.
View endpoints
Endpoint reference

Every Hugging Face API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

MethodEndpointWhat it doesAccessPermissionVersion

Models

Search and list models, read a single model's details, and read its model tags.3

Public models are listed without a token. A read or fine-grained token with repo.content.read is needed to include private models the account can see. Paginated through the Link header, with search, author, filter, sort, limit, and full parameters.

Acts onmodel
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Equivalent to model_info in the Python client. A revision can be appended as /revision/{revision}. Public models need no token; private ones need repo.content.read.

Acts onmodel
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

A read-only catalogue of tags, returned without a token.

Acts ontag
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Datasets

Search and list datasets, read a single dataset's details, and read its dataset tags.3

Public datasets are listed without a token; repo.content.read includes private ones. Paginated through the Link header with the same search, author, filter, sort, limit, and full parameters as models.

Acts ondataset
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Equivalent to dataset_info in the Python client. A revision can be appended as /revision/{revision}.

Acts ondataset
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

A read-only catalogue of tags, returned without a token.

Acts ontag
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Spaces

Search and list Spaces and read a single Space's details.2

Public Spaces are listed without a token; repo.content.read includes private ones. Paginated through the Link header.

Acts onspace
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Equivalent to space_info in the Python client. A revision can be appended as /revision/{revision}.

Acts onspace
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Repository management

Create, move, rename, and delete repositories, update their visibility, list commits and references, read the file tree, and commit files.9

The body sets type, name, organization, private, and, for a Space, sdk. Needs repo.write on a fine-grained token, or a write token. Subject to a separate, undocumented repository-creation limit.

Acts onrepository
Permission (capability)repo.write
VersionAvailable since the API’s base version
Webhook eventrepo
Rate limitStandard limits apply

The body sets type, name, and organization. Deleting a repository removes it and its files. Needs repo.write or a write token.

Acts onrepository
Permission (capability)repo.write
VersionAvailable since the API’s base version
Webhook eventrepo
Rate limitStandard limits apply

The body sets fromRepo, toRepo, and type. Needs repo.write or a write token.

Acts onrepository
Permission (capability)repo.write
VersionAvailable since the API’s base version
Webhook eventrepo
Rate limitStandard limits apply

Changing visibility to or from private is recorded as a repo.config update event. Needs repo.write or a write token.

Acts onrepository
Permission (capability)repo.write
VersionAvailable since the API’s base version
Webhook eventrepo.config
Rate limitStandard limits apply

The same path shape exists under /api/datasets and /api/spaces. Public repositories need no token.

Acts oncommit
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

The same path shape exists under /api/datasets and /api/spaces. Public repositories need no token.

Acts onreference
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

The same path shape exists under /api/datasets and /api/spaces. Public repositories need no token.

Acts onfile
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

The first half of an upload: it tells the client whether content goes through Git LFS. Needs repo.write or a write token.

Acts onfile
Permission (capability)repo.write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Records the upload prepared by preupload as a commit, firing a repo.content update event. The same path shape exists under /api/datasets and /api/spaces. Subject to a separate, undocumented commit limit. Needs repo.write or a write token.

Acts oncommit
Permission (capability)repo.write
VersionAvailable since the API’s base version
Webhook eventrepo.content
Rate limitStandard limits apply

Discussions & Pull Requests

List a repository's discussions and Pull Requests, create a discussion, and add a comment.3

On the Hub, a Pull Request is a special type of discussion. Public repositories need no token.

Acts ondiscussion
Permission (capability)repo.content.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Fires a discussion create event. Needs discussion.write on a fine-grained token, or a write token. Subject to a separate, undocumented discussion limit.

Acts ondiscussion
Permission (capability)discussion.write
VersionAvailable since the API’s base version
Webhook eventdiscussion
Rate limitStandard limits apply

Fires a discussion.comment create event. Needs discussion.write or a write token.

Acts oncomment
Permission (capability)discussion.write
VersionAvailable since the API’s base version
Webhook eventdiscussion.comment
Rate limitStandard limits apply

Inference

Run a model through the provider router for chat completions, feature extraction embeddings, and text-to-image generation, and list available models.4

Served at router.huggingface.co, not the Hub host. Drop-in OpenAI-compatible: swap the base URL. A model id can carry a provider or policy suffix such as :fastest or :cheapest. Needs inference.serverless.write on a fine-grained token.

Acts oncompletion
Permission (capability)inference.serverless.write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Served at router.huggingface.co. Lists models reachable through the router, including per-provider pricing and performance where available.

Acts onmodel
Permission (capability)inference.serverless.write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Served through the inference router. Feature extraction returns embeddings for semantic search, retrieval, and recommendation. Needs inference.serverless.write.

Acts onembedding
Permission (capability)inference.serverless.write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Served through the inference router. Returns the generated image bytes. Needs inference.serverless.write.

Acts onimage
Permission (capability)inference.serverless.write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Account & collections

Read the authenticated user, and list and create collections.3

Identifies the account the token belongs to and returns its orgs and the token's permissions. Any valid token works.

Acts onuser
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Public collections are returned without a token; collection.read includes private ones.

Acts oncollection
Permission (capability)collection.read
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Needs collection.write on a fine-grained token, or a write token.

Acts oncollection
Permission (capability)collection.write
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Webhooks

List, read, create, update, and delete the account's webhooks.5

Returns the account's webhooks, their watched repositories, and their target URLs. A valid token for the account is required.

Acts onwebhook
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Sets the watched repositories or namespaces, the target URL, and an optional secret sent back as the X-Webhook-Secret header. Each webhook is limited to 1,000 triggers per 24 hours.

Acts onwebhook
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Returns one webhook's watched repositories, target URL, and status. A valid token for the account is required.

Acts onwebhook
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Changes which events are delivered and where. A valid token for the account is required.

Acts onwebhook
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Stops all delivery for that webhook. A valid token for the account is required.

Acts onwebhook
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply
No endpoints match those filters.
Webhooks

Webhook events.

Hugging Face can notify an app or AI agent when something happens to a repository, instead of the app repeatedly asking. Hugging Face posts the event payload to a webhook URL that has been registered for the chosen repositories and events.

EventWhat it signalsTriggered by
repoGlobal events on a repository. The action is one of create, delete, update, or move, fired when a repository is created, deleted, renamed, or has its details change./api/repos/create
/api/repos/delete
/api/repos/move
repo.contentEvents on a repository's content, such as new commits or tags, including the commit created when a Pull Request opens. The action is always update, and the payload lists the references that changed./api/models/{namespace}/{repo}/commit/{rev}
repo.configEvents on a repository's config, such as updating Space secrets, settings, or visibility. The action is always update, and the payload carries the updated config keys./api/repos/{repo_type}/{repo_id}/settings
discussionCreating a discussion or Pull Request, updating its title or status, or merging it. The action is one of create, delete, or update./api/{repoType}/{namespace}/{repo}/discussions
discussion.commentCreating, updating, or hiding a comment on a discussion or Pull Request. The action is one of create or update./api/{repoType}/{namespace}/{repo}/discussions/{num}/comment
No events match that search.
Rate limits & pagination

Rate limits, pagination & request size.

Hugging Face limits how fast an app or AI agent can call, through a request quota counted over a rolling five-minute window that depends on the account tier behind the token, with separate, higher quotas for downloading repository files.

Request rate

Hugging Face counts requests in three buckets over a rolling five-minute window. The Hub APIs bucket covers calls like search, repository creation, and user management; the Resolvers bucket covers file downloads and carries a much higher quota; and the Pages bucket covers web pages. Quotas rise with the account tier behind the token: an anonymous caller gets about 500 Hub API requests per window per IP address, a free user 1,000, a PRO user 2,500, a Team organization 3,000, and Enterprise plans from 6,000 up to 100,000 when organization IP ranges are set. Going over returns a 429, and the RateLimit and RateLimit-Policy response headers report the remaining quota and the seconds until reset. Certain actions, such as repository creation, commits, and discussions, carry their own separate, undocumented limits. Each webhook is capped at 1,000 triggers per 24 hours.

Pagination

List endpoints, such as listing models, datasets, or Spaces, are paginated and return a Link header with a rel="next" URL, which should be followed rather than built by hand, until it is absent. A limit parameter caps the number of results fetched, and a full parameter requests the fuller record for each item. The Python client follows the Link header automatically.

Request size

Hub API requests and responses are JSON. Large files are not sent inline: an upload first calls the preupload check to decide whether content goes through Git LFS or the standard path, then a commit call records it, so file size is handled by the storage layer rather than a single request body limit. Listing endpoints return trimmed records by default and the fuller record only when full is requested.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

StatusCodeMeaningWhat to do
401UnauthorizedAuthentication is missing, or the access token is invalid or has been deleted.Send a valid token in the Authorization header as 'Bearer '.
403ForbiddenThe token is valid but lacks the scope for this call, or an organization has denied, revoked, or restricted it. A read or write token used where the organization requires a fine-grained token is also rejected here.Grant the missing scope, or have an organization administrator approve the token.
404Not FoundThe repository does not exist, or the token cannot see a private repository. A private repository is returned as 404 rather than 403 so that its existence is not confirmed.Confirm the repository id and that the token has access to it.
429Too Many RequestsA rate limit was exceeded for the current five-minute window. The RateLimit and RateLimit-Policy response headers report the remaining quota and the seconds until it resets.Wait until the window resets, spread requests out, pass a token, or upgrade the account tier.
Versioning & freshness

Version history.

The Hub API is served under a single, continuously updated version. There is no dated version to pin; changes ship through dated release notes rather than versioned endpoints.

Version history

What changed, and when

Latest versionv1
v1Current version
Unversioned Hub API, OpenAI-compatible inference router

The Hub API is served under a single, continuously updated version with no dated version header to pin. The machine-readable reference moved to an OpenAPI specification published at a well-known path and an OpenAPI Playground in late 2025. Inference Providers exposes an OpenAI-compatible router under a v1 path for chat completions, so existing OpenAI client code can target Hugging Face by swapping the base URL.

What changed
  • Hub API reference moved to an always-current OpenAPI specification and Playground
  • Inference Providers router is OpenAI-compatible for chat completions, with model listing at /v1/models
  • Rate limits standardised into Hub APIs, Resolvers, and Pages buckets over five-minute windows
2025-06-06Feature update
Official Hugging Face MCP server

Hugging Face launched a first-party MCP server at huggingface.co/mcp, letting an AI assistant search and explore models, datasets, Spaces, and papers, search the documentation, run Jobs, and call community Gradio Spaces as tools, authenticated with a Hugging Face token.

What changed
  • First-party hosted MCP server at huggingface.co/mcp
  • Built-in tools for model, dataset, Space, paper, and documentation search
  • Streamable HTTP and server-sent-events transports
2025-01-28Feature update
Inference Providers launched

Hugging Face launched Inference Providers, a unified router that runs hundreds of models across partner providers through one Hugging Face token, with an OpenAI-compatible chat completions endpoint and native Python and JavaScript clients.

What changed
  • Single token routes to partner providers through one proxy
  • OpenAI-compatible chat completions endpoint for drop-in migration
  • A free tier with extra credits for PRO, Team, and Enterprise accounts

Because the API is unversioned, an integration tracks behaviour through the release notes rather than pinning a version.

Hugging Face changelog ↗
Questions

Hugging Face API, answered.

Read token, write token, or fine-grained, which should I use?+
A fine-grained token is the better default for an agent or production app. A read token can read every repository the account can see and a write token can write every repository it can write, both all-or-nothing, while a fine-grained token can be limited to specific repositories or a specific organization with each scope set to read or write, such as only repo.content.read on one model. Fine-grained tokens are what Hugging Face recommends for production, since a leaked token is confined to the scopes it was granted.
Is the same token used for the Hub and for running models?+
Yes. One user access token authenticates both the Hub API, for working with repositories, and Inference Providers, for running models, each sent as a bearer token. The scopes differ: reading and writing repositories use the repo scopes, while calling a model through the router needs the inference.serverless.write scope on a fine-grained token. A single token can hold both.
What are the rate limits?+
Hugging Face counts requests over a rolling five-minute window in three buckets: Hub APIs, Resolvers for file downloads, and Pages. The Hub API quota depends on the account tier, from about 500 requests per window for an anonymous caller and 1,000 for a free user up to several thousand for PRO and Enterprise plans. Exceeding a quota returns a 429, and the RateLimit response header gives the seconds until reset. Passing a token, rather than calling anonymously, is the most common fix for being rate limited.
How do I receive repository events instead of polling?+
Webhooks deliver events without polling. A webhook is registered against chosen repositories or whole namespaces and a set of scopes, such as repo for repository changes, repo.content for new commits and tags, discussion for Pull Requests and discussions, and discussion.comment for comments. Hugging Face posts a JSON payload when each event fires, and an optional secret, sent as the X-Webhook-Secret header, confirms the payload came from Hugging Face. Each webhook is limited to 1,000 triggers per 24 hours.
How does the API handle versions?+
There is no dated version to pin. The Hub serves a single, continuously updated API, so there is no version header and no migration between versions. Notable changes are announced through dated release notes, and the OpenAPI specification at the well-known path is the always-current machine reference. An integration tracks behaviour through the release notes rather than pinning a version.
Does Hugging Face have an official MCP server?+
Yes. The Hugging Face MCP server, launched in June 2025, is a first-party hosted server at huggingface.co/mcp that lets an AI assistant search and explore models, datasets, Spaces, and papers, search the documentation, run Jobs, and call community Gradio Spaces as tools. It authenticates with a Hugging Face token and supports the streamable HTTP and server-sent-events transports.
Related

More ai API guides for agents

What is Bollard AI?

Control what every AI agent can do in Hugging Face.

Bollard AI sits between a team's AI agents and Hugging Face. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.

  • Set read, write, or full access per agent, never a shared Hugging Face token.
  • Denied by default, so an agent reaches only what has been explicitly allowed.
  • Every call recorded in plain English: who, what, where, and the decision.
Hugging Face
Model Ops Agent
Search models and datasets ResourceOffReadFull use
Run inference ActionOffReadFull use
Commit files to a repository ActionOffReadFull use
Delete repositories ResourceOffReadFull use
Per-agent access, set in Bollard AI, not in Hugging Face