Hugging Face Inference API

Inference Platform

by Hugging Face

Access 800,000+ open-source AI models through a unified API. Supports text, image, audio, and more with pay-as-you-go pricing.

Official API DocumentationFree: Free (~$0.10/month credits)

API Endpoint

https://api-inference.huggingface.co/

Documentation

Official Docs

Pricing

View Pricing

Registration & API Key Steps

Create Hugging Face Account

Open link

Get API Token

Go to Settings > Access Tokens. Create a token with inference permissions.

Open link

Use Free Tier Credits

Free accounts get ~$0.10/month in inference credits. PRO ($9/mo) gets $2/month.

Upgrade for Production

For production, use Inference Endpoints (dedicated) starting at $0.03/hour for CPU.

Pricing

Tier	Price	Features
Free	Free (~$0.10/month credits)	800K+ models, Rate limited, Community support, 10GB storage
PRO	$9/month	$2/month inference credits, 8x ZeroGPU quota, 100GB storage, Priority access
Enterprise Hub	$20/user/month	SSO, Advanced security, Priority support, Audit logs
Inference Endpoints	From $0.03/hour (CPU)	Dedicated infrastructure, Auto-scaling, Custom models, Pay-as-you-go

Application Tips

Massive Model Library

Access 800K+ models including Llama, Mistral, Stable Diffusion, Whisper, and more from a single API.

No Markup Pricing

Hugging Face charges the same rates as underlying providers with no markup.

Use Serverless for Testing

Use the free Serverless Inference API for testing, then switch to Inference Endpoints for production.

China Access Solutions

Direct Access

Hugging Face is generally accessible in China, though speeds may vary.

HF Mirror

Use hf-mirror.com for faster model downloads in China.

Code Example

JavaScript / TypeScript

// Hugging Face Inference API
import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_TOKEN);

// Text Generation
const textResult = await hf.textGeneration({
  model: 'mistralai/Mistral-7B-Instruct-v0.3',
  inputs: 'Explain quantum computing in simple terms:',
  parameters: { max_new_tokens: 200 },
});
console.log('Generated:', textResult.generated_text);

// Image Generation
const imageBlob = await hf.textToImage({
  model: 'stabilityai/stable-diffusion-xl-base-1.0',
  inputs: 'A beautiful landscape painting in impressionist style',
});
const buffer = Buffer.from(await imageBlob.arrayBuffer());
require('fs').writeFileSync('output.png', buffer);

// Speech Recognition
const audioFile = require('fs').readFileSync('audio.mp3');
const transcription = await hf.automaticSpeechRecognition({
  model: 'openai/whisper-large-v3',
  data: audioFile,
});
console.log('Transcription:', transcription.text);

Rate Limits

Tier	Limits
Free	~few hundred requests/hour
PRO	20x more than free tier
Inference Endpoints	Based on dedicated hardware

Recommended Use Cases

Model prototypingMulti-model pipelinesResearch & experimentationProduction inferenceCustom model deployment

Last Updated: 2025-02

Related API Guides

OpenAI GPT-4o / GPT-4.1 / o3

OpenAI

OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.

Anthropic Claude (Sonnet 4.5 / Opus 4.5)

Anthropic

Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.

Google Gemini (2.5 Pro / 2.5 Flash)

Google

Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.

Registration & API Key Steps

Create Hugging Face Account

Get API Token

Use Free Tier Credits

Upgrade for Production

Pricing

Application Tips

Massive Model Library

No Markup Pricing

Use Serverless for Testing

China Access Solutions

Direct Access

HF Mirror

Code Example

Rate Limits

Recommended Use Cases

Related API Guides

OpenAI GPT-4o / GPT-4.1 / o3

Anthropic Claude (Sonnet 4.5 / Opus 4.5)

Google Gemini (2.5 Pro / 2.5 Flash)

Meta Llama 4 (Scout / Maverick)