Hugging Face Inference API
Inference Platformby Hugging Face
Access 800,000+ open-source AI models through a unified API. Supports text, image, audio, and more with pay-as-you-go pricing.
Registration & API Key Steps
Use Free Tier Credits
Free accounts get ~$0.10/month in inference credits. PRO ($9/mo) gets $2/month.
Upgrade for Production
For production, use Inference Endpoints (dedicated) starting at $0.03/hour for CPU.
Pricing
| Tier | Price | Features |
|---|---|---|
| Free | Free (~$0.10/month credits) | 800K+ models, Rate limited, Community support, 10GB storage |
| PRO | $9/month | $2/month inference credits, 8x ZeroGPU quota, 100GB storage, Priority access |
| Enterprise Hub | $20/user/month | SSO, Advanced security, Priority support, Audit logs |
| Inference Endpoints | From $0.03/hour (CPU) | Dedicated infrastructure, Auto-scaling, Custom models, Pay-as-you-go |
Application Tips
Massive Model Library
Access 800K+ models including Llama, Mistral, Stable Diffusion, Whisper, and more from a single API.
No Markup Pricing
Hugging Face charges the same rates as underlying providers with no markup.
Use Serverless for Testing
Use the free Serverless Inference API for testing, then switch to Inference Endpoints for production.
China Access Solutions
Direct Access
Hugging Face is generally accessible in China, though speeds may vary.
HF Mirror
Use hf-mirror.com for faster model downloads in China.
Code Example
// Hugging Face Inference API
import { HfInference } from '@huggingface/inference';
const hf = new HfInference(process.env.HF_TOKEN);
// Text Generation
const textResult = await hf.textGeneration({
model: 'mistralai/Mistral-7B-Instruct-v0.3',
inputs: 'Explain quantum computing in simple terms:',
parameters: { max_new_tokens: 200 },
});
console.log('Generated:', textResult.generated_text);
// Image Generation
const imageBlob = await hf.textToImage({
model: 'stabilityai/stable-diffusion-xl-base-1.0',
inputs: 'A beautiful landscape painting in impressionist style',
});
const buffer = Buffer.from(await imageBlob.arrayBuffer());
require('fs').writeFileSync('output.png', buffer);
// Speech Recognition
const audioFile = require('fs').readFileSync('audio.mp3');
const transcription = await hf.automaticSpeechRecognition({
model: 'openai/whisper-large-v3',
data: audioFile,
});
console.log('Transcription:', transcription.text);Rate Limits
| Tier | Limits |
|---|---|
| Free | ~few hundred requests/hour |
| PRO | 20x more than free tier |
| Inference Endpoints | Based on dedicated hardware |
Recommended Use Cases
Related API Guides
OpenAI GPT-4o / GPT-4.1 / o3
OpenAI
OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.
Anthropic Claude (Sonnet 4.5 / Opus 4.5)
Anthropic
Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.
Google Gemini (2.5 Pro / 2.5 Flash)
Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.
Meta Llama 4 (Scout / Maverick)
Meta
Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.