H

Hugging Face Inference API

Inference Platform

by Hugging Face

Access 800,000+ open-source AI models through a unified API. Supports text, image, audio, and more with pay-as-you-go pricing.

Official APIDocumentationFree: Free (~$0.10/month credits)

Registration & API Key Steps

1

Create Hugging Face Account

Sign up at huggingface.co for free.

Open link
2

Get API Token

Go to Settings > Access Tokens. Create a token with inference permissions.

Open link
3

Use Free Tier Credits

Free accounts get ~$0.10/month in inference credits. PRO ($9/mo) gets $2/month.

4

Upgrade for Production

For production, use Inference Endpoints (dedicated) starting at $0.03/hour for CPU.

Pricing

TierPriceFeatures
FreeFree (~$0.10/month credits)800K+ models, Rate limited, Community support, 10GB storage
PRO$9/month$2/month inference credits, 8x ZeroGPU quota, 100GB storage, Priority access
Enterprise Hub$20/user/monthSSO, Advanced security, Priority support, Audit logs
Inference EndpointsFrom $0.03/hour (CPU)Dedicated infrastructure, Auto-scaling, Custom models, Pay-as-you-go

Application Tips

Massive Model Library

Access 800K+ models including Llama, Mistral, Stable Diffusion, Whisper, and more from a single API.

No Markup Pricing

Hugging Face charges the same rates as underlying providers with no markup.

Use Serverless for Testing

Use the free Serverless Inference API for testing, then switch to Inference Endpoints for production.

China Access Solutions

Direct Access

Hugging Face is generally accessible in China, though speeds may vary.

HF Mirror

Use hf-mirror.com for faster model downloads in China.

Code Example

JavaScript / TypeScript
// Hugging Face Inference API
import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_TOKEN);

// Text Generation
const textResult = await hf.textGeneration({
  model: 'mistralai/Mistral-7B-Instruct-v0.3',
  inputs: 'Explain quantum computing in simple terms:',
  parameters: { max_new_tokens: 200 },
});
console.log('Generated:', textResult.generated_text);

// Image Generation
const imageBlob = await hf.textToImage({
  model: 'stabilityai/stable-diffusion-xl-base-1.0',
  inputs: 'A beautiful landscape painting in impressionist style',
});
const buffer = Buffer.from(await imageBlob.arrayBuffer());
require('fs').writeFileSync('output.png', buffer);

// Speech Recognition
const audioFile = require('fs').readFileSync('audio.mp3');
const transcription = await hf.automaticSpeechRecognition({
  model: 'openai/whisper-large-v3',
  data: audioFile,
});
console.log('Transcription:', transcription.text);

Rate Limits

TierLimits
Free~few hundred requests/hour
PRO20x more than free tier
Inference EndpointsBased on dedicated hardware

Recommended Use Cases

Model prototypingMulti-model pipelinesResearch & experimentationProduction inferenceCustom model deployment
Last Updated: 2025-02

Related API Guides