Replicate
Model Deploymentby Replicate
Run open-source AI models with a simple API. Pay-per-use pricing for thousands of models including image, video, audio, and text generation.
Registration & API Key Steps
Install Client Library
Install the Python or Node.js client: pip install replicate or npm install replicate.
Run Your First Model
Browse models at replicate.com/explore and run one with a single API call. Pay only for compute used.
Pricing
| Tier | Price | Features |
|---|---|---|
| Pay-as-you-go | Varies by model | No subscription required, Billed per prediction, GPU-time based, No minimum |
| SDXL Example | ~$0.003/image (~4 sec) | ~30 images per $0.10, Nvidia L40S GPU, Fast generation |
| Flux Schnell Example | ~$0.003/image | Latest Flux model, High quality, Fast inference |
| Custom Models | Based on GPU time | Deploy your own models, Auto-scaling, Cold start optimization |
Application Tips
Extremely Affordable for Images
Models like SDXL and Flux cost ~$0.003/image. You can generate ~300 images for $1.
Official Models Have Predictable Pricing
Official models are priced by output (per image, per second of video, per token) rather than raw GPU time.
Deploy Custom Models
Use Cog (Replicate's open-source tool) to package and deploy your own models on Replicate.
No Upfront Cost
No subscription or minimum spend. Pay only for what you use. Great for experimentation.
China Access Solutions
API Proxy
Use an overseas proxy server to access Replicate API from China.
Direct Access
Replicate may be accessible directly in some regions.
Code Example
import Replicate from 'replicate';
const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });
// Image Generation with Flux
const output = await replicate.run(
'black-forest-labs/flux-schnell',
{ input: { prompt: 'An iguana on the beach, pointillism' } }
);
// output is an array of FileOutput objects
const fs = require('fs');
for (const item of output) {
const buffer = await item.blob().then(b => b.arrayBuffer());
fs.writeFileSync('output.png', Buffer.from(buffer));
}
// Video Generation with Minimax
const video = await replicate.run(
'minimax/video-01',
{
input: {
prompt: 'A cat playing piano in a jazz club',
prompt_optimizer: true,
}
}
);
console.log('Video URL:', video);
// Run any model by name
const result = await replicate.run('owner/model-name', {
input: { /* model-specific inputs */ }
});Rate Limits
| Tier | Limits |
|---|---|
| Default | No hard limits, billed per use |
| Cold Start | First request may be slower (model loading) |
Recommended Use Cases
Related API Guides
OpenAI GPT-4o / GPT-4.1 / o3
OpenAI
OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.
Anthropic Claude (Sonnet 4.5 / Opus 4.5)
Anthropic
Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.
Google Gemini (2.5 Pro / 2.5 Flash)
Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.
Meta Llama 4 (Scout / Maverick)
Meta
Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.