Replicate

Tier	Price	Features
Pay-as-you-go	Varies by model	No subscription required, Billed per prediction, GPU-time based, No minimum
SDXL Example	~$0.003/image (~4 sec)	~30 images per $0.10, Nvidia L40S GPU, Fast generation
Flux Schnell Example	~$0.003/image	Latest Flux model, High quality, Fast inference
Custom Models	Based on GPU time	Deploy your own models, Auto-scaling, Cold start optimization

Application Tips

Extremely Affordable for Images

Models like SDXL and Flux cost ~$0.003/image. You can generate ~300 images for $1.

Official Models Have Predictable Pricing

Official models are priced by output (per image, per second of video, per token) rather than raw GPU time.

Deploy Custom Models

Use Cog (Replicate's open-source tool) to package and deploy your own models on Replicate.

No Upfront Cost

No subscription or minimum spend. Pay only for what you use. Great for experimentation.

China Access Solutions

API Proxy

Use an overseas proxy server to access Replicate API from China.

Direct Access

Replicate may be accessible directly in some regions.

Code Example

JavaScript / TypeScript

import Replicate from 'replicate';

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

// Image Generation with Flux
const output = await replicate.run(
  'black-forest-labs/flux-schnell',
  { input: { prompt: 'An iguana on the beach, pointillism' } }
);
// output is an array of FileOutput objects
const fs = require('fs');
for (const item of output) {
  const buffer = await item.blob().then(b => b.arrayBuffer());
  fs.writeFileSync('output.png', Buffer.from(buffer));
}

// Video Generation with Minimax
const video = await replicate.run(
  'minimax/video-01',
  {
    input: {
      prompt: 'A cat playing piano in a jazz club',
      prompt_optimizer: true,
    }
  }
);
console.log('Video URL:', video);

// Run any model by name
const result = await replicate.run('owner/model-name', {
  input: { /* model-specific inputs */ }
});

Rate Limits

Tier	Limits
Default	No hard limits, billed per use
Cold Start	First request may be slower (model loading)

Recommended Use Cases

Rapid prototyping with various modelsImage/video/audio generationModel comparison & evaluationCustom model deploymentBuilding AI-powered products

Last Updated: 2025-02

Related API Guides

OpenAI GPT-4o / GPT-4.1 / o3

OpenAI

OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.

Anthropic Claude (Sonnet 4.5 / Opus 4.5)

Anthropic

Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.

Google Gemini (2.5 Pro / 2.5 Flash)

Google

Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.

Registration & API Key Steps

Create Replicate Account

Get API Token

Install Client Library

Run Your First Model

Pricing