Meta Llama 4 (Scout / Maverick)

LLM

by Meta

Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.

Official API DocumentationFree: Free during preview

API Endpoint

https://llama.developer.meta.com/

Documentation

Official Docs

Pricing

View Pricing

Registration & API Key Steps

Step 1

Visit llama.developer.meta.com and sign up with a Meta account.

Open link

Step 2

Request access to Llama models and accept the license agreement.

Step 3

For the official Meta API (preview): generate an API key from the developer portal.

Step 4

Recommended for production: Use partner providers: DeepInfra — cheapest option

Open link

Step 5

Groq — fastest inference

Open link

Step 6

Together AI — good balance

Open link

Step 7

AWS Bedrock / Vertex AI / Azure AI

Open link

Step 8

For self-hosting: download model weights from Hugging Face after approval.

Open link

Pricing

Tier	Price	Features
Llama 4 Scout (DeepInfra)	$0.08 / $0.30 per 1M tokens	Input / Output. 10M context. Via third-party providers.
Llama 4 Maverick (DeepInfra)	$0.50 / $0.50 per 1M tokens	Input / Output. Most capable Llama model.
Meta Official API (Preview)	Free during preview	Limited preview, not production-ready.
Self-hosted	Free (open-source)	Download weights and run locally. Only pay for compute.

Application Tips

Tip 1

Llama models are open-source — self-hosting gives you full control and no per-token cost.

Tip 2

For API access, DeepInfra and Groq offer the best price and speed respectively.

Tip 3

Llama 4 Scout has a 10M token context window — largest among open models.

Tip 4

Meta's official API guarantees data privacy: inputs/outputs not used for training.

Tip 5

Use vLLM or TGI for efficient self-hosted deployment.

Tip 6

Consider fireworks.ai or together.ai for serverless deployment with fine-tuning support.

China Access Solutions

Access Solution

Open-source models can be downloaded and self-hosted in China. Partner APIs (DeepInfra, Together) may require VPN. Consider domestic providers like SiliconFlow (siliconflow.cn) that host Llama models.

Code Example

JavaScript / TypeScript

from openai import OpenAI

# Using Together AI as provider (OpenAI-compatible)
client = OpenAI(
    api_key="your-together-api-key",
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {"role": "user", "content": "What are the benefits of open-source AI?"}
    ]
)

print(response.choices[0].message.content)

# --- Using Meta's official API ---
# curl https://api.llama.com/v1/chat/completions \
#   -H "Authorization: Bearer YOUR_API_KEY" \
#   -H "Content-Type: application/json" \
#   -d '{
#     "model": "llama-4-scout",
#     "messages": [{"role": "user", "content": "Hello!"}]
#   }'

Rate Limits

Tier	Limits
Default	Varies by provider. Meta official API (preview): limited. DeepInfra: up to 1,000 RPM. Together AI: varies by plan. Self-hosted: unlimited.

Recommended Use Cases

Self-hosted AI applicationsFine-tuning for specific domainsCost-sensitive production deploymentsResearch & experimentationEdge deployment

Last Updated: 2026-02-10

Related API Guides

OpenAI GPT-4o / GPT-4.1 / o3

OpenAI

OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.

Anthropic Claude (Sonnet 4.5 / Opus 4.5)

Anthropic

Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.

Google Gemini (2.5 Pro / 2.5 Flash)

Google

Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.

Mistral AI (Mistral Large / Small / Codestral)

Mistral AI

French AI company offering efficient open and commercial models. Mistral Large for complex reasoning, Mistral Small for cost-effective tasks, and Codestral for code generation. Known for strong European data privacy.