Meta Llama 4 (Scout / Maverick)
LLMby Meta
Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.
Registration & API Key Steps
Step 2
Request access to Llama models and accept the license agreement.
Step 3
For the official Meta API (preview): generate an API key from the developer portal.
Pricing
| Tier | Price | Features |
|---|---|---|
| Llama 4 Scout (DeepInfra) | $0.08 / $0.30 per 1M tokens | Input / Output. 10M context. Via third-party providers. |
| Llama 4 Maverick (DeepInfra) | $0.50 / $0.50 per 1M tokens | Input / Output. Most capable Llama model. |
| Meta Official API (Preview) | Free during preview | Limited preview, not production-ready. |
| Self-hosted | Free (open-source) | Download weights and run locally. Only pay for compute. |
Application Tips
Tip 1
Llama models are open-source — self-hosting gives you full control and no per-token cost.
Tip 2
For API access, DeepInfra and Groq offer the best price and speed respectively.
Tip 3
Llama 4 Scout has a 10M token context window — largest among open models.
Tip 4
Meta's official API guarantees data privacy: inputs/outputs not used for training.
Tip 5
Use vLLM or TGI for efficient self-hosted deployment.
Tip 6
Consider fireworks.ai or together.ai for serverless deployment with fine-tuning support.
China Access Solutions
Access Solution
Open-source models can be downloaded and self-hosted in China. Partner APIs (DeepInfra, Together) may require VPN. Consider domestic providers like SiliconFlow (siliconflow.cn) that host Llama models.
Code Example
from openai import OpenAI
# Using Together AI as provider (OpenAI-compatible)
client = OpenAI(
api_key="your-together-api-key",
base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
messages=[
{"role": "user", "content": "What are the benefits of open-source AI?"}
]
)
print(response.choices[0].message.content)
# --- Using Meta's official API ---
# curl https://api.llama.com/v1/chat/completions \
# -H "Authorization: Bearer YOUR_API_KEY" \
# -H "Content-Type: application/json" \
# -d '{
# "model": "llama-4-scout",
# "messages": [{"role": "user", "content": "Hello!"}]
# }'Rate Limits
| Tier | Limits |
|---|---|
| Default | Varies by provider. Meta official API (preview): limited. DeepInfra: up to 1,000 RPM. Together AI: varies by plan. Self-hosted: unlimited. |
Recommended Use Cases
Related API Guides
OpenAI GPT-4o / GPT-4.1 / o3
OpenAI
OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.
Anthropic Claude (Sonnet 4.5 / Opus 4.5)
Anthropic
Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.
Google Gemini (2.5 Pro / 2.5 Flash)
Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.
Mistral AI (Mistral Large / Small / Codestral)
Mistral AI
French AI company offering efficient open and commercial models. Mistral Large for complex reasoning, Mistral Small for cost-effective tasks, and Codestral for code generation. Known for strong European data privacy.