M

Meta Llama 4 (Scout / Maverick)

LLM

by Meta

Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.

Official APIDocumentationFree: Free during preview

Registration & API Key Steps

1

Step 1

Visit llama.developer.meta.com and sign up with a Meta account.

Open link
2

Step 2

Request access to Llama models and accept the license agreement.

3

Step 3

For the official Meta API (preview): generate an API key from the developer portal.

4

Step 4

Recommended for production: Use partner providers: DeepInfra — cheapest option

Open link
5

Step 5

Groq — fastest inference

Open link
6

Step 6

Together AI — good balance

Open link
7

Step 7

AWS Bedrock / Vertex AI / Azure AI

Open link
8

Step 8

For self-hosting: download model weights from Hugging Face after approval.

Open link

Pricing

TierPriceFeatures
Llama 4 Scout (DeepInfra)$0.08 / $0.30 per 1M tokensInput / Output. 10M context. Via third-party providers.
Llama 4 Maverick (DeepInfra)$0.50 / $0.50 per 1M tokensInput / Output. Most capable Llama model.
Meta Official API (Preview)Free during previewLimited preview, not production-ready.
Self-hostedFree (open-source)Download weights and run locally. Only pay for compute.

Application Tips

Tip 1

Llama models are open-source — self-hosting gives you full control and no per-token cost.

Tip 2

For API access, DeepInfra and Groq offer the best price and speed respectively.

Tip 3

Llama 4 Scout has a 10M token context window — largest among open models.

Tip 4

Meta's official API guarantees data privacy: inputs/outputs not used for training.

Tip 5

Use vLLM or TGI for efficient self-hosted deployment.

Tip 6

Consider fireworks.ai or together.ai for serverless deployment with fine-tuning support.

China Access Solutions

Access Solution

Open-source models can be downloaded and self-hosted in China. Partner APIs (DeepInfra, Together) may require VPN. Consider domestic providers like SiliconFlow (siliconflow.cn) that host Llama models.

Code Example

JavaScript / TypeScript
from openai import OpenAI

# Using Together AI as provider (OpenAI-compatible)
client = OpenAI(
    api_key="your-together-api-key",
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
    messages=[
        {"role": "user", "content": "What are the benefits of open-source AI?"}
    ]
)

print(response.choices[0].message.content)

# --- Using Meta's official API ---
# curl https://api.llama.com/v1/chat/completions \
#   -H "Authorization: Bearer YOUR_API_KEY" \
#   -H "Content-Type: application/json" \
#   -d '{
#     "model": "llama-4-scout",
#     "messages": [{"role": "user", "content": "Hello!"}]
#   }'

Rate Limits

TierLimits
DefaultVaries by provider. Meta official API (preview): limited. DeepInfra: up to 1,000 RPM. Together AI: varies by plan. Self-hosted: unlimited.

Recommended Use Cases

Self-hosted AI applicationsFine-tuning for specific domainsCost-sensitive production deploymentsResearch & experimentationEdge deployment
Last Updated: 2026-02-10

Related API Guides