Cohere Command R+ / Embed / Rerank
LLMby Cohere
Cohere specializes in enterprise AI with Command R+ for generation, Embed for embeddings, and Rerank for search optimization. Strong focus on RAG (Retrieval-Augmented Generation) workflows.
Registration & API Key Steps
Step 2
Register with email, Google, or GitHub account.
Step 3
A Trial API key is automatically created for your account.
Step 4
Find your API key in the "API Keys" section of the dashboard sidebar.
Step 5
Trial key allows 1,000 calls/month for free (non-production).
Step 6
Upgrade to Production key with billing for production use.
Pricing
| Tier | Price | Features |
|---|---|---|
| Command R+ | $2.50 / $10.00 per 1M tokens | Input / Output. Most capable generative model. |
| Command R | $0.15 / $0.60 per 1M tokens | Input / Output. Cost-effective generation. |
| Embed v4 | $0.10 per 1M tokens | Best-in-class embedding model for search/RAG. |
| Rerank 3.5 | $2.00 per 1K searches | Re-rank search results for better relevance. |
| Trial Key | Free | 1,000 API calls/month. All models. Non-production use. |
Application Tips
Tip 1
Trial key gives 1,000 free API calls/month — generous for experimentation.
Tip 2
Cohere Embed v4 is one of the best embedding models for search and RAG.
Tip 3
Rerank API dramatically improves search relevance — great for RAG pipelines.
Tip 4
Command R is designed specifically for RAG — includes built-in citation generation.
Tip 5
API supports grounded generation with web search connector.
Tip 6
Enterprise deployment available on AWS, GCP, Azure, and private cloud.
China Access Solutions
Access Solution
Requires VPN/proxy for API access from China. No specific China-friendly access method. Consider AWS Bedrock for Cohere models in supported regions.
Code Example
import cohere
co = cohere.ClientV2(api_key="your-api-key")
response = co.chat(
model="command-r-plus",
messages=[
{"role": "user", "content": "Explain RAG in simple terms"}
]
)
print(response.message.content[0].text)
# --- Embedding example ---
# embeddings = co.embed(
# texts=["hello", "world"],
# model="embed-v4.0",
# input_type="search_document",
# embedding_types=["float"]
# )
# --- cURL example ---
# curl https://api.cohere.com/v2/chat \
# -H "Authorization: Bearer your-api-key" \
# -H "Content-Type: application/json" \
# -d '{"model":"command-r-plus","messages":[{"role":"user","content":"Hello!"}]}'Rate Limits
| Tier | Limits |
|---|---|
| Default | Trial: 20 RPM, 1,000 calls/month. Production: 10,000 RPM default. Enterprise: custom limits. |
Recommended Use Cases
Related API Guides
OpenAI GPT-4o / GPT-4.1 / o3
OpenAI
OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.
Anthropic Claude (Sonnet 4.5 / Opus 4.5)
Anthropic
Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.
Google Gemini (2.5 Pro / 2.5 Flash)
Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.
Meta Llama 4 (Scout / Maverick)
Meta
Meta's open-source Llama 4 models are free to use and available through multiple cloud providers. Llama 4 Scout and Maverick offer competitive performance at extremely low cost through partner APIs.