Cohere Command R+ / Embed / Rerank

LLM

by Cohere

Cohere specializes in enterprise AI with Command R+ for generation, Embed for embeddings, and Rerank for search optimization. Strong focus on RAG (Retrieval-Augmented Generation) workflows.

Official API DocumentationFree: Free

API Endpoint

https://dashboard.cohere.com/

Documentation

Official Docs

Pricing

View Pricing

Registration & API Key Steps

Step 1

Visit Cohere Dashboard and sign up.

Open link

Step 2

Step 3

A Trial API key is automatically created for your account.

Step 4

Find your API key in the "API Keys" section of the dashboard sidebar.

Step 5

Trial key allows 1,000 calls/month for free (non-production).

Step 6

Upgrade to Production key with billing for production use.

Pricing

Tier	Price	Features
Command R+	$2.50 / $10.00 per 1M tokens	Input / Output. Most capable generative model.
Command R	$0.15 / $0.60 per 1M tokens	Input / Output. Cost-effective generation.
Embed v4	$0.10 per 1M tokens	Best-in-class embedding model for search/RAG.
Rerank 3.5	$2.00 per 1K searches	Re-rank search results for better relevance.
Trial Key	Free	1,000 API calls/month. All models. Non-production use.

Application Tips

Tip 1

Trial key gives 1,000 free API calls/month — generous for experimentation.

Tip 2

Cohere Embed v4 is one of the best embedding models for search and RAG.

Tip 3

Rerank API dramatically improves search relevance — great for RAG pipelines.

Tip 4

Command R is designed specifically for RAG — includes built-in citation generation.

Tip 5

API supports grounded generation with web search connector.

Tip 6

Enterprise deployment available on AWS, GCP, Azure, and private cloud.

China Access Solutions

Access Solution

Requires VPN/proxy for API access from China. No specific China-friendly access method. Consider AWS Bedrock for Cohere models in supported regions.

Code Example

JavaScript / TypeScript

import cohere

co = cohere.ClientV2(api_key="your-api-key")

response = co.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "Explain RAG in simple terms"}
    ]
)

print(response.message.content[0].text)

# --- Embedding example ---
# embeddings = co.embed(
#     texts=["hello", "world"],
#     model="embed-v4.0",
#     input_type="search_document",
#     embedding_types=["float"]
# )

# --- cURL example ---
# curl https://api.cohere.com/v2/chat \
#   -H "Authorization: Bearer your-api-key" \
#   -H "Content-Type: application/json" \
#   -d '{"model":"command-r-plus","messages":[{"role":"user","content":"Hello!"}]}'

Rate Limits

Tier	Limits
Default	Trial: 20 RPM, 1,000 calls/month. Production: 10,000 RPM default. Enterprise: custom limits.

Recommended Use Cases

RAG systemsSemantic searchEnterprise knowledge basesDocument search & retrievalMultilingual applications

Last Updated: 2026-02-10

Related API Guides

OpenAI GPT-4o / GPT-4.1 / o3

OpenAI

OpenAI's flagship LLM family including GPT-4o for multimodal tasks, GPT-4.1 for long-context coding, and o3 for advanced reasoning. Industry-leading models with the largest developer ecosystem.

Anthropic Claude (Sonnet 4.5 / Opus 4.5)

Anthropic

Anthropic's Claude model family excels in nuanced reasoning, safety, and long-context tasks. Claude Sonnet 4.5 offers the best balance of cost and performance, while Opus 4.5 delivers frontier intelligence.

Google Gemini (2.5 Pro / 2.5 Flash)

Google

Google's Gemini models offer a generous free tier, 1M token context window, and strong multimodal capabilities. Gemini 2.5 Pro leads in reasoning, while Flash models provide cost-effective alternatives.

Registration & API Key Steps

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Pricing

Application Tips

Tip 1

Tip 2

Tip 3

Tip 4

Tip 5

Tip 6

China Access Solutions

Access Solution

Code Example

Rate Limits

Recommended Use Cases

Related API Guides

OpenAI GPT-4o / GPT-4.1 / o3

Anthropic Claude (Sonnet 4.5 / Opus 4.5)

Google Gemini (2.5 Pro / 2.5 Flash)

Meta Llama 4 (Scout / Maverick)