RAG vs. MCP: when to use what for your content system
RAG and MCP both connect AI models to external data, but they're not interchangeable. RAG is a retrieval pattern for grounding models in your documents. MCP is a protocol for structured access and actions against live systems.
If you're building AI-powered content experiences, you'll usually want both: RAG-style semantic retrieval for user-facing Q&A, and MCP-powered tools for schema-aware queries, workflows, and write operations.

Knut Melvær
Principal Developer Marketing Manager
Published:
Your AI agent needs access to your content. The question is how.
RAG (retrieval-augmented generation) and MCP (Model Context Protocol) both connect AI models to external data, but they’re not the same kind of thing. RAG is an application pattern: it retrieves relevant content and injects it into a prompt so the model can give better answers. MCP is a protocol: it defines how AI applications connect to external systems through tools and resources, which can mean reading data, running queries, taking actions, or all three depending on the server.
They’re not competing. They solve different problems. Use RAG for semantic retrieval over documents. Use MCP for structured data access and operations. Use both when your agent needs to find the right content and act on it.
This guide breaks down what each approach actually does, where they overlap, and how to decide which one fits your use case. If you’re building AI-powered features on top of a CMS, this is the decision framework you need.
TL;DR
RAG is a pattern for semantic retrieval over documents. MCP is a protocol for connecting AI to external systems. They solve different problems and work well together. Use RAG for stable content Q&A. Use MCP for structured queries and write operations. Use both when your agent needs retrieval and actions.
What is RAG?
RAG is a retrieval pattern. It works by fetching relevant information from an external source and injecting it into an LLM’s prompt so the model can generate a more grounded response.
The typical pipeline:
- Ingest documents (product pages, help articles, policy docs) and split them into chunks
- Convert each chunk into a vector embedding and store it in a vector database
- When a user asks a question, convert that query into an embedding too
- Find the most semantically similar chunks from the vector database
- Combine those chunks with the original query into an augmented prompt
- Send the augmented prompt to the LLM, which now has real context to work with
The result: answers grounded in your actual content instead of the model’s general training data.
Where RAG works well
RAG shines for stable, unstructured content where semantic similarity is the right retrieval strategy:
- Help centers and support documentation
- Policy documents and compliance material
- Product catalogs (descriptions, specs, FAQs)
- Blog archives and knowledge bases
- Onboarding guides and training content
If the content doesn’t change every hour and the user’s intent is “find me the right information,” RAG handles it.
Where RAG falls short
RAG has some well-known limitations that matter for content systems.
Staleness. Your retrieval layer has to stay in sync with source content. Some vector databases support incremental upserts and record-level updates, but you still need to design for synchronization. When content changes frequently, that operational burden adds up fast.
Content structure gets reduced. A basic RAG pipeline chunks content into text for retrieval and loses the relational structure of your content model. It doesn’t inherently know that a product document has a price field, a category reference, and a variants array. You can preserve structure through metadata, filters, and reranking, but that’s additional design work on top of the baseline pipeline.
No write operations. RAG is read-only. It can help an AI answer “What’s our return policy?” but it can’t update that policy, publish a new version, or trigger a workflow.
Chunking is fragile. How you split your documents directly affects retrieval quality. Split a paragraph at the wrong boundary and you lose context. Too large and you burn tokens on irrelevant text.
What is MCP?
MCP is a protocol, not a retrieval strategy. It’s an open standard that defines how AI applications connect to external systems through a structured interface. Think of it as a universal adapter between an LLM and your tools.
Where RAG injects retrieved text into prompts, MCP exposes tools that the model can call. Those tools can read data, execute queries, create documents, trigger workflows, or call APIs. What a given MCP server actually does depends on the server’s implementation, not the protocol itself. The model decides which available tools to use based on the user’s request.
An MCP setup has three parts:
- MCP server that exposes capabilities from an external system (your CMS, a database, a third-party API)
- MCP client built into the AI application (Claude, Cursor, a custom agent)
- Primitives that the server exposes: tools (operations the model can invoke), resources (read-only data the model can access), and prompts (templated interactions)
Where MCP works well
MCP is the right fit when the model needs to interact with live, structured data or take actions:
- Querying content by type, field value, or relationship (“Find all product pages missing SEO descriptions”)
- Creating or updating documents in a CMS
- Managing workflows: staging content, publishing releases, coordinating across teams
- Fetching real-time data that changes frequently (inventory, pricing, user-specific content)
- Chaining operations: retrieve content, analyze it, then act on the analysis
Where MCP has trade-offs
Semantic search depends on the server. MCP itself doesn’t define retrieval quality or retrieval style. A given MCP server may expose semantic search, structured queries, both, or neither. If the server does include semantic search (like Sanity’s Agent Context), you may not need a separate vector database at all. But if your use case is pure document retrieval over a large unstructured corpus, a dedicated vector store with a RAG pipeline may still perform better.
Requires a well-structured data source. MCP works best when the underlying system has a schema. Dumping flat files behind an MCP server doesn’t buy you much over RAG.
More complex to set up for simple retrieval. If all you need is “answer questions about our docs,” a RAG pipeline is simpler and cheaper than standing up an MCP server.
Does MCP replace RAG?
No. MCP does not replace RAG. They solve different problems and are often used together.
Here’s what most comparison articles get wrong: they frame RAG and MCP as alternatives when they’re actually complementary layers.
RAG handles semantic retrieval. You describe what you’re looking for in natural language, and it finds content that’s conceptually similar. That’s powerful for unstructured, exploratory queries.
MCP handles structured access and actions. You connect the model to a system with a schema, and it can query by field, follow references, and write back. That’s powerful for operational work.
The interesting space is where these overlap. A production agent often needs both: semantic search to understand user intent, and structured access to retrieve precise data and act on it.
Consider a shopping assistant on an ecommerce site. A user asks: “Do you have anything good for a beach vacation?” The agent needs semantic search to interpret “good for a beach vacation” (that’s a RAG problem). But it also needs to check real-time inventory, filter by size and price, and add items to a cart (that’s an MCP problem).
For content systems specifically, most teams will end up using both. User-facing retrieval (search, recommendations, Q&A) benefits from semantic search. Operational workflows (content audits, bulk updates, release management) need structured access and write capabilities. A CMS that supports both patterns from a single content layer gives you the most flexibility.
When should you use RAG vs. MCP?
Use this framework to figure out what you need for a given use case.
Start with three questions
1. What kind of data is the model working with?
| Data type | Leans toward |
|---|---|
| Long-form text (docs, articles, policies) | RAG |
| Structured records (products, pages, events) | MCP |
| Mix of both | Both |
2. Does the data change frequently?
| Change frequency | Leans toward |
|---|---|
| Rarely (quarterly, annually) | RAG |
| Daily or real-time | MCP |
| Some stable, some dynamic | Both |
3. Does the model need to take actions?
| Capabilities needed | Approach |
|---|---|
| Read-only Q&A | RAG is sufficient |
| Read + query by schema | MCP |
| Read + write + workflow | MCP |
| Semantic search + structured ops | Both |
Common patterns mapped to approach
Internal knowledge bot (answer employee questions from company docs) → RAG. The content is stable, unstructured, and the use case is pure Q&A.
AI support agent (answer customer questions from help docs + check order status) → Both. RAG for the help documentation, MCP for querying live order data.
Content operations assistant (find stale pages, bulk-update SEO fields, stage releases) → MCP. The model needs schema awareness and write access. Semantic search is a bonus, not the core.
Shopping assistant (product recommendations + cart management) → Both. Semantic search for discovery (“something for a beach trip”), MCP for inventory queries, filtering, and cart actions.
Developer copilot for CMS (query content, scaffold schemas, manage releases from an IDE) → MCP. The developer needs the model to operate the content system directly.
What this looks like in practice with Sanity
Both patterns work best when the underlying content is structured, queryable, and available in real time. Sanity is a Content Operating System, not a traditional CMS. Content Lake stores content as typed JSON documents with a schema, references, and real-time sync. That same content layer powers websites, apps, and AI agents without maintaining separate systems for each pattern.
Structured retrieval via Agent Context
Agent Context is how Sanity solves the retrieval problem for production agents. It’s a hosted MCP server designed for agents that serve end users. But unlike a typical RAG pipeline, it doesn’t require you to maintain a separate vector database or chunking pipeline.
Agent Context gives your agent access to two retrieval modes:
- Semantic search across your content (when embeddings are enabled on your dataset), built on top of Content Lake
- Structured queries that follow your schema: filter by document type, traverse references, query specific fields
Here’s how you’d connect an agent to your Sanity content using Agent Context:
import { createMCPClient } from '@ai-sdk/mcp'
const mcpClient = await createMCPClient({
transport: {
type: 'http',
url: process.env.SANITY_CONTEXT_MCP_URL,
headers: {
Authorization: `Bearer ${process.env.SANITY_API_READ_TOKEN}`,
},
},
})Agent Context exposes three tools that your MCP client discovers automatically. Your agent can run GROQ queries (including text::semanticSimilarity() for semantic ranking when embeddings are enabled), explore your schema, and retrieve content with structured filters.
Here’s what that looks like in GROQ. A user asks your shopping agent for “lightweight trail running shoes under $150.” Instead of searching for semantically similar text chunks and hoping the right one surfaces, the agent writes a query that combines structural constraints with semantic ranking:
*[_type == "product" && category == "shoes" && price < 150]
| score(text::semanticSimilarity("lightweight trail runner"))
| order(_score desc)
{ title, price, description }[0...5]The structural filter (category == “shoes” && price < 150) enforces real constraints. The semantic ranking (text::semanticSimilarity()) handles the fuzzy part: “lightweight trail runner” doesn’t need to be an exact field value. Every result matches every constraint and is ranked by relevance. That’s the difference between a search engine and a shopping assistant.
No vector database to manage. No chunking pipeline to maintain. But the agent behavior depends on how you design the system prompt and tool orchestration.
You configure access scope, content filters, and business rules through an Agent Context document in Sanity Studio. Your editors control what the agent can see without touching code.
This is the approach for production-facing agents: shopping assistants, support bots, recommendation engines, or any feature where an AI needs read access to your published content.
Full workspace access via the MCP Server
The Sanity MCP Server is a different tool for a different job. It connects AI assistants (Claude, Cursor, VS Code, Lovable, v0) directly to your Sanity workspace with full read/write access.
With the MCP Server, an agent can:
- Execute GROQ queries against your content
- Create and patch documents
- Deploy schema changes
- Manage Content Releases (stage, schedule, publish)
- Generate images and manage media assets
Set it up from the CLI in one command:
# For Claude Code claude mcp add Sanity -t http https://mcp.sanity.io --scope user
No API tokens to manage. No local npm install. OAuth handles authentication, and your edits show up as you in revision history.
This is the approach for developer and editorial workflows: content audits, bulk updates, schema management, and any task where the AI operates your content system rather than serving content to end users.
When to use which
| Use case | Sanity tool | Why |
|---|---|---|
| Customer-facing search or Q&A | Agent Context | Scoped, read-only, semantic + structured |
| Shopping or recommendation agent | Agent Context | Schema-aware retrieval with business rules |
| Content audit (find pages missing alt text) | MCP Server | Needs GROQ queries across the full dataset |
| Bulk content updates | MCP Server | Needs write access to patch documents |
| AI coding assistant that manages content | MCP Server | Full workspace operations from an IDE |
Building a combined architecture
For teams that need both patterns, the architecture looks like this:
Layer 1: Structured content in Content Lake. Your single source of truth. Content is stored as JSON documents with a schema, references, and real-time sync. This is the foundation for both retrieval patterns.
Layer 2: Agent Context for user-facing retrieval. Production agents connect via Agent Context MCP to query published content with semantic search and schema-aware filters. Access scope is governed by the Agent Context document in Studio. Editors manage what agents can see. Developers build features on top.
Layer 3: MCP Server for operational access. Developer and editorial tools connect via the MCP Server for full workspace operations: querying, writing, releasing, deploying.
Layer 4: Compute and Agent API for automation. Compute is Sanity’s serverless execution environment for event-driven functions. Agent API provides schema-aware AI instructions you can trigger from code. Together, they handle the work that should happen without human prompting: translations triggered by publish events, SEO fields generated on save, content synced across datasets.
The key insight: you don’t build a separate RAG pipeline and a separate MCP integration and hope they play nice. You build on one structured content layer and access it through the right interface for each use case.
How should you evaluate your CMS for AI readiness?
If you’re choosing a CMS with AI integration in mind, here’s what to look for:
Does it store content as structured data? Flat HTML blobs are hard for both RAG and MCP. Structured JSON with typed fields and references gives AI models something useful to work with.
Does it expose content via MCP? The Model Context Protocol is becoming the standard interface between AI models and external systems. A CMS with a native MCP server saves you from building and maintaining a custom integration layer.
Can you scope access for production agents? Your shopping assistant shouldn’t have the same permissions as your developer’s IDE. Look for per-agent configuration: content filters, field-level access, read-only modes.
Does it support real-time content? If your RAG pipeline requires re-indexing to reflect content changes, you’ll always have a staleness window. A CMS with live data and real-time sync eliminates that lag.
Is the schema available to AI agents? An MCP server that knows your schema can translate natural language into precise queries. Without schema awareness, the agent is guessing at your data structure.
Where are RAG and MCP heading?
RAG isn’t going away. For large, stable document collections, semantic retrieval remains the most efficient pattern. But the role of RAG is shifting from “the way AI gets context” to one retrieval mechanism among several.
MCP is expanding. As agents become more autonomous, the ability to chain operations (retrieve, analyze, act, verify) becomes more important than any single retrieval step. MCP’s tool-based architecture fits that trajectory.
The interesting development is the convergence: MCP servers that include semantic search as one of their tools, sitting alongside structured queries, write operations, and workflow triggers. That’s already happening. The distinction between “RAG system” and “MCP system” will matter less over time. What matters is whether your content layer gives AI models what they need: structure, real-time data, semantic understanding, and the ability to act.
Build on structured content. Expose it through the right interfaces. Let the patterns follow.