Jun 9, 2025

RAG Implementation: Lessons from the Field

Retrieval Augmented Generation – RAG – has become the default architecture for enterprise AI applications that need to work with organisation-specific data. As MIT Technology Review has explained, the concept is straightforward: instead of relying on what the language model knows, retrieve relevant documents and provide them as context.

Simple concept. Harder execution. Here’s what I’ve learned from RAG deployments over the past year.

The Promise vs. Reality

The promise: Upload your documents, point AI at them, get accurate answers grounded in your data.

The reality: Document quality matters enormously. Retrieval is imperfect. Generation can still hallucinate. User experience depends on getting many things right simultaneously.

RAG works, but not as easily as vendors suggest.

Lesson 1: Document Quality Is Everything

The single biggest determinant of RAG success is the quality of your source documents.

Problems I’ve seen:

PDF documents that OCR poorly, creating garbled text
Documents with embedded images that contain critical information
Scanned documents with handwritten annotations
Complex tables that lose structure when converted to text
Documents with inconsistent formatting
Multiple versions of the same document in the corpus

What works:

Clean, well-structured digital documents
Consistent formatting across the corpus
Explicit section headings and document structure
Text-based content (not information in images)
Deduplicated, authoritative versions only

If your documents are messy, no amount of RAG sophistication will compensate. Start with document hygiene before worrying about chunking strategies.

Lesson 2: Chunking Strategy Matters More Than You’d Think

How you split documents into retrievable chunks significantly affects quality.

Too small: Chunks lack context. The retrieved snippet doesn’t contain enough information to answer questions fully.

Too large: Chunks include irrelevant content. The language model must filter noise, reducing accuracy.

Wrong boundaries: Splitting mid-paragraph or mid-section loses coherence. Important information gets separated from necessary context.

What works:

Respect document structure (sections, paragraphs) as chunk boundaries
Include overlap between chunks to preserve context at boundaries
Vary chunk size by document type (contracts need different handling than FAQs)
Test with real queries to validate chunking approach

There’s no universal optimal chunk size. It depends on your documents and your queries. Experimentation is required.

Lesson 3: Retrieval Quality Is the Bottleneck

Even with good documents and smart chunking, retrieval must find the right content.

Common retrieval problems:

Semantic search misses when queries use different terminology than documents
Relevant content spread across multiple chunks isn’t reassembled
High-quality chunks ranked below mediocre ones
Retrieval returns the obvious answer when users need the nuanced one

What works:

Hybrid retrieval combining semantic and keyword search
Query expansion to capture terminology variations
Reranking retrieved results before passing to LLM
Metadata filtering to narrow search scope appropriately
Iterative refinement based on retrieval failure analysis

Retrieval is a separate problem from generation. Treat it as such. Monitor retrieval quality independent of final answer quality.

Lesson 4: Generation Still Needs Guardrails

Even with perfect retrieval, generation can fail.

Generation problems:

Hallucinated content that sounds plausible but isn’t in the retrieved documents
Answers that ignore retrieved context and use model’s base knowledge
Confidently wrong answers when retrieved content is ambiguous
Failure to acknowledge when retrieved content doesn’t address the question

What works:

Explicit prompting to only use retrieved content
Citation requirements in output format
Confidence indicators for uncertain answers
Clear “I don’t know” responses when content is insufficient
Human review for high-stakes applications

RAG reduces hallucination but doesn’t eliminate it. Design for the failure mode.

Lesson 5: User Experience Determines Success

A technically excellent RAG system can still fail if users don’t trust or use it.

User experience issues:

Slow response times (retrieval + generation takes seconds)
Answers that are technically correct but not useful
No way to verify why the system gave a particular answer
Inconsistent quality across different query types
Poor handling of queries outside the system’s scope

What works:

Show sources with answers so users can verify
Acknowledge uncertainty explicitly
Fast response times (even if it means simpler approaches)
Clear communication about what the system can and can’t answer
Feedback mechanisms to capture when answers are wrong

The best RAG system is the one people actually use. User-centred design matters as much as technical architecture.

Lesson 6: Maintenance Is Ongoing

RAG isn’t deploy-and-forget. Sources change. User needs evolve. Quality degrades.

Maintenance requirements:

Regular document corpus updates as source content changes
Monitoring retrieval and generation quality metrics
Addressing emerging failure patterns
Reindexing when chunking strategy needs adjustment
User feedback triage and system improvement

Budget for:

Ongoing infrastructure costs
Regular quality assessment
Maintenance development capacity
User support

RAG systems that aren’t maintained degrade. Plan for operational costs from the start.

When RAG Is and Isn’t Appropriate

Good RAG use cases:

Internal knowledge bases with established documentation
Customer support with consistent, documented answers
Policy and compliance questions with authoritative sources
Research assistance across large document collections

Challenging RAG use cases:

Rapidly changing information (frequent reindexing needed)
Nuanced questions requiring synthesis across many sources
Content primarily in images, videos, or non-text formats
Questions requiring reasoning beyond document content

Not every AI application needs RAG. Sometimes fine-tuning, prompt engineering, or traditional search is more appropriate.

Final Thought

RAG is a powerful architecture that’s become enterprise AI’s workhorse. But it’s not magic. Success requires attention to documents, chunking, retrieval, generation, and user experience – all simultaneously.

The organisations getting value from RAG are those treating it as serious engineering, not just plugging documents into a platform. The complexity is manageable, but it can’t be ignored.

Do the work. Get the details right. RAG rewards thoroughness.