Chunking is the most underrated part of RAG. How you split your documents determines whether retrieval finds the right information or returns garbage.
| Strategy | How It Works | Pros | Cons | Best For |
|---|---|---|---|---|
| Fixed-Size | Split every N characters/tokens | Simple, predictable | Splits mid-sentence, loses context | Quick prototypes |
| Sentence-Based | Split on sentence boundaries | Preserves meaning | Uneven chunk sizes | Prose documents |
| Recursive | Split by headers, then paragraphs, then sentences | Respects document structure | Requires structured input | Technical docs, Markdown |
| Semantic | Embed sentences, group by similarity | Groups related content | Expensive, slow | Diverse documents |
| Parent-Child | Small chunks for search, large chunks for context | Best of both worlds | Complex to implement | Production systems |
// Parent-Child Chunking:
// 1. Create SMALL chunks (200 tokens) for embedding & retrieval
// 2. Each small chunk points to its PARENT (2000 token section)
// 3. Search returns small chunks, but you send the PARENT to the LLM
Small chunk (for search): "React 19 introduces server components..."
↓ maps to ↓
Parent chunk (for LLM): [Full 2000-token section about React 19 architecture]
This gives you precise retrieval (small chunks match queries better) with rich context (the LLM sees the full section).
| Model | Dimensions | Max Tokens | Cost | Quality |
|---|---|---|---|---|
text-embedding-3-large | 3072 | 8191 | $0.13/1M | Highest |
text-embedding-3-small | 1536 | 8191 | $0.02/1M | Good |
voyage-3 | 1024 | 32000 | $0.06/1M | Excellent for code |
cohere-embed-v3 | 1024 | 512 | $0.10/1M | Great for multi-lingual |