Skip to content
README.md 19.2 KiB
Newer Older
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# RAG System with Qwen
Nozomu05's avatar
Nozomu05 committed

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
A Retrieval-Augmented Generation (RAG) system that lets you query your documents using Ollama and Qwen models locally.
## Table of Contents

- [Installation and Setup](#installation-and-setup)
  - [Step 1: Install Ollama](#step-1-install-ollama)
  - [Step 2: Pull Required Ollama Models](#step-2-pull-required-ollama-models)
  - [Step 3: Enable CPU-Only Mode (For Low-End Computers)](#step-3-enable-cpu-only-mode-for-low-end-computers)
  - [Step 4: Install Pandoc (Optional)](#step-4-install-pandoc-optional)
  - [Step 5: Setup Python Environment](#step-5-setup-python-environment)
  - [Step 6: Configure Environment](#step-6-configure-environment)
  - [Step 7: Add Your Documents](#step-7-add-your-documents)
  - [Step 8: Ingest Documents](#step-8-ingest-documents)
  - [Step 9: Start the Frontend](#step-9-start-the-frontend)
- [Command-Line Query (Optional)](#command-line-query-optional)
- [Performance Notes](#performance-notes)
  - [CPU vs GPU Mode](#cpu-vs-gpu-mode)
  - [Model Recommendations by Hardware](#model-recommendations-by-hardware)
- [Model Upgrade Guide](#model-upgrade-guide)
  - [Current Setup (Fast/Testing)](#current-setup-fasttesting)
  - [Upgrading to Production Quality](#upgrading-to-production-quality)
  - [🏆 Recommended Production Configurations](#-recommended-production-configurations)
  - [⚡ Performance Impact Summary](#-performance-impact-summary)
- [🌍 Multilingual Functionality Guide](#-multilingual-functionality-guide)
  - [How Each Component Affects Multilingual Support](#how-each-component-affects-multilingual-support)
  - [Current System Multilingual Capability](#current-system-multilingual-capability)
  - [Upgrading to Full Multilingual Support](#upgrading-to-full-multilingual-support)
  - [Testing Multilingual Functionality](#testing-multilingual-functionality)
- [Chunking Configuration Guide](#chunking-configuration-guide)
  - [What is Chunking?](#what-is-chunking)
  - [Current Default Settings](#current-default-settings)
  - [How Chunk Size Affects Quality](#how-chunk-size-affects-quality)
  - [Why Overlap Matters](#why-overlap-matters)
  - [Recommended Settings by Document Type](#recommended-settings-by-document-type)
  - [How to Adjust Chunking](#how-to-adjust-chunking)
  - [Chunk Size Impact on Your System](#chunk-size-impact-on-your-system)

---

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
## Installation and Setup
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 1: Install Ollama
Nozomu05's avatar
Nozomu05 committed
```powershell
winget install Ollama.Ollama -e
```
Nozomu05's avatar
Nozomu05 committed
```powershell
ollama --version
```

Ollama runs as a Windows service automatically. If not running:
```powershell
Nozomu05's avatar
Nozomu05 committed
ollama serve
```

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS (via Homebrew):**
```bash
brew install ollama
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Verify installation:
```bash
ollama --version
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Start Ollama service:
```bash
ollama serve
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS (Manual Download):**
Download from [https://ollama.ai/download](https://ollama.ai/download) and install the .dmg file.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 2: Pull Required Ollama Models
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**LLM Model (for answering queries):**
```bash
ollama pull qwen2.5:14b-instruct
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Embedding Model (for semantic search):**
```bash
ollama pull mxbai-embed-large
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note for low-end computers:** The 14b model requires ~16GB RAM. If you have less RAM, use:
```bash
ollama pull qwen2.5:7b-instruct  # Requires ~8GB RAM
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 3: Enable CPU-Only Mode (For Low-End Computers)
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**If you have a low-end computer or insufficient GPU memory**, force Ollama to run on CPU only:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '0', 'User')
$env:OLLAMA_NUM_GPU = '0'
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
echo 'export OLLAMA_NUM_GPU=0' >> ~/.bashrc  # or ~/.zshrc for zsh
source ~/.bashrc  # or source ~/.zshrc
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Restart your terminal after setting this. The model will run slower but work on any computer.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**To re-enable GPU later (if you upgrade hardware):**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Windows
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '1', 'User')
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
```bash
# macOS/Linux - remove the line from ~/.bashrc or ~/.zshrc
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 4: Install Pandoc (Optional)
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Only needed if you have OpenDocument (.odt) files:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
winget install --id JohnMacFarlane.Pandoc -e
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS:**
```bash
brew install pandoc
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 5: Setup Python Environment
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
python -m venv .venv
.\.venv\Scripts\Activate.ps1
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note:** If you get an execution policy error:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python3 -m venv .venv
source .venv/bin/activate
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Install dependencies (all platforms):**
```bash
pip install -r requirements.txt
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 6: Configure Environment
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Copy-Item .env.example .env
```
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
cp .env.example .env
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Edit `.env` with your settings:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Model Configuration
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=mxbai-embed-large
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Retrieval Settings
RETRIEVAL_CHUNKS=100
TOP_N_RERANK=15
USE_RERANKING=true
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Document Processing
CHUNK_SIZE=800
CHUNK_OVERLAP=160
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note:** If using 7b model on low-end computer, change to `OLLAMA_MODEL=qwen2.5:7b-instruct`
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 7: Add Your Documents
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Place your documents (Word, PDF, PowerPoint, Text, Markdown, etc.) in the `docs/` folder.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 8: Ingest Documents
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Run the ingestion script to process your documents:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python rag/ingest.py
Nozomu05's avatar
Nozomu05 committed
```

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
This will:
- Extract ZIP files automatically
- Load and process all documents
- Generate embeddings
- Store vectors in the database
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 9: Start the Frontend
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Start the web interface:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
python frontend\app.py
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python frontend/app.py
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
The server will start at: **http://127.0.0.1:8000**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Open this URL in your browser to start querying your documents!
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
---
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
## Command-Line Query (Optional)
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
You can also run queries directly from the command line:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
python rag\query.py "Your question here"
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python rag/query.py "Your question here"
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
## Performance Notes
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### CPU vs GPU Mode
- **GPU Mode (default):** Fast responses (1-2 seconds with 14b model)
- **CPU-Only Mode:** Slower responses (8-15 seconds with 14b model) but works on any computer
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Model Recommendations by Hardware
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
| RAM Available | Recommended Model | CPU Query Time | Quality |
|---------------|-------------------|----------------|----------|
| 8GB | qwen2.5:7b-instruct | 3-5 seconds | Good |
| 16GB+ | qwen2.5:14b-instruct | 8-15 seconds | Excellent |
| 32GB+ | qwen2.5:32b | 30-60 seconds | Best |
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note:** These times are for CPU-only mode. GPU mode is 6-10x faster.
---

## Model Upgrade Guide

### Current Setup (Fast/Testing)
Your system is currently configured for **speed and testing**:
- **Embedding:** `sentence-transformers/all-MiniLM-L6-v2` (384-dim, very fast)
- **Reranker:** `BAAI/bge-reranker-base` (good quality)
- **LLM:** `qwen2.5:14b-instruct` (excellent balance)

### Upgrading to Production Quality

#### 🚀 **Embedding Model Upgrades**

**Current:** `sentence-transformers/all-MiniLM-L6-v2` (384-dim)
- Speed: ⚡⚡⚡⚡⚡ Very Fast (5x faster than BGE-large)
- Quality: ⭐⭐⭐ Good
- Use case: Testing, prototyping, fast iterations

**Option 1 - Balanced:** `BAAI/bge-base-en-v1.5` (768-dim)
- Speed: ⚡⚡⚡⚡ Fast (2x faster than BGE-large)
- Quality: ⭐⭐⭐⭐ Very Good
- Use case: Production with good performance/quality balance
- **Recommended for most users**

**Option 2 - Best Quality:** `BAAI/bge-large-en-v1.5` (1024-dim)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ Excellent
- Use case: Production where quality is critical
- Trade-off: Slower ingestion (but queries remain fast)

**Option 3 - Multilingual:** `BAAI/bge-m3` (1024-dim)
- Speed: ⚡⚡⚡ Moderate  
- Quality: ⭐⭐⭐⭐⭐ Excellent
- Use case: Multi-language documents (100+ languages)
- Supports: Chinese, French, Spanish, German, etc.

**To upgrade embedding model:**
```env
# In .env file, change:
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5  # or bge-large-en-v1.5
```
Then re-run: `python rag/ingest.py`

#### 🎯 **Reranker Model Upgrades**

**Current:** `BAAI/bge-reranker-base` (278M params)
- Speed: ⚡⚡⚡⚡ Fast
- Quality: ⭐⭐⭐⭐ Very Good
- Already excellent for most use cases

**Option 1 - Higher Quality:** `BAAI/bge-reranker-large` (560M params)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ Excellent
- Use case: When answer quality is critical
- Trade-off: 2x slower reranking (still fast overall)

**Option 2 - Best Available:** `BAAI/bge-reranker-v2-m3` (568M params)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ State-of-the-art
- Use case: Maximum accuracy, multilingual support
- Supports: 100+ languages

**To upgrade reranker:**
```env
# In .env file, change:
RERANKER_MODEL=BAAI/bge-reranker-large  # or bge-reranker-v2-m3
```
No re-ingestion needed, changes apply immediately!

#### 🤖 **LLM Model Upgrades**

**Current:** `qwen2.5:14b-instruct` (14B params, 8GB VRAM/16GB RAM)
- Speed: ⚡⚡⚡⚡ Fast
- Quality: ⭐⭐⭐⭐ Excellent
- Already very good for most tasks

**Option 1 - More Capable:** `qwen2.5:32b-instruct` (32B params, 20GB VRAM/32GB RAM)
- Speed: ⚡⚡⚡ Moderate (2x slower)
- Quality: ⭐⭐⭐⭐⭐ Outstanding
- Use case: Complex reasoning, technical documents
- Requirements: 32GB+ RAM recommended

**Option 2 - Maximum Quality:** `qwen2.5:72b-instruct` (72B params, 48GB VRAM/64GB RAM)
- Speed: ⚡⚡ Slow (5x slower)
- Quality: ⭐⭐⭐⭐⭐ Best available
- Use case: Research, critical analysis, highest accuracy
- Requirements: 64GB+ RAM, powerful hardware

**Option 3 - Faster Lightweight:** `qwen2.5:7b-instruct` (7B params, 4GB VRAM/8GB RAM)
- Speed: ⚡⚡⚡⚡⚡ Very Fast (2x faster)
- Quality: ⭐⭐⭐ Good
- Use case: Low-end hardware, quick responses

**To upgrade LLM:**
```bash
# Pull new model
ollama pull qwen2.5:32b-instruct

# Update .env
OLLAMA_MODEL=qwen2.5:32b-instruct
```
No re-ingestion needed!

### 🏆 **Recommended Production Configurations**

#### **Configuration 1: Balanced (Recommended)**
```env
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
RERANKER_MODEL=BAAI/bge-reranker-base
OLLAMA_MODEL=qwen2.5:14b-instruct
```
- **Speed:** Fast
- **Quality:** Very Good
- **Hardware:** 16GB RAM minimum
- **Best for:** Most production use cases

#### **Configuration 2: Maximum Quality**
```env
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
OLLAMA_MODEL=qwen2.5:32b-instruct
```
- **Speed:** Moderate
- **Quality:** Excellent
- **Hardware:** 32GB RAM minimum
- **Best for:** Critical applications, research

#### **Configuration 3: Fast & Efficient (Current)**
```env
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
RERANKER_MODEL=BAAI/bge-reranker-base
OLLAMA_MODEL=qwen2.5:14b-instruct
```
- **Speed:** Very Fast
- **Quality:** Good
- **Hardware:** 16GB RAM minimum
- **Best for:** Testing, development, rapid iteration

#### **Configuration 4: Multilingual**
```env
EMBEDDING_MODEL=BAAI/bge-m3
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
OLLAMA_MODEL=qwen2.5:14b-instruct
```
- **Speed:** Moderate
- **Quality:** Excellent
- **Hardware:** 16GB RAM minimum
- **Best for:** Multi-language document collections

### ⚡ **Performance Impact Summary**

| Component | Affects | Re-ingestion Required? |
|-----------|---------|------------------------|
| Embedding Model | Ingestion speed, retrieval quality | ✅ Yes |
| Reranker Model | Query speed (minimal), answer quality | ❌ No |
| LLM Model | Response generation speed/quality | ❌ No |

**Note:** Upgrading embedding model requires re-running `python rag/ingest.py` to rebuild the vector database with new embeddings.

---

## 🌍 Multilingual Functionality Guide

The chatbot **automatically responds in the language you use** to ask questions (English, French, Spanish, etc.). However, **each model component affects multilingual quality differently**:

### How Each Component Affects Multilingual Support

#### **1. Embedding Model - CRITICAL for Multilingual Retrieval** 🔴

**Impact:** Determines if your question in ANY language can find relevant documents

**Current Model:** `sentence-transformers/all-MiniLM-L6-v2`
- ⚠️ **English-only optimized**
- Non-English queries will retrieve less relevant documents
- Works for English, poor for French/Spanish/other languages

**Recommended for Multilingual:**
```env
EMBEDDING_MODEL=BAAI/bge-m3
# or
EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
```

**Why it matters:**
- French question → English-focused embeddings → retrieves wrong documents → LLM gets irrelevant context → poor answer **even if LLM speaks French**
- Multilingual embeddings → retrieves correct documents in any language → LLM gets relevant context → excellent answer

**⚠️ Requires re-ingestion:** YES - `python rag/ingest.py`

---

#### **2. Reranker Model - Important for Multilingual Precision** 🟡

**Impact:** Refines which documents are most relevant to your question

**Current Model:** `BAAI/bge-reranker-base`
- ⚠️ **English-focused**
- Can rerank, but less accurate for non-English queries

**Recommended for Multilingual:**
```env
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
```

**Why it matters:**
- Even if embeddings retrieve 10 good multilingual documents, English-only reranker might rank them poorly
- Multilingual reranker correctly identifies the most relevant chunks in any language

**⚠️ Requires re-ingestion:** NO - just update `.env` and restart

---

#### **3. LLM (Text Generation Model) - Determines Answer Language** 🟢

**Impact:** Generates the actual response in the target language

**Current Model:** `qwen2.5:14b-instruct`
-**Excellent multilingual support** (100+ languages)
- Strong in: English, Chinese, French, Spanish, German, Japanese, Korean, Arabic, and more
- The prompt automatically instructs it to respond in the question's language

**Alternative Multilingual LLMs:**
```bash
ollama pull qwen2.5:32b-instruct    # Best multilingual quality
ollama pull llama3.1:8b             # Good for European languages
ollama pull mistral:7b-instruct     # Good for French/English
```

**Why it matters:**
- Even with perfect retrieval, if LLM doesn't support the language, answers will be poor or in wrong language
- Qwen models are already excellent for multilingual - upgrading mainly improves reasoning depth

**⚠️ Requires re-ingestion:** NO - just update `.env` and restart

---

### Current System Multilingual Capability

| Component | Current Model | Multilingual? | Impact on Non-English |
|-----------|---------------|---------------|------------------------|
| **Embedding** | all-MiniLM-L6-v2 | ❌ English-only | 🔴 **Poor retrieval** for non-English questions |
| **Reranker** | bge-reranker-base | ⚠️ English-focused | 🟡 **Suboptimal ranking** for non-English |
| **LLM** | qwen2.5:14b-instruct | ✅ Excellent | ✅ **Perfect responses** in any language |

**Result:** The LLM **CAN respond** in French/Spanish/etc., but will work with **lower-quality context** retrieved by English-only embeddings.

---

### Upgrading to Full Multilingual Support

**Recommended Configuration:**

```env
# In .env file
EMBEDDING_MODEL=BAAI/bge-m3
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
OLLAMA_MODEL=qwen2.5:14b-instruct
```

**Steps:**
1. Update `.env` with multilingual models
2. Re-ingest documents: `python rag/ingest.py` (required for embedding change)
3. Restart frontend/queries

**Benefits:**
- ✅ Excellent retrieval for questions in **any language**
- ✅ Accurate reranking regardless of language
- ✅ High-quality answers in **100+ languages**

**Trade-offs:**
- Slightly slower (BGE-m3 is ~2x slower than all-MiniLM-L6-v2)
- Larger model downloads (~3GB vs 90MB)

---

### Testing Multilingual Functionality

```powershell
# English
python rag/query.py "What are the latest V-PCC compression results?"

# French
python rag/query.py "Quels sont les derniers résultats de compression V-PCC ?"

# Spanish
python rag/query.py "¿Cuáles son los últimos resultados de compresión V-PCC?"
```

**Expected behavior:**
- ✅ LLM responds in the correct language (works with current setup)
- ⚠️ Answer quality may be lower for non-English with current English-only embeddings
- ✅ Full quality in all languages after upgrading to multilingual embeddings

---

## Chunking Configuration Guide

### What is Chunking?

Chunking splits large documents into smaller pieces for better retrieval and processing. **Chunk settings significantly impact answer quality!**

### Current Default Settings:
```env
CHUNK_SIZE=800          # ~150-200 words, 2-3 paragraphs
CHUNK_OVERLAP=100       # 12.5% overlap between chunks
```

### How Chunk Size Affects Quality:

| Chunk Size | Best For | Pros | Cons |
|------------|----------|------|------|
| **300-600** | FAQs, snippets, Q&A | Precise retrieval, fast | May fragment ideas |
| **800-1000** | General technical docs | Balanced context/precision | Good all-around |
| **1200-1500** | Dense specs, standards | Complete explanations | Slower retrieval |
| **1500-2000** | Research papers, articles | Preserves narrative | May dilute relevance |

### Why Overlap Matters:

**Without overlap (0):**
```
Chunk 1: "...the solution requires three steps. First,"
Chunk 2: "Second, process the data. Third, validate..."
```
❌ Retrieving Chunk 2 misses "First" step!

**With overlap (10-20%):**
```
Chunk 1: "...the solution requires three steps. First, initialize."
Chunk 2: "...three steps. First, initialize. Second, process..."
```
✅ Important information appears in multiple chunks!

### Recommended Settings by Document Type:

#### **Dense Technical Specifications** (MPEG, ISO, IEEE standards)
```env
CHUNK_SIZE=1200
CHUNK_OVERLAP=200
```
- **Why:** Technical specs need complete multi-paragraph explanations
- **Example:** Algorithm descriptions, performance tables, conformance requirements
- **Impact:** Better context for complex technical questions

#### **Short FAQs / Knowledge Base**
```env
CHUNK_SIZE=500
CHUNK_OVERLAP=75
```
- **Why:** Quick, focused answers without excess context
- **Example:** Troubleshooting guides, quick reference docs
- **Impact:** Faster, more precise retrieval

#### **Long-Form Articles / Research Papers**
```env
CHUNK_SIZE=1500
CHUNK_OVERLAP=300
```
- **Why:** Preserves argument flow and narrative structure
- **Example:** White papers, academic articles, detailed reports
- **Impact:** Maintains logical connections between ideas

#### **Mixed Document Collection** (Recommended)
```env
CHUNK_SIZE=1000
CHUNK_OVERLAP=150
```
- **Why:** Good balance for varied content types
- **Example:** Mix of specs, guides, and reports
- **Impact:** Versatile performance across document types

### How to Adjust Chunking:

1. **Edit `.env` file:**
   ```env
   CHUNK_SIZE=1200
   CHUNK_OVERLAP=200
   ```

2. **Re-ingest your documents:**
   ```powershell
   python rag/ingest.py
   ```

3. **Test with same questions** to compare quality

### Chunk Size Impact on Your System:

| Setting | Total Chunks | Retrieval Speed | Context Quality |
|---------|--------------|-----------------|------------------|
| 500/75 | ~45,000 | Fastest | Fragmented |
| 800/100 | ~29,000 | Fast | Good |
| 1000/150 | ~23,000 | Medium | Better |
| 1500/300 | ~15,000 | Slower | Most Complete |

**Rule of Thumb:** Overlap should be 10-20% of chunk size for optimal results.