Newer
Older
A Retrieval-Augmented Generation (RAG) system that lets you query your documents using Qwen models from HuggingFace Transformers locally.
Arthur Delarue
committed
---
Minh Hoang Anh TRAN
committed
## Table of Contents
- [Installation and Setup](#installation-and-setup)
- [Linux Prerequisites](#linux-prerequisites)
- [Step 1: Setup Python Environment](#step-1-setup-python-environment)
- [Step 2: Install Pandoc (Optional)](#step-2-install-pandoc-optional)
- [Step 3: Configure Environment](#step-3-configure-environment)
- [Step 4: Add Your Documents](#step-4-add-your-documents)
- [Step 5: Ingest Documents](#step-5-ingest-documents)
- [Step 6: Start the Frontend](#step-6-start-the-frontend)
Minh Hoang Anh TRAN
committed
- [Command-Line Query (Optional)](#command-line-query-optional)
- [Interactive Mode (Recommended)](#interactive-mode-recommended)
- [Single Query Mode](#single-query-mode)
Minh Hoang Anh TRAN
committed
- [Performance Notes](#performance-notes)
- [CPU vs GPU Mode](#cpu-vs-gpu-mode)
- [Model Recommendations by Hardware](#model-recommendations-by-hardware)
- [Model Upgrade Guide](#model-upgrade-guide)
- [Current Setup (Fast/Testing)](#current-setup-fasttesting)
- [Upgrading to Production Quality](#upgrading-to-production-quality)
- [🏆 Recommended Production Configurations](#-recommended-production-configurations)
- [⚡ Performance Impact Summary](#-performance-impact-summary)
- [🌍 Multilingual Functionality Guide](#-multilingual-functionality-guide)
- [How Each Component Affects Multilingual Support](#how-each-component-affects-multilingual-support)
- [Current System Multilingual Capability](#current-system-multilingual-capability)
- [Upgrading to Full Multilingual Support](#upgrading-to-full-multilingual-support)
- [Testing Multilingual Functionality](#testing-multilingual-functionality)
Minh Hoang Anh TRAN
committed
- [Chunking Configuration Guide](#chunking-configuration-guide)
- [What is Chunking?](#what-is-chunking)
- [Current Default Settings](#current-default-settings)
- [How Chunk Size Affects Quality](#how-chunk-size-affects-quality)
- [Why Overlap Matters](#why-overlap-matters)
- [Recommended Settings by Document Type](#recommended-settings-by-document-type)
- [How to Adjust Chunking](#how-to-adjust-chunking)
- [Chunk Size Impact on Your System](#chunk-size-impact-on-your-system)
---
Arthur Delarue
committed
### Linux Prerequisites
Arthur Delarue
committed
**For Ubuntu/Debian-based distributions:**
```bash
# Update package list
sudo apt update
Arthur Delarue
committed
# Install Python 3.10+ and pip
sudo apt install python3 python3-pip python3-venv
Arthur Delarue
committed
# Install development tools (required for some Python packages)
sudo apt install build-essential python3-dev
**For Fedora/RHEL/CentOS:**
# Install Python 3.10+ and pip
sudo dnf install python3 python3-pip
Arthur Delarue
committed
# Install development tools
sudo dnf groupinstall "Development Tools"
sudo dnf install python3-devel
Arthur Delarue
committed
```
**For Arch Linux:**
# Install Python and pip
sudo pacman -S python python-pip
Arthur Delarue
committed
# Install base development tools
sudo pacman -S base-devel
Arthur Delarue
committed
```
**Verify Python installation:**
python3 --version # Should be 3.10 or higher
pip3 --version
Arthur Delarue
committed
```
### Step 1: Setup Python Environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
```
**Note:** If you get an execution policy error:
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
python3 -m venv .venv
source .venv/bin/activate
**Install dependencies (all platforms):**
pip install -r requirements.txt
Arthur Delarue
committed
```
**Note:** The first time you run a query, the Qwen model (~28GB for 14B model) will download automatically to `~/.cache/huggingface/`. This may take some time depending on your internet connection.
### Step 2: Install Pandoc (Optional)
Arthur Delarue
committed
Only needed if you have OpenDocument (.odt) files:
Arthur Delarue
committed
Arthur Delarue
committed
```powershell
Arthur Delarue
committed
```
Arthur Delarue
committed
Arthur Delarue
committed
```
**Linux:**
# Ubuntu/Debian
sudo apt update
sudo apt install pandoc
Arthur Delarue
committed
# Fedora/RHEL/CentOS
sudo dnf install pandoc
# Arch Linux
sudo pacman -S pandoc
Arthur Delarue
committed
```
Arthur Delarue
committed
### Step 3: Configure Environment
Arthur Delarue
committed
Arthur Delarue
committed
```powershell
Arthur Delarue
committed
**macOS/Linux:**
```bash
cp .env.example .env
Arthur Delarue
committed
```
Arthur Delarue
committed
```env
LLM_PROVIDER=transformers
TRANSFORMERS_MODEL=Qwen/Qwen2.5-14B-Instruct
MAX_NEW_TOKENS=4096
TEMPERATURE=0
LLM_SEED=42
QUANTIZATION=4bit
# Device Configuration (auto, cuda, or cpu)
LLM_DEVICE=auto
EMBEDDING_DEVICE=cuda
RERANKER_DEVICE=cuda
# Embedding Configuration
EMBEDDING_PROVIDER=huggingface
EMBEDDING_MODEL=intfloat/multilingual-e5-large-instruct
TOP_N_RERANK=8
CHUNK_OVERLAP=100
**Note:** If using a lower-spec computer, change to `TRANSFORMERS_MODEL=Qwen/Qwen2.5-7B-Instruct` for faster performance. If you don't have a GPU, set all device settings to `cpu`.
### Step 4: Add Your Documents
Arthur Delarue
committed
Place your documents (Word, PDF, PowerPoint, Text, Markdown, etc.) in the `docs/` folder.
Arthur Delarue
committed
### Step 5: Ingest Documents
Arthur Delarue
committed
Run the ingestion script to process your documents:
Arthur Delarue
committed
python rag\ingest.py
```
**macOS/Linux:**
```bash
python rag/ingest.py
This will:
- Extract ZIP files automatically
- Load and process all documents
- Generate embeddings
- Store vectors in the database
### Step 6: Start the Frontend
**macOS/Linux:**
```bash
python frontend/app.py
The server will start at: **http://127.0.0.1:8000**
Open this URL in your browser to start querying your documents!
### Interactive Mode (Recommended)
For multiple queries without reloading the model each time:
**macOS/Linux:**
```bash
python rag/query_interactive.py
```
python rag\query_interactive.py
```
This loads the model **once** and keeps it in memory. You can then ask multiple questions without the 15-second checkpoint loading delay.
**Example session:**
Query: What is V-PCC?
[Answer streams in real-time...]
Query: How does it compare to G-PCC?
[Answer streams immediately - no reload!]
Query: quit
```
### Single Query Mode
For one-off queries from the command line:
**macOS/Linux:**
```bash
python rag/query.py "Your question here"
**Windows:**
```powershell
python rag\query.py "Your question here"
```
**Note:** This reloads the model each time (~15s startup)
Arthur Delarue
committed
---
Arthur Delarue
committed
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
The system can run on either CPU or GPU for optimal performance. You can configure which device each component uses in your `.env` file:
```env
# Device configuration
# Options: auto (auto-detect GPU), cuda (force GPU), cpu (force CPU)
LLM_DEVICE=auto # Qwen language model
EMBEDDING_DEVICE=cuda # Document/query embeddings
RERANKER_DEVICE=cuda # Re-ranking model
```
**Device Options:**
- `auto` - Automatically detects and uses GPU if available (recommended for LLM)
- `cuda` - Forces GPU usage (fastest, requires NVIDIA GPU with CUDA)
- `cpu` - Forces CPU usage (slower but works on any computer)
**Performance Comparison (14B model):**
- **GPU Mode (cuda):** Fast responses (1-2 seconds)
- **CPU-Only Mode (cpu):** Slower responses (8-15 seconds) but works on any computer
- **Auto Mode (auto):** Best of both worlds - uses GPU if available, falls back to CPU
**Recommended Configurations:**
*For systems with NVIDIA GPU:*
```env
LLM_DEVICE=auto # Use GPU if available
EMBEDDING_DEVICE=cuda # Embeddings are 10-50x faster on GPU
RERANKER_DEVICE=cuda # Re-ranking is faster on GPU
```
*For CPU-only systems (no GPU):*
```env
LLM_DEVICE=cpu
EMBEDDING_DEVICE=cpu
RERANKER_DEVICE=cpu
```
*For systems with limited GPU memory:*
```env
LLM_DEVICE=cpu # Save GPU memory
EMBEDDING_DEVICE=cuda # Embeddings use less memory
RERANKER_DEVICE=cpu # Only when needed
```
**Note:** After changing device settings, restart the application for changes to take effect. Re-ingestion is not required unless you change `EMBEDDING_DEVICE` after already ingesting documents.
Arthur Delarue
committed
| RAM Available | Recommended Model | CPU Query Time | Quality |
|---------------|-------------------|----------------|----------|
| 8GB | qwen2.5:7b-instruct | 3-5 seconds | Good |
| 16GB+ | qwen2.5:14b-instruct | 8-15 seconds | Excellent |
| 32GB+ | qwen2.5:32b | 30-60 seconds | Best |
Arthur Delarue
committed
**Note:** These times are for CPU-only mode. GPU mode is 6-10x faster.
Minh Hoang Anh TRAN
committed
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
---
## Model Upgrade Guide
### Current Setup (Fast/Testing)
Your system is currently configured for **speed and testing**:
- **Embedding:** `sentence-transformers/all-MiniLM-L6-v2` (384-dim, very fast)
- **Reranker:** `BAAI/bge-reranker-base` (good quality)
- **LLM:** `qwen2.5:14b-instruct` (excellent balance)
### Upgrading to Production Quality
#### 🚀 **Embedding Model Upgrades**
**Current:** `sentence-transformers/all-MiniLM-L6-v2` (384-dim)
- Speed: ⚡⚡⚡⚡⚡ Very Fast (5x faster than BGE-large)
- Quality: ⭐⭐⭐ Good
- Use case: Testing, prototyping, fast iterations
**Option 1 - Balanced:** `BAAI/bge-base-en-v1.5` (768-dim)
- Speed: ⚡⚡⚡⚡ Fast (2x faster than BGE-large)
- Quality: ⭐⭐⭐⭐ Very Good
- Use case: Production with good performance/quality balance
- **Recommended for most users**
**Option 2 - Best Quality:** `BAAI/bge-large-en-v1.5` (1024-dim)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ Excellent
- Use case: Production where quality is critical
- Trade-off: Slower ingestion (but queries remain fast)
**Option 3 - Multilingual:** `BAAI/bge-m3` (1024-dim)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ Excellent
- Use case: Multi-language documents (100+ languages)
- Supports: Chinese, French, Spanish, German, etc.
**To upgrade embedding model:**
```env
# In .env file, change:
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5 # or bge-large-en-v1.5
```
Then re-run: `python rag/ingest.py`
#### 🎯 **Reranker Model Upgrades**
**Current:** `BAAI/bge-reranker-base` (278M params)
- Speed: ⚡⚡⚡⚡ Fast
- Quality: ⭐⭐⭐⭐ Very Good
- Already excellent for most use cases
**Option 1 - Higher Quality:** `BAAI/bge-reranker-large` (560M params)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ Excellent
- Use case: When answer quality is critical
- Trade-off: 2x slower reranking (still fast overall)
**Option 2 - Best Available:** `BAAI/bge-reranker-v2-m3` (568M params)
- Speed: ⚡⚡⚡ Moderate
- Quality: ⭐⭐⭐⭐⭐ State-of-the-art
- Use case: Maximum accuracy, multilingual support
- Supports: 100+ languages
**To upgrade reranker:**
```env
# In .env file, change:
RERANKER_MODEL=BAAI/bge-reranker-large # or bge-reranker-v2-m3
```
No re-ingestion needed, changes apply immediately!
#### 🤖 **LLM Model Upgrades**
**Current:** `qwen2.5:14b-instruct` (14B params, 8GB VRAM/16GB RAM)
- Speed: ⚡⚡⚡⚡ Fast
- Quality: ⭐⭐⭐⭐ Excellent
- Already very good for most tasks
**Option 1 - More Capable:** `qwen2.5:32b-instruct` (32B params, 20GB VRAM/32GB RAM)
- Speed: ⚡⚡⚡ Moderate (2x slower)
- Quality: ⭐⭐⭐⭐⭐ Outstanding
- Use case: Complex reasoning, technical documents
- Requirements: 32GB+ RAM recommended
**Option 2 - Maximum Quality:** `Qwen/Qwen2.5-72B-Instruct` (72B params, 48GB VRAM/64GB RAM)
Minh Hoang Anh TRAN
committed
- Speed: ⚡⚡ Slow (5x slower)
- Quality: ⭐⭐⭐⭐⭐ Best available
- Use case: Research, critical analysis, highest accuracy
- Requirements: 64GB+ RAM, powerful hardware
**Option 3 - Faster Lightweight:** `Qwen/Qwen2.5-7B-Instruct` (7B params, 4GB VRAM/8GB RAM)
Minh Hoang Anh TRAN
committed
- Speed: ⚡⚡⚡⚡⚡ Very Fast (2x faster)
- Quality: ⭐⭐⭐ Good
- Use case: Low-end hardware, quick responses
**To upgrade LLM:**
Minh Hoang Anh TRAN
committed
# Update .env
TRANSFORMERS_MODEL=Qwen/Qwen2.5-32B-Instruct
Minh Hoang Anh TRAN
committed
```
The new model will download automatically on first use. No re-ingestion needed!
Minh Hoang Anh TRAN
committed
### 🏆 **Recommended Production Configurations**
#### **Configuration 1: Balanced (Recommended)**
```env
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
RERANKER_MODEL=BAAI/bge-reranker-base
TRANSFORMERS_MODEL=Qwen/Qwen2.5-14B-Instruct
Minh Hoang Anh TRAN
committed
```
- **Speed:** Fast
- **Quality:** Very Good
- **Hardware:** 16GB RAM minimum
- **Best for:** Most production use cases
#### **Configuration 2: Maximum Quality**
```env
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
TRANSFORMERS_MODEL=Qwen/Qwen2.5-32B-Instruct
Minh Hoang Anh TRAN
committed
```
- **Speed:** Moderate
- **Quality:** Excellent
- **Hardware:** 32GB RAM minimum
- **Best for:** Critical applications, research
#### **Configuration 3: Fast & Efficient (Current)**
```env
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
RERANKER_MODEL=BAAI/bge-reranker-base
TRANSFORMERS_MODEL=Qwen/Qwen2.5-14B-Instruct
Minh Hoang Anh TRAN
committed
```
- **Speed:** Very Fast
- **Quality:** Good
- **Hardware:** 16GB RAM minimum
- **Best for:** Testing, development, rapid iteration
#### **Configuration 4: Multilingual**
```env
EMBEDDING_MODEL=BAAI/bge-m3
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
TRANSFORMERS_MODEL=Qwen/Qwen2.5-14B-Instruct
Minh Hoang Anh TRAN
committed
```
- **Speed:** Moderate
- **Quality:** Excellent
- **Hardware:** 16GB RAM minimum
- **Best for:** Multi-language document collections
### ⚡ **Performance Impact Summary**
| Component | Affects | Re-ingestion Required? |
|-----------|---------|------------------------|
| Embedding Model | Ingestion speed, retrieval quality | ✅ Yes |
| Reranker Model | Query speed (minimal), answer quality | ❌ No |
| LLM Model | Response generation speed/quality | ❌ No |
**Note:** Upgrading embedding model requires re-running `python rag/ingest.py` to rebuild the vector database with new embeddings.
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
---
## 🌍 Multilingual Functionality Guide
The chatbot **automatically responds in the language you use** to ask questions (English, French, Spanish, etc.). However, **each model component affects multilingual quality differently**:
### How Each Component Affects Multilingual Support
#### **1. Embedding Model - CRITICAL for Multilingual Retrieval** 🔴
**Impact:** Determines if your question in ANY language can find relevant documents
**Current Model:** `sentence-transformers/all-MiniLM-L6-v2`
- ⚠️ **English-only optimized**
- Non-English queries will retrieve less relevant documents
- Works for English, poor for French/Spanish/other languages
**Recommended for Multilingual:**
```env
EMBEDDING_MODEL=BAAI/bge-m3
# or
EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
```
**Why it matters:**
- French question → English-focused embeddings → retrieves wrong documents → LLM gets irrelevant context → poor answer **even if LLM speaks French**
- Multilingual embeddings → retrieves correct documents in any language → LLM gets relevant context → excellent answer
**⚠️ Requires re-ingestion:** YES - `python rag/ingest.py`
---
#### **2. Reranker Model - Important for Multilingual Precision** 🟡
**Impact:** Refines which documents are most relevant to your question
**Current Model:** `BAAI/bge-reranker-base`
- ⚠️ **English-focused**
- Can rerank, but less accurate for non-English queries
**Recommended for Multilingual:**
```env
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
```
**Why it matters:**
- Even if embeddings retrieve 10 good multilingual documents, English-only reranker might rank them poorly
- Multilingual reranker correctly identifies the most relevant chunks in any language
**⚠️ Requires re-ingestion:** NO - just update `.env` and restart
---
#### **3. LLM (Text Generation Model) - Determines Answer Language** 🟢
**Impact:** Generates the actual response in the target language
**Current Model:** `qwen2.5:14b-instruct`
- ✅ **Excellent multilingual support** (100+ languages)
- Strong in: English, Chinese, French, Spanish, German, Japanese, Korean, Arabic, and more
- The prompt automatically instructs it to respond in the question's language
**Alternative Multilingual LLMs:**
```env
# In .env file
TRANSFORMERS_MODEL=Qwen/Qwen2.5-14B-Instruct # Excellent for 100+ languages
TRANSFORMERS_MODEL=Qwen/Qwen2.5-32B-Instruct # Best multilingual quality
# Other alternatives:
# TRANSFORMERS_MODEL=meta-llama/Llama-3.1-8B-Instruct # Good for European languages
```
**Why it matters:**
- Even with perfect retrieval, if LLM doesn't support the language, answers will be poor or in wrong language
- Qwen models are already excellent for multilingual - upgrading mainly improves reasoning depth
---
### Current System Multilingual Capability
| Component | Current Model | Multilingual? | Impact on Non-English |
|-----------|---------------|---------------|------------------------|
| **Embedding** | all-MiniLM-L6-v2 | ❌ English-only | 🔴 **Poor retrieval** for non-English questions |
| **Reranker** | bge-reranker-base | ⚠️ English-focused | 🟡 **Suboptimal ranking** for non-English |
| **LLM** | Qwen2.5-14B-Instruct | ✅ Excellent | ✅ **Perfect responses** in any language |
**Result:** The LLM **CAN respond** in French/Spanish/etc., but will work with **lower-quality context** retrieved by English-only embeddings.
---
### Upgrading to Full Multilingual Support
**Recommended Configuration:**
```env
# In .env file
EMBEDDING_MODEL=BAAI/bge-m3
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
TRANSFORMERS_MODEL=Qwen/Qwen2.5-14B-Instruct
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
```
**Steps:**
1. Update `.env` with multilingual models
2. Re-ingest documents: `python rag/ingest.py` (required for embedding change)
3. Restart frontend/queries
**Benefits:**
- ✅ Excellent retrieval for questions in **any language**
- ✅ Accurate reranking regardless of language
- ✅ High-quality answers in **100+ languages**
**Trade-offs:**
- Slightly slower (BGE-m3 is ~2x slower than all-MiniLM-L6-v2)
- Larger model downloads (~3GB vs 90MB)
---
### Testing Multilingual Functionality
```powershell
# English
python rag/query.py "What are the latest V-PCC compression results?"
# French
python rag/query.py "Quels sont les derniers résultats de compression V-PCC ?"
# Spanish
python rag/query.py "¿Cuáles son los últimos resultados de compresión V-PCC?"
```
**Expected behavior:**
- ✅ LLM responds in the correct language (works with current setup)
- ⚠️ Answer quality may be lower for non-English with current English-only embeddings
- ✅ Full quality in all languages after upgrading to multilingual embeddings
Minh Hoang Anh TRAN
committed
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
---
## Chunking Configuration Guide
### What is Chunking?
Chunking splits large documents into smaller pieces for better retrieval and processing. **Chunk settings significantly impact answer quality!**
### Current Default Settings:
```env
CHUNK_SIZE=800 # ~150-200 words, 2-3 paragraphs
CHUNK_OVERLAP=100 # 12.5% overlap between chunks
```
### How Chunk Size Affects Quality:
| Chunk Size | Best For | Pros | Cons |
|------------|----------|------|------|
| **300-600** | FAQs, snippets, Q&A | Precise retrieval, fast | May fragment ideas |
| **800-1000** | General technical docs | Balanced context/precision | Good all-around |
| **1200-1500** | Dense specs, standards | Complete explanations | Slower retrieval |
| **1500-2000** | Research papers, articles | Preserves narrative | May dilute relevance |
### Why Overlap Matters:
**Without overlap (0):**
```
Chunk 1: "...the solution requires three steps. First,"
Chunk 2: "Second, process the data. Third, validate..."
```
❌ Retrieving Chunk 2 misses "First" step!
**With overlap (10-20%):**
```
Chunk 1: "...the solution requires three steps. First, initialize."
Chunk 2: "...three steps. First, initialize. Second, process..."
```
✅ Important information appears in multiple chunks!
### Recommended Settings by Document Type:
#### **Dense Technical Specifications** (MPEG, ISO, IEEE standards)
```env
CHUNK_SIZE=1200
CHUNK_OVERLAP=200
```
- **Why:** Technical specs need complete multi-paragraph explanations
- **Example:** Algorithm descriptions, performance tables, conformance requirements
- **Impact:** Better context for complex technical questions
#### **Short FAQs / Knowledge Base**
```env
CHUNK_SIZE=500
CHUNK_OVERLAP=75
```
- **Why:** Quick, focused answers without excess context
- **Example:** Troubleshooting guides, quick reference docs
- **Impact:** Faster, more precise retrieval
#### **Long-Form Articles / Research Papers**
```env
CHUNK_SIZE=1500
CHUNK_OVERLAP=300
```
- **Why:** Preserves argument flow and narrative structure
- **Example:** White papers, academic articles, detailed reports
- **Impact:** Maintains logical connections between ideas
#### **Mixed Document Collection** (Recommended)
```env
CHUNK_SIZE=1000
CHUNK_OVERLAP=150
```
- **Why:** Good balance for varied content types
- **Example:** Mix of specs, guides, and reports
- **Impact:** Versatile performance across document types
### How to Adjust Chunking:
1. **Edit `.env` file:**
```env
CHUNK_SIZE=1200
CHUNK_OVERLAP=200
```
2. **Re-ingest your documents:**
```powershell
python rag/ingest.py
```
3. **Test with same questions** to compare quality
### Chunk Size Impact on Your System:
| Setting | Total Chunks | Retrieval Speed | Context Quality |
|---------|--------------|-----------------|------------------|
| 500/75 | ~45,000 | Fastest | Fragmented |
| 800/100 | ~29,000 | Fast | Good |
| 1000/150 | ~23,000 | Medium | Better |
| 1500/300 | ~15,000 | Slower | Most Complete |
**Rule of Thumb:** Overlap should be 10-20% of chunk size for optimal results.