# RAG System with Qwen

A Retrieval-Augmented Generation (RAG) system that lets you query your documents using Ollama and Qwen models locally.

---

## Installation and Setup

### Step 1: Install Ollama

**Windows (via Winget):**
```powershell
winget install Ollama.Ollama -e
```

Verify installation:
```powershell
ollama --version
```

Ollama runs as a Windows service automatically. If not running:
```powershell
ollama serve
```

**macOS (via Homebrew):**
```bash
brew install ollama
```

Verify installation:
```bash
ollama --version
```

Start Ollama service:
```bash
ollama serve
```

**macOS (Manual Download):**
Download from [https://ollama.ai/download](https://ollama.ai/download) and install the .dmg file.

### Step 2: Pull Required Ollama Models

**LLM Model (for answering queries):**
```bash
ollama pull qwen2.5:14b-instruct
```

**Embedding Model (for semantic search):**
```bash
ollama pull mxbai-embed-large
```

**Note for low-end computers:** The 14b model requires ~16GB RAM. If you have less RAM, use:
```bash
ollama pull qwen2.5:7b-instruct  # Requires ~8GB RAM
```

### Step 3: Enable CPU-Only Mode (For Low-End Computers)

**If you have a low-end computer or insufficient GPU memory**, force Ollama to run on CPU only:

**Windows:**
```powershell
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '0', 'User')
$env:OLLAMA_NUM_GPU = '0'
```

**macOS/Linux:**
```bash
echo 'export OLLAMA_NUM_GPU=0' >> ~/.bashrc  # or ~/.zshrc for zsh
source ~/.bashrc  # or source ~/.zshrc
```

Restart your terminal after setting this. The model will run slower but work on any computer.

**To re-enable GPU later (if you upgrade hardware):**
```powershell
# Windows
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '1', 'User')
```
```bash
# macOS/Linux - remove the line from ~/.bashrc or ~/.zshrc
```

### Step 4: Install Pandoc (Optional)

Only needed if you have OpenDocument (.odt) files:

**Windows:**
```powershell
winget install --id JohnMacFarlane.Pandoc -e
```

**macOS:**
```bash
brew install pandoc
```

### Step 5: Setup Python Environment

**Windows:**
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
```

**Note:** If you get an execution policy error:
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```

**macOS/Linux:**
```bash
python3 -m venv .venv
source .venv/bin/activate
```

**Install dependencies (all platforms):**
```bash
pip install -r requirements.txt
```

### Step 6: Configure Environment

**Windows:**
```powershell
Copy-Item .env.example .env
```

**macOS/Linux:**
```bash
cp .env.example .env
```

Edit `.env` with your settings:
```env
# Model Configuration
OLLAMA_MODEL=qwen2.5:14b-instruct
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=mxbai-embed-large

# Retrieval Settings
RETRIEVAL_CHUNKS=100
TOP_N_RERANK=15
USE_RERANKING=true

# Document Processing
CHUNK_SIZE=800
CHUNK_OVERLAP=160
```

**Note:** If using 7b model on low-end computer, change to `OLLAMA_MODEL=qwen2.5:7b-instruct`

### Step 7: Add Your Documents

Place your documents (Word, PDF, PowerPoint, Text, Markdown, etc.) in the `docs/` folder.

### Step 8: Ingest Documents

Run the ingestion script to process your documents:

**Windows:**
```powershell
python rag\ingest.py
```

**macOS/Linux:**
```bash
python rag/ingest.py
```

This will:
- Extract ZIP files automatically
- Load and process all documents
- Generate embeddings
- Store vectors in the database

### Step 9: Start the Frontend

Start the web interface:

**Windows:**
```powershell
python frontend\app.py
```

**macOS/Linux:**
```bash
python frontend/app.py
```

The server will start at: **http://127.0.0.1:8000**

Open this URL in your browser to start querying your documents!

---

## Command-Line Query (Optional)

You can also run queries directly from the command line:

**Windows:**
```powershell
python rag\query.py "Your question here"
```

**macOS/Linux:**
```bash
python rag/query.py "Your question here"
```

---

## Performance Notes

### CPU vs GPU Mode
- **GPU Mode (default):** Fast responses (1-2 seconds with 14b model)
- **CPU-Only Mode:** Slower responses (8-15 seconds with 14b model) but works on any computer

### Model Recommendations by Hardware

| RAM Available | Recommended Model | CPU Query Time | Quality |
|---------------|-------------------|----------------|----------|
| 8GB | qwen2.5:7b-instruct | 3-5 seconds | Good |
| 16GB+ | qwen2.5:14b-instruct | 8-15 seconds | Excellent |
| 32GB+ | qwen2.5:32b | 30-60 seconds | Best |

**Note:** These times are for CPU-only mode. GPU mode is 6-10x faster.