Skip to content
README.md 4.49 KiB
Newer Older
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# RAG System with Qwen
Nozomu05's avatar
Nozomu05 committed

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
A Retrieval-Augmented Generation (RAG) system that lets you query your documents using Ollama and Qwen models locally.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
## Installation and Setup
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 1: Install Ollama
Nozomu05's avatar
Nozomu05 committed
```powershell
winget install Ollama.Ollama -e
```
Nozomu05's avatar
Nozomu05 committed
```powershell
ollama --version
```

Ollama runs as a Windows service automatically. If not running:
```powershell
Nozomu05's avatar
Nozomu05 committed
ollama serve
```

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS (via Homebrew):**
```bash
brew install ollama
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Verify installation:
```bash
ollama --version
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Start Ollama service:
```bash
ollama serve
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS (Manual Download):**
Download from [https://ollama.ai/download](https://ollama.ai/download) and install the .dmg file.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 2: Pull Required Ollama Models
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**LLM Model (for answering queries):**
```bash
ollama pull qwen2.5:14b-instruct
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Embedding Model (for semantic search):**
```bash
ollama pull mxbai-embed-large
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note for low-end computers:** The 14b model requires ~16GB RAM. If you have less RAM, use:
```bash
ollama pull qwen2.5:7b-instruct  # Requires ~8GB RAM
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 3: Enable CPU-Only Mode (For Low-End Computers)
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**If you have a low-end computer or insufficient GPU memory**, force Ollama to run on CPU only:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '0', 'User')
$env:OLLAMA_NUM_GPU = '0'
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
echo 'export OLLAMA_NUM_GPU=0' >> ~/.bashrc  # or ~/.zshrc for zsh
source ~/.bashrc  # or source ~/.zshrc
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Restart your terminal after setting this. The model will run slower but work on any computer.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**To re-enable GPU later (if you upgrade hardware):**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Windows
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '1', 'User')
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
```bash
# macOS/Linux - remove the line from ~/.bashrc or ~/.zshrc
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 4: Install Pandoc (Optional)
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Only needed if you have OpenDocument (.odt) files:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
winget install --id JohnMacFarlane.Pandoc -e
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS:**
```bash
brew install pandoc
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 5: Setup Python Environment
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
python -m venv .venv
.\.venv\Scripts\Activate.ps1
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note:** If you get an execution policy error:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python3 -m venv .venv
source .venv/bin/activate
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Install dependencies (all platforms):**
```bash
pip install -r requirements.txt
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 6: Configure Environment
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Copy-Item .env.example .env
```
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
cp .env.example .env
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Edit `.env` with your settings:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Model Configuration
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=mxbai-embed-large
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Retrieval Settings
RETRIEVAL_CHUNKS=100
TOP_N_RERANK=15
USE_RERANKING=true
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
# Document Processing
CHUNK_SIZE=800
CHUNK_OVERLAP=160
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note:** If using 7b model on low-end computer, change to `OLLAMA_MODEL=qwen2.5:7b-instruct`
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 7: Add Your Documents
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Place your documents (Word, PDF, PowerPoint, Text, Markdown, etc.) in the `docs/` folder.
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 8: Ingest Documents
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Run the ingestion script to process your documents:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python rag/ingest.py
Nozomu05's avatar
Nozomu05 committed
```

Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
This will:
- Extract ZIP files automatically
- Load and process all documents
- Generate embeddings
- Store vectors in the database
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Step 9: Start the Frontend
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Start the web interface:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
python frontend\app.py
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python frontend/app.py
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
The server will start at: **http://127.0.0.1:8000**
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
Open this URL in your browser to start querying your documents!
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
---
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
## Command-Line Query (Optional)
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
You can also run queries directly from the command line:
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Windows:**
```powershell
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
python rag\query.py "Your question here"
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**macOS/Linux:**
```bash
python rag/query.py "Your question here"
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
## Performance Notes
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### CPU vs GPU Mode
- **GPU Mode (default):** Fast responses (1-2 seconds with 14b model)
- **CPU-Only Mode:** Slower responses (8-15 seconds with 14b model) but works on any computer
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
### Model Recommendations by Hardware
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
| RAM Available | Recommended Model | CPU Query Time | Quality |
|---------------|-------------------|----------------|----------|
| 8GB | qwen2.5:7b-instruct | 3-5 seconds | Good |
| 16GB+ | qwen2.5:14b-instruct | 8-15 seconds | Excellent |
| 32GB+ | qwen2.5:32b | 30-60 seconds | Best |
Minh Hoang Anh TRAN's avatar
Minh Hoang Anh TRAN committed
**Note:** These times are for CPU-only mode. GPU mode is 6-10x faster.