Newer
Older
A Retrieval-Augmented Generation (RAG) system that lets you query your documents using Ollama and Qwen models locally.
Arthur Delarue
committed
---
Arthur Delarue
committed
Arthur Delarue
committed
**Windows (via Winget):**
Arthur Delarue
committed
Verify installation:
Arthur Delarue
committed
```
Ollama runs as a Windows service automatically. If not running:
```powershell
**macOS (via Homebrew):**
```bash
brew install ollama
Arthur Delarue
committed
```
Verify installation:
```bash
ollama --version
Arthur Delarue
committed
```
Arthur Delarue
committed
```
**macOS (Manual Download):**
Download from [https://ollama.ai/download](https://ollama.ai/download) and install the .dmg file.
Arthur Delarue
committed
Arthur Delarue
committed
**LLM Model (for answering queries):**
```bash
ollama pull qwen2.5:14b-instruct
Arthur Delarue
committed
```
**Embedding Model (for semantic search):**
```bash
ollama pull mxbai-embed-large
Arthur Delarue
committed
```
**Note for low-end computers:** The 14b model requires ~16GB RAM. If you have less RAM, use:
```bash
ollama pull qwen2.5:7b-instruct # Requires ~8GB RAM
Arthur Delarue
committed
```
### Step 3: Enable CPU-Only Mode (For Low-End Computers)
**If you have a low-end computer or insufficient GPU memory**, force Ollama to run on CPU only:
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '0', 'User')
$env:OLLAMA_NUM_GPU = '0'
**macOS/Linux:**
```bash
echo 'export OLLAMA_NUM_GPU=0' >> ~/.bashrc # or ~/.zshrc for zsh
source ~/.bashrc # or source ~/.zshrc
Restart your terminal after setting this. The model will run slower but work on any computer.
Arthur Delarue
committed
**To re-enable GPU later (if you upgrade hardware):**
Arthur Delarue
committed
```powershell
# Windows
[System.Environment]::SetEnvironmentVariable('OLLAMA_NUM_GPU', '1', 'User')
Arthur Delarue
committed
```
```bash
# macOS/Linux - remove the line from ~/.bashrc or ~/.zshrc
Arthur Delarue
committed
```
Arthur Delarue
committed
Only needed if you have OpenDocument (.odt) files:
Arthur Delarue
committed
Arthur Delarue
committed
```powershell
Arthur Delarue
committed
```
Arthur Delarue
committed
Arthur Delarue
committed
```
Arthur Delarue
committed
Arthur Delarue
committed
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
Arthur Delarue
committed
```
Arthur Delarue
committed
Arthur Delarue
committed
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Arthur Delarue
committed
```
**macOS/Linux:**
```bash
python3 -m venv .venv
source .venv/bin/activate
Arthur Delarue
committed
```
**Install dependencies (all platforms):**
```bash
pip install -r requirements.txt
Arthur Delarue
committed
```
Arthur Delarue
committed
Arthur Delarue
committed
Arthur Delarue
committed
```powershell
Arthur Delarue
committed
**macOS/Linux:**
```bash
cp .env.example .env
Arthur Delarue
committed
```
Arthur Delarue
committed
```env
Arthur Delarue
committed
OLLAMA_MODEL=qwen2.5:14b-instruct
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=mxbai-embed-large
# Retrieval Settings
RETRIEVAL_CHUNKS=100
TOP_N_RERANK=15
USE_RERANKING=true
# Document Processing
CHUNK_SIZE=800
CHUNK_OVERLAP=160
**Note:** If using 7b model on low-end computer, change to `OLLAMA_MODEL=qwen2.5:7b-instruct`
Arthur Delarue
committed
Place your documents (Word, PDF, PowerPoint, Text, Markdown, etc.) in the `docs/` folder.
Arthur Delarue
committed
Arthur Delarue
committed
Run the ingestion script to process your documents:
Arthur Delarue
committed
python rag\ingest.py
```
**macOS/Linux:**
```bash
python rag/ingest.py
This will:
- Extract ZIP files automatically
- Load and process all documents
- Generate embeddings
- Store vectors in the database
**macOS/Linux:**
```bash
python frontend/app.py
The server will start at: **http://127.0.0.1:8000**
Open this URL in your browser to start querying your documents!
You can also run queries directly from the command line:
**macOS/Linux:**
```bash
python rag/query.py "Your question here"
Arthur Delarue
committed
---
Arthur Delarue
committed
### CPU vs GPU Mode
- **GPU Mode (default):** Fast responses (1-2 seconds with 14b model)
- **CPU-Only Mode:** Slower responses (8-15 seconds with 14b model) but works on any computer
Arthur Delarue
committed
| RAM Available | Recommended Model | CPU Query Time | Quality |
|---------------|-------------------|----------------|----------|
| 8GB | qwen2.5:7b-instruct | 3-5 seconds | Good |
| 16GB+ | qwen2.5:14b-instruct | 8-15 seconds | Excellent |
| 32GB+ | qwen2.5:32b | 30-60 seconds | Best |
Arthur Delarue
committed
**Note:** These times are for CPU-only mode. GPU mode is 6-10x faster.