# Qwen RAG (CPU-friendly)

This is a minimal local RAG setup using:
- Ollama for a local Qwen chat model
- LangChain + Chroma for retrieval
- FastEmbed embeddings (CPU-friendly, no PyTorch required)

Works on Windows without GPU. You can upgrade the model later (e.g., Qwen 8B) when you have GPU.

## Prerequisites

- Python 3.9+
- Ollama installed and running (local server at `http://localhost:11434`)

### Install Ollama on Windows
If you're not sure Ollama is installed:
1) Install via Winget (requires admin approval on first use):
```powershell
winget install Ollama.Ollama -e
```
2) Start the Ollama daemon (it usually runs as a Windows service):
```powershell
ollama --version
ollama serve
```
Leave it running in a terminal, or rely on the service.

### Pull a small Qwen model for CPU
For better CPU performance, start with a smaller instruct model:
```powershell
ollama pull qwen2.5:3b-instruct
```
You can switch to larger models later (e.g., `qwen2.5:7b-instruct` or `qwen3:8b`) once you have a GPU.

### Install Pandoc (required for ODT files)
If you plan to use `.odt` files, install Pandoc:
```powershell
winget install --id JohnMacFarlane.Pandoc -e --accept-source-agreements --accept-package-agreements
```

## Setup Python environment
From the repo root (`qwen/` folder):
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
```

**Note:** If you get an execution policy error when activating the venv, run:
```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```

## Configure
Copy `.env.example` to `.env` and adjust if needed:
- `OLLAMA_MODEL` – default is `qwen2.5:3b-instruct`
- `DOCS_DIR` – folder with your documents (default: `docs`)
- `CHROMA_DIR` – vector DB storage (default: `storage/chroma`)

```powershell
Copy-Item .env.example .env
```

## Ingest documents
Put `.md`, `.txt`, `.docx`, `.pptx`, or `.odt` files in the `docs/` folder, then run:
```powershell
python .\rag\ingest.py
```
This will build a Chroma vector store under `storage/chroma`.

## Ask questions (RAG)
```powershell
python .\rag\query.py "What does this project do?"
```
The script retrieves relevant chunks and asks the local Qwen model to answer using that context.

## Upgrading to Qwen 8B later
When you have a GPU, pull and use a larger model:
```powershell
ollama pull qwen3:8b
# then set in .env
OLLAMA_MODEL=qwen3:8b
```

## Notes
- Supports `.md`, `.txt`, `.docx`, `.pptx`, and `.odt` files
- For `.odt` files, Pandoc must be installed (see Prerequisites above)
- FastEmbed uses ONNX under the hood and is light-weight for CPU