A fully local Retrieval-Augmented Generation (RAG) assistant that answers questions about your internal documentation. Indexes Markdown files, stores embeddings in ChromaDB, and uses Ollama to generate answers — no external APIs, no cloud, no data leaving your machine.
Install these tools before running the project:
| Tool | Version | Install |
|---|---|---|
| Python | 3.11+ | python.org |
| uv | latest | astral.sh/uv |
| Ollama | latest | ollama.com |
| Docker | latest | docker.com |
| Git | latest | git-scm.com |
- Index — scans
docs/for.mdfiles, generates embeddings, stores in ChromaDB - Retrieve — embeds the user question, finds most similar docs in ChromaDB
- Generate — sends question + relevant docs to Ollama, returns grounded answer
Your Markdown docs
↓
sentence-transformers converts text to vectors (one time)
↓
ChromaDB stores vectors on disk
↓
User asks a question
↓
ChromaDB finds most relevant docs by semantic search
↓
Ollama (llama3.2) reads context and answers accurately
| Component | Tool | Purpose |
|---|---|---|
| Embedding model | all-MiniLM-L6-v2 | Convert text to vectors |
| Vector database | ChromaDB | Store and search vectors |
| LLM | Ollama / llama3.2 | Answer questions |
| Package manager | uv | Python dependency management |
| Language | Python 3.11 | Application runtime |
# 1. Clone the repo
git clone https://github.com/artisan22/rag-docs-assistant.git
cd rag-docs-assistant
# 2. Install Python dependencies
uv sync
# 3. Pull the Ollama model (one-time, ~2GB)
ollama pull llama3.2uv run main.pyFirst run — indexes all docs and saves to chroma_db/:
🤖 RAG Docs Assistant
========================================
Loading model...
Indexing documents...
✅ Documents indexed!
Ready for questions!
Ask a question (or 'quit' to exit): how do I rollback a deployment?
🤖 Answer: To rollback the deployment:
1. Run `docker compose down`
2. Run `git checkout previous-tag`
3. Run `docker compose up -d`
----------------------------------------
Ask a question (or 'quit' to exit): quit
👋 Goodbye!
Subsequent runs — reuses existing index, starts instantly:
✅ Using existing index (3 docs)
Ready for questions!
rag-docs-assistant/
├── src/
│ ├── indexer.py # load docs, generate embeddings, write to ChromaDB
│ └── searcher.py # embed query, retrieve relevant docs, call Ollama
├── docs/ # your Markdown documentation goes here
│ ├── deploy.md # deployment and rollback procedures
│ ├── database.md # database failover and restore procedures
│ └── incidents.md # memory and disk space incident response
├── chroma_db/ # auto-generated vector database (git-ignored)
├── main.py # entry point: model loading, indexing, Q&A loop
├── pyproject.toml # project config and dependencies
└── .env.example # example environment variables
- Add any
.mdfiles to thedocs/folder - Delete
chroma_db/to force re-indexing:
rm -rf chroma_db/- Run
uv run main.py— it will re-index automatically
Key values you may want to change:
| Setting | Location | Default |
|---|---|---|
| Embedding model | src/indexer.py and src/searcher.py |
all-MiniLM-L6-v2 |
| LLM model | src/searcher.py |
llama3.2 |
| Ollama URL | src/searcher.py |
http://localhost:11434 |
| Docs folder | main.py |
docs/ |
| ChromaDB path | main.py |
chroma_db/ |
MIT