I take notes constantly. I save articles. I bookmark papers. I paste things into Notion. After years of this, I had a large, disorganized archive of things I thought were worth keeping — and almost no ability to retrieve them.
The solution I actually shipped is a local RAG system over my personal knowledge. Here's exactly how it works.
The Problem with Search
Full-text search is inadequate for personal knowledge retrieval. My notes don't always use the exact words I'm searching for later. Something I saved about "LLM context limits" might be indexed under "transformer attention span" or "token window" in my own writing.
Semantic search — where you search by meaning rather than keywords — solves this. Embeddings capture the meaning of text in a vector space where "context limits" and "token window" are close together because they mean similar things.
The Architecture
Ingest pipeline → Embedding → ChromaDB
↓
Notes, articles, bookmarks, PDFs
Query pipeline → Embed query → Similarity search → LLM synthesis → Answer
Nothing fancy. Three components: ingestion, storage, retrieval.
Building It
Step 1: Ingestion
I pull from three sources: markdown notes, web bookmarks (exported as HTML), and saved PDFs.
import ollama
import chromadb
from pathlib import Path
import hashlib
client = chromadb.PersistentClient(path=str(Path.home() / ".kb"))
collection = client.get_or_create_collection("personal-kb")
def chunk_text(text: str, chunk_size: int = 400, overlap: int = 50) -> list[str]:
"""Split text into overlapping chunks."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
def ingest_file(filepath: Path, source_type: str = "note"):
text = filepath.read_text(encoding="utf-8", errors="ignore")
chunks = chunk_text(text)
for i, chunk in enumerate(chunks):
doc_id = hashlib.md5(f"{filepath}-{i}".encode()).hexdigest()
# Skip if already indexed
existing = collection.get(ids=[doc_id])
if existing["ids"]:
continue
embedding = ollama.embeddings(
model="nomic-embed-text",
prompt=chunk
)["embedding"]
collection.add(
documents=[chunk],
embeddings=[embedding],
ids=[doc_id],
metadatas=[{"source": str(filepath), "type": source_type, "chunk": i}]
)
print(f"Indexed {len(chunks)} chunks from {filepath.name}")
Step 2: Index Your Notes
notes_dir = Path.home() / "Notes"
for md_file in notes_dir.rglob("*.md"):
ingest_file(md_file, "note")
Run this once to build the initial index. Run it again periodically to pick up new notes — the ID check means you don't re-embed things you've already processed.
Step 3: Query
def ask_kb(question: str, n_results: int = 5) -> str:
query_embedding = ollama.embeddings(
model="nomic-embed-text",
prompt=question
)["embedding"]
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results
)
if not results["documents"][0]:
return "No relevant notes found."
context_parts = []
for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
source = Path(meta["source"]).stem
context_parts.append(f"[From: {source}]\n{doc}")
context = "\n\n---\n\n".join(context_parts)
response = ollama.chat(
model="llama3.2",
messages=[{
"role": "user",
"content": f"""Answer the following question using only the provided notes.
If the notes don't contain enough information, say so.
Notes:
{context}
Question: {question}"""
}]
)
return response["message"]["content"]
Step 4: Make It Usable
A CLI makes this actually usable:
# kb.py
import sys
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python kb.py 'your question here'")
sys.exit(1)
question = " ".join(sys.argv[1:])
answer = ask_kb(question)
print(answer)
alias kb="python ~/scripts/kb.py"
kb "What did I write about distributed consensus?"
kb "Notes on RAG chunking strategies"
What I Learned Building This
Chunk size matters a lot. Too small (100 words) and each chunk lacks context. Too large (1000 words) and the retrieval is less precise — you match the right document but include a lot of irrelevant text in the context. I landed on 400 words with 50-word overlap.
Source metadata is essential. Knowing which note a result came from lets you validate and follow up. Without it, you have answers with no provenance.
The model for answering matters less than the retrieval. Getting the right chunks back is 80% of the problem. The LLM synthesis from good chunks is almost always decent. The LLM synthesis from bad chunks is always bad, no matter how good the model.
Re-index regularly. Your notes are only useful if the index is current. I run the ingestion script via a weekly cron job.
The whole thing runs locally, costs nothing per query, and has become one of my most-used personal tools.