Swiftbeard

Building a Personal Knowledge Base with AI

Using local embeddings, vector search, and RAG to build a searchable personal knowledge base from your notes, bookmarks, and articles.

raglocal-aiknowledge-managementproductivity

I take notes constantly. I save articles. I bookmark papers. I paste things into Notion. After years of this, I had a large, disorganized archive of things I thought were worth keeping — and almost no ability to retrieve them.

The solution I actually shipped is a local RAG system over my personal knowledge. Here's exactly how it works.

Full-text search is inadequate for personal knowledge retrieval. My notes don't always use the exact words I'm searching for later. Something I saved about "LLM context limits" might be indexed under "transformer attention span" or "token window" in my own writing.

Semantic search — where you search by meaning rather than keywords — solves this. Embeddings capture the meaning of text in a vector space where "context limits" and "token window" are close together because they mean similar things.

The Architecture

Ingest pipeline → Embedding → ChromaDB
       ↓
   Notes, articles, bookmarks, PDFs

Query pipeline → Embed query → Similarity search → LLM synthesis → Answer

Nothing fancy. Three components: ingestion, storage, retrieval.

Building It

Step 1: Ingestion

I pull from three sources: markdown notes, web bookmarks (exported as HTML), and saved PDFs.

import ollama
import chromadb
from pathlib import Path
import hashlib

client = chromadb.PersistentClient(path=str(Path.home() / ".kb"))
collection = client.get_or_create_collection("personal-kb")

def chunk_text(text: str, chunk_size: int = 400, overlap: int = 50) -> list[str]:
    """Split text into overlapping chunks."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

def ingest_file(filepath: Path, source_type: str = "note"):
    text = filepath.read_text(encoding="utf-8", errors="ignore")
    chunks = chunk_text(text)

    for i, chunk in enumerate(chunks):
        doc_id = hashlib.md5(f"{filepath}-{i}".encode()).hexdigest()

        # Skip if already indexed
        existing = collection.get(ids=[doc_id])
        if existing["ids"]:
            continue

        embedding = ollama.embeddings(
            model="nomic-embed-text",
            prompt=chunk
        )["embedding"]

        collection.add(
            documents=[chunk],
            embeddings=[embedding],
            ids=[doc_id],
            metadatas=[{"source": str(filepath), "type": source_type, "chunk": i}]
        )
    print(f"Indexed {len(chunks)} chunks from {filepath.name}")

Step 2: Index Your Notes

notes_dir = Path.home() / "Notes"
for md_file in notes_dir.rglob("*.md"):
    ingest_file(md_file, "note")

Run this once to build the initial index. Run it again periodically to pick up new notes — the ID check means you don't re-embed things you've already processed.

Step 3: Query

def ask_kb(question: str, n_results: int = 5) -> str:
    query_embedding = ollama.embeddings(
        model="nomic-embed-text",
        prompt=question
    )["embedding"]

    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )

    if not results["documents"][0]:
        return "No relevant notes found."

    context_parts = []
    for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
        source = Path(meta["source"]).stem
        context_parts.append(f"[From: {source}]\n{doc}")

    context = "\n\n---\n\n".join(context_parts)

    response = ollama.chat(
        model="llama3.2",
        messages=[{
            "role": "user",
            "content": f"""Answer the following question using only the provided notes.
If the notes don't contain enough information, say so.

Notes:
{context}

Question: {question}"""
        }]
    )
    return response["message"]["content"]

Step 4: Make It Usable

A CLI makes this actually usable:

# kb.py
import sys

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python kb.py 'your question here'")
        sys.exit(1)

    question = " ".join(sys.argv[1:])
    answer = ask_kb(question)
    print(answer)
alias kb="python ~/scripts/kb.py"
kb "What did I write about distributed consensus?"
kb "Notes on RAG chunking strategies"

What I Learned Building This

Chunk size matters a lot. Too small (100 words) and each chunk lacks context. Too large (1000 words) and the retrieval is less precise — you match the right document but include a lot of irrelevant text in the context. I landed on 400 words with 50-word overlap.

Source metadata is essential. Knowing which note a result came from lets you validate and follow up. Without it, you have answers with no provenance.

The model for answering matters less than the retrieval. Getting the right chunks back is 80% of the problem. The LLM synthesis from good chunks is almost always decent. The LLM synthesis from bad chunks is always bad, no matter how good the model.

Re-index regularly. Your notes are only useful if the index is current. I run the ingestion script via a weekly cron job.

The whole thing runs locally, costs nothing per query, and has become one of my most-used personal tools.