Swiftbeard

GitHub Repos Every AI Developer Should Know

Seven GitHub repos that are genuinely useful for developers building AI-powered products — not just stars, but tools you'll actually use.

githubopen-sourceaideveloper-tools

GitHub is full of AI repos collecting stars from people who will never actually use them. This list is different — these are repos I've actually shipped code with or learned something concrete from.

1. BerriAI/litellm

LiteLLM gives you a unified interface to 100+ LLM providers. One SDK, one API contract, swap models without changing code.

from litellm import completion

# Works for OpenAI, Anthropic, Cohere, local models, all of them
response = completion(model="anthropic/claude-opus-4-5", messages=[...])
response = completion(model="ollama/llama3", messages=[...])

The killer feature for production: a proxy server that lets you add rate limiting, load balancing, and cost tracking without touching application code.

2. chroma-core/chroma

The simplest vector database to get started with. Runs in-process (no server required), scales to a hosted version when you need it, and has a clean Python API.

import chromadb

client = chromadb.Client()
collection = client.create_collection("my-docs")
collection.add(documents=["doc 1", "doc 2"], ids=["1", "2"])
results = collection.query(query_texts=["relevant question"], n_results=2)

For RAG prototypes and small to medium production use cases, Chroma is often the right choice. You can graduate to Qdrant or Pinecone later without rearchitecting.

3. simonw/llm

Simon Willison's llm CLI tool. Run LLMs from the command line, log every prompt and response to SQLite, chain operations with pipes.

llm "Summarize this" < article.txt
cat code.py | llm "What does this do?"
llm logs list  # See everything you've ever run

The logging alone is worth installing. Every prompt stored locally — useful for debugging, auditing, and cost estimation.

4. microsoft/promptflow

Promptflow is a framework for building, testing, and evaluating LLM applications. The useful parts are the evaluation tools — you can define evaluation metrics and run them against your prompts systematically.

Less useful for simple apps. Very useful if you're iterating on prompt quality and want a structured way to measure whether changes are improvements.

5. guidance-ai/guidance

Guidance gives you structural control over LLM output — constrained generation, guaranteed JSON schemas, conditional logic in prompts.

from guidance import models, gen, select

llm = models.Anthropic("claude-haiku-4-5")
with llm:
    lm = llm + "Is this review positive or negative? " + select(["positive", "negative"])

When you need deterministic output structure and regular prompting isn't reliable enough, Guidance is the tool.

6. run-llama/llama_index

LlamaIndex is mature, well-documented, and has excellent support for the retrieval patterns that actually matter in production. The data connectors alone — PDF, Notion, Slack, Google Docs, 100+ others — save significant time.

Use it when your RAG pipeline pulls from diverse sources and you don't want to write 15 custom parsers.

7. instructor-ai/instructor

Instructor patches the Anthropic and OpenAI SDKs to reliably return structured Pydantic models from LLM calls. No more writing JSON validation code for model outputs.

import instructor
from anthropic import Anthropic
from pydantic import BaseModel

client = instructor.from_anthropic(Anthropic())

class UserProfile(BaseModel):
    name: str
    age: int
    skills: list[str]

user = client.messages.create(
    model="claude-haiku-4-5",
    response_model=UserProfile,
    messages=[{"role": "user", "content": "Extract: John, 28, knows Python and Rust"}],
)
# user is a UserProfile, not a string

Instructor handles retries, validation errors, and re-prompting automatically. If you're parsing structured data from LLMs, you need this.

The Meta-Pattern

What these repos have in common: they wrap LLMs at the right abstraction level. They don't try to make LLMs do things they're bad at — they handle the infrastructure so you focus on what matters.

Star them all, but actually use them. Most AI developer productivity comes from good tooling, not better prompting.