Claude Code is excellent. It's also not free, and every prompt goes through Anthropic's servers. For some workflows — especially anything touching proprietary code or sensitive systems — that matters.
Here's how to run a Claude Code-equivalent setup locally, for free, with your code staying on your machine.
What "Local Claude Code" Actually Means
Let's be precise: you cannot run Claude itself locally. The actual Claude models are not open-source and not publicly available for self-hosting.
What you can run locally is an equivalent setup: an AI coding assistant that uses an open-weight model running on your machine, with a CLI interface similar to Claude Code. The key tools are:
- Ollama — runs open-weight models locally
- Continue.dev or Aider — CLI/IDE coding assistants
- Qwen2.5-Coder or Deepseek-Coder — the best open-weight coding models
Setting Up the Stack
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a coding model
# Qwen2.5-Coder-32B is the strongest option for coding tasks
ollama pull qwen2.5-coder:32b
# Or a smaller model if you're on limited hardware
ollama pull qwen2.5-coder:7b
# Install Aider (Claude Code equivalent for open models)
pip install aider-chat
Run Aider against your local Ollama instance:
aider --model ollama/qwen2.5-coder:32b --no-auto-commits
You now have a coding assistant with full repo context, file editing, git integration — running 100% locally.
The Real Trade-offs
This is where I want to be honest rather than hype the local setup.
Quality gap: Qwen2.5-Coder-32B is impressive for an open-weight model. It's not as good as Claude Opus or Sonnet on complex, multi-file refactors. For simple tasks and single-file edits, the gap is smaller. For architectural reasoning across a large codebase, the gap is noticeable.
Hardware requirements: The 32B model requires about 20GB of RAM/VRAM to run in 4-bit quantization. On an M2 MacBook Pro with 32GB, it runs comfortably. On a machine with 16GB, you're looking at the 7B model, which is meaningfully weaker.
Speed: Local inference on CPU is slow. On Apple Silicon with Metal acceleration, the 32B model runs at about 10-15 tokens/second. That's usable but slower than the API.
When Local Makes Sense
Local is genuinely the right choice when:
Privacy is a hard requirement — If your company policy prohibits sending code to third-party APIs, local is your only option. This applies to defense contractors, financial institutions, and many enterprise environments.
You're iterating on something highly sensitive — A new product feature you haven't announced, research you haven't published, security-related code.
You want to run experiments at scale — If you're batch-processing code analysis across thousands of files, API costs add up. Local is free after the hardware.
You're learning and experimenting — The API costs nothing when there are no API calls.
A Hybrid Approach
What I actually do: use Aider with local models for low-stakes tasks and exploratory work, switch to Claude Code (actual API) for complex tasks that need the model quality.
# Quick edits, refactors, test writing → local
aider --model ollama/qwen2.5-coder:32b file.py
# Complex architecture work → API
claude # Claude Code with real Anthropic API
The context-switching is minor. You get privacy for the stuff that needs it and quality for the stuff that needs that.
The Getting Started Shortcut
If you just want to try this quickly without reading the full Aider docs:
ollama pull qwen2.5-coder:7b
pip install aider-chat
cd your-project
aider --model ollama/qwen2.5-coder:7b src/main.py
You'll have a working local coding assistant in about 10 minutes (plus model download time). From there you can tune the model size and settings for your hardware.