AI in the Developer Workflow: What Sticks

There's a gap between "tried this AI tool" and "actually integrated it into how I work." A year ago I ran an experiment: adopt every AI tool that seemed useful, use it for two weeks, see what survived.

Here's the honest accounting.

What Stayed

Claude Code for everything code-adjacent. This one I didn't expect to use as heavily as I do. The things I use it for daily: writing boilerplate I'd otherwise look up, understanding unfamiliar code, generating test cases, explaining error messages, reviewing diffs before committing. None of these are glamorous but they compound.

Fabric for text processing. When I have text that needs doing something to it — summarize, extract, reformat — fabric --pattern extract_wisdom < file.txt is faster than opening a chat window. The patterns are reusable and the CLI composability means I can pipe it into other things.

Local models for sensitive contexts. Anything involving client code, internal systems, or personal data runs through Ollama locally. Not because I'm paranoid, but because it's a reasonable default once you have local models running anyway.

AI for commit messages. This is embarrassing in a good way — I resisted this for a long time because "I can write my own commit messages" and then tried it and realized my commit messages were worse than what the AI generated with context about the diff. Now I have a hook that generates a message and I edit it.

Documentation first drafts. I write the key points, Claude writes the documentation, I edit it. This is genuinely better than writing docs from scratch because the AI produces the boilerplate structure and I focus on the parts where I have something to say.

What Got Dropped

AI code review on every PR. The signal-to-noise was too low. It would find real issues sometimes and generate confident-sounding non-issues just as often. I switched to running AI review on demand for specific areas I want scrutinized, not as a default gate.

AI-generated tests for existing code. The tests it generates for existing code are often testing the wrong things — they test that the code does what it does, not that the code does what it should. Writing tests for new code that hasn't been implemented is better; it catches design issues before they're frozen.

Chat for technical research. I tried using Claude as a research tool — "explain how Raft consensus works" — and the answers are good but I was getting more from reading actual papers and blog posts with real links I could follow. The chat format doesn't let me follow the trail. I use Perplexity or just search for this now.

AI-generated UI from scratch. The output wasn't bad but the iteration cycle to get something I actually wanted to ship was longer than just building it. I've found AI UI generation useful for starting components — get a rough scaffold, then edit heavily — but not for complete features.

The Pattern I Noticed

What stayed is what made existing work faster, not what replaced existing work. Faster boilerplate, faster docs, faster commit messages, faster understanding of unfamiliar code. The tool is doing the mechanical parts and I'm doing the judgment parts.

What got dropped is what tried to replace judgment calls — code review, test design, UI design decisions. These require understanding what "good" means in a specific context, and the AI's definition of good is generic.

The Honest Caveat

My workflow is solo consulting on custom internal tools. If I were working on a larger team, with stricter code review standards, or on consumer-facing products — the calculus would be different. Context determines what's useful.

The meta-lesson: run experiments with specific tools for specific tasks, track honestly whether they helped, cut the ones that didn't. Don't adopt AI tools because everyone says you should — adopt them when the specific benefit in your specific workflow is clear.