Autoresearch: AI Agents Doing Your Research

Research is one of those tasks that sounds simple and is actually brutal to automate. It's not just "find relevant information" — it's iterative. You find something, realize you need to understand something else first, find that, update your understanding, and repeat. It's a loop, not a pipeline.

Autoresearch is a tool that runs that loop autonomously with AI agents. After using it for a few weeks, here's my honest take.

What It Does

Autoresearch takes a research question and runs a multi-step agent loop:

Breaks the question into sub-questions
Searches the web for each sub-question
Reads and extracts relevant content
Identifies gaps in the current understanding
Generates follow-up questions for those gaps
Repeats until it has enough to synthesize an answer

The output is a structured report with citations, not just a chat response.

autoresearch run \
  "What are the main architectural patterns for multi-agent AI systems?" \
  --depth 3 \
  --format markdown \
  --output report.md

--depth 3 means it will recurse three levels deep on follow-up questions. Depth 1 is fast (2-3 minutes), depth 3 can take 15-20 minutes but produces substantially better output.

The Iteration That Matters

The part that makes Autoresearch different from a single AI call isn't the search — it's the gap detection. After each round of research, the agent explicitly asks: "What don't I understand yet that I need to understand?"

This catches the classic research failure mode where you answer the question you asked rather than the question you meant. If your initial question was "how do vector databases work," a single search gives you an overview. Autoresearch keeps going — it finds the overview, identifies that you'd need to understand HNSW indexing to really get it, goes and reads about HNSW, identifies that you'd need to understand approximate nearest neighbor search, and so on.

It's doing what a good researcher does: following the thread until you actually understand the thing.

What It's Useful For

Best use cases I've found:

Technical due diligence — "What are the known failure modes of this library?" turns into a thorough report that would take me 2 hours to write manually.
Competitor research — Structured reports on what a competitor has shipped and how they talk about it.
Prepping for unfamiliar domains — When I have to work in a codebase or technology I don't know well, Autoresearch builds me a quick primer.
Literature review — For any research-adjacent topic, it surfaces the main papers and key arguments.

It's not great for:

Anything requiring access to internal/private data (it's web-only by default)
Real-time information (there's a knowledge lag from whatever the search results contain)
Highly nuanced topics where the answer depends heavily on context you haven't provided

What This Means for Products

The interesting thing for developers building knowledge-heavy products is that Autoresearch is essentially a template for a class of agent architecture.

If your product is in a domain where understanding evolves — legal, medical, technical documentation, competitive intelligence — this iterative research loop is a pattern worth building. The core components are:

A question decomposer
A web (or internal) search tool
A content extractor
A gap identifier that generates follow-up questions
A synthesizer that writes the final output

Each of those is a prompt or tool call. The architecture is straightforward once you see it.

Honest Limitations

Citations are inconsistent — sometimes the final report omits sources for claims. This matters a lot if you're using it for anything that will be published or shared.

The depth setting has diminishing returns. Depth 1 gets you 80% of the way there. Depth 3 adds real value but also adds a lot of tangential material you'll need to edit out.

Run time and cost scale with depth. At depth 3, you're making a lot of API calls. Budget accordingly.

For exploratory research where you want a solid first draft, it's genuinely excellent. For anything requiring primary source accuracy, treat it as a starting point, not an endpoint.