Paperclip: Orchestration for AI Pipelines

LangChain exists. LlamaIndex exists. A dozen other orchestration frameworks exist. So when Paperclip showed up on my radar, my first instinct was "do we need another one?"

After spending time with it, my answer is: it depends on what you're building.

What Paperclip Does

Paperclip is an open-source orchestration platform focused on building, running, and monitoring AI pipelines in production. The core concept is a pipeline — a directed graph of steps where each step is either an LLM call, a tool execution, or a data transformation.

You define pipelines in YAML or Python:

pipeline:
  name: research_pipeline
  steps:
    - id: search
      type: tool
      tool: web_search
      input: "{{ query }}"

    - id: summarize
      type: llm
      model: claude-opus-4-5
      depends_on: [search]
      prompt: |
        Summarize the following search results for: {{ query }}

        Results: {{ search.output }}

    - id: extract_facts
      type: llm
      model: claude-haiku-4-5
      depends_on: [summarize]
      prompt: "Extract key facts as a JSON list: {{ summarize.output }}"

The pipeline is a first-class object. You can version it, run it, inspect its execution history, and replay failed runs.

How It Compares to LangChain

LangChain is code-first. You import classes, compose chains, and wire things together in Python. This gives you full flexibility but means your pipeline logic lives in application code — hard to inspect, hard to modify without a deploy, hard to hand to a non-engineer.

Paperclip treats pipelines as configuration, not code. The pipeline definition is separate from the runtime. This makes a few things easier:

Visibility: You can see what a pipeline does without reading code. Product managers can read it. Ops can inspect it.

Modification without redeploy: Change a prompt in the pipeline definition, and the next run uses the new prompt. No code change, no deploy.

Retry and replay: Because the pipeline execution is tracked step-by-step, you can retry from any failed step with the same inputs. This is genuinely useful in production.

The trade-off is flexibility. LangChain can express almost anything because it's code. Paperclip has a defined set of step types — if your use case doesn't fit, you're fighting the framework.

When to Reach for Paperclip

Use Paperclip when:

Your pipeline needs to be visible to non-engineers
You want retry/replay semantics out of the box
Your pipelines are relatively linear (not deeply recursive multi-agent loops)
You care about observability and want execution traces without setting up OpenTelemetry yourself

Stick with LangChain (or just raw API calls) when:

Your pipeline logic is highly dynamic — the structure changes at runtime based on results
You need to express complex branching, recursion, or multi-agent patterns
You're prototyping and don't need production observability yet

The Monitoring Story

The part that genuinely surprised me was the monitoring. Paperclip ships with a dashboard that shows pipeline execution history, per-step latency, token costs, and failure rates. You get this for free — no Langsmith subscription, no custom instrumentation.

For small teams building production AI features, that's real value.

Bottom Line

Paperclip fills a gap between "raw API calls" and "full LangChain complexity." If you're building pipelines that need to run reliably in production and be visible to non-engineers, it's worth a look. If you're building deeply dynamic multi-agent systems, you'll probably need something lower-level.

The open-source model also matters — you own your pipeline definitions and your execution data.