AI & Delivery

Introducing Haven: An AI-Native Backlog Manager

Haven is a local-first backlog manager for projects that span humans, AI agents, and real life. It keeps product intent, readiness, evidence, ownership, and delivery state in one graph agents can actually use.

Introducing Haven: An AI-Native Backlog Manager

Haven is a local-first backlog manager for work that spans humans, AI agents, and real life.

That sounds like a project-management tool, but that is not quite how I think about it. Most project-management and issue-tracking tools are good at managing work once it has already become work: a ticket, an issue, a sprint item, a task assigned to someone.

Haven is aimed slightly earlier and slightly lower. It is for managing the backlog itself: the fuzzy ideas, half-formed bets, dependencies, evidence, decisions, specs, readiness, handoffs, and completion evidence that turn intent into something worth building.

A normal TODO list is too flat. A chat transcript is too ephemeral. A repo-local PLAN.md is useful until the work crosses repositories, gets handed to another agent, or turns into a decision someone has to make next week.

Haven is the durable backlog graph underneath that mess.

It keeps track of what needs doing, who owns it, what is blocked, what changed shape, what is ready to pick up next, and what evidence proves something is actually done. You can use it directly from the haven CLI, or talk to it through an AI agent via MCP in tools like Codex and Claude.

That last part is the important bit. I do not want every agent session to infer the state of a project from old chat, stale notes, and whatever happens to be open in the editor. I want to ask:

What is ready for you to work on?

And I want the answer to come from a real backlog, not vibes.

The missing tool is the backlog

A lot of AI tooling is focused on the software development lifecycle: generate the plan, write the code, run the tests, open the PR, review the diff. That makes sense. It is where the first obvious productivity gains showed up.

But if implementation gets much faster, the scarce work moves upstream.

The hard part becomes deciding what should be built, why it matters, what evidence supports it, what is ready, what is still fuzzy, what depends on what, and what would count as done. That is backlog management. It is a product-management job, not just a software-delivery job.

Most existing tools do not feel shaped around that problem. Jira, Linear, GitHub Issues, Asana, and similar tools are useful once an organisation needs shared execution, reporting, permissions, and team coordination. They are not primarily built as AI-readable backlog substrates for one person and their agents to continuously shape intent into ready work.

That is the gap Haven is trying to fill.

Backlog management is not lifecycle management

This distinction is easy to blur because lots of tools have something called a backlog.

But a backlog view is not the same thing as backlog management. A lot of the current AI tooling starts after the product work has already happened: a PRD exists, a feature has been chosen, an issue has been written, or a developer has asked an agent to implement a task.

Those tools can be very useful. Turning a PRD into tasks is useful. Running an agent from an issue is useful. Using Linear, Jira, or GitHub Issues to coordinate team delivery is useful.

It is just a different layer.

The backlog layer asks earlier questions:

  1. Why does this idea exist?
  2. What evidence supports it?
  3. Is it one thing, or several?
  4. What is it blocked by?
  5. What decision would make it ready?
  6. What would count as done?
  7. Who or what owns the next move?

That is the product-manager-shaped work that gets more important when implementation gets faster. If the delivery system can burn through ready work quickly, the limiting factor becomes whether the backlog contains enough work that is actually ready.

Haven is not trying to replace the tools that execute work downstream. It is trying to make the upstream backlog durable enough that humans and agents can work from it.

Why I built it

Haven is not my first attempt at this problem.

For a while I had local harnesses that generated spec files, backlog files, and JSON files. That sort of worked. It was much better than asking an agent to reconstruct the state of a project from chat history.

But the files kept creating awkward questions.

Where should they live? If they sit inside the repository, they are easy for the agent to find, but now I have planning material, half-formed product thinking, and AI working notes sitting beside code. That is uncomfortable when the repo is public, and noisy even when it is private.

If they sit outside the repo, they are safer, but now they are harder for tools to discover. They drift from the work. They are not naturally queryable. They are difficult to use from a remote agent or a different machine. If there is a generated JSON file and a Markdown file and a hand-edited note, sooner or later one of them becomes stale.

I also tried the more product-shaped version of this idea. One archived prototype was a backlog manager with a database, REST API, evidence links, evolution history, ticket rendering, and delivery-tool links. Another version pulled it back toward a local CLI: a SQLite store, JSON output for agents, and a Claude skill that could add, split, groom, and query items conversationally.

Those attempts were useful because they taught me the boundary.

I did not want a full project-management product first. I wanted the durable backlog underneath my own work: local-first, structured enough for agents, not trapped inside a repo, and still made of plain files where the thinking needs to be plain files.

That became Haven.

Why a backlog graph?

Real projects are not flat lists.

Ideas arrive half-formed. Some get parked. Some become commitments. Some depend on other work. Some are waiting on a person, a decision, an external account, or a thing happening in the real world. Some split into smaller items. Some merge with older ideas. Some are superseded by a better approach.

Most lightweight tools lose that shape. You end up with a list of tasks, plus the real structure hidden in prose, memory, comments, or chat.

Haven makes the structure explicit:

  1. Items hold the work.
  2. Dependencies say what blocks what.
  3. Grouping says what ships or gets built together.
  4. Lineage records what an old item became.
  5. Ownership says whether a human or AI currently has the baton.
  6. Artifacts attach specs, research, decisions, handoff notes, and delivery evidence.

That makes the backlog queryable. An agent can ask for the next AI-owned item. A human can ask why nothing is ready. A completion can report what it just unblocked. A stale idea can resolve forward to the item that replaced it.

This is the part I think most tools miss. The backlog is not just a queue of tickets. It is where product thinking changes shape.

Graph of Haven backlog items showing parked ideas, committed work, dependencies, blocked items, AI-owned ready work, handoffs, lineage, and completed items with evidence.

Talking to Haven through AI

Haven is a single Rust binary with a local SQLite store, a CLI, and a stdio MCP server. The CLI and MCP server use the same store, so the human and the agent are looking at the same project state.

From the shell, that looks like this:

haven item add "Rate-limit the public search endpoint"
haven item update HV-12 --status ready \
  --done-looks-like "p95 verify under replay load"
haven item commit HV-12 --priority 1
haven next --explain
haven item handoff HV-12 --to ai --note "Spec attached"
haven item complete HV-12 --evidence "cargo test: 92 green; PR #14"

Through an AI agent, it is more conversational:

Capture this as a parked idea.

What is ready for you to build next?

Why is the launch work blocked?

Groom this item so it is ready for an agent.

Hand this back to me for review.

The agent is not maintaining a private task list inside its context window. It is reading and updating the same durable graph I can inspect from the terminal.

Readiness is separate from commitment

One of the core modelling choices in Haven is separating two questions that many backlog tools blur together:

  1. How well understood is this work?
  2. Have I decided to do it?

An item can move from discovery, to definition, to ready, to in_progress, to done. Separately, it can be committed and prioritised, or it can remain floating.

That distinction matters.

A well-specified idea can sit parked for months. A committed bet can still be fuzzy and need definition before anyone should pick it up. haven next only returns work that is committed, ready, and unblocked. If nothing is eligible, Haven can explain why: not committed, not specified, blocked by a dependency, waiting on a human, or waiting on an external event.

For AI-assisted work, this is more than tidiness. It is a guardrail. An agent should not build an item that is not ready, and "ready" should mean something more concrete than "there is a ticket with a title."

Specs live with the work

Haven keeps the graph in SQLite, but the working material lives as ordinary files under ~/.haven: specs, research, decisions, handoff notes, delivery evidence, and project documents.

That split is deliberate. The graph answers structural questions:

  • What is blocked?
  • What is committed?
  • What can an AI pick up next?
  • What did this old item become?
  • What evidence completed this work?

The files hold the actual thinking. They are Markdown, grep-able, directly editable, and readable by agents without round-tripping through the database.

This is where Haven connects to spec-driven AI work. An item can carry acceptance criteria through done_looks_like, plus a spec artifact when the one-line item is not enough. Multi-item batches can carry a Context Pack when several items share architecture, behaviour, contracts, or sequencing.

The point is not that every item gets a large document. The point is that the relevant intent stays attached to the work and can be routed to the agent that needs it.

It also answers the repo question. The work belongs to the product, not necessarily to a single checkout. A repo can have a generated Haven/ workspace so humans and agents have a visible entry point, but the canonical graph and artifacts live outside the repo. I can keep private planning material out of public project history without making it invisible to the agent.

Handoffs are first-class

AI-assisted work creates a lot of small handoffs.

An agent implements something and needs a human review. A human makes a product decision and hands the item back to an agent. A task is blocked on a license key, a stakeholder answer, or a manual sign-off.

In Haven, ownership and waiting state are part of the item. A handoff is one operation: it records the note, flips the owner, and sets the wait state. That means a handed-off item drops out of the wrong queue instead of sitting around looking actionable.

Completion also requires evidence: test output, a PR link, a short delivery note, or whatever proves the work is done. When an item is completed, Haven reports what it just unblocked.

That is the shape I want for agent work: not "the model said it was finished," but "the item met its acceptance, evidence was attached, and these downstream items are now ready."

What Haven is not

Haven is not an agent orchestrator. It does not launch coding agents, run tests, or merge PRs. Other tools can sit above it and do that.

Haven is not a replacement for GitHub, Linear, or Jira when those tools are the right centre of gravity. A merged PR is better evidence than any tracker field. A team tracker is better when many people need reporting, analytics, permissions, and management workflows.

It is also not yet a team product. Sync is on the short-term horizon, but the public local-first version is single-player. Multi-user collaboration is a longer-term product boundary, not something I want to fake with a shallow feature flag.

Haven is also not a memory bank for dumping agent thoughts. It is the durable shape of the work: what exists, what matters, what is blocked, what is ready, who owns it, and what it became.

The immediate use case is one person and their agents maintaining a high-quality backlog: capturing fuzzy ideas, refining them, attaching evidence, shaping specs, deciding what is ready, and handing work between human and AI without losing the thread.

Stack diagram showing Haven as the durable backlog and artifact layer underneath coding agents, CI, pull requests, and team execution tools.

Status

Haven is in daily use. I run my own work through it.

The local workflow is working end to end: items, dependency layers, handoffs, lineage, haven next, haven dispatch, full-text search, artifacts, generated repo workspaces, and the MCP server.

Cloud sync is partly built but not part of the public local-first release yet. The public version is local-first: one binary, one store, usable from the terminal and from AI agents on the same machine.

There is also a shorter project page for Haven with the current status and implementation shape.

Why it matters

The reason I care about Haven now is the same reason I care about specs in AI-assisted development.

As implementation gets faster, the bottleneck moves toward backlog quality: intent, evidence, prioritisation, readiness, coordination, and verification. It is not enough for an agent to have a prompt. It needs to know what work is real, what is ready, what context matters, what would count as done, and who gets the baton next.

That information needs to live somewhere more durable than the current chat.

For me, that place is Haven.