Phase-aware product agent

One agent. Six phases. The whole opportunity-to-launch lifecycle, on a real ask.

PM-OS turns 50 codified PM skills into callable nodes inside a stateful agent. It loads persistent business context, infers which phase to run, executes node-by-node, and writes an inspectable trace — including the skills it deliberately did not run.

See the pipeline Walk through a real run

Codified PM skills

Promoted as nodes

Phases, one orchestrator

Inspectable trace per run

The gap

The canonical PM thinking sequence exists. It is just not addressable.

Without PM-OS

Sequence happens in someone's head, in Slack threads, in scattered docs.
Context is loaded implicitly, every time, from memory.
Skips happen — but the skipped move is invisible.
Reasoning rarely auditable. The verdict ships; the why doesn't.
Across PMs the sequence drifts. No shared substrate.

With PM-OS

Persistent business context loads into every run.
Skills are callable nodes with input/output contracts.
Skips are first-class — every skip carries reasoning.
Mode (executable / degraded / advisory) is required per node.
Every run writes a flow.json trace you can argue with.

Architecture

Three layers. Every architectural choice anchors to one of them.

context/6 files · loaded into every run

owns

business.md

What GDT is, what it sells, the market role

Hover a file to inspect what it owns. Editing a context file changes the lens for every future run — past artifacts stay frozen.

The centerpiece

Six phases. Same orchestrator. Different node sets, artifacts, and verdict vocab.

Build and Eval are siblings off the same Spec parent. Retro reads accumulated runs across every phase. Click a phase to inspect its trigger, headline artifact, and verdicts.

Phase 01

Discovery

Reframe the ask. Decouple bundled questions. Run synthetic interviews against persistent context. Verdict + brief out.

Trigger input

asks/<date>-<slug>.md

Headline artifact

03-discovery-memo.md

Verdict vocab

build
explore
kill
deferred
not-strategy-shaped

upstream_run_id chains linear phases · branch_of ties Build and Eval as siblings off Spec · upstream_run_ids[] lets Retro read across many runs.

Active run · 2026-05-11-price-alerts

Sales asked for price alerts with a paid tier. The agent reframed it as two coupled asks.

Real GDT ask. Six artifacts produced in one run. Scroll to walk the trace — the verdict and next-action update as each artifact lands.

Artifact · 01aipm-prd-stress-test
Opportunity brief
01-opportunity-brief.md
Reframed as two coupled asks: basic alerts (low strategic risk) and paid analytics (high strategic risk). The bundling problem is the brief's first move — it changes everything downstream.
- Two coupled asks separated.
- Basic alerts → low strategic risk.
- Paid analytics → high strategic risk, deferred to its own discovery.
$ cat runs/discovery/2026-05-11-price-alerts/01-opportunity-brief.md
→ artifact written · verdict: asks · decoupled
Artifact · 02aipm-synthetic-personas
Synthetic interviews
02-synthetic-interviews.md
5 of 8 personas, with reasoning per pick and skip. Marlies (P_B3): sophisticated AND not event-driven — wouldn't adopt the basic feature. A real population gap nobody had named.
- Reasoning logged per pick AND per skip.
- Marlies (P_B3): sophisticated but not event-driven — basic feature won't land.
- Real population gap surfaced before any spec is written.
$ cat runs/discovery/2026-05-11-price-alerts/02-synthetic-interviews.md
→ artifact written · verdict: 5 / 8 personas relevant
Artifact · 03aipm-ai-moat · aipm-automated-experimentation
Discovery memo
03-discovery-memo.md
Verdict: build basic, defer paid as its own discovery. Confidence medium-high. Moat-by-moat per tier. P_B5 and P_S2 — different sides of the platform — independently surfaced the neutrality concern.
- Confidence: medium-high.
- Moat-by-moat reasoning per tier.
- Two independent personas (P_B5, P_S2) flagged neutrality — not a one-off.
$ cat runs/discovery/2026-05-11-price-alerts/03-discovery-memo.md
→ artifact written · verdict: build · basic only
Artifact · 04aipm-spec-writer
PRD outline
04-prd-outline.md
Only produced because the verdict was build. Section 3 names what's deliberately out of scope: forecasts, comparable trade, paid tier of any kind. The contract with Strategy + Legal.
- Section 3 names what is deliberately out of scope.
- No forecasts. No comparable trade. No paid tier of any kind.
- Locks the contract with Strategy + Legal up front.
$ cat runs/discovery/2026-05-11-price-alerts/04-prd-outline.md
→ artifact written · verdict: spec · scoped
Artifact · 05aipm-eval-framework
Eval criteria
05-eval-criteria.md
Four dimensions: adoption, correctness, neutrality, IOSCO posture. Stop thresholds are non-net-able — a neutrality failure can't be offset by strong adoption.
- Four dimensions: adoption, correctness, neutrality, IOSCO posture.
- Neutrality failure cannot be offset by adoption wins.
- Stop thresholds belong to the artifact, not the room.
$ cat runs/discovery/2026-05-11-price-alerts/05-eval-criteria.md
→ artifact written · verdict: eval · gates set
Artifact · 06advisory
Next action
next-action.md
Two advisory nodes the agent specifically did NOT execute. Real interviews — agent designed, you run. Seller engagement — agent advised the framing, you run the conversations.
- Two nodes the agent specifically did NOT execute.
- Real interviews — agent designed, you run.
- Seller engagement — agent advised the framing, you run the conversation.
$ cat runs/discovery/2026-05-11-price-alerts/next-action.md
→ artifact written · verdict: advisory · human-run

run · closed

Six artifacts, one decoupling, two advisory hand-offs. The trace is the artifact — what was decided, what was skipped, and what the human still owns.

The agentic bit

Skips are first-class. Modes separate what the agent did from what you need to do.

Compare two runs of the same plan. The full-library version produces the same memo every time. The real run skips nodes with reasoning — and tags every executed node with a mode.

flow.json·executed: 6·skipped: 3choices on the record

nodeexecutable

aipm-prd-stress-test

PRD stress-test

mode

executable

artifact written

01-opportunity-brief.md

reasoning

Decoupled bundled asks. Basic vs paid framed as separate decisions.

mode legend

executableThe agent did the work. Output lives in the artifact. You read and decide.
executable-but-degradedThe agent did the work with a named gap — synthetic stand-in for missing real signal.
advisoryThe agent designed the move; you execute it. Real customer contact lives here. Always.

The week

Built around how a real PM week actually pulses.

The agent doesn't share artifacts with stakeholders — you do. PM-OS stops at the workspace boundary. Friday retros are the habit worth keeping.

MondayTriage

Open asks/. Decide which need a flow this week. File the new ones.

$ ls asks/

inputs

raw asks dropped this week
stakeholder pings
your own backlog

outputs

asks/<date>-<slug>.md filed for each kept ask
deferred asks moved out of the queue with a one-line reason

rule · Triage on Monday, not on Wednesday at 4pm.

The 5-minute pitch

Six diagrams. Five minutes. The whole system, on one wall.

The interview deck. Walk it manually, or hit play and let it advance with timed narration cues.

slide 0:45·total 5:00·space · ← →

pitch · slide 01 · why pm-os

interactive

1 / 6

Chapter · 010:00 – 0:45

Why PM-OS

"What problem does this solve?"

The PM thinking sequence exists in every senior PM's head — it just isn't addressable. PM-OS makes it addressable.

The gap: context implicit, skips invisible, reasoning hidden.
No shared substrate. Skills are not callable.
The bet: the trace is the value, not the verdict.

hotspots on this slide

native · interactive svg · no excalidraw runtime

The bet

The most valuable PM artifact is not the verdict. It is the trace.

A flow that writes a memo is a fancy LLM prompt. A flow that runs five skills, skips four with named reasoning, and writes the same memo plus a flow.json — that is a different thing. The agent's reasoning becomes inspectable, and therefore correctable.

Closed-loop self-modification deliberately avoided