One agent. Six phases. The whole opportunity-to-launch lifecycle, on a real ask.
PM-OS turns 50 codified PM skills into callable nodes inside a stateful agent. It loads persistent business context, infers which phase to run, executes node-by-node, and writes an inspectable trace — including the skills it deliberately did not run.
The canonical PM thinking sequence exists. It is just not addressable.
- Sequence happens in someone's head, in Slack threads, in scattered docs.
- Context is loaded implicitly, every time, from memory.
- Skips happen — but the skipped move is invisible.
- Reasoning rarely auditable. The verdict ships; the why doesn't.
- Across PMs the sequence drifts. No shared substrate.
- Persistent business context loads into every run.
- Skills are callable nodes with input/output contracts.
- Skips are first-class — every skip carries reasoning.
- Mode (executable / degraded / advisory) is required per node.
- Every run writes a flow.json trace you can argue with.
Three layers. Every architectural choice anchors to one of them.
What GDT is, what it sells, the market role
Hover a file to inspect what it owns. Editing a context file changes the lens for every future run — past artifacts stay frozen.
Six phases. Same orchestrator. Different node sets, artifacts, and verdict vocab.
Build and Eval are siblings off the same Spec parent. Retro reads accumulated runs across every phase. Click a phase to inspect its trigger, headline artifact, and verdicts.
Discovery
Reframe the ask. Decouple bundled questions. Run synthetic interviews against persistent context. Verdict + brief out.
- build
- explore
- kill
- deferred
- not-strategy-shaped
upstream_run_id chains linear phases · branch_of ties Build and Eval as siblings off Spec · upstream_run_ids[] lets Retro read across many runs.
Sales asked for price alerts with a paid tier. The agent reframed it as two coupled asks.
Real GDT ask. Six artifacts produced in one run. Scroll to walk the trace — the verdict and next-action update as each artifact lands.
- Artifact · 01aipm-prd-stress-test
Opportunity brief
01-opportunity-brief.mdReframed as two coupled asks: basic alerts (low strategic risk) and paid analytics (high strategic risk). The bundling problem is the brief's first move — it changes everything downstream.
- Two coupled asks separated.
- Basic alerts → low strategic risk.
- Paid analytics → high strategic risk, deferred to its own discovery.
$ cat runs/discovery/2026-05-11-price-alerts/01-opportunity-brief.md→ artifact written · verdict: asks · decoupled - Artifact · 02aipm-synthetic-personas
Synthetic interviews
02-synthetic-interviews.md5 of 8 personas, with reasoning per pick and skip. Marlies (P_B3): sophisticated AND not event-driven — wouldn't adopt the basic feature. A real population gap nobody had named.
- Reasoning logged per pick AND per skip.
- Marlies (P_B3): sophisticated but not event-driven — basic feature won't land.
- Real population gap surfaced before any spec is written.
$ cat runs/discovery/2026-05-11-price-alerts/02-synthetic-interviews.md→ artifact written · verdict: 5 / 8 personas relevant - Artifact · 03aipm-ai-moat · aipm-automated-experimentation
Discovery memo
03-discovery-memo.mdVerdict: build basic, defer paid as its own discovery. Confidence medium-high. Moat-by-moat per tier. P_B5 and P_S2 — different sides of the platform — independently surfaced the neutrality concern.
- Confidence: medium-high.
- Moat-by-moat reasoning per tier.
- Two independent personas (P_B5, P_S2) flagged neutrality — not a one-off.
$ cat runs/discovery/2026-05-11-price-alerts/03-discovery-memo.md→ artifact written · verdict: build · basic only - Artifact · 04aipm-spec-writer
PRD outline
04-prd-outline.mdOnly produced because the verdict was build. Section 3 names what's deliberately out of scope: forecasts, comparable trade, paid tier of any kind. The contract with Strategy + Legal.
- Section 3 names what is deliberately out of scope.
- No forecasts. No comparable trade. No paid tier of any kind.
- Locks the contract with Strategy + Legal up front.
$ cat runs/discovery/2026-05-11-price-alerts/04-prd-outline.md→ artifact written · verdict: spec · scoped - Artifact · 05aipm-eval-framework
Eval criteria
05-eval-criteria.mdFour dimensions: adoption, correctness, neutrality, IOSCO posture. Stop thresholds are non-net-able — a neutrality failure can't be offset by strong adoption.
- Four dimensions: adoption, correctness, neutrality, IOSCO posture.
- Neutrality failure cannot be offset by adoption wins.
- Stop thresholds belong to the artifact, not the room.
$ cat runs/discovery/2026-05-11-price-alerts/05-eval-criteria.md→ artifact written · verdict: eval · gates set - Artifact · 06advisory
Next action
next-action.mdTwo advisory nodes the agent specifically did NOT execute. Real interviews — agent designed, you run. Seller engagement — agent advised the framing, you run the conversations.
- Two nodes the agent specifically did NOT execute.
- Real interviews — agent designed, you run.
- Seller engagement — agent advised the framing, you run the conversation.
$ cat runs/discovery/2026-05-11-price-alerts/next-action.md→ artifact written · verdict: advisory · human-run
Six artifacts, one decoupling, two advisory hand-offs. The trace is the artifact — what was decided, what was skipped, and what the human still owns.
Skips are first-class. Modes separate what the agent did from what you need to do.
Compare two runs of the same plan. The full-library version produces the same memo every time. The real run skips nodes with reasoning — and tags every executed node with a mode.
PRD stress-test
Decoupled bundled asks. Basic vs paid framed as separate decisions.
- executableThe agent did the work. Output lives in the artifact. You read and decide.
- executable-but-degradedThe agent did the work with a named gap — synthetic stand-in for missing real signal.
- advisoryThe agent designed the move; you execute it. Real customer contact lives here. Always.
Built around how a real PM week actually pulses.
The agent doesn't share artifacts with stakeholders — you do. PM-OS stops at the workspace boundary. Friday retros are the habit worth keeping.
Open asks/. Decide which need a flow this week. File the new ones.
- raw asks dropped this week
- stakeholder pings
- your own backlog
- asks/<date>-<slug>.md filed for each kept ask
- deferred asks moved out of the queue with a one-line reason
rule · Triage on Monday, not on Wednesday at 4pm.
Six diagrams. Five minutes. The whole system, on one wall.
The interview deck. Walk it manually, or hit play and let it advance with timed narration cues.
Why PM-OS
"What problem does this solve?"
The PM thinking sequence exists in every senior PM's head — it just isn't addressable. PM-OS makes it addressable.
- The gap: context implicit, skips invisible, reasoning hidden.
- No shared substrate. Skills are not callable.
- The bet: the trace is the value, not the verdict.
The most valuable PM artifact is not the verdict. It is the trace.
A flow that writes a memo is a fancy LLM prompt. A flow that runs five skills, skips four with named reasoning, and writes the same memo plus a flow.json — that is a different thing. The agent's reasoning becomes inspectable, and therefore correctable.