PM-OS
Phase-aware product agent

One agent. Six phases. The whole opportunity-to-launch lifecycle, on a real ask.

PM-OS turns 50 codified PM skills into callable nodes inside a stateful agent. It loads persistent business context, infers which phase to run, executes node-by-node, and writes an inspectable trace — including the skills it deliberately did not run.

50
Codified PM skills
33
Promoted as nodes
6
Phases, one orchestrator
1
Inspectable trace per run
The gap

The canonical PM thinking sequence exists. It is just not addressable.

Without PM-OS
  • Sequence happens in someone's head, in Slack threads, in scattered docs.
  • Context is loaded implicitly, every time, from memory.
  • Skips happen — but the skipped move is invisible.
  • Reasoning rarely auditable. The verdict ships; the why doesn't.
  • Across PMs the sequence drifts. No shared substrate.
With PM-OS
  • Persistent business context loads into every run.
  • Skills are callable nodes with input/output contracts.
  • Skips are first-class — every skip carries reasoning.
  • Mode (executable / degraded / advisory) is required per node.
  • Every run writes a flow.json trace you can argue with.
Architecture

Three layers. Every architectural choice anchors to one of them.

context/6 files · loaded into every run
owns
business.md

What GDT is, what it sells, the market role

Hover a file to inspect what it owns. Editing a context file changes the lens for every future run — past artifacts stay frozen.

The centerpiece

Six phases. Same orchestrator. Different node sets, artifacts, and verdict vocab.

Build and Eval are siblings off the same Spec parent. Retro reads accumulated runs across every phase. Click a phase to inspect its trigger, headline artifact, and verdicts.

Phase 01

Discovery

Reframe the ask. Decouple bundled questions. Run synthetic interviews against persistent context. Verdict + brief out.

Trigger input
asks/<date>-<slug>.md
Headline artifact
03-discovery-memo.md
Verdict vocab
  • build
  • explore
  • kill
  • deferred
  • not-strategy-shaped

upstream_run_id chains linear phases · branch_of ties Build and Eval as siblings off Spec · upstream_run_ids[] lets Retro read across many runs.

Active run · 2026-05-11-price-alerts

Sales asked for price alerts with a paid tier. The agent reframed it as two coupled asks.

Real GDT ask. Six artifacts produced in one run. Scroll to walk the trace — the verdict and next-action update as each artifact lands.

  1. Artifact · 01aipm-prd-stress-test

    Opportunity brief

    01-opportunity-brief.md

    Reframed as two coupled asks: basic alerts (low strategic risk) and paid analytics (high strategic risk). The bundling problem is the brief's first move — it changes everything downstream.

    • Two coupled asks separated.
    • Basic alerts → low strategic risk.
    • Paid analytics → high strategic risk, deferred to its own discovery.
    $ cat runs/discovery/2026-05-11-price-alerts/01-opportunity-brief.md
    → artifact written · verdict: asks · decoupled
  2. Artifact · 02aipm-synthetic-personas

    Synthetic interviews

    02-synthetic-interviews.md

    5 of 8 personas, with reasoning per pick and skip. Marlies (P_B3): sophisticated AND not event-driven — wouldn't adopt the basic feature. A real population gap nobody had named.

    • Reasoning logged per pick AND per skip.
    • Marlies (P_B3): sophisticated but not event-driven — basic feature won't land.
    • Real population gap surfaced before any spec is written.
    $ cat runs/discovery/2026-05-11-price-alerts/02-synthetic-interviews.md
    → artifact written · verdict: 5 / 8 personas relevant
  3. Artifact · 03aipm-ai-moat · aipm-automated-experimentation

    Discovery memo

    03-discovery-memo.md

    Verdict: build basic, defer paid as its own discovery. Confidence medium-high. Moat-by-moat per tier. P_B5 and P_S2 — different sides of the platform — independently surfaced the neutrality concern.

    • Confidence: medium-high.
    • Moat-by-moat reasoning per tier.
    • Two independent personas (P_B5, P_S2) flagged neutrality — not a one-off.
    $ cat runs/discovery/2026-05-11-price-alerts/03-discovery-memo.md
    → artifact written · verdict: build · basic only
  4. Artifact · 04aipm-spec-writer

    PRD outline

    04-prd-outline.md

    Only produced because the verdict was build. Section 3 names what's deliberately out of scope: forecasts, comparable trade, paid tier of any kind. The contract with Strategy + Legal.

    • Section 3 names what is deliberately out of scope.
    • No forecasts. No comparable trade. No paid tier of any kind.
    • Locks the contract with Strategy + Legal up front.
    $ cat runs/discovery/2026-05-11-price-alerts/04-prd-outline.md
    → artifact written · verdict: spec · scoped
  5. Artifact · 05aipm-eval-framework

    Eval criteria

    05-eval-criteria.md

    Four dimensions: adoption, correctness, neutrality, IOSCO posture. Stop thresholds are non-net-able — a neutrality failure can't be offset by strong adoption.

    • Four dimensions: adoption, correctness, neutrality, IOSCO posture.
    • Neutrality failure cannot be offset by adoption wins.
    • Stop thresholds belong to the artifact, not the room.
    $ cat runs/discovery/2026-05-11-price-alerts/05-eval-criteria.md
    → artifact written · verdict: eval · gates set
  6. Artifact · 06advisory

    Next action

    next-action.md

    Two advisory nodes the agent specifically did NOT execute. Real interviews — agent designed, you run. Seller engagement — agent advised the framing, you run the conversations.

    • Two nodes the agent specifically did NOT execute.
    • Real interviews — agent designed, you run.
    • Seller engagement — agent advised the framing, you run the conversation.
    $ cat runs/discovery/2026-05-11-price-alerts/next-action.md
    → artifact written · verdict: advisory · human-run
run · closed

Six artifacts, one decoupling, two advisory hand-offs. The trace is the artifact — what was decided, what was skipped, and what the human still owns.

The agentic bit

Skips are first-class. Modes separate what the agent did from what you need to do.

Compare two runs of the same plan. The full-library version produces the same memo every time. The real run skips nodes with reasoning — and tags every executed node with a mode.

flow.json·executed: 6·skipped: 3choices on the record
nodeexecutable
aipm-prd-stress-test

PRD stress-test

mode
executable
artifact written
01-opportunity-brief.md
reasoning

Decoupled bundled asks. Basic vs paid framed as separate decisions.

mode legend
  • executableThe agent did the work. Output lives in the artifact. You read and decide.
  • executable-but-degradedThe agent did the work with a named gap — synthetic stand-in for missing real signal.
  • advisoryThe agent designed the move; you execute it. Real customer contact lives here. Always.
The week

Built around how a real PM week actually pulses.

The agent doesn't share artifacts with stakeholders — you do. PM-OS stops at the workspace boundary. Friday retros are the habit worth keeping.

MondayTriage

Open asks/. Decide which need a flow this week. File the new ones.

$ ls asks/
inputs
  • raw asks dropped this week
  • stakeholder pings
  • your own backlog
outputs
  • asks/<date>-<slug>.md filed for each kept ask
  • deferred asks moved out of the queue with a one-line reason

rule · Triage on Monday, not on Wednesday at 4pm.

The 5-minute pitch

Six diagrams. Five minutes. The whole system, on one wall.

The interview deck. Walk it manually, or hit play and let it advance with timed narration cues.

slide 0:45·total 5:00·
pitch · slide 01 · why pm-os
interactive
the gap → the betWhy PM-OSThe senior-PM thinking sequence exists. PM-OS makes it addressable.before · the gapContext lives in your headReasoning hides in SlackSkipped moves are invisible50 PM moves exist as conceptsafter · pm-osContext persistent in context/*Every node carries reasoningSkips first-class in flow.json50 SKILL.md = callable functionsaddressablethe betThe most valuable artifact is NOT the verdict — it is the trace.
1 / 6
Chapter · 010:00 – 0:45

Why PM-OS

"What problem does this solve?"

The PM thinking sequence exists in every senior PM's head — it just isn't addressable. PM-OS makes it addressable.

  • The gap: context implicit, skips invisible, reasoning hidden.
  • No shared substrate. Skills are not callable.
  • The bet: the trace is the value, not the verdict.
hotspots on this slide
native · interactive svg · no excalidraw runtime
The bet

The most valuable PM artifact is not the verdict. It is the trace.

A flow that writes a memo is a fancy LLM prompt. A flow that runs five skills, skips four with named reasoning, and writes the same memo plus a flow.json — that is a different thing. The agent's reasoning becomes inspectable, and therefore correctable.

Closed-loop self-modification deliberately avoided