The Cloverleaf Guide

How AI-first teams build software.

The canonical reference. Read top to bottom for the full methodology, or jump to a chapter.

§ 1 · The bottleneck moved

Why Cloverleaf

AI inverts the cost structure of software. Implementation becomes near-free; specification, review, coordination, and failure-handling become the new bottlenecks. Methodologies built on the assumption that humans are the slow part — Scrum sprints, Shape Up cycles, traditional code review — under-allocate effort to the parts that actually gate throughput in an AI-first team.

Cloverleaf re-balances effort around the new bottlenecks. Specifically:

  • The human attention moves upstream to strategic decisions where it has the most leverage.
  • Risk classification is deterministic and conservative so AI volume can scale without compromising on what matters.
  • Specs are treated as binding contracts so agents can execute without ambiguity.
  • Code review is structurally fresh-eyes, not aspirational.
  • Failure handling is typed so mechanical issues never burn human attention.
§ 2 · The 9 principles

Principles

These principles bind throughout the methodology. They are not aspirational — they are the contract.

  1. Humans gate, AI executes. Humans never implement on the critical path. They define, approve, and reject.
  2. Specs are contracts, not suggestions. Every artifact has a mandatory schema. Agents can rely on the contract.
  3. Fresh eyes, always. Code review happens in a separate AI context. The agent that wrote the code never reviews it.
  4. Risk is deterministic, not judged. AI never decides “is this risky?” — risk is rules-based, conservative, and only ever escalates.
  5. Strategic decisions are batched upstream; tactical decisions stay with the agent. Humans approve at the RFC/batch level, not per-Task by default.
  6. Failure type determines handling. Mechanical → auto-retry; quality → bounce once; strategic → escalate immediately.
  7. Phases are contracts, agents are recommendations. The methodology binds at phase input/output. Agent split is a recommended default.
  8. Roles are agnostic; gates carry responsibilities. The methodology defines what each gate decides; how teams staff it is an org choice.
  9. Docs are for AI; Work Items are for humans. Documenter Agent output serves AI; humans communicate intent through RFCs, Spikes, Plans, and Tasks.
§ 3 · Architecture

Two tracks

Heavyweight · batched

Discovery

strategic gating
tasks
questions
Per-task · parallel

Delivery

tactical execution

Discovery feeds Delivery via approved Tasks.
Delivery feeds Discovery via emergent questions, mid-implementation discoveries, or Final-Gate rejections that require revisiting the RFC.

The stages

Agent
Human
Discovery
? Why two gates?
Two gates, not one

Strategy approval and breakdown approval are split because they need different competencies — strategy is product + architecture; breakdown is engineering decomposition.

And bouncing a Plan back for re-decomposition is far cheaper than re-running Spikes.

RFC
Spike(s) optional
Strategy Gate
Plan & Breakdown
Task Batch Gate
Delivery
Pickup
Risk Classifier
Fast / Full Pipeline
Final Approval
Merge & Deploy
§ 4 · Discovery track

Discovery

The Discovery Track converts an idea into a ratified plan with well-formed Tasks.

Agent
Human
Discovery
RFC
Spike(s) optional
Strategy Gate
Plan & Breakdown
Task Batch Gate

Stages

  1. RFC — A proposal authored by the Researcher Agent. Origination is org-decided.
  2. Spike(s) — Zero or more research/de-risk tasks. Each Spike has a single specific question, a method (research/prototype/benchmark), evidence-backed findings, and a recommended path.
  3. RFC + Strategy Gate (Human) — Approves the RFC + Spike findings as a strategic decision. Rejection sends the RFC back for revision.
  4. Plan & Task Breakdown — Executed by the Plan Agent. Emits the Plan artifact: finalized RFC, Task DAG, Tasks (each conforming to the mandatory Task schema), and path → reviewer mappings.
  5. Task Batch Gate (Human) — Verifies the Task breakdown is well-decomposed, achievable, and faithfully implements the RFC intent.

Approved Tasks land in the Delivery backlog.

Splitting strategy from breakdown

Plan & Breakdown is non-trivial AI effort. Gating the RFC first prevents wasting it on a doomed strategy. The two decisions also need different competencies — strategy is product + architecture; breakdown is engineering decomposition.

Failure handling on Discovery

  • A failed Spike does not invalidate the RFC; it informs the next Spike or the RFC revision.
  • A failed Strategy Gate sends the RFC back for re-drafting; if multiple iterations fail, the RFC is closed.
  • A failed Task Batch Gate bounces back to Plan for re-decomposition (cheaper than re-spiking).
§ 5 · Delivery track

Delivery

The Delivery Track executes Tasks through one of two lanes based on deterministic risk classification.

Agent
Human
Delivery
Pickup
Risk Classifier
Fast Lane / Full Pipeline branches
Final Approval
Merge & Deploy

Stages

  1. Pickup — The Implementer Agent picks up an approved Task with its Definition of Done, Acceptance Criteria, and risk markers.
  2. Risk Classifier — Rules-based, deterministic. Examines the Task’s touched paths against project-defined rules (e.g., changes under auth/ are always Full Pipeline). Conservative by design: only escalates, never downgrades.
  3. Fast Lane / Full Pipeline — Fast Lane: Implementer → Reviewer → Final Approval. Full Pipeline adds: UI Reviewer (if frontend touched), QA Agent (if functional tests warranted), Documenter Agent (always).
  4. Final Approval Gate (Human) — The single human gate in Delivery. Approves merge.
  5. Merge & Deploy — The Implementer (or a deploy agent) merges and ships.
Why deterministic risk

AI is good at reasoning but bad at calibrating its own risk tolerance. Hard rules prevent an agent from talking itself into shipping a Fast Lane change that should have been Full Pipeline.

Failure handling on Delivery

  • Mechanical (build break, test flakiness, lint) — auto-retry with the same agent.
  • Quality (Reviewer rejection) — bounce once back to Implementer; if rejected again, escalate.
  • Strategic (mid-implementation discovery that the Task is wrong) — escalate immediately to Discovery; the Task does not block the rest of the batch.
§ 6 · Artifacts

Work items

Cloverleaf has four Work Item types. Each maps to one leaf of the cloverleaf interchange and has a mandatory schema.

RFC

A proposal. Captures problem, context, recommended approach, alternatives, and acceptance criteria. Authored by the Researcher Agent.

Spike

A bounded research task with a single specific question, a method (research/prototype/benchmark), evidence-backed findings, and a recommended path.

Plan

The output of Plan & Breakdown. Contains: finalized RFC, Task DAG, Tasks (each schema-valid), and path → reviewer mappings.

Task

A unit of implementation work. Has DoD, AC, risk markers, dependency edges, and assigned reviewers.

All four types share a common envelope (project, id, status, relationships, extensions). The full schemas live in the Cloverleaf Interoperability Standard at standard/.

§ 7 · Roles

Agents

Cloverleaf defines 7 default agent roles. They are recommended, not mandated — phases bind, agents are flexible.

AgentPhaseResponsibility
ResearcherDiscoveryAuthors RFCs and runs Spikes
PlanDiscoveryDecomposes approved RFCs into Tasks
ImplementerDeliveryPicks up Tasks, writes code
ReviewerDeliveryCode review in fresh context
UI ReviewerDelivery (full pipeline)Frontend-specific review
QADelivery (full pipeline)Functional test execution
DocumenterDelivery (full pipeline)Updates AI-facing docs after every change

A solo developer can run all 7 by switching personas in a single session. A 50-person team can map them to specialized teams. The methodology only requires that the function of each role is filled at the right phase.

§ 8 · Human decisions

Gates

Cloverleaf has three human gates. Each decides one thing.

Strategy Gate (Discovery)

Decides: “Is this strategy sound?” Considers RFC + Spike findings. Outputs: approve, reject (with reason), or send back for revision.

Task Batch Gate (Discovery)

Decides: “Does this breakdown faithfully implement the strategy?” Considers the Plan artifact. Outputs: approve, reject (re-decompose), or escalate (re-RFC).

Final Approval Gate (Delivery)

Decides: “Is this Task ready to merge?” Considers the implementation, review reports, and any QA artifacts. Outputs: approve (merge) or reject (back to Implementer).

Path → reviewer mappings

Project-configurable rules suggest reviewers based on which code paths a Task touches:

  • Changes under auth/ request a Security Reviewer
  • Schema changes request a Database Reviewer
  • Frontend changes request a Frontend Reviewer

The framework provides the mechanism (path → role rules); teams define the mappings per project.

§ 9 · Determinism

Risk classification

Risk classification is deterministic and conservative. AI never decides “is this risky?” — risk is rules-based, only ever escalates.

Inputs

  • Touched code paths (from the Task DAG)
  • Touched data schemas
  • Author identity (the Implementer Agent does not influence its own risk score)
  • Project-specific rule configuration

Outputs

A single classification: Fast Lane or Full Pipeline.

Conservative principle

If any rule matches Full Pipeline, the Task is Full Pipeline. Rules can only escalate; nothing in the Task or its execution can downgrade the lane after classification.

Example rules

  • Any change under auth/ → Full Pipeline
  • Any schema migration → Full Pipeline
  • Any change touching > 200 LOC → Full Pipeline
  • Frontend-only change in a single component → Fast Lane
  • Documentation-only change → Fast Lane (skips QA)

Teams configure their own rules at project setup. The framework provides the matcher; the rules are project property.

§ 10 · When things break

Failure handling

Cloverleaf classifies failure types so handling is mechanical, not improvised.

Mechanical failures

Build breaks, lint errors, test flakes, transient infra issues. Handling: auto-retry with the same agent, up to N attempts. If still failing, escalate to a human via the next available gate.

Quality failures

Reviewer rejections, QA findings, documentation gaps. Handling: bounce once back to the Implementer with the rejection rationale. If rejected again, escalate.

Strategic failures

Mid-implementation discovery that the Task is wrong, that the RFC is incomplete, or that a key assumption was false. Handling: escalate immediately to Discovery. The Task is unblocked; the rest of the batch continues. The escalation may trigger a new Spike or a revised RFC.

Typed failures preserve human attention

Without typing, every failure looks like a human problem. With typing, mechanical failures never reach humans, quality failures get one bounce, and strategic failures route to the right place. Human attention is reserved for decisions only humans can make.

§ 11 · Reference

Glossary

Agent — A specialized AI worker with a defined role. Cloverleaf recommends 7 by default but binds at phase contracts, not at agent count.

Cloverleaf Interoperability Standard — The machine-readable spec (JSON Schemas, OpenAPI contracts, conformance runner, reference validators) that lives in standard/.

DAG — Directed Acyclic Graph. Cloverleaf’s Task structure is a DAG: tasks have ordered dependencies but no cycles.

Delivery Track — The per-Task execution loop. Pickup → Risk Classifier → Fast Lane / Full Pipeline → Final Approval → Merge.

Discovery Track — The strategic planning loop. RFC → Spike → Strategy Gate → Plan & Breakdown → Task Batch Gate.

Fast Lane — Delivery path for low-risk Tasks. Implementer → Reviewer → Final Approval.

Full Pipeline — Delivery path for higher-risk Tasks. Adds UI Reviewer, QA, and Documenter steps.

Gate — A human approval point. Cloverleaf has three: Strategy Gate, Task Batch Gate, Final Approval Gate.

Plan — The output of Plan & Breakdown. Contains finalized RFC, Task DAG, schema-valid Tasks, and path → reviewer mappings.

RFC — A proposal Work Item. The starting point for any non-trivial change.

Risk Classifier — Deterministic, rules-based mechanism that decides Fast Lane vs Full Pipeline.

Spike — A bounded research Task attached to an RFC. Single question, single method, evidence-backed findings.

Task — A unit of implementation work with DoD, AC, risk markers, and dependency edges.

Work Item — Any of the 4 mandatory artifact types: RFC, Spike, Plan, Task.

This glossary expands over time—terms are added as methodology teams encounter them in real practice, not speculatively.