Iterative delivery, after malleable software

Forty years of iterative-delivery literature describes the loop precisely and assumes, without saying so, that the iterator is the same legal entity that owns the artefact. The configuration this essay describes lifts the assumption.

6 May 2026 · 16 min read

On a Monday in late April 2026, the valuer at the practice we work with was halfway through a real chattel valuation — Day 5 on a residential property in Christchurch — and decided the report needed something it didn't have. Each chattel he was justifying carried a market value he could defend — a Bosch built-in oven, a Fisher & Paykel French-door fridge-freezer, a Fujitsu ducted heat pump, Goldair towel rails, a Merlin garage door opener, Kwila decks — but the report didn't show his working. He wanted source-notes per line: a retail-price band, the retailer the comparable came from, an anchor strength, a methodology grade. He opened a session, told the agent on his side what he wanted the report to contain, and within an hour the format-spec for the workpaper contracts had a new line type — W| — defined with seven fields; the contract-validator had been extended to require it where applicable; the report-renderer had been updated to inject source-note rows beneath each line at the right anchor-strength threshold. He used the new schema on the same job he was working on. Twenty-six source-notes rendered into the chattel valuation that went to the client.

We watched this in the next pull. We were not in the loop. We had built the original report's tagged-contract structure as part of the engagement's end-to-end delivery; the valuer changed what the report was required to contain, and what the validators were required to enforce, on a Monday afternoon, working inside a job that had a deadline. The substrate updated. The next session read the change. The thing that needed changing — the report's evidentiary chain — got changed by the person with the most direct evidence of what was wrong, at the cadence the evidence arrived, with no contractual surface to cross.

Forty years of iterative-delivery literature describes the shape of that loop precisely. Forty years of iterative-delivery literature also assumes, without saying so, that the iterator is the same legal entity that owns the artefact. SpaceX iterates on SpaceX rockets. Toyota iterates on Toyota cars. Goldratt's Alex Rogo iterates on UniCo's plant. The loop runs inside an organisational boundary because no other configuration was practical. The valuer's W| schema added to the report contract ran across one. This essay is about what changes when the contractual surface the loop always wanted to cross stops being a wall.

What the literature describes, and what it doesn't

Three sources sit on the canon, each pointed at the loop from a different angle.

Mary Poppendieck's 2019 keynote at swissICT's Lean-Agile-Scrum conference, Learning How to Learn, tightens Peter Senge's Fifth Discipline into a sharper claim: the cadence of integration is the unit of learning. Her load-bearing example is SpaceX, told through John Muratore — a former NASA Launch Director who joined SpaceX in the same role. Poppendieck calls the pattern Sync-and-Stabilize: the launch date does not move, every component is integrated against the whole on a frequent cadence, and the engineer responsible for any failure has to explain the next morning what went wrong and how it will not go wrong again. Sync-and-Stabilize learns; the waterfall alternative hopes.

Muratore's framing of why this works, quoted by Poppendieck, is the load-bearing line: because we can design, build, and test at low cost. Does that look like low cost to you? It's cheaper than thinking too hard. We can afford to learn through experience, rather than consuming schedule and attempting to anticipate all possible system interactions. The economics flip when each iteration is cheap relative to the program. SpaceX could afford to lose rockets because each rocket was cheap relative to the program. A government contractor on a different cost structure could not, and so iterated less, and so learned less.

The cultural artefact of this orientation is SpaceX's 2017 video How Not to Land an Orbital Rocket Booster — a compilation of 2013-2016 booster failures set to the Monty Python theme, with text overlays like “Look, that's not an explosion. It's just a rapid unscheduled disassembly.” The same failure mode (propellant depletion) appears twice across two years, which means the loop is genuinely a loop. Publishing the failures inverts the usual signal-of-competence: in most cultures no public failures signals competence; in the SpaceX artefact, publishing them does, because their absence becomes reframable as evidence the organisation isn't iterating fast enough to produce them.

The third frame is older. Eliyahu Goldratt's The Goal: A Process of Ongoing Improvement came out in 1984, and the Theory of Constraints names the loop operationally. Identify the constraint, exploit it, subordinate everything else to it, expand its capacity only after the previous steps are exhausted, then repeat — the constraint will have moved. Goldratt also supplies the warning that pairs with the loop: Herbie, the slowest scout in The Goal's hiking parable, sets the pace of the troop. Speeding up the fast hikers stretches the line and loses Herbie at the back. Iterating on the wrong layer is worse than not iterating.

Three vocabularies, one shape: ship, observe in reality, change, ship again. None of the three names the implicit boundary condition. None of them has to. The boundary condition was never violated in the texts.

Why the loop hasn't crossed contracts

Most service businesses do not run the loop, and the reason is not engineering. It is the cost-allocation surface around the loop, which fails in three different ways once a contractual boundary appears.

The first failure is cultural. Most organisations are structured to hide failure. Performance reviews penalise public failures; procurement processes select for vendors who promise success. The SpaceX move — publishing the failure reel — is culturally radical precisely because it inverts the signal-of-competence in most workplaces. An organisation whose leadership has not normalised public failure cannot adopt the iteration loop without first changing its political surface.

The second failure is contractual. The dominant engagement shapes — fixed-fee project, time-and-materials advisory, subscription — were designed before iteration was cheap. Fixed-fee assumes the artefact is specified before construction; iteration becomes change-orders. Time-and-materials pays for input hours; iteration is unbillable because nobody owns the failure cycle. Subscription decouples revenue from delivery; iteration cycles are invisible to the buyer. The loop survives only inside organisational boundaries where the contractual surface is internal.

The third failure is risk-allocation. Iteration produces failures, and failures cost something. Under fixed-fee or time-and-materials the customer pays for the failures and reasonably refuses; under subscription the vendor absorbs them but the customer cannot tell whether iteration is happening because the contractual surface is opaque. Either way, the party with the most direct knowledge of whether iteration is producing value — the customer — is structurally separated from the party deciding whether to iterate. The loop runs slower than it should, or doesn't run at all.

The pattern that survives in the absence of the loop is well-known. The vendor iterates on the artefact. The customer reacts. Reaction is verbal, not modificatory — this terminology is wrong; take this line out, our process can't honour it — because the customer cannot modify what the vendor owns. The vendor receives the verbal feedback, interprets it against their own model of the customer's reality, makes a change, ships again. Two decision cadences instead of one. Every cycle pays a translation tax.

There is a worse failure mode underneath the cycle-time one. Some kinds of feedback are not verbalisable. A customer who would have moved a paragraph two lines up does not, in general, ask for that. A customer who would have changed a single word — valuation to report, engagement to contract — frequently cannot articulate which word is wrong, because the wrongness is felt at the editing-of-the-sentence layer, not at the rephrase-to-the-vendor layer. When the loop cannot cross the contract, the feedback that survives the boundary is the feedback the customer can articulate. The rest is feedback the artefact silently fails to receive.

What changed

Malleable software is software the customer can modify without writing code. The customer modifies it by directing an AI agent on their end — describing what should change, against the artefact's current state, with the agent translating intent into the modification. The customer is not a programmer in this loop. The customer is an editor, a domain expert, a person who knows what the right workflow shape feels like. The agent supplies leverage on the editorial taste.

When this works, the iterator is the customer. The vendor's role moves up a layer: build the substrate the customer iterates on. That means the deployable surface where the customer's content and configuration live, separated from the immutable runtime and audit core; the agent runtime on the customer's side; the source-of-truth documents the agent reads; the deploy workflow that bridges the edits and the live system. The vendor's craft is the substrate's malleability and the agent's competence inside it.

The two cadences collapse to one. The translation tax goes to zero, because the customer's modification is the specification of what was missing. The gap was filled in the act of filling it.

Editorial taste at the workflow-shape layer

There is a kind of taste this configuration depends on, and the practitioner literature does not have a vocabulary for it. The customer is not bringing programming skill to the loop. They are bringing something else, and naming it precisely matters because the configuration only works when this thing is present.

The valuer in our engagement knows what a chattel valuation report should read like in a way the agency cannot. He knows the cadence of the section headings; he knows which fields the insurer expects in which order; he knows that replacement value and insured value are different numbers in his practice and conflating them is a small but real error a lay reader would not catch. When he authored, end-to-end via the agent, the skill that produces the practice's invoices, what he produced specified a Xero-style NZ tax invoice whose columns and totals lines match the IRD's expectations — not because he wrote the implementation, but because he supplied the shape. The agent supplied the leverage; the shape was his.

This is editorial taste at the workflow-shape layer. It is not engineering judgement. It is closer to the taste a senior copy-editor brings to a manuscript — knowing what reads correctly, which moves are tradition and which are weight-bearing, where the reader's eye is going next. Applied to software, the taste produces SKILL.md files that fit the practice because the practitioner is the author. The agent cannot supply the taste. The agent can supply everything else.

The configuration only works when the customer brings this taste. It is the load-bearing precondition. A customer who does not bring it produces, via the agent, a workflow whose shape fits nobody. The shape needs an author.

Three signals in one observation

When the customer modifies the workflow themselves, the single observation reveals three things product research usually treats as separate streams.

First: what they need that doesn't yet exist. The modification is the specification of the gap. There is no requirements document to write, no user-research session to schedule, no synthesis pass to interpret what the customer meant. The diff is the spec. The valuer wanted the report to carry per-line retail-evidence anchors; he added the W| line type to the workpaper contract spec, extended the validators to enforce it, and used twenty-six of them on the live job. The spec is in the contract file and the validator extensions, dated the moment he decided the report needed to show its working.

Second: the cost the customer was willing to bear to change it. Revealed willingness, not stated. The customer iterated, therefore the modification was worth the customer's effort against everything else they could have spent it on. Stated willingness is unreliable because the cost of stating is low. Revealed willingness is reliable because the cost of acting was real. The vendor doesn't have to ask whether the customer cares; the customer's calendar already answered.

Third: the direction the workflow should evolve. Modifications across customers, or across time for one customer, reveal the trajectory in a way no roadmap exercise can. The cluster of customer-authored skills that emerge in the first six weeks tells the vendor where the workflow's actual centre of gravity is — which often turns out to be a place no one would have predicted from the engagement letter. Roadmap discovery becomes archaeology of commits.

Three traditional research questions — specification, willingness-to-pay, roadmap — collapse into one observable event. The strongest signal of what needs changing is the customer actually making the change.

How the loop runs at the customer's cadence

The shape of the engagement, in the operator's words on the day this thinking ripened: very much an iterative delivery from a spiky 1 hour to working demo proving the core workflow followed by a few weeks of infra / hosting build out plus iteratively working on the productised version of that demo with the client. The first hour produced a working end-to-end demonstration on a single real case, fast enough to feel real. The weeks that followed scaffolded the infrastructure and built the productised version. That was vendor-side iteration. What happened next is the subject.

From mid-April 2026 onward, the customer-side repository started accruing commits the customer authored himself via the agent on his end. The most load-bearing pattern is iteration on the deliverable's product shape — the chattel valuation report itself. The W| source-note schema that opens this essay is one example: a new line type defined in the workpaper contract spec, the contract-validator extended to enforce it, the report-renderer updated to inject source-note rows beneath each line item, and 26 source-notes used the same day on the live job. The retail-price evidence those notes carry — Bosch via Noel Leeming, Fisher & Paykel via the manufacturer's direct retail catalogue, Fujitsu via Kiwi Heat Pumps, Kwila decks against construction-cost rates — is the valuer's domain knowledge made auditable by the document.

None of this requires programming literacy. The workflow's load-bearing surfaces are plain text on every axis. Each step the workflow runs is a SKILL.md file — markdown prose with a small frontmatter block declaring the step's inputs and triggers. The structured records that flow between steps are pipe-delimited contracts the customer reads line by line: V| for valuation lines, G| for grouping, W| for evidence anchors. The customer-facing report renders from those contracts via templates the customer can edit by reading. Nothing is compiled. A practitioner who can read a contract and describe what is missing can describe it to the agent and have the agent edit the spec, the validators, and the renderer in one session. The agent supplies the engineering leverage. The practitioner supplies the shape. Programming is not the precondition. Domain literacy is.

End-to-end customer iteration on this substrate has been observed continuously since mid-April 2026 — weekly cadence of customer-authored commits on the report contract, the validators, the depreciation-rules workpaper, the per-job domain calls. The cadence runs at the customer's pace; the work is in his domain; the substrate molds to the preferences the work reveals; the agency sees it in the next pull. The configuration runs.

Why this only works when the agency runs the same loops

A reader paying attention to the asymmetry will be wondering why the customer should trust a substrate the agency built but the customer depends on. The structural answer is dogfooding, and it is load-bearing.

The loops the agency ships are the loops the agency runs internally. The agent runtime on the customer's machine is the same one the agency uses on its own. The skill-authoring pattern the customer uses is the pattern the agency uses to maintain its own workspace. The failure modes the customer hits are failure modes the agency hits first, on its own work, with consequences for its own delivery. An agency whose internal operations do not depend on the substrate ships customers a substrate it has not stress-tested. Dogfooding is the structural answer to the trust question, not a marketing claim.

This is what makes the configuration composable with the agency's pricing position, which we wrote down separately in The wrong unit of measurement. The iteration claim and the pricing claim compose into a three-leg position: the agency dogfoods the substrate it ships, the pricing is pegged to the surplus the customer's business produces, and the artefact is iterated by the customer. Drop any one leg and the structure is unanchored: dogfooding without outcome-pricing pays the agency the same whether the loop works; outcome-pricing without dogfooding ships customers a substrate the agency hasn't stress-tested; malleable software without either is a deliverable nobody is on the hook for.

What the configuration doesn't lift

The vendor still owns the substrate, the runtime, and the observability layer. When the substrate breaks, the customer cannot fix it via the agent. The audit core in the practice's workflow is a worked example: the customer can edit prose freely, but the audit blob is immutable because external actors (regulators, future disputes) require stability there. The malleable surface lives around the immutable core, not over it. Regulated artefacts with audit and compliance constraints have the same shape: a workflow whose entire surface is regulated has no malleable surface to ship.

There are now two iteration loops. The customer iterates on the artefact. The vendor iterates on the substrate that enables the iteration — deploy workflows, agent runtime, source-of-truth doc patterns, audit boundary. The vendor's loop hasn't disappeared; it has moved up a layer.

Not all customers are iterators, and this is the configuration's most operator-relevant failure mode. A customer who arrives without editorial taste at the workflow-shape layer produces, via the agent, a workflow whose shape fits nobody. The early signal is unmistakable in the first two weeks: the customer-side repository is silent, the agent is on the customer's machine, the substrate is in place, and the customer is not making changes. Every meeting returns to the agency for direction. The configuration has no native answer beyond honest acknowledgement: this customer wanted a deliverable they didn't have to think about, and the configuration is a poor fit. The remedy is recognising it early and not extending on a value-share basis.

Iteration without bottleneck-identification is local optimisation in disguise, and Goldratt's Herbie warning lives inside the engagement. If the customer iterates heavily on terminology and not on workflow shape, the customer's attention is on the wrong layer — and possibly the workflow shape is wrong and terminology is the only surface they are touching. A malleable substrate that makes the wrong work easier to do faster is the failure mode under a fresh coat of paint.

Customer iteration also produces customer-side technical debt. When the customer edits via the agent and the vendor doesn't see the edits in advance, the substrate accumulates decisions the vendor didn't vet. The audit and source-of-truth surface needs careful design so the substrate does not silently drift past its observability boundary. The standard answer — automated checks against customer-iterated branches — is operationally underdeveloped at the time of writing.

The cross-customer surface, deliberately open

There is a question this essay does not settle, and it becomes a real question at three customers, not at one. When more than one customer is iterating on their own substrate, the agency's meta-loop sees patterns no individual customer can. A skill the valuer authored to handle an edge case might be the same shape as something a future research-workflow customer would benefit from; a deploy-workflow refinement one customer wires up might generalise. The mechanism for surfacing class-wide patterns and feeding them back into shared substrate components — without violating the per-customer audit boundary, without flattening customer-specific shapes into a generic that fits no one — is unspecified at one customer.

What gets tested at customer #2, specifically: whether the onboarding cadence holds when the substrate is the first deliverable rather than the result of iterative discovery, and whether editorial taste at the workflow-shape layer is portable across domains. What gets tested at customer #3: whether cross-customer learning can surface generalisable substrate components without flattening either customer's shape. The answer will be empirical, not theoretical, and we have not run the experiment yet.

The contractual surface the loop always wanted to cross

The literature was right about the loop and silent about the iterator, and the silence was load-bearing. Iterative delivery's classic shape — vendor iterates, customer reacts — was a constraint of pre-AI tooling, not a law of iteration. The constraint lifts when the vendor builds the substrate, the agency runs the same loops it ships, and the customer brings editorial taste at the workflow-shape layer. The contractual surface the loop always wanted to cross has stopped being a wall.

We have observed this on one engagement, continuously since mid-April 2026. That is what we have. The next convergence point is the second customer.