Every Decision Was a Decision to Build Less

A meta-analysis of the reasoning pattern that ran through one refactor session — and how to turn it into a thinking skill you can run on purpose.

Ninety-six turns into a refactor session, I noticed a pattern. Every decision that had actually improved the system was a decision to remove work, not add it. Not “what should I build here” — “what can I stop building because something else already handles it.”

Six moves. Each one a question that removes work before code is written.

Move 1 — Ask What the Host Already Does

The first question before writing any validation or enforcement logic: does the host environment already do this?

In the refactor context, the harness already enforced agent boundaries, file permissions, and tool access scopes. Writing a second layer of boundary-checking inside the skill was not defense-in-depth — it was duplication that would drift out of sync with the host’s actual enforcement.

The move: declare governance in the contract. Delegate enforcement to the host. Do not build either.

Concept: Governance declared in a contract is lighter than governance implemented in code, because the host’s enforcement applies universally without maintenance overhead from the skill.

Move 2 — Look for the Orthogonal Axis

The rubric had genres and platforms. First instinct: build a matrix — one rubric file per genre-platform combination. Eight genres, six platforms, forty-eight files.

The question that killed the matrix: are these variables actually dependent, or am I treating them as dependent because they co-occur?

They were orthogonal. Genre and platform are independent axes. A how-to for LinkedIn and a how-to for X share all the genre structure and differ only in platform-specific checks. Composition turns the forty-eight-file maintenance problem into a fourteen-file one: eight genre files, six platform files, composed at scoring time.

A change to the how-to genre standard propagates to every platform automatically. A change to the LinkedIn platform standard propagates to every genre automatically. No file touches both axes.

Concept: When two variables are independent, a matrix is a trap. Composition turns an N×M maintenance problem into an N+M one.

Move 3 — Classify Each Step as Code or Inference

Before writing any evaluation logic, classify the step: is this deterministic or does it require judgment?

Deterministic: does the output match the expected format? Does the response contain a valid JSON structure? Is the word count within range? Write these as code. They cost nothing to run, give the same answer every time, and never need to be calibrated.

Judgment-requiring: is this actually insightful? Would a reader save this? Does the argument hold together? Spend inference here. These are the checks where a model adds value that code cannot replicate.

The failure mode is spending inference on steps that code could handle for free — then compounding the error by letting code “handle” judgment calls it cannot actually make.

Concept: Write the deterministic part as code; spend inference only where judgment is required. Code is a deletable one-time cost; inference is a compounding recurring one.

Move 4 — Make Each Layer Earn Its Place

The orchestration had a harness, a router, a dispatcher, and a registry. The question: what would break if I removed each layer?

The registry didn’t break anything when removed — the harness could route directly to skills. The dispatcher was handling a use case that hadn’t occurred yet. The router was doing work the harness could do with two lines of config.

Start with the cheapest form. Add structure when real friction demands it, not in anticipation of friction that might never arrive. A layer that exists in case you need it later is a layer that costs maintenance budget every week against a speculative future need.

Concept: Add structure when it’s needed, not before. Start with the cheapest form and let real friction pull you up the ladder.

Move 5 — Test Each Piece the Moment You Build It

The pattern that caught three real bugs: test against a known-good and a known-bad immediately after writing each check.

The listicle scorer was passing a test case it should have failed — the regex was matching partial structures. The how-to regex was firing on content that was not a how-to. The sales-copy CTA pattern was missing a common CTA form.

All three bugs would have been silent without immediate testing: the check would have run, returned a number, and the number would have been wrong in a way that looked right. Three wrong verdicts before the first real draft went through.

Concept: Test against known-good and known-bad immediately. A check that produces a wrong verdict silently is worse than no check.

Move 6 — Keep One Source of Truth and Flag the Seams

The rubric was being read by the judge skill and also copied into the harness config for display purposes. Two copies of the same standard. Within a week they drifted: the judge was scoring on four criteria, the config was displaying three.

One source of truth. Everything else references it. Where uncertainty exists — where a check is based on incomplete evidence, or a threshold was set by feel rather than data — mark it honestly in the file. A comment that says “threshold set at 0.7; based on 12 samples, revisit at 50” is a feature. Hidden uncertainty that looks like certainty is a liability.

Concept: Duplication lets two copies of a standard drift. One source of truth, with honestly-marked uncertainty, is more reliable than two clean-looking copies.

The Subtraction Pass

These six moves formalize into a pre-build reasoning pass — run before writing any new code or adding any new layer:

Move	Question	What it removes
Ask what the host does	Does the environment already handle this?	Redundant enforcement
Look for the orthogonal axis	Are these variables actually dependent?	Matrix duplication
Classify code vs. inference	Is this deterministic or judgment?	Misallocated inference spend
Make each layer earn its place	What breaks if I remove this?	Speculative infrastructure
Test each piece immediately	Does this produce the right verdict on known cases?	Silent wrong verdicts
Keep one source of truth	How many copies of this standard exist?	Drift between copies

The pass takes five minutes before a build session and removes an hour of cleanup after. Most of what gets caught is work that felt necessary at proposal time and turned out to be duplication, premature structure, or a test that was never going to run.

Key Takeaways

Every decision that improved the system was a decision to remove work, not add it
The host already enforces more than you think — declare governance, don’t implement it twice
Composition beats matrices when variables are orthogonal
Inference is a recurring cost; code is a one-time cost; classify before building
Add structure when friction demands it, not in anticipation of friction that might not arrive
One source of truth, honestly marked, beats two clean-looking copies that will drift

How to Start

Run the subtraction pass on whatever you are about to build:

List what the host or framework already handles. Cross those off your build list.
Identify any grid or matrix in your design. Ask whether the axes are actually dependent.
Classify each planned check: can code answer this, or does it genuinely require judgment?
For each proposed layer: write one sentence on what breaks if you remove it. If you cannot, remove it.
After writing each check, test it against one thing you know should pass and one thing you know should fail before moving on.
Count how many places your standard lives. If more than one, consolidate. Mark any threshold that was set by feel.

The six questions together take less time than the first wrong decision costs to undo.