Output Quality Gate

AI Mastermind | Knowledge Entrepreneurs Edition

“Stop fixing the input. The model is not deterministic. Even great prompts produce slop on some runs. What if instead of monitoring that, we turn it into a number from 0 to 1?” — Lou

This Week in 30 Seconds

Lou’s AIA Quality Gate — the week’s centerpiece: an external evaluation agent trained on 20–50 gold-standard examples, with an autoregressive improvement loop, that any skill can call before publishing. Integrates into the new harness.yaml library architecture.
Harness architecture — solving the token-tax problem: one central AIA-lib, one harness.yaml per project, only the capabilities you declare get loaded.
Gears software release — $2,500 perpetual license, source code + content hub tool (Astro + Cloudflare + Gears schema injection), updates through end of 2026.
Donald’s Codex computer use — GoHighLevel has no API for what he needed. Codex’s computer-use feature navigated the UI anyway. He now has a full inventory of his GHL instance; it’s running a second task as he’s talking.
Scott’s scraper upgrade — headless browser fallback added, reducing JavaScript failures from 20 to 4 out of 144 contacts; down from 17s to 9s per contact.
Joanna’s content strategy — moving from daily manual posting into the void to batching with AI, using client VOC + competitive research as inputs.
Kasimir’s avatar video pipeline — HeyGen API + script generation + auto-upload, working toward 20 serialized videos in a story arc.
Don Back’s negotiation rehearsal — loaded the Act, bylaws, meeting transcript, and everyone’s position; ran a full negotiating rehearsal with Claude; the real meeting “rolled out exactly as it was role-played.”
Closing reflection — “Attention is All You Need” as the E=MC² of AI; the gap between mathematics and intelligence is still the most miraculous thing in the room.

1 — The Output Quality Gate: Stop Fixing the Input

Lou opened with a full walkthrough of the quality gate system he’d been building. The starting reframe: we spend enormous effort optimizing prompts, model selection, and memory — all input-side improvements. But the model is non-deterministic. Even great prompts produce slop on some runs, and new model releases reshuffle the behavior you were relying on. The input can never fully solve an output problem.

The system Lou built starts from gold-standard examples: 20–50 pieces of the best content you’d want to produce — your own work, or anyone else’s you’d aspire to match. A command reads through these and derives a scoring rubric: not just grammar and structure, but substance — hook quality, angle, perspective, whether the piece earns its claim. The rubric encodes what makes these pieces worth publishing.

The gate runs downstream of any writing skill. It accepts content + a content-type label (LinkedIn post, newsletter, thought-leader article), scores the output against the matching rubric from 0–1, and returns what failed. It never edits. If the output scores below threshold, the writing skill reruns. The gate gets harder to fool over time — every use and every correction sharpens the discrimination.

Multiple rubrics for multiple content types. Each content type can have its own 20–50 examples and derived criteria. When the invoking skill passes the type, the evaluator uses the right rubric.

Lou tied this to the broader pipeline aspiration: focus on the conversations where ideas happen, automate everything else. The quality gate is the floor that makes automation safe to trust.

💡 What This Means for You

The rubric encodes your taste, not best-practice defaults. Collect 20 pieces of content you’d want to be known for. That collection is more instructive than any prompt about your voice.

2 — The Harness Architecture: One Library, Per-Project Declaration

Connected to the quality gate, Lou described the AIA-lib and harness.yaml pattern he’s been developing to solve the token-tax problem. The issue: as a skills library grows, naïvely embedding everything in a global CLAUDE.md costs 10–15K tokens on every inference — paid whether or not the project needs those capabilities.

The solution: one central library (AIA-lib, version-controlled on GitHub) holds all skills, commands, resources, and agents. Nothing loads automatically. Each project folder contains a harness.yaml that declares only the capabilities it needs. The global CLAUDE.md holds one thing: a pointer to where the library lives.

Lou’s mental model: a USB hub — one cable in, ten ports on top, plug in only what this task requires.

The eval loop agent itself lives in AIA-lib as an ambient folder with its own intelligence — an agent that any skill can inherit. The harness.yaml in a writing project says “I need brand-voice, I need the eval gate.” It doesn’t pay for the coding skills, the intake skills, or anything else.

💡 What This Means for You

The token cost stays flat regardless of how large your library grows. Every improvement to a shared skill in the library is immediately available to all projects that declare it.

Next session: 2026-06-11