Ambient Intelligence Culmination

AI Mastermind | Knowledge Entrepreneurs Edition

“I’m gonna use the cheapest model available, but I’m gonna use the smartest model available to prompt that cheap model so that it performs as well as it can.” — Lou

This Week in 30 Seconds

The ambient-intelligence culmination — months of scattered pieces (context management, token use, model/effort optimization, skill sharing) snapped together this week into one architecture. Lou walked through it as a 25-minute narrated deck, then unpacked the load-bearing ideas live.
Fork vs Spawn — both isolate context and return only a result; the difference is whether the child inherits what the parent decided. Fork for continuity, spawn for an uncontaminated read.
Model altitude — stop asking “which model for this article?” Ask it per step: research, angle, draft, copy-edit, and fact-check each need different intelligence. Record a rationale so routing is auditable.
Prompt the cheap model with the smart model — commit to Haiku, then have Opus write the prompt that makes Haiku perform like Opus. Pay once for the reasoning; reuse it. Reported 20–75% lifts, especially paired with DSPy.
Plugins beat manifests and symlinks — Lou’s homegrown manifest and symlink approaches don’t integrate with native / invocation. A marketplace plugin repo is the supported path — version-pinnable, and the clean way to share skills with clients.
Capability-Architect — a compiler, not a “make me an agent” prompt: intake → DAG → classify → bundle → execute → route → contract → generate → eval → install.
The deck itself was the demo — generated from three days of chat exports, scripted and slided by Claude, voiced by ElevenLabs. The forked writing team produced a comprehensive article while keeping ~350K tokens of intermediate work out of the main context.
Token economics — Joanna’s Uber-pricing question opened the subsidy conversation: Lou ran 43M tokens last month on a $20 plan — roughly $8,000 of API compute, still subsidized.
The closing turned heavy — Joanna raised the AI-displacement fear; Lou gave an unguarded read on billionaire concentration of power, uncompensated training data, and where small operators stand. The group landed on community and leverage as the answer.

1 — Gears + Hub: The À La Carte Offer

Lou re-opened the Gears/Hub source-code offer after last call’s interest didn’t convert. Rather than the single $2,500 bundle, he broke it into parts so members could buy exactly what they want: both source-code packages (Gears + Hub) for $695 (“I want the source code out in the world”), a $1,000 commercial/derivative license, and six months of coaching at $2,500 — or the full bundle at $2,500 (two payments of $1,300 available). The page: coachlou.com/fix-that-roof.

The honest frame: the prices assume hundreds of buyers, not a handful, but Lou has no overhead and a literal $15,000 roof to cover. He’d like a cohort of 4–5 for the coaching track and would start next week if interest holds.

2 — The Ambient-Intelligence Culmination

This was the spine of the session. For weeks Lou has been working separate problems — context management, token cost, model/effort optimization, sharing skills across projects without duplication. This week they fused: “each one of them helped figure the other thing out.”

The originating pain: many projects, all needing the same skills. He didn’t want skills duplicated into every project, didn’t want all 50–60 skills’ front matter loaded into global context (he measured ~45–50K tokens of MCP-tools-plus-skills overhead paid on every query, even “hello”), and didn’t want to hand-maintain local copies. The destination is an architecture where intelligence lives in the environment and a project inherits only what it declares.

He presented it as a narrated 25-minute deck — “from chat to system” — built for three audiences: members new to the language, builders who care about where context lives, and operators asking “where in my business am I still manually carrying context the environment could carry for me?“

3 — Fork vs Spawn: Should the Child Inherit What the Parent Knows?

The session’s sharpest distinction. Fork and spawn both run work in an isolated context and return only the result — neither pollutes the main conversation. The difference is the starting state.

A fork inherits the parent’s context — useful when the child should know what’s already been decided. A spawn starts cold — useful when inherited context would contaminate the work (an adversarial reviewer shouldn’t inherit the parent’s confidence in the current direction). The whole decision reduces to one question: would the child do better seeing what the parent decided? Yes → fork. No → spawn.

💡 What This Means for You

Before delegating any step, ask out loud: “Would this step do better knowing what I’ve already decided, or would that bias it?” Then tell the child to return only its conclusion or the artifact path — nothing else.

4 — Model Altitude: Route by Step, Not by Artifact

Lou’s correction to his own starting question. “Which model should I use?” is asked at the wrong altitude. The final artifact is one thing, but the process that builds it is many kinds of work — research, angle selection, drafting, copy-editing, fact-checking — each with different needs. Four questions decide each step: does it need inference at all? what’s the consequence if it’s wrong? does it need grounding? will a cheaper model retry so much it gets expensive?

The router emits a small record per step — component, step-class, model, effort, and a rationale. The rationale is what makes a bad output debuggable. The routing logic lives once in a shared model-effort-routing.md so every workflow reuses it. “This is modularity applied to judgment.” The standing rule: assign the least excessive inference that still clears the bar.

💡 What This Means for You

Don’t pick one model for the whole job. Score each step on consequence and grounding, then assign the cheapest model + effort that reliably clears the bar — and write down why.

5 — Capability-Architect: A Compiler for Capabilities

Once a folder can inherit and activate capabilities, how do you create them repeatably? Lou’s answer is a compiler, not a “make me an agent” prompt. Capability-Architect takes a workflow, problem, or existing pipeline and walks a fixed compile path: intake → DAG → classification → bundling → execution → routing → contract → generation → evaluation → install.

The output isn’t a prompt — it’s an inheritable skill bundle that drops into the ambient library. And because judgment is modular (routing lives in a shared reference, not inside the architect), a writing orchestrator, a course-design workflow, and a client-delivery agent all reuse the same routing process.

6 — Plugins: How Skills Actually Get Shared

Lou narrated the dead ends so members don’t repeat them. A manifest works but doesn’t surface under native / invocation. Symlinks fail because Claude doesn’t reliably follow them. What works is plugins from a marketplace: a version-controlled Git repo with a marketplace manifest. A project declares which plugins to install; a tightly-scoped plugin loads only front matter, keeping context cheap.

Two bonuses: version-pinning and clean client distribution (hand them a marketplace link). Open thread: making plugin skills appear in the / menu like native ones. Targeted for Monday.

Gotcha worth its own note: the skill-creator tool tends to fill the 1024-character description limit — 20 skills at ~1,000 chars is ~20K of wasted context. When generating a skill, tell it to keep the description tight.

💡 What This Means for You

Stop copying skills between projects and stop reaching for symlinks. One version-controlled repo, a marketplace manifest, plugins declared per project. And cap your skill descriptions — the front matter is a context tax you pay on every query.

7 — The Deck Was the Demo: Forked Writing Team in Action

The 25-minute presentation was itself an artifact of the architecture. Lou fed Claude three days of chat exports, said “pull out the main ideas, fill in how they work and what problem they solve, make it a cohesive presentation,” and got a second-draft script. ElevenLabs voiced it; Claude built the script and slides.

The forked writing team that produced a sample article showed the payoff concretely: an orchestrator with scan / architect / draft / review / polish stages, each forked, each returning only a summary plus an artifact path. Drafts moved by file path — never pasted into the parent — so 68K, 63K, 61K, 56K of intermediate work stayed out of the main conversation. Only the final draft entered context.

8 — Prompt the Cheap Model With the Smart Model

An idea Lou read that morning and unpacked three times because the inversion is easy to miss. Instead of using a smart model for hard work and a cheap one for easy work, you commit to the cheap model (say Haiku at high effort) and hire the smart model for one job: write the prompt that lets the cheap model perform like the expensive one. Opus knows Haiku’s capabilities and limits intimately, so it bakes the reasoning, strategy, and “think here” cues into the instructions. Pay once for the intelligence; reuse it on every cheap inference after. Reported 20–75% gains.

💡 What This Means for You

Pick a task you run on a premium model. Tell that model: “I’ll run this on Haiku — knowing its limits, write a prompt that gets it to perform as well as you would.” Test it against your old output. Where it holds, you’ve cut the per-run cost by an order of magnitude.

9 — Token Economics: Who’s Actually Paying

Joanna asked whether AI pricing follows the Uber playbook — lowball to capture the market, then raise prices once you have it. Lou: largely yes — give it away free to capture eyeballs, convert the rabid fans to $20, then build out to API and enterprise as needs grow. He ran ~43M tokens last month on a $20 plan; at API rates that’s roughly $8,000 of compute.

Lou’s own optimization: $20 on Claude + $20 on Codex, alternating between them for different parts of the process. He’s also figured out how to stack a second Claude subscription to get $40 then $60 of capacity. He’s been reaching for Haiku a lot lately for day-to-day tasks — “remarkable how efficient it is.”

Community Corner

Joanna J is publicly committing to systemizing her business after naming manual work as her five-year stagnation point.
Don Back is cloning his voice for explainer videos and shared a fully-worked Final Cut production SOP plus the chat-export extension fix.
Scott Delinger keeps surfacing the “simple, specific, non-programmatic” wins — this week, finding one slide across three dozen decks.
Donald Kihenja brought Pictory.ai and the session’s closing wisdom on double-edged tools.
Dirk Ohlmeier issued the unofficial group challenge from chat: “Produce a complete product within 1 million tokens.”
Kasimir connected the cheap-model technique to Nate Herk’s dynamic-downgrade approach.

Links Shared in Chat

Gears/Hub à la carte offer: https://coachlou.com/fix-that-roof
Pictory.ai (Donald — audio-to-video slide/b-roll sync): https://pictory.ai

Try This Before Next Session

Take one task you currently run on Opus (or your premium model) and try to move it down a tier. Hand the premium model this instead:

“I’m going to run this task on Haiku. Knowing that model’s specific capabilities and limits, write a prompt that gets it to perform this task as well as you would — include the strategy, when to think step-by-step, and anything it’s likely to get wrong without being told.”

Run Haiku with that prompt. Compare to your old output. Where it holds up, you’ve just made that task ~10× cheaper.

Next session: 2026-06-18