Bonus 1: How to Hire Judgment When Anyone Can Ship the Artifact

You are hiring a senior practitioner.

Their portfolio is impressive. Three case studies. Two thought-leadership pieces. A framework that looks well-developed.

References say they’re sharp. You’re 60% sure.

Eighteen months ago, you’d have been 90% sure with that data. Now you’re at 60% and you can’t quite say why.

Here’s why.

The artifacts in their portfolio could have been produced by judgment they actually built. Or by AI output they edited. From the outside, you cannot tell which.

Neither can their references. Most references don’t see the work being made. Only the work being delivered.

The hiring problem in this market is not finding people with strong portfolios. AI-augmented portfolios are everywhere. The problem is telling whose portfolio reflects real judgment.

Here is how to actually do that.

There Are Three Types of Senior Candidates Walking Into Your Process

There are candidates whose portfolio is downstream of earned judgment.

There are candidates whose portfolio is downstream of borrowed scaffolding.

And there are candidates who don’t know which they are.

All three look identical on paper.

All three sound thoughtful in the first interview.

Only the first one will perform when the high-stakes situation hits in month nine.

The question is whether your process can tell them apart before you sign.

OPERATOR FILE #24 (Hiring)

Expert hiring managers test for what’s underneath the portfolio.

Average hiring managers evaluate the portfolio.

Commodity hiring managers trust the references.

Here is the mechanism, then the practice.

When someone uses AI extensively in their work over a year or two, two things diverge.

Their output.

And their underlying capability.

The output stays high. The capability quietly atrophies in places — particularly in the judgment-formation pathway. The cognitive activity that builds discrimination from doing, failing, and recovering.

Call this the phantom expert.

They produce real outputs. They are not faking. The problem is that the outputs are downstream of borrowed scaffolding rather than earned discrimination.

The difference does not surface until a high-stakes situation requires the discrimination directly.

A hiring process that evaluates portfolios is evaluating outputs.

You will hire phantom experts at scale if your process doesn’t probe for what’s underneath.

Three protocols change that.

Protocol #1 — The Live Diagnostic

In the second-stage interview, present a real, specific situation from your business.

Not a case study you’ve sanitized. A genuine, ambiguous, contextual problem that has no obvious right answer.

Say:

“Here’s the situation. Walk me through how you’d diagnose what’s actually going on, before you’d recommend any solution.”

No laptop. No notes from them. No prep time beyond the room.

Watch for two things.

Signal #1 — Specificity.

Real judgment surfaces specific objections. Specific variables. Specific failure modes.

Borrowed fluency produces generic frameworks: “I’d want to understand the stakeholders, gather data, identify the root cause…”

A phantom expert can sound thoughtful for ten minutes without making a single specific call.

Signal #2 — Bounded uncertainty.

Real judgment knows where it ends.

The candidate should naturally say things like: “I’d want to know X before I’d commit to a view on Y.” Or: “I have a strong opinion on this but a much weaker one on that.”

Phantom experts have flat confidence across the entire territory. Equally certain about everything. Which means nothing is calibrated.

If you get specificity and bounded uncertainty, you have a real candidate.

If you get fluency and flat confidence, you have a phantom expert.

Hire the first.

Don’t hire the second.

Protocol #2 — The Portfolio Reverse-Engineer

Pick one piece from their portfolio. Tell them:

“Walk me through how you’d produce this from scratch today, without using AI. What would your week look like?”

Watch their face for the first three seconds.

A practitioner who actually produced the work this way will describe a process. They will mention specific decisions. Specific moments where they considered alternatives. Specific failures along the way that shaped the final form.

The process will have texture.

A practitioner who AI-mediated the work will produce a recipe. They will describe the steps in clean, generic terms — “I’d start with research, then draft an outline, then iterate” — without the texture of actual judgment-formation.

This is not about catching AI use.

AI use is fine.

This is about telling whether the candidate has the capability that would have produced this output if AI hadn’t existed.

If they can’t produce that capability under your gaze, they probably don’t have it.

Protocol #3 — The Failure Reference

When you call references, do not ask about successes.

Successes are confounded by AI.

Ask:

“Describe a time this person was specifically wrong about something, and what they did with the wrongness.”

References will pause at this question. Most are not prepared for it. The pause is informative.

If the reference can produce a specific instance — “They were wrong about the pricing call on the Reynolds account, and what they did was…” — the candidate has been wrong well. Their judgment formation is alive. They have a wrongness log running, even if they don’t call it that.

If the reference produces a generic answer — “Oh, they’re always learning, very humble” — the candidate either hasn’t been wrong meaningfully (which is its own warning sign for senior roles) or hasn’t built the metacognition that turns wrongness into judgment.

The first signal is hire.

The second is don’t.

What These Protocols Won’t Catch

They are designed to detect phantom expertise in existing domains where the candidate’s portfolio claims judgment.

They will not reliably detect a candidate who’s expanding into new domains and hasn’t yet built judgment there. For new-domain capability, the test is different — see the acquisition discipline in Article 3.

For senior roles, the existing-domain test is usually the load-bearing one.

You’re hiring for established judgment. Not potential.

The Cost of Getting This Wrong

The cost is high. And slow.

A phantom expert hired into a senior role will perform indistinguishably from a real expert for six to twelve months.

Their outputs will be acceptable. Their meetings will sound thoughtful.

The cost surfaces when a high-stakes situation requires real discrimination. The strategic call. The hard client conversation. The technical decision with second-order effects. The phantom expert produces a confident answer that is borrowed scaffolding from somewhere else.

By then, you’ve made decisions on their advice.

Some of those decisions will be irreversible.

The three protocols above add maybe ninety minutes to your hiring process.

They will tell you, with reasonably high reliability, whether the person you’re about to hire actually has what their portfolio implies.

The Operational Close

Before your next senior hire, write down the three signals that would tell you their expertise is real, not phantom.

If you can’t write them down, you don’t have them yet.

That is the work.

The cost of a bad senior hire in this market is far higher than it was three years ago.

The floor on portfolio quality has come up.

The floor on judgment hasn’t.

Hire the difference. Not the artifacts.