Coaching for the age of AI agents

AI will either sharpen your people or dull them.
One skill decides — and it can be trained.

It's called calibrated trust: knowing precisely what an AI agent handles reliably — and when it's time to switch your own brain back on.

Test yourself in 5 minutes

Both the diagnostic and the program are built on published research — Wharton School, NEJM AI.

What the research shows

Thinking: fast, slow — and artificial

Psychology long worked with two systems of thought: fast intuition and slow reasoning. Researchers at Wharton School now describe a third — AI, the system we increasingly hand our thinking to. There's a catch: unlike intuition and reasoning, it isn't in your head. And we are remarkably bad at telling when it's wrong.

Across three preregistered experiments (1,372 participants, 9,593 trials), people solved reasoning problems. Participants were randomly split: some worked without an assistant, others could consult one at any time — except roughly half of its advice was deliberately faulty. Here's how it went:

Share of correct answers (Study 1)

+25 pts−14 pts

With faulty advice, people did worse than with no AI at all. Values include every trial in the category — including those where people didn't consult the assistant.

Cognitive surrender

Researchers have a name for it: cognitive surrender. You adopt the AI's answer without checking it — even though you have everything you need to catch the error. When participants consulted the assistant, they went along with its faulty advice 80% of the time (Study 1). And the errors weren't nonsense: the AI delivered them confidently, with a fluent rationale — aimed precisely at the answer that already feels right.

Shaw, S. D., & Nave, G. (2026). Thinking—Fast, Slow, and Artificial: How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender. Working paper, The Wharton School, University of Pennsylvania (preprint, SSRN No. 6097646; not yet peer-reviewed).

The catch

Why a prompting course isn't enough

The more participants leaned on the AI, the more their results mirrored its performance — errors included. Their confidence didn't suffer — quite the opposite: access to AI lifted it by nearly 12 percentage points, and it showed no meaningful decline even as faulty advice piled up. Tool knowledge alone won't save you either: in a randomized trial, physicians with twenty hours of AI-literacy training still scored 14 points worse on diagnostic reasoning when the AI was wrong.

Prompting courses teach what the tool can do. This is about metacognition: knowing when to trust your judgment and when to verify the output. You won't get that from a slide deck — it's built through practice with feedback.

Shaw & Nave 2026 · Qazi et al., NEJM AI 2026

The program

Calibration

Six weeks. One coach. Your real work.

One-on-one coaching and hands-on exercises built around each participant's actual workload — their agents, their code, their documents. No canned scenarios. We measure calibration at the start and again at the end; the difference is the result.

The themes that naturally surface when people work with AI — professional confidence, resilience to the pace of change — belong in the program too. We work on them inside concrete situations, though, not as standalone lectures.

Weeks 1–2

Diagnostics

We start by measuring, not guessing. Each participant takes a diagnostic built on the design of the Wharton experiment: how reliably they catch faulty AI output, and how far their confidence diverges from their actual accuracy. That gives us the baseline — the calibration gap the rest of the program works to close.

What you take away: a personal calibration profile — where you trust AI more than it deserves.

Weeks 3–4

Mapping the boundary

Coach and participant go through the participant’s own workload task by task and sort it into three zones: what to hand to AI freely, what to accept only with verification — and what to keep to yourself. Not by gut feel, but by where AI demonstrably fails in their specific work.

What you take away: a map of your own work — what to delegate, what to verify, what to keep.

Week 5

Verification protocols

Knowing where AI fails isn't enough — at working pace, habits decide. Participants set up their own verification routines with their coach: signals that mean slow down, questions to ask of an output, and the moments to switch from fast adoption to slow thinking.

What you take away: verification habits built right into your working day.

Week 6

Retest

The same diagnostic as at the start. We compare detection of faulty output, confidence calibration, and decisions about what to delegate. Both the participant and the sponsor get the results.

What you take away: a measurable shift, in black and white — before and after.

The outcome isn't an impression. It's a delta: the shift in error detection and confidence calibration, measured before and after.

Why Headstart

Why us

A methodology built on published research

The diagnostic draws on the design of the Wharton experiment, and the verification protocols on what demonstrably improved error detection in the studies: immediate feedback, real stakes, and forced slow-downs at key moments. Where we don't have research, we don't make claims.

Training on real work, not canned scenarios

Calibration forms where decisions happen: on your own agents, your own code, your own documents. A skill built on your actual workload doesn't need to transfer anywhere — you use it the very next day.

EMCC-accredited coaches, QED methodology

Changing habits is a coaching discipline, not a lecturing one. Our coaches hold EMCC accreditation and work in the methodology of QED Group, a Czech coaching school practicing since 1996.

What meta-analyses say about coaching effectiveness: overall effect g = 0.66; performance and skills g = 0.60; goal-directed self-regulation g = 0.74 (Theeboom et al., 2014) · attitudes and well-being δ = 0.51 (Jones et al., 2016).

Try it yourself

Measure your calibration. In 5 minutes.

In the Wharton School study, people who asked the AI went along with its faulty advice 80% of the time — while feeling more confident than the group without AI. Eight problems, optional AI advice on each. Some of the advice is deliberately wrong. Can you tell which?

8 problems 5 minutes no email, everything stays in your browser

Start the test

Built on the protocol of Shaw & Nave (2026), Wharton School.

In HR or leadership? We'll send you the research brief — the research and our methodology in a few pages.

Next step

Start with a diagnostic conversation

We choose who we work with. Every engagement starts with a diagnostic conversation: we look at how your people actually work with AI, and we'll tell you straight whether Calibration makes sense for you. If it doesn't, we'll point you to a better path.

Measure your calibration first →

The team