What a Good AI Pilot Actually Looks Like

There is a statistic doing the rounds that should stop any executive mid-sentence. According to MIT, 95% of enterprise AI pilot projects fail to deliver measurable impact. Here is what the other 5% do differently.

The source is MIT’s NANDA initiative, whose GenAI Divide report analysed over 300 public deployments in 2025. Not 50%. Not even the 70 to 80% range people were quoting a year earlier. Ninety-five per cent. And “fail” here does not mean the model never ran. It means the pilot never produced a measurable change in cost, revenue or productivity.

Estimated reading time: 9 minutes

That is a remarkable number when you consider how much money, attention and executive energy is being poured into these initiatives. MIT’s research is clear on the cause, and it is not the models. The models are fine. It is the way organisations are adopting them.

We have seen this pattern repeatedly while running AI pilot projects with clients across sectors. Someone senior gets excited, a vendor gets engaged, and a pilot gets scoped around a use case chosen because it makes for a good demo. Four months later, there is a slick presentation, a few impressed stakeholders and absolutely no plan for what happens next. The pilot becomes a story the company tells about being “AI-first.” It does not become a product, a process or a saving.

So what would a good AI pilot project actually look like? One that earns the right to scale?

The demo problem

The fundamental issue is that most AI pilot projects are designed to prove the technology works, which is a bit like designing a pilot to prove that electricity works. Of course it works. The question nobody is answering is whether it works here, for this team, on this problem, in a way that changes how people actually operate.

The MIT research highlights a stark gap between approaches. In the dataset, companies that bought and integrated specialised tools with strong partners were roughly twice as likely to succeed as those trying to build everything in-house. The instinct at most large companies is to build internally, to maintain control, to customise everything. But the data says that instinct is expensive and usually wrong.

There is a parallel here with the early days of mobile. When apps first took off, every business wanted a bespoke app built from scratch that did everything. The winners started with a specific problem and leveraged existing platforms to solve it quickly. AI is in exactly the same phase.

Two kinds of AI pilot project

After running AI pilot projects through our AI for Humans programme and watching plenty of others from the outside, the difference between pilots that go somewhere and pilots that don’t is usually visible from the start.

The pilot that stalls: Starts with the technology. Measures whether the model is accurate. Runs on clean data with a hand-picked team. Success means “it works.” Owned by IT or innovation.

The pilot that scales: Starts with an operational problem. Measures whether the process improved. Runs with real users in real conditions. Success means “it changed behaviour.” Owned by the people who do the work.

This is not a subtle difference. It is the difference between a science experiment and a business decision.

What a good AI pilot project actually answers

Forget technical accuracy for a moment. A pilot that deserves to exist should answer five questions, none of which concern the model.

Does anyone actually care about this problem? If the process you are trying to improve does not frustrate anyone or cost anyone time, the pilot is academic. Pick something where the pain is felt, where people will notice the difference. The best use cases are the ones where staff roll their eyes every day.
Can you measure the before? You need a baseline. How long does this take today? What does it cost? How many errors? Without numbers, you are relying on vibes, and vibes do not survive a budget review. This sounds obvious but most pilots skip it.
Will real people use it in real conditions? A pilot on clean sample data with your best engineer does not tell you anything useful. It tells you the technology works. You already knew that. What you need to know is whether Sarah in operations trusts it, uses it and finds it quicker than what she was doing before.
Has anyone agreed what happens if it works? This is the one that kills most pilots. The pilot finishes, the results are positive, and then nothing happens because nobody pre-agreed the next step. No budget, no owner, no timeline. A pilot without a decision at the end is just expensive curiosity.
Does it work with what you already have? If the pilot requires a new data platform, a new integration layer and three months of prep work, it is not a pilot. It is a programme. The best pilots use your existing tools, your existing data and your existing team. That is the whole point.

What it looks like when it works

When we worked with Workhouse, a creative agency, we did not walk in with a model and a slide deck. We walked in with questions. Where are your people spending time on things that do not require creative judgment? Where is the friction? What would free them up to do better work?

We mapped their workflows, identified two areas where AI could realistically help across creative and strategy, and ran a focused pilot on each. Two pilots, six weeks, real people, real data. One delivered clear improvements and moved to adoption. One did not. Both results were useful.

That last point matters. A pilot that tells you “no, not here” is just as valuable as one that tells you “yes, go.” Both are answers. The 95% that fail do not fail because they get a negative result. They fail because they never produce a clear result at all.

Weeks, not months

There is an assumption that AI pilots need to be long, complex, enterprise-grade projects. They do not. In fact, the longer a pilot runs, the more likely it is to drift. Scope creeps, stakeholders change, and the original question gets buried under layers of additional requirements.

We run ours in weeks, not months. Short enough that nobody loses interest and the results still feel urgent. Long enough to discover, build, test and evaluate properly.

Map the territory: Workflows, pain points, data landscape. Talk to the people who do the work, not just the people who commissioned it. Score potential use cases on impact and feasibility. Pick one or two. Set your baseline numbers.
Build and configure: Set up the tools using existing systems. Define what good looks like in terms someone outside the project can understand. Brief the pilot team. Make tracking lightweight so you can see what is happening live, not in a retrospective six weeks later.
Run it properly: Real users, real data, real conditions. Watch what happens. Listen to the feedback that people give unprompted, not just what they say in the debrief. Adjust if something is clearly not working. The goal is learning, not perfection.
Decide: Compare against baseline. Present findings with a recommendation that someone can actually act on: scale this, adapt this, or stop. Include what it would take to roll out. The pilot ends with a decision, not a deck.

The MIT research backs this up, though from a different angle. The highest ROI in the dataset came not from flashy customer-facing applications but from back-office automation: eliminating outsourced processes, cutting agency spend, streamlining operations. The unglamorous work that quietly drives margins. And exactly the kind of thing a focused, well-scoped pilot can address directly.

Warning signs your AI pilot project is in trouble

You can usually tell early whether a pilot is heading for trouble. If any of these feel familiar, it is worth pausing before spending more.

A vendor picked the use case. Nobody can say what success looks like in numbers. The pilot team has never done the job they are trying to improve. No budget exists for what comes after. IT is running it without operations in the room. The timeline keeps extending.

Two or three of those at once, and you are probably building a demo, not running a pilot.

Small is not the same as unambitious

There is a persistent idea that a meaningful AI pilot project needs to be big. A company-wide initiative. A multi-department transformation. This is backwards, and it is one of the reasons the failure rate is so high.

The best pilots are deliberately small. One process. One team. One question. Not because the ambition is limited, but because the learning is concentrated. A small pilot that gives you a clear answer is worth infinitely more than a sprawling one that gives you ambiguous data and a hundred slide decks.

A pilot that answers a clear question in weeks is worth more than a six-month initiative that answers none.

This is how we think about it at Human Kind. Our AI for Humans approach starts with low-risk, high-learning AI pilot projects that use your existing tools and teams. We prove impact before asking anyone to commit further. And if the pilot shows AI is not the right answer for a particular problem, we say so. That is not a failure. That is money saved.

The real question

If your organisation is considering AI, the first step is not to buy a platform. It is not hiring a data science team. It is identifying the right question to ask.

Start with the work. The processes that are slow, repetitive or fragile. The decisions that would be better with information that currently sits in a spreadsheet nobody opens. The tasks where your best people are being wasted on things that do not need them.

That is where AI actually delivers value. Not as a transformation programme, but as a focused intervention in a specific place where it makes things measurably better.

According to MIT, the 5% that succeed do exactly this: they pick one pain point, execute well, and partner smartly. The other 95% start with the technology and work backwards. It is a pattern we have seen before. It will sort itself out eventually, but there is no reason you need to be part of that statistic.

Running focused AI pilots is exactly the kind of work we do. If you want experienced support shaping and delivering yours, take a look at how we help. Our Digital Product and AI service.

What a Good AI Pilot Actually Looks Like

The demo problem

Two kinds of AI pilot project

What a good AI pilot project actually answers

What it looks like when it works

Weeks, not months

Warning signs your AI pilot project is in trouble

Small is not the same as unambitious

The real question

More from Latest Thinking

Building Digital Products That Last: What the Circular Economy Teaches Software Teams

ChatGPT Apps: The Next Interface Shift

The Trouble with Human in the Loop

Want to discuss this further?