How to Run an AI Project Retrospective That Actually Improves Things


You’ve deployed an AI project. Maybe it worked. Maybe it didn’t. Either way, you need to run a retrospective. And if you’re like most enterprise teams I’ve worked with, that retrospective will be a polite 60-minute meeting where everyone agrees things went “mostly well” and the real problems never surface.

That’s not a retrospective. That’s corporate theatre.

Here’s how to run one that actually produces useful insights.

Why AI Retrospectives Are Different

Standard project retrospectives follow well-known frameworks. What went well, what didn’t, what we’d change. But AI projects have quirks that make the standard approach insufficient:

  • Outcomes are probabilistic. A traditional IT project either works or doesn’t. An AI model might work 87% of the time, and you need to decide if that’s good enough.
  • Failure modes are subtle. The model might be technically accurate but practically useless because it doesn’t fit the workflow. Or it might work perfectly in testing and fail in production because the data distribution shifted.
  • Responsibility is distributed. AI projects typically involve data engineers, ML engineers, business analysts, change management teams, and end users. When something goes wrong, the instinct is to blame the adjacent team.

These differences mean you need a more structured approach.

The Five-Stage AI Retrospective

I’ve refined this over about twenty enterprise engagements. It’s not perfect, but it consistently surfaces insights that the standard approach misses.

Stage 1: Data and Assumptions Review (30 minutes)

Before you talk about what happened, get alignment on what you expected to happen. Pull up the original project brief and ask:

  • What assumptions did we make about data quality, availability, and format?
  • Which assumptions turned out to be wrong?
  • How early did we know they were wrong, and what did we do about it?

This stage almost always reveals that the team knew about data problems weeks or months before they became critical, but didn’t escalate because there was no clear mechanism for doing so.

Stage 2: Technical Performance Review (30 minutes)

Look at the numbers. Not just model accuracy, but the full picture:

  • Latency and reliability. Did the system perform within acceptable response times in production? Were there outages?
  • Edge cases. What inputs caused unexpected behaviour? How did the system handle them?
  • Drift. Has model performance changed since deployment? If you don’t have monitoring in place to answer this question, that’s its own finding.

Be specific. “The model performed well” is not a finding. “The model maintained 91% accuracy on standard queries but dropped to 67% on queries involving financial quarter transitions” is a finding.

Stage 3: User Adoption and Experience Review (30 minutes)

This is where most retrospectives fall apart. Teams spend all their time on technical performance and skip the question that matters most: did people actually use it, and did it help them?

Get real usage data. How many people are actively using the tool? How does that compare to the target? For the people who stopped using it, why? If you don’t have this data, your retrospective has already surfaced its most important finding.

Talk to end users directly. Not their managers. Not the project sponsor. The people who sit in front of the tool every day. Their feedback will be blunter and more useful than anything you’ll hear in a steering committee.

Stage 4: Process and Governance Review (20 minutes)

This stage examines how the project was run, not what it delivered:

  • Was the approval process too slow or too fast?
  • Did the right stakeholders have visibility at the right times?
  • Were there decision points where the project should have been paused or redirected?
  • Did the governance framework help or hinder the team?

Stage 5: Forward-Looking Recommendations (10 minutes)

Keep this short and concrete. No more than five recommendations, each with a specific owner and timeline. If you can’t fit it in five recommendations, you’re not being selective enough.

Common Mistakes to Avoid

Don’t let the loudest voice dominate. Use anonymous input collection before the meeting. Give everyone five minutes to write their observations on sticky notes (physical or digital) before any discussion.

Don’t conflate “the model didn’t work” with “the project failed.” Sometimes the most valuable outcome of an AI project is discovering that the problem doesn’t need AI. That’s a legitimate success.

Don’t skip the retrospective because the project succeeded. Successful projects are the ones where you learn the most about what to replicate. If you only do retrospectives on failures, you’re optimising for loss avoidance rather than learning.

Don’t wait too long. Run the retrospective within two weeks of deployment (or cancellation). After that, memories fade and people have moved on emotionally.

Making It Stick

The retrospective is only useful if its findings change behaviour. Assign each recommendation to a specific person. Schedule a follow-up in 30 days to check progress. Document the findings somewhere that future project teams will actually look, not buried in a SharePoint folder three levels deep.

The organisations that get consistently better at AI are the ones that learn from every project, successful or not. A good retrospective is the mechanism that makes that happen.

Sarah Chen is a Melbourne-based enterprise consultant specialising in AI strategy and digital transformation.