Five Data Readiness Questions Before Starting Any AI Project


I’ve watched dozens of AI projects fail over the past two years. The pattern is depressingly consistent: organisations jump into AI initiatives without understanding whether their data can support what they’re trying to do.

The AI technology is rarely the problem. The data is almost always the problem.

Before starting any AI project, you need honest answers to these five questions. They’re not exciting questions, but they’re the ones that determine success or failure.

1. Where Is Your Data, Actually?

This sounds basic, but most organisations can’t answer it accurately. When I ask “where’s your customer data?” I get answers like:

  • “In Salesforce”
  • “And also some in our data warehouse”
  • “There’s a legacy system that has historical records”
  • “Marketing has their own database”
  • “Actually, some teams use spreadsheets…”

That fragmentation is normal. But it creates real problems for AI. If you want AI to understand your customers, it needs to access customer data across all those systems. That requires integration work that often exceeds the AI implementation effort itself.

What to do: Before your AI project, create a complete map of where the relevant data lives. Include unofficial sources like spreadsheets and department-specific databases. The map will be messier than you expect. That’s valuable information.

2. How Good Is Your Data Quality?

“Good enough for reports” is not good enough for AI.

AI systems amplify data quality issues. A human analyst looking at data might notice and correct anomalies. An AI system will treat garbage data as valid input and produce garbage output with confidence.

Specific quality issues that kill AI projects:

  • Duplicate records: The same entity recorded multiple times with slight variations
  • Missing values: Critical fields that are empty or defaulted
  • Inconsistent formats: Dates in different formats, names in different conventions
  • Stale data: Records that haven’t been updated to reflect current reality
  • Incorrect data: Information that was entered wrong and never corrected

What to do: Run quality assessments on the specific data your AI project will use. Measure completeness, consistency, and accuracy. Don’t rely on general impressions – quantify the quality issues.

3. Can You Access the Data Programmatically?

Data that exists isn’t the same as data you can use.

Many organisations have data locked in systems without usable APIs. Or data that requires manual exports. Or data protected by access controls that can’t accommodate AI systems.

Questions to answer:

  • Does the source system have APIs that support bulk data access?
  • Are there rate limits that will constrain AI workloads?
  • Can you set up automated data pipelines, or is manual extraction required?
  • What authentication and authorisation is needed?
  • Are there licensing restrictions on how data can be used?

What to do: For each data source your AI project needs, verify programmatic access is feasible. Test the APIs, understand the limitations, and budget for integration work.

4. Is Your Data Appropriately Labelled?

Many AI approaches require labelled data – examples where the desired output is known. For supervised learning, this is essential. For fine-tuning language models, it’s valuable.

Labelling is often the hidden cost that blows up AI budgets. Getting humans to accurately label thousands of examples is expensive and time-consuming.

Questions to consider:

  • Does your AI approach require labelled training data?
  • Do labels already exist in your data (implicit in transactions, explicit in systems)?
  • If labelling is needed, who will do it and at what cost?
  • How will you ensure label quality and consistency?
  • How much labelled data is needed for your approach?

What to do: Understand the labelling requirements early. If extensive manual labelling is needed, factor that into timelines and budgets. Consider approaches that require less labelling (transfer learning, few-shot learning) if labelling costs are prohibitive.

5. Do You Have Ongoing Data Processes?

AI isn’t a one-time project. Models need continuous data to remain accurate. Data quality needs ongoing maintenance. New data sources need integration.

Many organisations treat AI projects like software deployments – build it once and maintain it lightly. This leads to model decay, where AI performance degrades over time as the underlying data patterns shift.

Questions to answer:

  • Who maintains data quality after initial cleanup?
  • How will new data flow into AI systems?
  • What happens when source systems change?
  • Who monitors AI data inputs for drift and anomalies?
  • What’s the budget for ongoing data operations?

What to do: Budget for data operations as part of the AI operating cost. Plan for ongoing quality monitoring, integration maintenance, and periodic refreshes. This isn’t optional – it’s what keeps AI systems working.

The Data Readiness Spectrum

Based on these five questions, you can assess where you fall:

Not ready: Data is fragmented, quality is unknown, access is manual, no labels exist, and no ongoing processes are established. AI projects will fail.

Partially ready: Some data is consolidated, quality is understood but not addressed, programmatic access exists for core systems. AI projects can succeed with significant data work.

Ready: Data is consolidated, quality is managed, APIs are available, labels exist or can be generated, and data operations are established. AI projects can focus on AI rather than data.

Most enterprises fall in the “partially ready” category. That’s fine – it just means the AI project needs to include data work, with realistic timelines and budgets.

The Uncomfortable Truth

The data work isn’t glamorous. It doesn’t generate impressive demos or exciting announcements. But it’s where AI projects actually succeed or fail.

I’ve seen organisations spend six months selecting AI platforms and two weeks on data readiness. That ratio should be inverted.

Get your data house in order before you start decorating with AI. Otherwise, you’re building on a foundation that can’t support what you’re trying to construct.

The AI is the easy part. The data is where the work is.