Your Data Isn't AI-Ready: A Practical Assessment Framework


“Our data is in good shape.” I hear this constantly from enterprise leaders planning AI initiatives. Then we look at the actual data, and the story changes.

Data quality issues sink more AI projects than algorithm problems. The model can be perfect, the use case can be compelling, but if your data has gaps, inconsistencies, or structural problems, nothing else matters.

The Data Quality Audit Nobody Wants to Do

Most organisations skip proper data assessment because they don’t want to know the answer. But discovering problems before you’ve committed significant budget and reputation to an AI project is far better than discovering them after.

Here’s a practical framework for assessing whether your data can actually support AI.

Dimension 1: Completeness

For each data field your AI will need:

  • What percentage of records have this field populated?
  • Are missing values random, or do they cluster in particular segments?
  • How has completeness changed over time?

A retail client discovered their product categorisation data was 95% complete—but that remaining 5% represented 30% of revenue because it was mostly new products. Their recommendation engine would have ignored their best-selling items.

Dimension 2: Accuracy

  • When was data last validated against ground truth?
  • What’s your error rate on key fields?
  • Who’s responsible for data accuracy, and how do they measure it?

Financial services firms typically have strong accuracy controls on transaction data but weak controls on customer preference data. The AI use case determines which matters more.

Dimension 3: Consistency

  • Do the same concepts use the same codes across systems?
  • Are there duplicate records with conflicting information?
  • Do naming conventions match between data sources?

One manufacturing company found they had 47 different ways of recording “Melbourne” as a location across their systems. Their logistics AI couldn’t function until this was resolved.

Dimension 4: Timeliness

  • How quickly does data become available after an event?
  • What’s the latency between systems?
  • Is historical data archived in an accessible format?

Real-time AI applications fail when they depend on data that arrives hours late.

Dimension 5: Relevance

  • Does your data actually capture what you need to predict?
  • Are there proxy variables that correlate with your target?
  • What data would you need that you don’t currently collect?

This is often the hardest assessment. A hospital wanted to predict patient no-shows but discovered their data captured appointment outcomes without capturing the factors that actually drive attendance.

The Assessment Process

Week 1: Inventory

List every data source your AI project will touch. For each source, document:

  • Owner and custodian
  • Update frequency
  • Access method
  • Known quality issues

Week 2: Profiling

Run automated profiling on each data source:

  • Field-level statistics
  • Null rates
  • Value distributions
  • Anomaly detection

Tools like Great Expectations or commercial alternatives can automate much of this.

Week 3: Cross-Source Analysis

Examine how data flows between systems:

  • Do keys match across sources?
  • Are there timing issues between updates?
  • What happens to data that fails validation?

Week 4: Remediation Planning

For each issue identified, assess:

  • Severity for your AI use case
  • Effort to fix
  • Ongoing maintenance requirements

Some issues need fixing before you proceed. Others can be worked around. A few might be so fundamental they change your project scope entirely.

When to Pause the AI Project

If your assessment reveals any of these, stop and address them first:

  • Less than 80% completeness on critical fields: Your model won’t have enough signal.
  • Conflicting ground truth sources: You can’t train a model when you don’t know what “correct” looks like.
  • Data arriving too late for decisions: Real-time AI needs real-time data.
  • No historical data for training: You need enough examples of the thing you’re predicting.

These aren’t minor hurdles—they’re project blockers.

The Business Case for Data Quality Investment

Data quality work isn’t exciting, and it’s hard to get funded. But consider: every dollar spent on AI applied to poor data is largely wasted.

Some Australian enterprises are now treating data quality as infrastructure investment, separate from any specific AI project. When teams like AI consultants Melbourne begin engagements, they often recommend this approach—building a data foundation that supports multiple future initiatives rather than fixing problems project by project.

Starting the Assessment

You don’t need expensive tools or external consultants to begin. Start with three questions:

  1. What data will our AI actually use?
  2. Where does that data live today?
  3. When was it last validated?

The answers usually reveal enough to know whether you’re ready to proceed or need to do foundation work first.

Better to learn this now than after you’ve promised stakeholders results.