Data Governance: The Boring Work That Makes AI Possible


I’m going to talk about something that makes most executives’ eyes glaze over: data governance. Stay with me. This is the single biggest predictor of AI success I’ve found in ten years of consulting.

The Pattern I Keep Seeing

Organisation wants to do AI. Organisation hires data scientists. Data scientists ask for data. Data is a mess. Project stalls. Everyone blames the technology.

This happens so often it’s become predictable. The companies that succeed with AI aren’t the ones with the best algorithms or the biggest budgets. They’re the ones who did the unglamorous work of getting their data house in order first.

What Data Governance Actually Means

Let’s demystify this. Data governance answers four questions:

  1. What data do we have? (Data catalogue)
  2. Where does it live? (Data lineage)
  3. Who’s responsible for it? (Data ownership)
  4. What are we allowed to do with it? (Data policies)

That’s it. Four questions. If you can’t answer them clearly, you can’t do AI at scale. You might get a pilot working, but you’ll never move beyond that.

The AI-Governance Connection

Here’s why this matters specifically for AI:

Training data quality determines model quality. The “garbage in, garbage out” principle is absolute. If your customer data has 40% missing postcodes, your model will learn to make predictions as if postcodes don’t matter. No algorithm can fix fundamentally bad data.

You need to know where data came from. When an AI makes a prediction, you may need to explain why. Regulators, auditors, and customers increasingly demand this. Without lineage, you can’t trace back from output to input.

Consent and usage rights matter. Training AI on customer data without understanding your consent obligations is a regulatory risk. GDPR, Australia’s Privacy Act, industry-specific rules – all require you to know what data you have and whether you’re permitted to use it for AI training.

Reproducibility requires control. If you need to retrain your model, you need access to the same data used originally. Without governance, data gets deleted, modified, or migrated without tracking. Suddenly you can’t reproduce your own work.

The Minimum Viable Governance for AI

You don’t need perfect governance to start. But you need something. Here’s the minimum:

1. Data Catalogue for AI-Relevant Data

You don’t need to catalogue everything. Start with the data assets that will power your AI use cases. For each one:

  • What’s in it?
  • How current is it?
  • What’s the quality like?
  • Who owns it?

A simple spreadsheet is better than nothing. Tools like Alation or Collibra are better than spreadsheets. The Tech Council of Australia has published resources on data governance best practices for Australian organisations. But start somewhere.

2. Basic Quality Metrics

For each key dataset, track:

  • Completeness (percentage of non-null values)
  • Accuracy (percentage of correct values, based on sampling)
  • Timeliness (how current is the data?)
  • Consistency (do the same things have the same identifiers?)

You don’t need sophisticated data quality tools initially. SQL queries and Excel can get you started.

3. Clear Ownership

Every dataset needs an owner. Not IT. A business person who understands the data and is accountable for its quality.

This is harder than it sounds. People resist data ownership because it means responsibility. Push through this. Without owners, nobody fixes problems.

4. Usage Policies

Document what you’re allowed to do with each dataset. Can it be used for AI training? Can it be shared with vendors? Can it be moved to the cloud? What consent conditions apply?

Your legal team should be involved here. This isn’t just a technical exercise.

The ROI of Governance

I know what you’re thinking: “This sounds expensive and boring.”

It is expensive. It is boring. It’s also dramatically cheaper than failing AI projects.

Some numbers from real projects:

  • Average cost of failed AI pilot: $200,000-500,000
  • Average cost of establishing basic governance for AI: $150,000-300,000
  • Organisations with formal data governance have 3x higher AI project success rate

The maths isn’t complicated. You can spend the money upfront on governance and succeed, or spend it on failed projects and do governance anyway later.

Common Governance Mistakes

Overengineering. You don’t need a perfect governance framework before starting AI. You need enough governance for your current projects. Build incrementally.

Making it an IT project. Data governance is a business discipline that IT enables. When IT owns it alone, business engagement disappears.

Ignoring incentives. People won’t maintain data quality unless their performance metrics reward it. Governance without incentives is just documentation.

Treating it as one-time. Governance is ongoing. Data changes, sources change, requirements change. Build sustainable processes, not one-off efforts.

Starting Tomorrow

If you’re starting from zero, here’s your first month:

Week 1: Identify your top 3 AI use cases and the data they require.

Week 2: For each dataset, find out: who owns it, where it lives, what’s the quality like.

Week 3: Document what you learned. Create a simple catalogue.

Week 4: Meet with data owners. Agree on quality standards and improvement plans.

It’s not glamorous. It won’t get you speaking slots at conferences. But it’s the foundation everything else builds on.

Final Thought

The sexiest AI project in the world will fail without good data. The most mundane AI project will succeed with clean, well-governed data.

If you’re an executive wondering why your AI initiatives keep stalling, the answer probably isn’t technology. It’s governance. Do the boring work first.