Redefining Data Governance for the AI and Machine Learning Era

We're going to need a new operating model

I've had several conversations recently around data governance in the AI world. Several of the vendors in the Data Management space are incorporating AI and ML into their solutions, and applying new solutions to old problems will greatly help.

But we're also going to need solutions to the new and upcoming problems.

 

History doesn't repeat, but it rhymes, so my view is that we can learn from existing data governance practice about how we can adapt it to the future.

 

To research this article I started, as you might expect, by asking ChatGPT. As you would also expect, that text had some good ideas mixed in with a lot of chaff. However, it was a decent base to work from.

 

What we now have below highlights some topics that merit further analysis and a good start point for future thinking.

 

 

Artificial intelligence (AI) and machine learning models (MLMs) are here and this swift evolution, while fascinating, throws up unique challenges, pushing us to revisit our prevailing data governance frameworks. So how do we ensure that our data governance is fighting fit for the AI revolution?

 

They have reshaped how we manage data, but their adoption has tested the limits of traditional data governance models, which were principally devised for structured, human-friendly data.

 

Now we have to grapple with unstructured and semi-structured data, as well as the "black box" nature of many AI algorithms which raises concerns around transparency, interpretability, and fairness.

 

Data governance models need to evolve, ensuring they encompass AI's unique requirements and mitigate its inherent risks.

 

What will we mean by data quality?

 

A robust data governance model for the AI era needs to account for a wide array of data types and sources, including unstructured data, social media feeds, IoT data, real-time data streams, and the like.

We'll need to consider new data quality dimensions beyond our trusted friends of accuracy, consistency, completeness, and timeliness. These may need to look at the inputs as much, or even in preference to, the outputs.

Imagine concepts such as:

  • Repeatability - does the model produce similar outputs when repeatedly run over the same data?

  • Reversibility - can we understand how the outputs came from the inputs?

  • Availability - is the underlying data still available? e.g. a social media post which has since been edited, deleted, or withheld.

  • Bias - does our input data set contain suitable variance to produce reliable outputs?

 

Data lineage and model governance

 

AI and MLM black boxes are only going to get more complex, producing outputs that will be hard to reproduce through other means. Consider data lineage - instead of just tracking the flow of data and its transformation, from source to consumption, we'll need to promote transparency by tracking and documenting model iterations and their underlying assumptions, parameters, and limitations.

When I think back to my start in Data Governance, in insurance and particularly with Capital Models for Solvency II; there was a clear demarcation between the governance of data (my bit) and the governance of the model (the actuaries).

Now it feels like model governance is going to have to move to the data team as well. The relationship of the data team to those building and operating those models will be analogous to current data ownership models.

 

Ethics and Fairness

 

AI carries the potential to perpetuate and amplify existing biases. An effective data governance model should include guidelines to promote fairness, reduce bias, and ensure ethical use of AI. This could involve methods for bias detection and mitigation, and guidelines around the ethical use of data.

 

I think this is an entirely new (and fascinating) area for data governance to get involved in. I don't see what other business function could reasonably take this on, other than a specific ethics and fairness.

 

A personal insight - I spent several years in the "Independence and Ethics" team of a Big 4 firm. That considered the ethics of what work was done and for whom; and I can see how that logically extends to what work is done and how.

 

Skills and Literacy

 

Successful data governance in the AI epoch requires a blend of data science, privacy law, ethical guidelines, and industry-specific knowledge.

 

Few individuals will have this mix of skills, so organisations need to invest in upskilling or reskilling their workforce to create collaborative teams to handle the complexities of data governance in an AI world.

 

It will take time for school curriculums to adapt - universities will likely transition somewhat faster - but I think that businesses will need to handle a lot of this themselves.

 

Conclusion

 

As organisations transition to an AI-powered future, reshaping our data governance models will not be a luxury but a necessity. The journey will call for open dialogues, regulatory insight, technological prowess, and organisational change management.

Through a systematic approach, we can ensure our data governance models evolve in tandem with AI and machine learning, turning the challenges of this new epoch into opportunities for innovation and progress.

 

The AI revolution isn't on the horizon; it's already on our doorstep. The question is, how do we get ready to govern it effectively?

Before you go, I have a favour to ask of you. Can you think of two people who might find this article interesting or relevant? If you can, then please forward this email or share the link with them. It’s through your referrals that I can grow this newsletter and reach more people.

Have a wonderful week,
Charles

Don’t forget that there is a tagged, searchable website of these newsletters at datazed.beehiiv.com, where you can also subscribe to get them sent straight to your inbox.

Join the conversation

or to participate.