4 new elements of Data Governance for Gen AI

As well as, not instead of....

Generative AI is not just a tool; it's the spark that ignites creativity, the engine that drives innovation, and the bridge that connects imagination to reality.

ChatGPT

In our Generative AI world, where adoption is rocketing and a legal framework is still taking shape, people are scrambling to identify, assess and manage the risks so that Gen AI can be used effectively and safely.

Generative AI and LLMs are built on data, and ensuring that the data is reliable and trustworthy is a major hurdle towards effective Gen AI use.

Fortunately, many of the concepts that we need to manage this data have already been developed through Data Governance.

Fundamental Aspects of Data Governance in AI

  1. Data Community: People remain the most important component of Data Governance. We need to know who is accountable and responsible for data, and who are the subject matter experts.

  2. Data Lineage: Where did it come from, and how did it get here? We might not be able to draw a diagram on a bit of paper any more, but it is ever more critical that we can track the flow of data from source to consumption.

  3. Data Definitions: What do we mean when we use a particular term. Often we might think that a definition is obvious, but rarely does a whole team define a term the same way.

  4. Data Issue Resolution: You’ve found an issue or error. Now what? You need a robust process to assess issues and respond to them accordingly. Additionally, you’ll need to consider any Gen AI outputs which derived from flawed data.

  5. Regulatory and Legal Standards: Data governance helps ensure that AI initiatives comply with legal and regulatory requirements, such as GDPR and BCBS 239.

These components lead us to Data Quality. High-quality, trusted data is essential for training accurate and reliable AI models.

However, we are working with larger volumes of data than ever before, and joining them together in ways which were previously inconceivable, so we need to grow our data governance toolkit for the specific demands of Gen AI.

Additional Aspects for Data Governance in AI

  1. Transparency and Accountability: Transparency in AI processes, making it clear how decisions are made and who is responsible for those decisions. Who developed the AI routine, who operated it, and what machine executed it?

  2. Model Cataloguing: Different LLMs exist for different purposes, and each one will have new versions created over time. We’ll need to understand what LLM was used to create a particular output.

  3. Ethical Considerations: Whether they come from laws, regulations or other sources of ethical guidelines, we’ll need to ensure model outputs align with principles of fairness and non-discrimination.

  4. Availability: Few business use cases require real time, 24/7 data. Stakeholder expectations of Gen AI puts additional pressure on not just the availability of data, but its timeliness.

Are you ready?

There is much to do, and few of us are anywhere near where we need to be. It’s a journey, and the most important thing is that we are continually improving.

dataZED is available to help you on that journey. I really want to see people succeed in data, and it would be my privilege to be able to help you do so.

Have a wonderful week,
Charles

Reply

or to participate.