What steps are you taking to ensure that your data is "AI-ready" and can be effectively utilized by generative AI algorithms?
Sort by:
I'm looking into creating feature stores (data in parquet format) and leveraging TensorFlow Extended (TFX) pipelines.
We've been focusing on a data sharing strategy for some time and it turns out that this strategy does deliver AI-Ready data and for us it more of a question of focus / prioritization on which data is in scope of the strategy now.
The data strategy is twofold :
- A federated operating model for the governance (value and lifecycle) and compliance (security and regulatory) of the data to ensure ownership and data contracts are defined for shareable data
- A FAIR-ification strategy to provide trustable and fit for purpose quality data via a proper metadata strategy including a single source of truth catalog, a centrally governed base metadata model, built in observability to provide data fitness metrics which provide a trust meter. with that is a focus on building ontologies, terminologies, reference and master data to ensure consistency/quality in how your data is described. And with that we have an automation strategy to embed policies computationally at the data platform level in order to build in governance and at the same decrease the cost of data, because AI needs a lot of it !
- A Compute at the data strategy. You can't get results quickly if you are bandwith limited in feeding data to your model so you need to provide your AI compute, at scale, on demand and at the data.
We put an Enterprise Data Strategy in place in 2019 which standardizes the governance and technology stack for all of our enterprise data. By making progress in ensuring that our most critical data is governed by the business, and enabled through a consistent technology stack including data quality checks, we can very quickly and easily derive new data sets that can be leveraged in GenAI solutions. Our bigger concern is how we create the new vector data sets for GenAI efficiently. GenAI gives us a lot of opportunity to generate significant waste.