Let’s share some genAI learnings. What challenges or considerations have you encountered when integrating generative AI into your data management processes?
Sort by:
Generative AI based systems are built on, and optimized by, unstructured text data. In looking to integrate GenAI based systems using copilot functionality, there are significant barriers to be able to leverage data in more structured data sources as insights for GenAI copilots. Creating complex RAG patterns that use Graph to create the context needed by GenAI can help, but don't go nearly far enough to enable the full depth of insights that are buried in mountains of structured data.
In our exploration of generative AI for marketing and customer service, we faced two key challenges: data privacy and data quality.
First, data privacy is a critical concern. We need to be extremely cautious about what data is exposed to the AI models, particularly sensitive customer information. During training, we mitigate risks by using synthetic data, but when preparing for production, privacy becomes even more important. Extensive testing is required to ensure real data is handled securely and in compliance with regulations.
Second, data quality proved to be equally challenging. The effectiveness of AI-generated responses depends on the quality of the data it's trained on. Inconsistent or incomplete data led to inaccurate outputs. We had to invest heavily in data cleansing and enrichment to ensure that the AI produced relevant and valuable results for customers.
genAI has in a way exacerbated an issue that has been around for years: the lack of meaningful, trustworthy metadata, ontologies and quality standards to ensure findability and reusability of the data. We've adopted a Data mesh approach to address these and being able to publish FAIR data products internally. Some key focus points is to define a meaningful data contract for these products to address the security, quality, completeness, freshness, etc. for these products and the observability to report on how well the contract is being fulfilled. In addition, genAI needs data at scale so all of this needs to be self service / automated. It's quite a journey. But it's worth the effort because this benefits way beyond genAI