New paper ๐Ÿ‘‰ Elevating Data Quality A Paradigm Shift for Data Spaces and AI Needs

29/05/24

BDVA is proud to present its latest paper, “Elevating Data Quality A Paradigm Shift for Data Spaces and AI Needs”โ— In general, data quality refers to the characteristics of data in relation to specific standards and criteria, reflecting...

Share this news
Stay up to date!

Follow us on Twitter

Follow us on Linkedin

Subscribe to our monthly Newsletter!

BDVA is proud to present its latest paper, “Elevating Data Quality A Paradigm Shift for Data Spaces and AI Needs”โ—

Elevating Data Quality A Paradigm Shift for Data Spaces and AI Needs_May 2024

In general, data quality refers to the characteristics of data in relation to specific standards and criteria, reflecting how accurately it represents the reality it aims to capture. Data quality impacts most stages of the data lifecycle and is a concern for various stakeholders across different domains within the data value chain. The value derived from data is significantly influenced by its quality, making it a widely discussed topic in the literature from multiple perspectives. Different approaches to data quality exist, each with its own dimensions and metrics.

With the increasing importance of data sharing within the data value chain, quality is crucial for the interoperability, reliability and usability of shared information. The expansion of data sharing into data spacesโ€”collaborative environments where multiple participants access and share dataโ€”demands higher levels of quality and trust to ensure successful adoption and operation. In this context, data quality ensures that shared data is reliable, trustworthy and fit for collaborative purposes. Data spaces provide the necessary tools, mechanisms and governance frameworks to assess and maintain data quality.

In this complex ecosystem, data providers and users are often several steps apart or even unknown to each other, making the fitness of data for its intended purpose increasingly important. The rapid growth of data usage in AI applications, driven by advancements in AI technologies, underscores the significance of data quality for AI, as highlighted in the AI Act. Ensuring that data meets specific quality requirements for AI applications is critical as AI becomes more integrated into various aspects of society and industry.

Based on the above, the objectives of this document are:

  • Present data quality from its different perspectivesย and gather the information and material needed to pave the way for the subsequent discussions.
  • Explore the symbiotic relationship between data quality and data spaces, highlighting the importance to incorporate data quality in all aspects of data spaces (quality-by-design), but also presenting data spaces as a unique environment to ensure data quality when sharing data in a scalable way.
  • Drive the fit-for-purpose paradigm to focus on the data quality for AI, which introduces AI requirements that should be addressed by specific metrics and processes, also according to AI Act.
  • Provide some recommendations and paths to follow in order to fully achieve those goals and shift data quality to a new dimension.

This document is intended for all stakeholders involved in the data lifecycle, including data providers, users and those focused on value generation through use cases across different industry sectors and domains. It also addresses data space designers and builders, emphasising the need to incorporate data quality by design and enhance the role of data spaces in maintaining and improving data quality. Additionally, it is relevant for AI practitioners who require high-quality data for training, validating and testing models.