Safe and Ethical AI (SEA) Platform Network · Linking Artificial Intelligence Principles (LAIP)

Data Quality Principle

The Data Quality Principle follows from the preceding obligation.

Principle: Universal Guidelines for AI, Oct, 2018

Published by Center for AI and Digital Policy

Related Principles

· Prepare Input Data:

1 Following the best practice of responsible data acquisition, handling, classification, and management must be a priority to ensure that results and outcomes align with the AI system’s set goals and objectives. Effective data quality soundness and procurement begin by ensuring the integrity of the data source and data accuracy in representing all observations to avoid the systematic disadvantaging of under represented or advantaging over represented groups. The quantity and quality of the data sets should be sufficient and accurate to serve the purpose of the system. The sample size of the data collected or procured has a significant impact on the accuracy and fairness of the outputs of a trained model. 2 Sensitive personal data attributes which are defined in the plan and design phase should not be included in the model data not to feed the existing bias on them. Also, the proxies of the sensitive features should be analyzed and not included in the input data. In some cases, this may not be possible due to the accuracy or objective of the AI system. In this case, the justification of the usage of the sensitive personal data attributes or their proxies should be provided. 3 Causality based feature selection should be ensured. Selected features should be verified with business owners and non technical teams. 4 Automated decision support technologies present major risks of bias and unwanted application at the deployment phase, so it is critical to set out mechanisms to prevent harmful and discriminatory results at this phase.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

Prepare Input Data:

1 Adequate steps and actions should be taken to measure the data sample’s quality, accuracy, suitability, and credibility when dealing with the data sets of an AI model. This is essential to ensure the accuracy of data interpretation by the AI system, the consistency of avoiding misleading measurements, as well as ensuring the relevance of the AI system’s outcomes to the purpose of the model. 2 It is crucial for the build and validate step to test how the system behaves under outlier events, extreme parameters, etc. In this step, stress test data should be prepared for extreme scenarios.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 An important aspect of the Accountability and Responsibility principle during Prepare Input Data step in the AI System Lifecycle is data quality as it affects the outcome of the AI model and decisions accordingly. It is, therefore, important to do necessary data quality checks, clean data and ensure the integrity of the data in order to get accurate results and capture intended behavior in supervised and unsupervised models. 2 Data sets should be approved and signed off before commencing with developing the AI model. Furthermore, the data should be cleansed from societal biases. In parallel with the fairness principle, the sensitive features should not be included in the model data. In the event that sensitive features need to be included, the rationale or trade off behind the decision for such inclusion should be clearly explained. The data preparation process and data quality checks should be documented and validated by responsible parties. 3 The documentation of the process is necessary for auditing and risk mitigation. Data must be properly acquired, classified, processed, and accessible to ease human intervention and control at later stages when needed.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

4. Privacy and security by design

AI systems are fuelled by data, and Telefónica is committed to respecting people’s right to privacy and their personal data. The data used in AI systems can be personal or anonymous aggregated. When processing personal data, according to Telefónica’s privacy policy, we will at all times comply with the principles of lawfulness, fairness and transparency, data minimisation, accuracy, storage limitation, integrity and confidentiality. When using anonymized and or aggregated data, we will use the principles set out in this document. In order to ensure compliance with our Privacy Policy we use a Privacy by Design methodology. When building AI systems, as with other systems, we follow Telefónica’s Security by Design approach. We apply, according to Telefónica’s privacy policy, in all of the processing cycle phases, the technical and organizational measures required to guarantee a level of security adequate to the risk to which the personal information may be exposed and, in any case, in accordance with the security measures established in the law in force in each of the countries and or regions in which we operate.

Published by Telefónica in AI Principles of Telefónica, Oct 30, 2018

7. Data Quality Obligation.

Institutions must establish data provenance, and assure quality and relevance for the data input into algorithms. [Explanatory Memorandum] The Data Quality Principle follows from the preceding obligation.

Published by The Public Voice coalition, established by Electronic Privacy Information Center (EPIC) in Universal Guidelines for Artificial Intelligence, Oct 23, 2018