· 2. Data Governance

The quality of the data sets used is paramount for the performance of the trained machine learning solutions. Even if the data is handled in a privacy preserving way, there are requirements that have to be fulfilled in order to have high quality AI. The datasets gathered inevitably contain biases, and one has to be able to prune these away before engaging in training. This may also be done in the training itself by requiring a symmetric behaviour over known issues in the training set. In addition, it must be ensured that the proper division of the data which is being set into training, as well as validation and testing of those sets, is carefully conducted in order to achieve a realistic picture of the performance of the AI system. It must particularly be ensured that anonymisation of the data is done in a way that enables the division of the data into sets to make sure that a certain data – for instance, images from same persons – do not end up into both the training and test sets, as this would disqualify the latter. The integrity of the data gathering has to be ensured. Feeding malicious data into the system may change the behaviour of the AI solutions. This is especially important for self learning systems. It is therefore advisable to always keep record of the data that is fed to the AI systems. When data is gathered from human behaviour, it may contain misjudgement, errors and mistakes. In large enough data sets these will be diluted since correct actions usually overrun the errors, yet a trace of thereof remains in the data. To trust the data gathering process, it must be ensured that such data will not be used against the individuals who provided the data. Instead, the findings of bias should be used to look forward and lead to better processes and instructions – improving our decisions making and strengthening our institutions.
Principle: Draft Ethics Guidelines for Trustworthy AI, Dec 18, 2018

Published by The European Commission’s High-Level Expert Group on Artificial Intelligence

Related Principles

III. Privacy and Data Governance

Privacy and data protection must be guaranteed at all stages of the AI system’s life cycle. Digital records of human behaviour may allow AI systems to infer not only individuals’ preferences, age and gender but also their sexual orientation, religious or political views. To allow individuals to trust the data processing, it must be ensured that they have full control over their own data, and that data concerning them will not be used to harm or discriminate against them. In addition to safeguarding privacy and personal data, requirements must be fulfilled to ensure high quality AI systems. The quality of the data sets used is paramount to the performance of AI systems. When data is gathered, it may reflect socially constructed biases, or contain inaccuracies, errors and mistakes. This needs to be addressed prior to training an AI system with any given data set. In addition, the integrity of the data must be ensured. Processes and data sets used must be tested and documented at each step such as planning, training, testing and deployment. This should also apply to AI systems that were not developed in house but acquired elsewhere. Finally, the access to data must be adequately governed and controlled.

Published by European Commission in Key requirements for trustworthy AI, Apr 8, 2019

IV. Transparency

The traceability of AI systems should be ensured; it is important to log and document both the decisions made by the systems, as well as the entire process (including a description of data gathering and labelling, and a description of the algorithm used) that yielded the decisions. Linked to this, explainability of the algorithmic decision making process, adapted to the persons involved, should be provided to the extent possible. Ongoing research to develop explainability mechanisms should be pursued. In addition, explanations of the degree to which an AI system influences and shapes the organisational decision making process, design choices of the system, as well as the rationale for deploying it, should be available (hence ensuring not just data and system transparency, but also business model transparency). Finally, it is important to adequately communicate the AI system’s capabilities and limitations to the different stakeholders involved in a manner appropriate to the use case at hand. Moreover, AI systems should be identifiable as such, ensuring that users know they are interacting with an AI system and which persons are responsible for it.

Published by European Commission in Key requirements for trustworthy AI, Apr 8, 2019

3. Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.

Many of the hopes and the fears presently associated with AI are out of step with reality. The public and policymakers alike have a responsibility to understand the capabilities and limitations of this technology as it becomes an increasing part of our daily lives. This will require an awareness of when and where this technology is being deployed. Access to large quantities of data is one of the factors fuelling the current AI boom. The ways in which data is gathered and accessed need to be reconsidered, so that innovative companies, big and small, have fair and reasonable access to data, while citizens and consumers can also protect their privacy and personal agency in this changing world. Large companies which have control over vast quantities of data must be prevented from becoming overly powerful within this landscape. We call on the Government, with the Competition and Markets Authority, to review proactively the use and potential monopolisation of data by big technology companies operating in the UK.

Published by House of Lords of United Kingdom, Select Committee on Artificial Intelligence in AI Code, Apr 16, 2018

9. Principle of transparency

AI service providers and business users should pay attention to the verifiability of inputs outputs of AI systems or AI services and the explainability of their judgments. Note: This principle is not intended to ask for the disclosure of algorithm, source code, or learning data. In interpreting this principle, privacy of individuals and trade secrets of enterprises are also taken into account. [Main points to discuss] A) Recording and preserving the inputs outputs of AI In order to ensure the verifiability of the input and output of AI, AI service providers and business users may be expected to record and preserve the inputs and outputs. In light of the characteristics of the technologies to be used and their usage, in what cases and to what extent are the inputs and outputs expected to be recorded and preserved? For example, in the case of using AI in fields where AI systems might harm the life, body, or property, such as the field of autonomous driving, the inputs and outputs of AI may be expected to be recorded and preserved to the extent whch is necessary for investigating the causes of accidents and preventing the recurrence of such accidents. B) Ensuring explainability AI service providers and business users may be expected to ensure explainability on the judgments of AI. In light of the characteristics of the technologies to be used and their usage, in what cases and to what extent is explainability expected to be ensured? Especially in the case of using AI in fields where the judgments of AI might have significant influences on individual rights and interests, such as the fields of medical care, personnel evaluation and recruitment and financing, explainability on the judgments of AI may be expected to be ensured. (For example, we have to pay attention to the current situation where deep learning has high prediction accuracy, but it is difficult to explain its judgment.)

Published by Ministry of Internal Affairs and Communications (MIC), the Government of Japan in Draft AI Utilization Principles, Jul 17, 2018

2. Transparent and explainable AI

We will be explicit about the kind of personal and or non personal data the AI systems uses as well as about the purpose the data is used for. When people directly interact with an AI system, we will be transparent to the users that this is the case. When AI systems take, or support, decisions we take the technical and organizational measures required to guarantee a level of understanding adequate to the application area. In any case, if the decisions significantly affect people's lives, we will ensure we understand the logic behind the conclusions. This will also apply when we use third party technology.

Published by Telefónica in AI Principles of Telefónica, Oct 30, 2018