Safe and Ethical AI (SEA) Platform Network · Linking Artificial Intelligence Principles (LAIP)

The principle "AI R&D Principles" has mentioned the topic "cyberattack" in the following places:

3. Principle of controllability

Examples of what to see in the risk assessment are risks of reward hacking in which AI systems formally achieve the goals assigned but substantially do not meet the developer's intents, and risks that AI systems work in ways that the developers have not intended due to the changes of their outputs and programs in the process of the utilization with their learning, etc.

3. Principle of controllability

For reward hacking, see, e.g., Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman & Dan Mané, Concrete Problems in AI Safety, arXiv: 1606.06565 [cs.AI] (2016).

3. Principle of controllability

3. Principle of controllability

Contact