(in)Complete introduction to AI Safety
2023-08-17 , Aula

AI is poised to be "Our final invention," either the key to a never-ending utopia or a direct road to dystopia (or apocalypse). Even without the eschatological framing, it's still a revolutionary technology increasingly embedded in every aspect of our life, from smartphones to smart cities, from autonomous agents to autonomous weapons. In the face of acceleration, there can be no delay: if we want AI to shape a better tomorrow, we must discuss safety today.


AI is our generation's most important technological breakthrough; beyond all the discussion about hype and Doom lie serious safety, technical and ethical considerations. In the face of accelerating AI capabilities, if we want to create a better world, we must also accelerate our safety efforts, exploring ethics and biases, deep learning failures and their alignment implications, or AI and computing policies.

This talk aims at providing a brief introduction and overview of the principal axes of AI safety: Ethics, Alignment, and Policies). Hopefully, it will work as a gateway for further forays into these crucial and intertwined areas; references and conceptual maps will accompany the talk's slides.

The talk will be roughly structured like this (sections and subsections may vary, but a higher focus will be put on Alignment):

  • Risks: From misuse to Doom
  • Alignment:
  • Black Boxes: Interpretability or the lack thereof
  • Mesa-Optimizer and Reward Hacking
  • Of RLHF and Waluigis: on the brittleness of LLMs
  • Policies: Can we Regulate A(G)I?
  • Ethics: breaking the vicious cycle of unfair models, datasets, and societies.

Abstract as a tweet:

Making safe, ethical, aligned AIs is the only way to avoid dystopia (or Doom) and build (hopefully) everlasting utopia. In the face of accelerating capabilities, we need to discuss safety now.

Category [Machine and Deep Learning]:

Algorithmic bias and Trustworthy AI

Expected audience expertise: Domain:

none

Expected audience expertise: Python:

none

"Ever Tried. Ever Failed. Try Again. Fail Again. Fail Better. (Beckett)"

I work full-time as a Senior Machine Learning Scientist (handling many Data Engineering tasks as well) with a focus on ML for medical imaging at Align Tech.

Current tech hobbies: working with LLMs, worrying about AI risks (focusing on x-risks and Alignment, but it's not looking too good even for more "mundane" threats), and contributing to AI Safety.

I was active as a Python & Data Science/Machine Learning teacher and speaker for local and European meetups and conferences, but that ground to a halt due to the plague. I plan to resume in 2023, as I love traveling and teaching!

I can usually be found next to some source of caffeine, be it a chawan of Matcha or a cup of V60, bookstores & libraries, cooking classes, tabletop RPGs, and Python/ML/Data meetups.