2024-08-06 –, Firenze
According to the World Economics Forum annual report “Approximately half of executives say that advances in adversarial capabilities (phishing, malware, deep fakes) present the most concerning impact of generative AI on cyber”. It is already a fact that the world is already entering, if not inside, the AI bubble and facing this reality as soon as possible will help companies be better prepared for the future. However, with the velocity required to implement AI and surf into this new technology the risks involved may be put behind to give place to velocity. Based on this scenario this talk is designed to explore the adversarial attacks applied to ML systems and present the results of research made observing cybersecurity communities focused on sharing AI Jailbreaks and how those behave when applied to the most used AIs in the market.
Growing up listening to the power of AI and having in mind that there is no fully secure system in the world I was always triggered by the question: “But, what if someone breaks it?”. During college, when I had an opportunity to do a winter camp in San Francisco to visit some companies, including Tesla, this question gained a new view during a talk with one of the major AI engineers of the company. And if someone hacked the Tesla AI system?
Those questions are a constant in my mind. It was always clear to me that even being fascinated with the power of AI technology, the impact of someone hacking it could be immensurable to society. How could we trust a self-driving car without ensuring its AI algorithm safety?
More recently, with the popularization of AI and a better understanding of how it works, those ideas turned into personal research about how it would be possible to use the hacker mentality to confuse the system that everyone knows is almost superior to human intelligence. Studying counter-adversary ML attacks and looking for Jailbreaks I could find forums focused on that and better understand easy ways to trick some AIs.
This talk was designed to explore those techniques and how those forums work to share the most recent Jailbreakers discovered on the most used AIs in the Market, consequently making products vulnerable to explorations.
During the talk, I plan to explain the basis of counter-adversary ML attacks, go over how the JailBreaks works, and demonstrate in videos some attack examples.
Tools:
– Open AI and Google Bard AIs for making the video demonstrations
– Blog/Articles
– Book: Not with a Bug, But with a Sticker: Attacks on Machine Learning Systems and What To Do About Them - Ram Shankar Siva Kumar, Hyrum Anderson
Outline:
I intend to cover the following in the talk:
Intro – 5 Minutes
– Who we are
– How we got here
What is AI – 5 Minutes
– Basic explanation of AI
– The difference between AI and ML
Counter-Adversary ML attacks – 5 Minutes
– Explaining the type of attacks
– Presenting famous examples
– Demonstration of a counter-attack
Jailbreaks – 5 Minutes
– The basis of how Jailbreaks works
– The research performed on the cybersecurity forums
– Results of the discovered Jailbreaks on most used AI tools and in products developed using AI
– Video demonstrations
Review/Close/Thank You – 5 Minutes
– Conclusions and how to prevent attacks
– Where people can find more information
– Thanks/Kudos to previous researchers
– Questions?
Special Requirements:
– Internet for live demos
– Ability to project slides
According to the World Economics Forum annual report “Approximately half of executives say that advances in adversarial capabilities (phishing, malware, deep fakes) present the most concerning impact of generative AI on cyber”. It is already a fact that the world is already entering, if not inside, the AI bubble and facing this reality as soon as possible will help companies be better prepared for the future. However, with the velocity required to implement AI and surf into this new technology the risks involved may be put behind to give place to velocity. Based on this scenario this talk is designed to explore the adversarial attacks applied to ML systems and present the results of research made observing cybersecurity communities focused on sharing AI Jailbreaks and how those behave when applied to the most used AIs in the market.