2024-08-06 –, Siena
We previously demonstrated how large language models (LLMs) excel at creating phishing emails (https://www.youtube.com/watch?v=yppjP4_4n40). Now, we continue our research by demonstrating how LLMs can be used to create a self-improving phishing bot that automates all five phases of phishing emails (collecting targets, collecting information about the targets, creating emails, sending emails, and validating the results). We evaluate the tool using a factorial approach, targeting 200 randomly selected participants recruited for the study. First, we compare the success rates (measured by pressing a link in an email) of our AI-phishing tool and phishing emails created by human experts. Then, we show how to use our tool to counter AI-enabled phishing bots by creating personalized spam filters and a digital footprint cleaner that helps users optimize the information they share online. We hypothesize that the emails created by our fully automated AI-phishing tool will yield a similar click-through rate as those created using human experts, while reducing the cost by up to 99%. We further hypothesize that the digital footprint cleaner and personalized spam filters will result in tangible security improvements at a minimal cost.
Presentation Outline
1–introduction - Automated phishing attacks and intervention methods
LLMs
* Toby Ord, an Oxford professor of philosophy, estimates a ⅙ likelihood of AI leading to humanity's downfall within the next century. Many other leading AI researchers estimate that we will have autonomous intelligence within a decade.
* Our previous research demonstrated how LLMs can be used to automate one part of the phishing kill chain (create phishing emails). We now demonstrate how to automate the complete phishing kill chain (collecting targets, collecting information about the targets, creating emails, sending emails, and validating the results).
* We also show how to counter AI-enabled phishing attacks using personalized spam filters and a digital footprint-cleaning tool.
* Due to the rapid increase in capabilities and decrease in the cost of AI-enabled spear phishing, these attacks are likely to grow tremendously in the years to come. We provide three simple mitigation strategies for business leaders and security practitioners.
2—The AI-phishing tool (automating the phishing kill chain)
* The tool uses iterative queries to language model APIs to scrape publicly available information from the target and create a unique vulnerability profile for the target. Then, it uses the principles of the V-Triad (credibility, customizability, and compatibility) and Robert Cialdini’s influence principles (reciprocity, consistency, social proof, authority, liking, and scarcity) to create phishing emails that match the user's vulnerability profile.
* The tool also creates different sender profiles (based on synthetic or real people), such as a professor or a government official, and figures out what sender profile to use for each target. The type of sender profile significantly affects the content of the phishing email.
* Demonstrate the tool and invite the audience to provide feedback.
3–The silver lining: using the tool to counter AI phishing
* We create personalized spam filters by matching users’ vulnerability profiles with the phishing email's vulnerability profile (based on the email’s influence principles and the sender’s category). For example, an engineering student at Harvard interested in AI governance and policy (target profile) is sent a phishing email using authority and liking (influence categories) from a professor in the field (sender profile). Over time, we will train a language model to search for correlations between target profiles, influence categories, and sender profiles.
* Our AI tool also helps users categorize the email’s intention and proposes relevant actions for responding. For example, our tool might tell a user who received a collaboration invitation email from a professor to go to the university’s official website and verify the professor’s email address.
* The digital footprint cleaner helps users find what portion of their publicly shared information is most useful to attackers and least useful to the users. This intersection presents a sweet spot of publicly shared information that can be removed to maximize attackers’ difficulty in impersonating users while minimizing the users’ sacrifice.
* Demonstrate the mitigation strategies and invite the audience to provide feedback
4–Results from our first user evaluation
* Description of the methodology from the first user evaluation (including 200 participants, compare the success rates of our AI-phishing tool and phishing emails created by human experts, etc.)
* The phishing tool will be evaluated quantitatively by noting how many participants open each phishing email and qualitatively by letting each participant answer a survey gauging their suspicion towards each email. The data measures a user's Cyber Risk Beliefs (CRB), as described by Arun Vishwanath in The Weakest Link. CRBs measure the user's susceptibility to phishing by analyzing their two modes of information processing (heuristic and systematic). We also measured whether a user estimates, overestimates, or underestimates risk, based on three factors (complexity, expertise, and metacognition).
* Quantitative results: TBD
* Qualitative results: TBD.
* The personalized spam filters are evaluated by letting the tool provide recommendations for phishing emails created by the tool, by human experts, and fetched from online phishing archives. The provided recommendations are evaluated by human experts, and the email’s intention is evaluated quantitatively by comparing the real intention with the suggested intention.
* The digital footprint cleaner is evaluated by removing the suggested information and recreating the phishing email against each participant. After the study, we ask the participants how important the information is to them based on a seven-point scale (our aim is to only suggest removing unimportant information).
* Results from the personalized spam filters and digital footprint cleaner: TBD.
* Discuss the methodology and tool and invite the audience to provide feedback.
5–Benchmarks for evaluating the deceptive capabilities of frontier AI models
* Discuss the problems of correctly evaluating AI models’ capability to deceive humans
* Discuss existing evaluation schemes and benchmarks for AI deception, and the shortcomings of existing benchmarks
* Present new benchmarks and evaluation strategies for the deceptive capabilities of current, near-future, and superintelligent AI models, specifically focusing on phishing
Fredrik Heiding is a research fellow in computer science at Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS). He researches AI-enabled cyberattacks from the intersection of technology, business implications, and national security policies. His work demonstrates how AI models can be used to hack devices and users and create mitigation strategies for preventing those hacks. He also red teams the AI models themselves and the US national cybersecurity strategy to find out how to better prepare our national security for AI-enabled cyberattacks. In early 2022, Fredrik got media attention for hacking the King of Sweden and the Swedish European Commissioner. Fredrik currently works with the World Economic Forum's Cybercrime Center and White House Officials to improve global and domestic cybersecurity standards of AI-based cyber defense. Fredrik is a teaching fellow for the Generative AI for Business Leaders course at the Harvard Business School and leads the cybersecurity division of the Harvard AI Safety Student Team (HAISST).
Twitter: @fredheiding
I am currently an independent AI researcher. I look into the misuse risks of AI agents for things like spear phishing or automated forensics.