Jacob is the Head of Labs at Thinkst Applied Research. Prior to that he managed the HW/FW/VMM security team at AWS, and was a Program Manager at DARPA's Information Innovation Office (I2O). At DARPA he managed a cyber security R&D portfolio including the Configuration Security, Transparent Computing, and Cyber Fault-tolerant Attack Recovery programs. Starting his career at Assured Information Security, he led the Computer Architectures group performing bespoke research into low-level systems security and programming languages. Jacob has been a speaker and keynote speaker at conferences around the world, from BlackHat USA, to SysCan, to TROOPERS and many more. When not in front of the computer, he enjoys trail running, volunteering as a firefighter/EMT, and hiking with his family.
The world is awash in large-language model (LLM) AI (e.g., ChatGPT) news, predictions, and of course, content (all for good and ill). This talk takes a step back from the posturing and hype to look at how these models work, and how to detect the content they produce. We will look at the fundamentals of LLM-generated text detection, compare the best in breed: GPTZero, Roberta, and OpenAI's detector with a novel detector, ZipPy.
ZipPy is a new, open-source LLM text detector developed by Thinkst Labs that is 60-100x faster than the competition, over 1000x smaller (< 200KB), and for many types of content, more accurate. We will explain the intuition behind ZipPy, show how it works, and they types of content it struggles with. Finally we look at where LLMs can improve their stealth, and fundamental shortcomings in their designs that enable detection long-term.