Understanding file type identifiers & scanners
10-24, 16:45–17:15 (Europe/Luxembourg), Europe - Main Room

Yara, LibMagic (file, binwalk, polyfile), TrID, Yara, Magika, PeID, Pronom, FDD, ShareMime, DiE...
How do they work? What are their pros and cons, their limitations, their risks?


There's a lot of misconception around file type identifications and scanning:
the existing tools have different needs and use cases, requirements and limitations (that could be abused).

Warning: contains raw bytes.

A reverse engineer since the 80s who started his Infosec career as a malware analyst decades ago.

His wide knowledge of file formats is available in his hundreds of Corkami posters and visualisations, and is essential for projects like Magika, the AI-powered file type detection at Google.
His passion for retrocomputing and funky files makes him explore the darkest corners of the files' landscape:
bypassing security with ancient techniques, analyzing parsers and breaking them with extreme files, writing tools to evade detections via mock files or polyglots such as PoC||GTFO, exploiting AES-GCM via crypto-polyglots or colliding SHA1 via Shattered.