PyCon Lithuania 2025

Surprisal and the headache of tokenizer encodings in LLMs!
2025-04-25 , 228

What can go wrong with tokenizer encodings? Everything! I will share my experience of understanding, misunderstanding, and ultimately learning to work with tokenization in LLMs. I will discuss what surprisal is, its relevance to my research, and its connection to tokenization. The talk will include various examples illustrating how misunderstandings of tokenization can arise, as well as strategies for debugging and preventing these issues.

Computational Cognitive Science researcher at the University of Potsdam, Potsdam, Germany

This speaker also appears in: