PyCon UK 2025

Why `len('😶‍🌫️') == 4` and other weird things you should know about strings in Python
2025-09-21 , Main stage

Strings in Python behave in all sorts of unexpected ways: len('😶‍🌫️') == 4, 'ñ' != 'ñ', '‮dlrow olleh‬'‬.split()[1] == 'olleh‬'! How is this possible? And importantly, why should you care?


In this presentation, we’ll give a brief introduction to text encoding, the different Unicode standards and how Python encodes strings before we dive deep into Unicode oddities. We’ll cover how one code point can represent multiple characters, why converting between upper-case and lower-case characters should be locale-dependent and how emoji’s work. Finally, we will discuss some best practices for handling unicode input in Python.

After this talk, you'll have a better understanding of strings in Python, the Unicode character encoding and you’ll be aware of some common pitfalls and how to avoid them.

Audience:
The content is aimed at someone who has some experience with Python programming but wants to get a deeper understanding of strings and unicode. However, we don’t assume any prerequisite knowledge and will use simple examples, so a curious beginner will be able to follow the talk as well.


What level of experience do you expect from your audience for this session?:

Basic

Marie is a researcher, data scientist and educator with over eight years experience teaching Python to middle- and high-school students, teachers and professionals in Norway. As part of her work as a researcher at the Norwegian Language Bank at the National Library of Norway, Marie has first-hand experience dealing with the intricacies of Unicode. Marie is also a co-founder and organiser of PyLadies Oslo.

Yngve is also an experienced Python educator, Python developer and data science consultant. He has taught Python to professionals, university students and academics. While working at the Norwegian University of Life Sciences, he headed a project transforming introduction to Python into a tutorial-based course focusing on active learning. He is currently tech lead, working on automating the Norwegian power grid.