Petr Andreev
I’m a CPython-internals specialist with 8+ years leading technical teams,
evolving from ML systems work into CPython runtime research (GIL/noGIL, GC, allocators, bytecode).
Since 2024 I’ve been a Lecturer at MIPT, where I designed three 15-module Advanced Python courses (OOP, async/parallelism, PVM, GC, noGIL, CPython source code),
delivered 90+ classes to 140+ students, and mentored 13 students to conference-level research.
Industry work includes building production Python systems:
an RL trading platform with IB/Saxo connectors and Monte-Carlo risk models,
and tooling that improved match accuracy 10× and reduced search time 800×;
I also delivered ML/DTS analytics for a Saudi Aramco project.
Session
You’ll learn a repeatable workflow to accelerate real numeric kernels using
CPU SIMD, GPU arrays + custom kernels, and TPU/XLA compilation—all from Python.
For each acceleration tier we follow the same loop: theory → minimal working code → benchmark
that confirms (or disproves) the theory. You’ll leave with a small benchmark harness you can reuse,
plus a decision checklist for when SIMD is enough, when GPUs pay off, and when XLA/TPU is the right move.