Arthur Andres
A seasoned software engineer, working in both batch and real time, data intensive, python application.
Session
Adopting a streaming architecture as a Python developer often means abandoning the tools and abstractions you know: DataFrames, batch processing, familiar data workflows, in favour of an entirely different mental model. After ten years of tackling this problem across multiple companies, I've learned it doesn't have to be that way.
In this talk, I'll show how to treat Kafka not as a stream of individual messages but as a source of micro-batches, and how to deserialize those messages, whether JSON or Protobuf, into Arrow-backed DataFrames. The result: your processing code looks the same whether the data comes from a Parquet file or a Kafka topic.
No heavy framework required. Using confluent-kafka and Apache Arrow, I'll walk through how to build this from the ground up, so you understand every layer of the stack.