Nico Kreiling
Nico is a Data Scientist at scieneers, co-organizer of PyData cologne meetup and host of the Techtiefen podcast. His passions are quick and simple solutions and the constant expansion of his and the communities' knowledge base.
@NicoKreiling
Github – LinkedIn –Sessions
Pandas is the de-facto standard for data manipulation in python, which I personally love for its flexible syntax and interoperability. But Pandas has well-known drawbacks such as memory in-efficiency, inconsistent missing data handling and lacking multicore-support. Multiple open-source projects aim to solve those issues, the most interesting is Polars.
Polars uses Rust and Apache Arrow to win in all kinds of performance-benchmarks and evolves fast. But is it already stable enough to migrate an existing Pandas' codebase? And does it meet the high-expectations on query language flexibility of long-time Pandas-lovers?
In this talk, I will explain, how Polars can be that fast, and present my insights on where Polars shines and in which scenarios I stay with pandas (at least for now!)
Innovations such as sentence-transformers, neural search and vector databases fueled a very fast development of question-answering systems recently. At scieneers, we wanted to test those components to satisfy our own information needs using a slack-bot that will answer our questions by reading through our internal documents and slack-conversations. We therefore leveraged the HayStack QA-Framework in combination with a Weaviate vector database and many fine-tuned NLP-models.
This talk will give you insights in both, the technical challenges we faced and the organizational learnings we took.