Berlin Buzzwords 2022

Aarne has more than 16 years of experience in software development, consulting and academic research with specific focus on NLP and search engines.

Aarne is the CEO and co-founder of Basement AI, Lead AI Engineer at Silo AI and a PhD researcher in NLP at University of Helsinki.

Aarne is currently working on a new multilingual and multimodal search engine Muves.

Muves: Multimodal and multilingual vector search with Hardware Acceleration

Alessandro Benedetti

Alessandro Benedetti is director and R&D Software Engineer at Sease Ltd.
His focus is on information retrieval, information extraction, natural language processing, and machine learning.
At Sease Alessandro is working on Search/Machine learning R&D and consultancies.
When he isn't on clients' projects, he is actively contributing to the open-source community and presenting the applications of leading-edge techniques in real world scenarios at meet-ups and conferences such as ECIR, the Lucene/Solr Revolution, ApacheCon, Haystack, FOSDEM, and Open Source Summit.

Neural Search Comes to Apache Solr: Approximate Nearest Neighbor, BERT and More (Buzzwords)!

Aline Paponaud

CTO of Adelean, working with search and providing consulting services and expertise around Elasticsearch, Lucene and Solr. She brings her energy to leveraging search engines, as they become more and more essential in every domain.

Scaling an online search engine to thousands of physical stores

Amrit Sarkar

Amrit Sarkar is a Software Engineer at Apple India Pvt Ltd with 6+ years of experience in the search domain and big data, e-commerce, and product.

Kafka Monitoring: What Matters!

Ana Maria García Sánchez

Hi! My name's Ana and I'm a 22 years old girl finishing her degree in Software Engineering while working at Empathy.co as a Backend Engineer.

During my time in college, I've been active in student representation. In fact, I've been the president of the delegation and assembly of my school for two years, leading groups, making decisions, giving speeches, organizing events like the EII Tech Fest...

At this time, I'm trying to leave that world, but while I do, I'm the coordinator of the infrastructure commission at RITSI.

Not only I'm not afraid of talking in public and being "against" an audience, but I enjoy doing it, so trying to give a speech at Berlin Buzzwords is an exciting new experience.

Relevance is not a Thing but a Perception

Andre Jasiskis

Andre Jasiskis is a founding member of Nubank's data platform, currently working on Nubank's streaming platform. Has been thinking a lot about data ingestion, fast batch, and streaming processing, specifically on how to handle the exponential growth in consumers and data volume.

The perils of building a democratic data platform

Andrea Schuett

Andrea Schütt is a Data Scientist at OTTO’s search team. Currently she is working on bringing OTTO’s first learning to rank model into production. She has a degree in electrical engineering with a focus on automation.

Offline Ranking Validation - Predicting A/B Test Results

Anshum Gupta

Anshum is an Apache Lucene and Solr committer and Project Management Committee member. He started dabbling with Lucene about 15 years ago, and since then has worked at various organizations building both internal and consumer facing search platforms on top of Lucene and Solr. He is currently a part of Apple's Open Source Technologies group.

What's new in Apache Solr 9.0

Atita Arora

Atita works as a Search Relevance Consultant at OpenSource Connections.
She develops, consults, and optimize Enterprise & E-commerce search engines for more than 10 years. She holds 2 Masters degrees in Computer Applications and Strategic Business Management. She specializes in search platform migration to the cloud, building index pipelines, custom component development, and tuning relevance. She has a keen interest in personalizing search and influencing customer interaction using NLP, ML, and AI.

Understanding Vespa with a Lucene mindset

Charlie Davies

Charlie Davies is the CEO of TravelTime and the creator of an API and set of plugins which enable users to search location data using minutes rather than miles. Charlie is responsible for running all aspects of TravelTime including planning the product roadmap and the coordination of the technical team.

Should we stop using distance in our location-based data recommendation models?

Charlotte Henkle

Charlotte Henkle is a Senior Software Development Manager at the OpenSearch Project supporting the OpenSearch engine, Engineering Efficiency and Clients teams. She’s worked for Amazon for 14 years, working in Amazon Retail Catalog, Amazon Photos and Amazon Explore. Prior to Amazon she wrote software for federal clients like the Marine Corps and the FBI and worked for the University of Chicago Biological Learning Center.

Working in the Open...Search

Chinmay Soman

Chinmay Soman is a founding engineer in StarTree, building real time analytics solutions at scale. Previously he led the streaming platform team at Uber for building a large scale, self-serve platform around messaging, stream processing and OLAP technologies. Before that, he worked at LinkedIn and IBM, focussing on distributed systems and security. He’s a PMC member of Apache Samza and a committer on Apache Pinot, Voldemort, uReplicator and AthenaX.

Next generation OLAP stack using Apache Pinot

Christoph Schubert

Christoph is a Solutions Architect with Confluent. Before joining Confluent, he developed streaming applications for the financial services industry and mobile applications for about everyone else. He holds a PhD in Mathematics and has a long-lasting interest in developing performant and secure software systems.

Patterns and anti-patterns for production ready Kafka Streams apps

Ciprian Hacman

Ciprian Hacman works mainly as a DevOps/Software Engineer for polypoly, helping them and other clients modernize their infrastructure and migrate to Kubernetes.
He is also an open source project maintainer for kOps (Kubernetes Operations), etcd-manager, cloud-provider-aws and frequent contributor to other projects in the Kubernetes ecosystem.

Autoscaling Elasticsearch for Logs on Kubernetes

Daniel Hernández Alfageme

I am a software engineer working as a Data Engineer for Empathy.co. My work is focused on building and managing ETL pipelines that feed our search engine with contextual information to improve the search experience on the final users and provide aggregated analytics to merchandisers, always keeping privacy in mind. I am passionate about data engineering and I perform my daily work with technologies like Apache Flink, Apache Spark, MongoDB running in AWS or GCP.

Running Apache Spark on K8s: From AWS EMR to K8s

Daniele Antuzi

Daniele Antuzi is a software engineer passionate about high-performance data structures and algorithms. He has been working for 4 years in finance (List spa) and 2 years in cloud services (Amazon Web Services) but the curiosity to learn more about information retrieval brings him to join Sease Ltd.
He likes studying and experimenting with new technologies trying to reduce the gap between academia and industry.

Word2Vec model to generate synonyms on the fly in Apache Lucene

Dmitry Kan

Dmitry has been focusing on search engines since 2010 with Apache Lucene and Solr and since 2020 with Elasticsearch. He was responsible for building a search team and search technology powering AlphaSense product which today is used by thousands of reputed banks, hedge funds and companies in almost any industry vertical around the world. At Silo.AI Dmitry led a team of NLP researchers, search, frontend and QA engineers working on search at web scale, interacting with Product Management, Engineers and Data teams on a daily basis.
Dmitry has worked on open source projects Luke and Quepid and co-founded a few startups: in text analytics, edtech and team engagement space. He is the founder and host of the Vector Podcast (https://www.youtube.com/c/VectorPodcast). Having established himself as an independent researcher in vector search, Dmitry began working on Muves -- multilingual and multimodal search engine, together with his co-founders. In free time he enjoys reading, cycling and blogging about AI and Search. Dmitry holds a PhD in Applied Mathematics and a Master’s in Computer Science.

Muves: Multimodal and multilingual vector search with Hardware Acceleration

Edoardo Tosca

Edoardo is an experienced software craftsman. He is obsessed with business problems and desperate to find the best technology to solve them. He is an open source enthusiast with a particular interest in search engines and machine learning.

Currently he is Head of Technology at Signal AI.

Entity Linking at scale with Lucene

Eli Fisher

Eli Fisher has worked on databases and analytics for Microsoft, Google, and Amazon. He also co-founded the IoT-based digital signage company, Raydiant. He now leads the Product Management team at AWS that is working on the OpenSearch Project.

Why a Search Engine Makes a Great Log Analytics Solution

Eran Bibi

Eran Bibi is Co-Founder & Chief Product Officer at Firefly. With years of experience in anything DevOps/SRE and security, he has earned a reputation as a CI/CD and SRE expert and an avid admin of Cloud Platforms and containerized environments.

Prior to Firefly, Eran was Head of DevOps & Cloud Platform at Aqua Security and DevOps Group Lead at Finastra. Eran is a frequent speaker at Cloud Native meetups, AWS community meetups, and other cloud workshops and conferences.

Don't Panic: Getting Your Infrastructure Drift Under Control

Erik Erlandson

Erik Erlandson is a Software Engineer at Red Hat’s Open Services Group, where he explores emerging technologies at the intersection of Data Science workloads and the Kubernetes ecosystem.

Scaling the Open Source Climate Community

Fiona Coath

Fiona is a lead software developer and consultant currently working at Thoughtworks. Her specialty is being a generalist. She enjoys problem solving across a range of technologies including natural language processing and data analytics. Her passion for social justice and equality keeps inspiring her to investigate the relationships between technology and society. When she isn’t solving challenging problems, she enjoys craft, exploring the world and the colour purple.

Meet the people fighting surveillance capitalism

Florian Hoenicke

When working in large organizations like SoundCloud, Deloitte and Axel-Springer I learned that the hardest challenges for tech companies are not of technical nature. As a Solution Lead at Jina, I analyze the challenges of our clients and come up with customized solutions. Based on these learnings, I propose changes to our Framework in order to push the quality and accessibility of neural search.

Neural Search - Let's talk about quality

Francesco Tisiot

Francesco comes from Verona, Italy and works as a Developer Advocate at Aiven. With his many years of experience as a data engineer, he has stories to tell and advice for data-wranglers everywhere. Francesco loves sharing knowledge with others as a speaker and writer, and is on a mission to defend the world from bad Italian food!

Solving the knapsack problem with recursive queries and PostgreSQL

Frank Liu

Frank Liu is an engineering manager and tech lead at Zilliz, where he leads a team of multi-national engineers while serving as a maintainer for the Towhee open-source project (https://towhee.io). In a span of less than three months, he and his team turned Towhee into an easy-to-use platform in use by 3 major corporations ($10B+ market value) and a number of small- and mid-size startups in proof-of-concept and production systems.

Prior to Zilliz, Frank co-founded a indoor localization startup based in Shanghai and worked as a deep learning engineer at Yahoo in San Francisco. Frank holds a MS and BS degrees in Electrical Engineering from Stanford University.

Building an Open-source Framework for Generating Embedding Vectors

Hakan Lofcali

Hakan is a Software / Data Engineer. He worked and built his knowledge around Software, Data Engineering, and Cloud-Native Computing in different environments. From early start-up to AWS. From sports media companies to highly regulated FSI enterprises. The experiences gained, problems encountered, and solutions found led to him co-founding DataCater to enhance tooling in the Data space.

Cloud-native ETL with Java Quarkus, Kubernetes, and Jib Container Builder

Ido Nadler

I am a big data team lead at Nielsen.
My team focuses on building massive data pipelines (~250 Billion events/day) and infrastructure for running machine learning algorithms. Our projects run on AWS using a variety of technologies like Kafka, Spark, Airflow, Kubernetes, and more.
I like to continuously experiment with new technologies, tackle challenging problems, and find those better, more elegant, and cost-effective solutions.

Scaling your Kafka pipeline can be a pain - but it doesn’t have to be!!

Ilaria Petreti

Ilaria is an Information Retrieval/Machine Learning engineer at Sease. Strongly believing in the power of Big Data and Digital Transformation, she got a master in Data Science.
She loves the application of data mining and machine learning methods to information retrieval problems. Currently, she is involved in Learning to Rank projects.

Word2Vec model to generate synonyms on the fly in Apache Lucene

Javier Blanco Cordero

Javier Blanco Cordero is a senior data scientist at Quix, where he helps customers get the most out of their data science projects. He was previously a senior data scientist at Orange, developing churn prediction, marketing mix modeling, propensity to purchase models and more. Javier is a master's lecturer and speaker specializing in pragmatic data science and causality.

Live build: How to harness streaming data in real time to track, transform and build on heart rate data

Jo Kristian Bergum

Distinguished Engineer @Yahoo working on @vespaengine. Tweets about Vespa, search, recommendation, ranking, and IR.

AI-powered Semantic Search; A story of broken promises?

Joaquim Torres

Lead Software Engineer at Nubank, building streaming infrastructure to enable multiple teams to manage their real-time data products.

He is a generalist at heart with a penchant for distributed systems and platform software.

The perils of building a democratic data platform

Joel Lord

Joel Lord (@joel__lord on Twitter) is passionate about the web and technology in general. He likes to learn new things, but most of all, he wants to share his discoveries. He does so by travelling at various conferences all across the globe.
He graduated from college in computer programming in the last millennium. Apart from a little break to get his BSc in computational astrophysics, he was always in the industry.
In his daily job, Joel is a developer advocate with MongoDB, where he connects with software engineers to help them make the web better by using best practices in web development.
During his free time, he can be found stargazing on a campground somewhere or brewing a fresh batch of beer in his garage.

Help! I Need To UnSQLize My Application

John Dagdelen

John Dagdelen is a PhD student in the department of materials science and engineering at UC Berkeley. His research focuses at the intersection of artificial intelligence, high performance computing, and materials discovery and design.

Matscholar: The search engine for materials science researchers

Josh Reed

Josh lives in Montréal, Canada, and works on the Release Engineering team at Aiven. With many years of experience in several parts of the software development lifecycle, he has a passion for processes that enable smooth interactions between developers and operations. When he’s not hacking away at a problem or championing best practices, he loves to cook recipes from around the world, as long as he can garnish them with a little bit of cilantro!

Effective CI/CD for Large Systems

Julien Le Dem

Julien Le Dem is the Chief Architect of Astronomer and Co-Founder of Datakin. He co-created Apache Parquet and is involved in several open source projects including OpenLineage, Marquez (LFAI&Data), Apache Arrow, and Apache Iceberg. Previously, he was a senior principal at Wework; principal architect at Dremio; tech lead for Twitter’s data processing tools, where he also obtained a two-character Twitter handle (@J_); and a principal engineer and tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.

Cross-Platform Data Lineage with OpenLineage

Julien Nioche

Having studied Russian language and culture in Paris and taught French in a school in Kiev, Ukraine, Julien went on to graduate in Text Engineering and Natural Language Processing. He moved to the UK to work as a researcher at the University of Sheffield in 2005 and founded DigitalPebble in 2008.

Julien has been involved in several open source projects, mainly at the Apache Software Foundation, and was the PMC chair for Apache Nutch. He is an Emeritus member of the Apache Software Foundation.

Julien runs workshops on web crawling, speaks at conferences and reviews technical books. He has over 20 years experience in the Java programming language.

URL Frontier, an open source API and implementation for crawl frontiers

Konstantinos Perifanos

Kostas is the Head of Data Science at codec.ai, leading the strategy and the implementation of machine learning, natural language processing
and information retrieval across the business. He enjoys high quality coffee, hiking and landscape photography.
He holds a PhD in Natural Language Processing

Dense Concept Retrieval

Lara Menéndez García

Lara Menendez is Product Director and member of the Executive Board at Empathy.co, a leading innovator in commerce search and discovery. She is responsible for all aspects of the product. Her role is critical in the product strategy, focusing on commerce search experiences that evoke positive emotions.
Prior to joining Empathy.co, Lara’s work experience included data engineering and analytics positions with Pull&Bear and Merkle. Her crucial role in overseeing global product management at Empathy is reflected in how she leads the product team: A team that is committed to building human-centred commerce search and discovery platforms that empower big brands to create trustworthy, understanding and joyful experiences.
After living in Asturias and Galicia (Spain), she is now based in London as part of Empathy’s team in their UK headquarters. She loves London vibes but she misses Spanish gastronomy. When she’s not at her desk, you can find Lara enjoying a good Spanish wine.

Goodbye Tracking, Hello Privacy: The Technology & Architecture behind Ethical Search & Discovery

Lester Solbakken

Principle Software Engineer at Verizon Media (previously Yahoo) on the Vespa platform, the open big data serving engine (vespa.ai). Focus areas are machine learning engineering with emphasis on serving and search system ranking. Previously pursued a PhD within Artificial Intelligence and Machine Learning. Main research topics were neural networks, exploratory data analysis and self-organizing systems.

Hybrid search > sum of its parts?

Lily Davies

Lily is a data scientist at Codec.ai, working with NLP, deep learning and IR to understand culture across different modalities.

Dense Concept Retrieval

Lucian Precup

Lucian Precup is the CTO of all.site - the collaborative search engine developed at Station F in Paris. With his colleagues at Adelean, Lucian develops solutions for indexing, searching and analyzing data. Lucian regularly shares his knowledge in specialized conferences and organizes the Search & Data Meetup.

The life of a search engine administrator

Mary Grygleski

Mary is a Java Champion and a passionate Senior/Lead Developer Advocate at HiveMQ, the leading provider of MQTT IoT & IIoT messaging and enterprise-grade cloud-native software. She spent 3.5 years as a very effective advocate at IBM, focusing on Java, Jakarta EE, OpenJ9, Open Source, Cloud, and Distributed Systems. She transitioned from Unix/C to Java around 2000 and has never looked back since then. She considers herself a polyglot and loves to continue learning new and better ways to solve real-life problems. She is an active tech community builder outside of her day job, and currently the President of the Chicago Java Users Group (CJUG), as well as a co-organizer for several IBM-sponsored meetup groups in the Greater Chicago area.

Benefits of MQTT for IoT Messaging and Beyond

Max Irwin

Max Irwin is the founder of https://max.io, and is a contributing author of the book "AI Powered Search". Prior to founding MAX.IO, he was Managing Consultant at OpenSource Connections, and was the founding leader of the Search Center of Excellence at Wolters Kluwer.

Max has over 20 years of experience directing delivery and strategy of large scale applications in various industries, with 10 of those years globally managing large diverse teams to improve search quality to drive results. He has deep practical hands-on technical expertise in search relevance, customer experience, natural language processing, and growing quality-focused culture.

The Race to the Bottom - Low Latency in the age of the Transformer

Maximilian Werk

I enjoy bringing machine learning into production at Jina.ai as an engineering director. The combination of high quality engineering, digging into data and the real-world problem at hand thrills me.

Neural Search - Let's talk about quality

Mike Tapi Nzali

I am a machine learning engineer at CybelAngel with a PhD in Computer Science. I like to work in a startup environment, also leading the development of machine learning products from idea to production. I am interested in cutting-edge technology, sharing knowledge and industrialization of Machine Learning.

Reproducible and shareable notebooks across a data science team

Minakshi Korad

I am a Senior Software Engineer working in the Enterprise Insights team at Twilio. I am always excited about distributed systems, data driven technologies also interested in scalability and performance aspects of the product. I am always on the lookout for learning new technologies and optimizing the current usecases.

Min and Max Aggregations with Updates in Real Time.

Nick Burch

Nick is heavily involved in a number of Apache projects, such as Tika and POI, while having the fortune to know many of the people involved in the Apache Big Data and Search space! When not helping out with Apache things, Nick works as the Director of Engineering at FLEC, where he leads a team making heavy use of Open Source technologies. When not helping improve the logistics industry, he is often to be found attending or organising BarCamps, Geek Nights, or other such fun events dedicated to sharing what's great and new!

Barcamp
Learning about AI/ML for Text, with Wordle!

Nina Müller

Nina is the Ethical Commerce Alliance Director at Empathy.co. She has lots of marketing and event organisation experience, especially in the tech industry having run Berlin Buzzwords and other conferences before.
In her current role, Nina is responsible for establishing partnerships and connecting people in a network committed to privacy, data and ethics towards a more human-centric approach to the online world.
When she’s not at her desk, Nina enjoys music and cooking, has a weakness for licorice and loves the cinema.

Goodbye Tracking, Hello Privacy: The Technology & Architecture behind Ethical Search & Discovery

Noaa Barki

Noaa is a full-stack developer, community manager, and tech writer who wishes to encourage developers to deepen the decisions we make during the development processes, research about the technologies we use and share our knowledge. She started her journey in the 8200 Unit of the IDF Intelligence forces where Noaa took her first steps in software development. In the last 4 years, her work has mainly included Angular, .NET, VanillaJS, and Typescript. She currently develops in React, NodeJS and Golang.

What we learned from reading 100+ Kubernetes Post-Mortems

Olena Kutsenko

Olena is a software engineer and a developer advocate currently working at Aiven. She is passionate about open source, data, sustainable software development and team work. Her knowledge is shaped by expertise she acquired working in such companies as Nokia, HERE Technologies and AWS; and from the countries she was lucky to live in - Ukraine, Sweden, Spain and Germany.

Apache Kafka simply explained

Opher Dubrovsky

I am a director of data engineering at Nielsen.
My group builds massive data pipelines that are cost effective and scalable (~250 Billion events/day). Our projects run on AWS, using Kafka, Spark, Aerospike, serverless Lambda functions, Airflow, OpenFAAS, Kubernetes and more.
I am passionate about new technologies, data, algorithms and machine learning. I love to tackle difficult problems and come up with amazing solutions to them.
I have 4 patents in the area of security, and lots of ideas for more..

Scaling your Kafka pipeline can be a pain - but it doesn’t have to be!!

Pascal Godbillot

Pascal is a Data Scientist at CybelAngel, Paris. He is focusing on building robust and efficient machine learning models to identify all kinds of digital threats. He also has a strong interest in various subjects related to Machine Learning Operations (MLOps). He is eager to solve the technological challenges of tomorrow in the AI field where innovation and knowledge sharing are paramount.

Reproducible and shareable notebooks across a data science team

Petros Angelatos

Petros is a software engineer at Materialize where he started the development of the Postgres CDC connector as a demo project. He is the co-founder and former CTO of balena.io, a platform for managing remote fleets of Linux devices.

Change data capture with Debezium…and without

Qi Wu

Qi Wu works as a Machine Learning Engineer at ontolux to translate current research results into usable applications for our customers. She works on topics such as training and optimizing models, with a focus on finetuning and distillation. During her master's degree in statistics, she has already worked with Prof. Dr. Alan Akbik on the NLP framework FLAIR and worked on machine learning in the area of natural language processing, such as information extraction.

Compress giant language models to effective and resource-saving models using knowledge distillation

Radu Gheorghe

Radu Gheorghe works mainly as a search consultant at Sematext, working with clients of all sizes on their Elasticsearch and Solr projects. He is also a trainer and does production support for both these search engines.

Sometimes he helps out with the development of Sematext Cloud (an observability SaaS), mostly when it comes to Elasticsearch and log shippers (e.g. Logstash, rsyslog...). He also writes on the Sematext blog or helps other publish new articles.

He co-authored a book (Elasticsearch in Action, Manning), recorded a video tutorial (Working with Elasticsearch, O'Reilly) and was a speaker at a number of conferences, such as Berlin Buzzwords, LuceneSolrRevolution (later Activate) and O'Reilly Velocity.

Autoscaling Elasticsearch for Logs on Kubernetes

Radu Pop

Radu is providing Consulting Services as Solutions Architect at Adelean. He handles projects around Elasticsearch and Adelean’s A2 search technology. He oversees the integration and evolution of search engines within large e-commerce platforms and marketplaces. Prior to joining Adelean, Radu acquired a solid experience in Web archiving, operating large scale crawling systems in the context of several European research projects. He holds a PhD in Computer Science and a MSc in Distributed Systems.

Searching through large graphs using Elasticsearch

Ramiro Alvarez Fernandez

I’m a Senior DevOps Engineer currently working as Tech Lead of the Platform Engineering Team at Empathy.co. I mostly manage Kubernetes Clusters, CI/CD orchestration, Elasticsearch, MongoDB and try to break things on AWS, GCP and Azure. I'm a big fan of Anton Babenko and if I'm not online you can find me on Pagerduty.

Running Apache Spark on K8s: From AWS EMR to K8s

Ricardo Ferreira

Ricardo is Senior Developer Advocate at AWS, working in the developer relations team for North America. With +20 years of experience, he may have learned a thing or two about distributed systems, fast data analytics, software architecture, databases, and observability. Before joining AWS, he worked for software vendors like Elastic, Confluent, and Oracle. Ricardo is known for his natural ability to explain complex topics. He craftily breaks them down into bite-sized pieces until anyone can understand.

While not working, he loves barbecuing in his backyard with his family and friends, where he finally gets the chance to talk about anything unrelated to computers. He currently lives in North Carolina, USA, with his wife and son. Follow Ricardo on Twitter: @riferrei.

Do It Yourself: Programmable Metrics using OpenTelemetry

Ricardo Kawase

Dr. Ricardo Kawase leads a team of Data Scientists and Engineers mobile.de GmbH/Adevinta, the leading online automotive marketplace in Germany. His team is responsible for topics such fraud fighting/prevention, price prediction, user profiling, customer behavior prediction and personalization. He holds a Ph.D in Computer Science (Doctor rerum naturalium, Dr. rer. nat.) from the Gottfried Wilhelm Leibniz Universität Hannover, Germany. Before joining mobile.de he worked as a researcher for over 7 years at the L3S Research center in Hannover on several topics such as data mining, information retrieval, semantic web, e-learning, social networks, crowdsourcing and Web science in general. He has written, collaborated, and published over 60 peer reviewed academic articles, and serves as a reviewer in several conferences and journals.

A smooth ride: Online car buying and selling at mobile.de

Richard Goodman

Currently a Senior Data Infrastructure Engineer at Brandwatch in Brighton, UK, having joined fresh out of University several years ago. Working in the Data Infrastructure team is all about making Brandwatch's data safe, reliable and available. Using techs such as Solr, Kafka, HBase, Airflow, Postgres and Kubernetes. I developed an interest in Solr early on in my career and that has been my main focus working in this team. Over the last couple of years I began my journey in contributing to the open-source community submitting a few small patches for Solr.

I take a keen interest in anything Solr, Python and monitoring, having spent a good time elevating our monitoring stack here at Brandwatch, overhauling our entire monitoring stack for Solr. More recently, I've lead a project to take a cluster from the data center to host in AWS in a cost-effective manner and have better scalability.

Outside of work I take interest in classical music, being trained in Piano and Flute, sewing, D&D and video games.

Using Solr unconventionally to serve 26bn+ documents

Sakshi Deo Shukla

I am currently pursuing my master in Computational Linguistics at the University of Stuttgart. I am working as a Research Assistant at Landes Baden Wüttemberg, ISTE. I have formerly worked as a Senior data scientist at Delhivery, India. I have been leading various technical communities in Delhi like Women Techmakers, WiMLDS, GDG Cloud New Delhi.

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach

Sean Neumann

Sean Neumann is a Seattle-based software engineering manager leading development of OpenSearch Dashboards, a community-driven, visualization and analytics application in the OpenSearch Project. Over the last 9 years at Amazon, Sean has worked on Amazon OpenSearch Service, FinTech business applications, and the eCommerce Platform. Prior to Amazon, Sean was a software developer at Microsoft and various Seattle startups. Outside of work, Sean regularly practices yoga, where he has also been teaching for over six years.

Working in the Open...Search

Sergio Ferragut

Sergio Ferragut is a database veteran turned Developer Advocate. His experience includes 16 years at Teradata in professional services and engineering roles. He has direct experience in building analytic applications spanning the retail, supply chain, pricing optimization and IoT spaces. Sergio has worked at multiple technology start-ups including APL and Splice Machine where he helped guide product design and field messaging. He joined Imply as a Developer Advocate in 2021 to help and learn from the Apache Druid open source community.

Build Real-time Analytic Applications: The Easy Way.

Shubhro Jyoti Roy

Shubhro Roy is a Staff Engineer and Tech Lead on the Search Team at Box. His team is responsible powering search and discovery capabilities for Box which involves running and maintaining a petabyte scale search index on Solr. Prior to Box, he was building query engines for the Database group at Oracle. He has been working on distributed systems and information retrieval for 10+ years after graduating from Carnegie Mellon with Masters in Information Systems and Machine Learning.

Architecting Solr indexing pipelines in Google Cloud Platform

Simona Meriam

Simona Meriam is a Senior Data Engineer at Aidoc, where she specializes in research and development of solutions for big data infrastructures. In her previous position as a Big Data Engineer at Nielsen, she researched and developed big data solutions using cutting-edge technologies such as Spark, Kafka, and Elasticsearch. In her spare time she enjoys talking, talking about music that you'll probably think is weird, Japan and data.

Logging Apache Spark - How we made it easy

Sophie Watson

Sophie is a Technical Marketing Manager at NVIDIA, where she strives to make Data Scientists' every day lives easier. Sophie has focused on applying her data science and statistics skills to solving business problems and informing next-generation infrastructure for intelligent application development.

Luxuries, necessities, and the challenges that remain: some experiences with accelerated data science

Steven Kolawole

Steven Kolawole has his technical skillset cuts across Data Science and Software Engineering, with a bias for ML Research these days. His research interests focus on resource-efficient machine learning in terms of computational resources and low-resource/limited labeled data.

He is and has been heavily involved in varieties of ML subfields including ML Engineering, Software Engineering, Data Engineering, Data Science/Analytics, and Cloud Computing.

Steven is also big on knowledge sharing via community mentorship and collective growth, open-source development, meetups facilitation, speakership, technical writing, research, and he gets kicks from helping tech muggles find their feet.

Open Science: Building Models Like We Build Open-Source Software

Thomas Fricke

Kubernetes Security in Critical Infrastructure
Founder of Resility, Endocode, Inög

Open Source Enthusiast

Optimizing Containers for Security and Scaling

Timo Walther

Timo Walther is a long-term committer and PMC member of the Apache Flink project. He studied Computer Science at TU Berlin. Alongside his studies, he participated in the Database Systems and Information Management Group there and worked at IBM Germany. Timo worked as a software engineer and lead of the SQL team at Ververica. In Flink, he is working on various topics in the Table & SQL ecosystem.

Changelog Stream Processing with Apache Flink

Tomáš Neubauer

Tomas Neubauer is cofounder and CTO at Quix, responsible for the technical direction of the company across the full technical stack, and working as a technical authority for the engineering team. He was previously technical lead at McLaren, where he led architecture uplift for Formula One racing realtime telemetry acquisition. He later led platform development outside motorsport, reusing the knowhow he gained from racing.

Live build: How to harness streaming data in real time to track, transform and build on heart rate data

Umesh Dangat

Umesh Dangat is a Principal Engineer and Group Tech Lead for the market engineering platform at Yelp. Umesh joined Yelp in 2015 and has since architected and led Yelp’s ranking infrastructure evolution into its third generation. This group at Yelp is responsible for providing search and ranking infrastructure to all of Yelp’s search and ranking needs in a cost efficient, scalable and extensible way.

Prior to Yelp, Umesh has worked at various companies for over a decade mostly solving search, streaming and data ingestion problems for large datasets and building backend systems.

Umesh is also an open source contributor for popular search projects like Elasticsearch, learning to rank and most recently Nrtsearch.

NrtSearch: Yelp’s fast, scalable, and cost-effective open source search engine

Uwe Schindler

Uwe is committer and PMC member of Apache Lucene and Apache Solr. His main focus is on development of Lucene Core. He implemented fast numerical search and is maintaining the new attribute-based text analysis API. He studied Physics at the University of Erlangen-Nuremberg and works as managing director for SD DataSolutions GmbH in Bremen, Germany, a company that provides consulting and support for Apache Lucene, Elasticsearch, and Apache Solr. He also works for “PANGAEA – Publishing Network for Geoscientific & Environmental Data” where he implemented the portal's geo-spatial retrieval functions with Lucene Java. Uwe had talks about Lucene at various international conferences like the previous Berlin Buzzwords, ApacheCon EU/US, Lucene Revolution, Lucene Eurocon, and various local meetups.

The future of Lucene's MMapDirectory: Why use it and what's coming with Java 19 and later?

Vincent Bréhin

Search addict since 2006, Vincent has contributed to many Search engines integrations and Search Based Applications. With http://www.adelean.com he helps customers find their way to effective Search, from the business case up to the JVM tuning .

The life of a search engine administrator

William Benton

William Benton is passionate about making it easier for machine learning practitioners to benefit from advanced infrastructure and making it possible for organizations to manage machine learning systems. His recent roles have included defining product strategy and professional services offerings related to data science and machine learning, leading teams of data scientists and engineers, and contributing to many open source communities related to data, ML, and distributed systems. Will was an early advocate of building machine learning systems on Kubernetes and developed and popularized the “intelligent applications” idiom for machine learning systems in the cloud. He has also conducted research and development related to static program analysis, language runtimes, cluster configuration management, and music technology.

Luxuries, necessities, and the challenges that remain: some experiences with accelerated data science

Yunus Lutz

Yunus is a Data Scientist at Otto, where he works on bringing Otto’s first learning to rank model into production. Prior to joining Otto, he worked as a Data Scientist and Engineer at Deloitte, where he developed pragmatic and data-driven solutions for various clients. He holds a M.Sc. in Quantitative Economics with a focus on statistics and time series analysis.

Offline Ranking Validation - Predicting A/B Test Results

Álvaro Panizo

DataScience&Search Product Owner and developer in Empathy. I worked in all the search areas in the last years, from the relevancy to data science and from the pure backend to manage the merchandiser and customer needs.
Experience with high availability systems using k8s, different cloud providers. Talking about search and search intelligence I usually have fun with technologies like Spark or Elasticsearch but also love multidisciplinary teams with knowledge over all the development process (CI/CD, metrics, performance...).

About the real me, I like to travel, be (more or less) healthy and plants but... who doesn't?
Please if you see me around pay a drink.

Running Apache Spark on K8s: From AWS EMR to K8s