Persistent and performant ingestion using Data Prepper
2024-05-06 , Moskau

We will discuss how you can use Data Prepper pipelines to improve ingestion into OpenSearch. We will focus on some of its features to persistent data during failures. We will also share suggestions to improve the performance of ingestion into your OpenSearch cluster.


Data Prepper is a last-mile data collector for ingestion into OpenSearch. It has a number of capabilities to help users reliably move their data into OpenSearch. We will discuss the Kafka buffer. This buffer uses Apache Kafka to persist data before writing to OpenSearch. You can use it to simplify your pipeline creation. We will also share how to use dead-letter queues to ensure your data is saved when OpenSearch has a transient error preventing ingestion. We will also discuss how you can use end-to-end acknowledgements for pull-based sources. Finally, we will share some of our future plans for Data Prepper and how they will improve on these solutions.

David is a senior software engineer working on observability in OpenSearch at Amazon Web Services. He is a maintainer on the Data Prepper project. Prior to working at Amazon, he was the CTO at Allogy Interactive - a start-up creating mobile-learning solutions for healthcare.

This speaker also appears in: