Apache StreamPipes for Pythonistas: IIoT data handling made easy!
2023-04-17 , B09

The industrial environment offers a lot of interesting use cases for data enthusiasts. There are myriads of interesting challenges that can be solved by data scientists.
However, collecting industrial data in general and industrial IoT (IIoT) data in particular, is cumbersome and not really appealing for anyone who just wants to work with data.
Apache StreamPipes addresses this pitfall and allows anyone to extract data from IIoT data sources without messing around with (old-fashioned) protocols. In addition, StreamPipes newly developed Python client now gives Pythonistas the ability to programmatically access and work with them in a Pythonic way.

This talk will provide a basic introduction into the functionality of Apache StreamPipes itself, followed by a deeper discussion of the Python client. Finally, a live demo will show how IIoT data can be easily derived in Python and used directly for visualization and ML model training.


The industrial environment is becoming an increasingly attractive use case for data enthusiasts with challenges ranging from predictive maintenance to robotics to autonomous vehicles.
Building a full-fledged IIoT architecture is a big endeavor, especially for small and medium sized companies with limited resources. It requires IIoT specialists with extensive knowledge of industrial protocols, software architects capable of designing an IIoT platform, and cloud specialists able to operate an infrastructure at scale that can handle potentially massive data streams. However, the added value lies not in the technical infrastructure, but in the data itself. Therefore, it should be as easy as possible for data scientists to analyze data to gain new insights without worrying about underlying technical details. But such a project has many pitfalls, which is why many projects are not even initiated because the costs seem too high. These pitfalls are addressed by Apache StreamPipes, an end-to-end toolbox that allows anyone to easily extract, explore and analyze IIoT data. With its new Python client, it targets Python data enthusiasts (e.g., data scientists) who want to work with IIoT data but don't want to get their hands dirty interacting with industrial systems.
Via an easy-to-use python client, it is possible for developers to get streaming or historic data from StreamPipes internal data management layer in a pythonic representation like dictionaries or pandas dataframes. This allows data scientists to work with their familiar tech stack and use the extracted data directly for analytics, visualizations, or even machine learning. StreamPipes handles all the infrastructure such as the message broker or time-series storage and provides many out-of-the-box features that ease data analytics of industrial sources: More than 20 data adapters for quickly getting access to a variety of industrial protocols, built-in pre-processing rules to harmonize sensor and other data on the fly and a pipeline editor featuring over 100 algorithms and a rich user interface to interactively build data processing pipelines.

Apache StreamPipes is a large and mature open source project which started as a research project in 2015 and made its way to an Apache top-level project in November 2022 with a community of currently more than 25 active contributors.

The talk will provide a basic introduction to Apache StreamPipes, followed by a deeper discussion of the Python client focusing on the target audience (Python developers). The main part is about data handling with python, and design decisions within the client for common patterns will be discussed in detail.

As a conclusion we will show how IIoT data can be extracted via Apache StreamPipes and used for further analytics within the Python world. Attendees will get familiar with Apache StreamPipes in general, its mission, and its core modules. In addition, common IIoT patterns will be presented and illustrated using the Python client of Apache StreamPipes. The presentation includes an extensive demo with many hands-on examples.


Expected audience expertise: Domain:

None

Expected audience expertise: Python:

None

Abstract as a tweet:

Data enthusiasts love to play with IIoT data. However, the technical challenges remain high (e.g., connect to devices). @StreamPipes makes this easy by providing a self-service toolbox. In this talk, we introduce a new python module to work with IIoT data in a pythonic way.

Public link to supporting material:

https://github.com/bossenti/pycon-23-streampipes-pythonistas

See also: slide deck (3.7 MB)

Tim Bossenmaier works as a Data Engineer at inovex. There he develops and builds modern data infrastructures in customer projects, from streaming ETL pipelines to data catalogs. He is also a developer and member of the project management committee of Apache StreamPipes, an open source solution for IoT data analysis.

I study Applied Artificial Intelligence at the Offenburg University of Applied Sciences and I am very interested in Data Science and AI. During my internship at the startup Bytefabrik.AI in Karlsruhe, I came in touch with the Apache StreamPipes software and became a committer for this project. I work on the python integration to enable easy access to live data streams that can be quickly connected by StreamPipes.