DevConf.CZ

Manuel Dewald

Principal Site Reliability Engineer, Red Hat OpenShift

Manuel completed his studies in applied computer science with a Master's degree in 2013 in Heidelberg and started working for SAP shortly thereafter. He was involved in projects using the Cloud Foundry platform of SAP and in his role contributed to the Cloud Foundry lifecycle manager, BOSH. In 2019, he decided to broaden his open source contributions and joined Red Hat as Site Reliability Engineer for Red Hat Openshift. He is interested in working with and combining all kinds of technology to build new cool things, striving to make peoples life easier.


Session

06-15
11:00
35min
Making Sense of Metrics: Crafting and Leveraging Prometheus Metrics for Infrastructure Intelligence
Manuel Dewald

Audience
This talk is targeted at System Administrators and Site Reliability Engineers interested in learning about how to best make sense of the Prometheus metrics their system exposes. If you know PromQL, but the queries behind your dashboards are still a mystery to you, you are not alone. This talk will show how to get information out of your metrics to maximize the insights and make data-based decisions.

Outline
Creating new metrics and collecting them with Prometheus is easier today than it was ever before. Site Reliability Engineers and System Administrators have all the data at hand they need to make the right, data-based decisions. But how?
Making sense of all that information is still a challenge. Crafting the right PromQL query to answer your question and manifesting it in a Grafana dashboard is a complex and time-consuming task. Not speaking of understanding that query when you need to change it a few weeks later.

In this session, you will see different approaches to make sense of the prometheus metrics exposed by a software deployment: Starting from the default Prometheus UI, via PromLens, an improved, open source query-building UI, all the way to an experiment on transforming Prometheus metrics into a data warehouse for improved data exploration and visualization. Data analysts have used Business intelligence software for decades. What can we learn from these systems to discover knowledge in the ocean of metrics to make better decisions for our infrastructure?

Key Takeaways
During this talk, attendees should have learned how to (1) best explore and query the available metrics in their environment, (2) which tools are available today and (3) how infrastructure intelligence can leverage data warehouse concepts for improved knowledge discovery and decision making.

Cloud, Hybrid Cloud, and Hyperscale Infrastructure
D105 (capacity 300)