Keynote - Towards Learned Database Systems
2023-04-18 , Kuppelsaal

Database Management Systems (DBMSs) are the backbone for managing large volumes of data efficiently and thus play a central role in business and science today. For providing high performance, many of the most complex DBMS components such as query optimizers or schedulers involve solving non-trivial problems. To tackle such problems, very recent work has outlined a new direction of so-called learned DBMSs where core parts of DBMSs are being replaced by machine learning (ML) models which has shown to provide significant performance benefits. However, a major drawback of the current approaches to enabling learned DBMS components is that they not only cause very high overhead for training an ML model to replace a DBMS component but that the overhead occurs repeatedly which renders these approaches far from practical. Hence, in this talk, I present my vision of Learned DBMS Components 2.0 to tackle these issues. First, I will introduce data-driven learning where the idea is to learn the data distribution over a complex relational schema. In contrast to workload-driven learning, no large workload has to be executed on the database to gather training data. While data-driven learning has many applications such as cardinality estimation or approximate query processing, many DBMS tasks such as physical cost estimation cannot be supported. I thus propose a second technique called zero-shot learning which is a general paradigm for learned DBMS components. Here, the idea is to train model


Database Management Systems (DBMSs) are the backbone for managing large volumes of data efficiently and thus play a central role in business and science today. For providing high performance, many of the most complex DBMS components such as query optimizers or schedulers involve solving non-trivial problems. To tackle such problems, very recent work has outlined a new direction of so-called learned DBMSs where core parts of DBMSs are being replaced by machine learning (ML) models which has shown to provide significant performance benefits. However, a major drawback of the current approaches to enabling learned DBMS components is that they not only cause very high overhead for training an ML model to replace a DBMS component but that the overhead occurs repeatedly which renders these approaches far from practical. Hence, in this talk, I present my vision of Learned DBMS Components 2.0 to tackle these issues. First, I will introduce data-driven learning where the idea is to learn the data distribution over a complex relational schema. In contrast to workload-driven learning, no large workload has to be executed on the database to gather training data. While data-driven learning has many applications such as cardinality estimation or approximate query processing, many DBMS tasks such as physical cost estimation cannot be supported. I thus propose a second technique called zero-shot learning which is a general paradigm for learned DBMS components. Here, the idea is to train models that generalize to unseen data sets out of the box. The idea is to train a model that has observed a variety of workloads on different data sets and can thus generalize. Initial results on the task of physical cost estimates suggest the feasibility of this approach. Finally, I discuss further opportunities which are enabled by zero-shot learning.


Expected audience expertise: Domain:

Novice

Expected audience expertise: Python:

Novice

Abstract as a tweet:

ML and DBMSs? Carsten talks about data-driven learning where the idea is to learn the data distribution over a complex relational schema.

Carsten Binnig is a Full Professor in the Computer Science department at TU Darmstadt and a Visiting Researcher at the Google Systems Research Group. Carsten received his Ph.D. at the University of Heidelberg in 2008. Afterwards, he spent time as a postdoctoral researcher in the Systems Group at ETH Zurich and at SAP working on in-memory databases. Currently, his research focus is on the design of scalable data systems on modern hardware as well as machine learning for scalable data systems. His work has been awarded a Google Faculty Award, as well as multiple best paper and best demo awards.