2024-04-24 –, B05-B06
Reinforcement learning (RL) has great potential for industrial applications, but few mature software frameworks exist to facilitate its use. This talk discusses efforts to improve the software landscape for RL, making it easier for researchers to contribute algorithms and for engineers to apply RL in real-world settings. Specifically, we highlight the open-source library Tianshou, which provides high-level interfaces for painless RL application development along with lower-level APIs that cater to the needs of researchers. By improving RL software, we aim to accelerate research progress and expand RL adoption in industry.
Despite the very general applicability of reinforcement learning (RL) to a variety of decision and control problems, there are comparatively few applications of it in current industries. Moreover, many important developments emerging in the highly active RL research community do not get added to existing frameworks or libraries. Code written for successful RL applications in industry is also rarely contributed to open source software (OSS). This is in stark contrast to other areas of machine learning (ML), where reported progress is often transferred to mature OSS within weeks, if not days.
Part of the reason behind this lamentable state may be the intrinsically higher complexity of RL when compared to, say, supervised learning. However, we believe that the lower permeation of RL in mature software arises in large part because writing RL-based software is currently much harder than it has to be. Widely used OSS for RL is either too complex for researchers to contribute to (like ray/RLlib or Pearl), too buggy and unstable for industry to consider (also RLlib), too limited in scope (like stable-baselines3, which includes relatively few algorithms), lacking high-level interfaces (like torch-rl), or even completely gives up on modularity (like cleanRL).
Another reason is the difference in focus between RL research and applications. In research, an important goal is to find an algorithm that works well in a variety of environments, whereas in applications, one is usually interested in solving a particular environment of interest, by any means. This leads to wildly differing evaluation scenarios and selection criteria.
We believe that the current state of RL software is reminiscent of the pre-PyTorch/pre-Keras era for supervised deep learning, when the implementation of a task like training a convolutional network on a large image dataset was non-trivial. Today, it requires but a few lines of code. We thus infer that significant progress in the software landscape supporting RL is still to be made, and that this progress will have high impact both on researchers and ML engineers.
With this goal in mind, the appliedAI Institute for Europe, together with the core developers of the open source RL library Tianshou, took on the task of extending the latter in order to democratize RL in applications and accelerate reliable and trustworthy research on it. In this talk, we will highlight Tianshou’s high-level interfaces, which allow painless applications of RL algorithms in industry applications, as well as the lower-level interfaces that researchers can base their work on. Research code that is compatible with Tianshou’s interfaces will not only get mature evaluation, reporting and hyper-parameter optimization “for free”, but will also be much easier to use in applications, thereby boosting its impact. We will also address the question of environment design, which is a highly important RL engineering topic that is largely ignored in RL research.
Intermediate
Expected audience expertise: Python:Novice
Abstract as a tweet (X) or toot (Mastodon):Reinforcement learning (RL) has untapped potential for industry. This talk presents Tianshou, an open-source library with interfaces facilitating both industrial RL applications and new algorithm research, with the dual goals of accelerating progress and adoption.
Public link to supporting material, e.g. videos, Github, etc.:Mischa is a researcher with background in physics and mathematics who decided to change course and go into AI (for the sake of falsifiability of ideas). On his path since then he has worked on multiple projects in ML and data analysis and as a bonus gained some experience DevOps and in developing production grade solutions.