Juliacon 2024

RxEnvironments: Reactive multi-agent environments
07-12, 16:00–16:30 (Europe/Amsterdam), REPL (2, main stage)

In reinforcement learning problems, interactions between an agent and its environment are simulated in discrete time steps: after a pre-determined amount of time passes, the agent and environment exchange observations and actions that are used to generate new observations for the agent, etc. RxEnvironments.jl changes this assumption and uses a Reactive Programming framework to model agent-environment interactions, allowing interactions at any time and natively supporting multi-agent environments


Introduction

At the core of Reinforcement Learning and self-organizing systems are agent-environment interactions, forming the framework for designing agents that learn by interacting with an external environment. In traditional Reinforcement Learning implementations, the agent and environment have a synchronized clock and communicate on a predetermined frequency. While this framework is convenient, it imposes a fixed temporal discretization on the communication protocol between the agent and the environment. Furthermore, because the environmental transition function usually takes the agent action into account, these environments are unable to simulate a multi-agent environment. The package RxEnvironments.jl alleviates these shortcomings by employing a reactive programming strategy to model agent-environment interactions. Users specify how agents and environments react to different stimuli and how internal states change in the absence of external stimuli. In this paradigm, the agent and environment can handle infrequent stimuli at any time and can react accordingly. Furthermore, by modeling the impact of agent-environment interactions, environments written in RxEnvironments generalize to multi-agent settings. The reactive behavior of RxEnvironments is implemented using Rocket.jl.

Markov blankets

RxEnvironments is inspired by the Active Inference community. This means that every agent and environment is separated from its environment by a Markov Blanket. By implementing interactions on these Markov Blankets, we fully specify how agents and environments incorporate stimuli from their Markov Blankets and determine the data to emit to subscribers. This separates the internal states of the agent and the environment from the observable states witnessed and emitted on the Markov Blanket.

The power of reactivity

Reactivity is at the core of the philosophy of RxEnvironments. A consequence of this is that the transition function in reinforcement learning is decomposed into two separate functions in RxEnvironments: A state transition function that models the behavior of an environment in the absence of any stimuli, and a function that describes the influence of actions on the environment internal state. By continually calling the state transition function and calling the action incorporation function reactively whenever an action is observed, the environment is simulated even in the absence of stimuli, or whenever an agent emits multiple actions in a short period. This decomposition of the transition function also enables multi-agent environments natively, since the environment knows how to react to any incoming action. As the action is not part of the transition function over time, we can add multiple agents to the same environment and the environment will behave accordingly.

Contrast with traditional frameworks

The impact of splitting this transition function is apparent when compared to popular Reinforcement Learning frameworks like OpenAI Gym. In Gym, the environmental simulation can only proceed when presented with an action from an agent, fixing both the amount of agents as well as the time simulated by the transition function. By introducing reactivity and splitting the transition function, we are not limited by these constraints: the environmental simulation will continue running until any agent conducts an action, and the environment will react to this action by updating its internal state. This allows more complex agent-environment interactions with a framework closer to real-world interactions.

Conclusions

RxEnvironments implements a different paradigm for simulating self-organizing agents in an environment than is common in reinforcement learning literature. By alleviating the constraints on the communication protocol between agents and environments we can model a wider scope of environments. A reactive programming approach keeps this paradigm computationally feasible while providing interesting byproducts, such as multi-agent environments.

PhD student @ Eindhoven University of Technology

This speaker also appears in: