Continous Control with MuZero

Paper | Poster

During my bachelor thesis at TU Delft, I explored how action sampling strategies affect the performance of Sampled MuZero, a model-based reinforcement learning algorithm designed for continuous control tasks such as robotics.

Unlike discrete domains (e.g., chess or Go), continuous actions—like torques applied to robot joints—cannot be enumerated. Sampled MuZero addresses this by sampling a fixed number of candidate actions from a distribution β during Monte Carlo Tree Search (MCTS). However, little was known about how the choice of β or the use of progressive widening influences performance.

If you’re interested in the full technical details, including results on the Brax HalfCheetah benchmark and further experiments, you can read the paper here: