MCPG-ME

Monte Carlo Policy Gradient MAP-Elites: A synergy between Deep Reinforcement Learning and Quality Diversity Algorithms.

The MCPG-ME method employs two distinct variation operators within the standard MAP-Elites loop: (1) \(\textbf{Isoline Variation}\), which mutates a parent genotype based on that of a randomly selected elite; (2) \(\textbf{MCPG Variation}\), where the behavior of a parent genotype is computed over a trajectory of states previously encountered by another. The behavior is then adjusted with \(\boldsymbol{P}\) based on quality metrics estimated from comparisons with the second genotype's behavior. This adjusted behavior is converted back into genotype space using \(\boldsymbol{J}\) to mutate the parent genotype.

Summary

Quality-Diversity (QD) optimization, a family of evolutionary algorithms designed to produce a diverse set of high-performing solutions for a specific problem. MAP-Elites (Mouret & Clune, 2015) is a simple, yet effective QD optimization algorithm that has shown success to a variety of fields. It has been used to teach robots how to adapt to damage (Cully et al., 2015; Vassiliades et al., 2016; Tarapore et al., 2016), generate aerodynamic designs (Gaier et al., 2017) or to create content for games (Alvarez et al., 2017). MAP-Elites, however, is driven by Genetic Algorithms and often struggles to evolve neural networks with numerous parameters and is inefficient in navigating the search space of the optimization problem. Additionally, actor-critic based QD algorithms, like PGA-MAP-Elites (Nilsson & Cully, 2021) and DCRL (Faldor et al., 2024) , while capable of handling more complex models efficiently, suffer from slow execution times and are heavily dependent on the effectiveness of the actor-critic training, which compromises scalability. Addressing these challenges, this work introduces the Monte Carlo Policy Gradient MAP-Elites (MCPG-ME) algorithm, which utilizes a Monte Carlo Policy Gradient (MCPG) based variation operator to apply quality-guided mutations to the solutions. This novel variation operator allows MCPG-ME to independently optimize the solutions without relying on any actor-critic training, enhancing scalability, runtime efficiency while maintaining competitive sample efficiency.

Results

Evaluations across various continuous control locomotion tasks demonstrate that MCPG-ME:

  • Fast: Achieves high execution speeds, operating significantly faster than the actor-critic based QD algorithms, in some cases running up to nine times faster.

  • Sample Efficient: Surpasses the performance of the state-of-the-art algorithm, DCRL, in some of the tasks and consistently outperform all the other baselines in most of the tasks.

  • Scalable: Demonstrates promising scalability capabilities when subjected to massive parallelization.

  • Novel Solutions: Excels in finding a diverse set of solutions, achieving equal or higher diversity score (coverage) than all competitors in all tasks.

In these tasks, solutions refer to the robot’s walking behavior. The quality of solutions refers to the effectiveness with which the robot completes a given task, and the diversity of solutions denotes the range of different walking styles the robot can successfully employ. See below some interesting solutions:

A jumping Walker

A hopping Walker

Ant going around the trap

Ant going around the trap

A fast Walker

A fast Walker

For technical details of the method and the official results, please stay tuned. The paper is coming soon!

Author

  • Konstantinos Mitsides (konstantinos.mistides23@imperial.ac.uk)

References

2024

  1. Maxence Faldor, Félix Chalumeau, Manon Flageat, and 1 more author
    09–15 jun 2024

2021

  1. Olle Nilsson, and Antoine Cully
    In Proceedings of the Genetic and Evolutionary Computation Conference, Lille, France, 09–15 jun 2021

2017

  1. Adam Gaier, Alexander Asteroth, and Jean Baptiste Mouretz
    In 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, 2017, 09–15 jun 2017

2016

  1. Vassilis Vassiliades, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret
    Oct 2016
  2. Danesh Tarapore, Jeff Clune, Antoine Cully, and 1 more author
    Oct 2016

2015

  1. Jean-Baptiste Mouret, and Jeff Clune
    Apr 2015
  2. Antoine Cully, Jeff Clune, Danesh Tarapore, and 1 more author
    Nature, May 2015