The MCPG-ME method employs two distinct variation operators within the standard MAP-Elites loop: (1) \(\textbf{Isoline Variation}\), which mutates a parent genotype based on that of a randomly selected elite; (2) \(\textbf{MCPG Variation}\), where the behavior of a parent genotype is computed over a trajectory of states previously encountered by another. The behavior is then adjusted with \(\boldsymbol{P}\) based on quality metrics estimated from comparisons with the second genotype's behavior. This adjusted behavior is converted back into genotype space using \(\boldsymbol{J}\) to mutate the parent genotype.

Summary

Quality-Diversity (QD) optimization, a family of evolutionary algorithms designed to produce a diverse set of high-performing solutions for a specific problem. MAP-Elites (Mouret & Clune, 2015) is a simple, yet effective QD optimization algorithm that has shown success to a variety of fields. It has been used to teach robots how to adapt to damage (Cully et al., 2015; Vassiliades et al., 2016; Tarapore et al., 2016), generate aerodynamic designs (Gaier et al., 2017) or to create content for games (Alvarez et al., 2017). MAP-Elites, however, is driven by Genetic Algorithms and often struggles to evolve neural networks with numerous parameters and is inefficient in navigating the search space of the optimization problem. Additionally, actor-critic based QD algorithms, like PGA-MAP-Elites (Nilsson & Cully, 2021) and DCRL (Faldor et al., 2024) , while capable of handling more complex models efficiently, suffer from slow execution times and are heavily dependent on the effectiveness of the actor-critic training, which compromises scalability. Addressing these challenges, this work introduces the Monte Carlo Policy Gradient MAP-Elites (MCPG-ME) algorithm, which utilizes a Monte Carlo Policy Gradient (MCPG) based variation operator to apply quality-guided mutations to the solutions. This novel variation operator allows MCPG-ME to independently optimize the solutions without relying on any actor-critic training, enhancing scalability, runtime efficiency while maintaining competitive sample efficiency.

Results

Evaluations across various continuous control locomotion tasks demonstrate that MCPG-ME:

Fast: Achieves high execution speeds, operating significantly faster than the actor-critic based QD algorithms, in some cases running up to nine times faster.
Sample Efficient: Surpasses the performance of the state-of-the-art algorithm, DCRL, in some of the tasks and consistently outperform all the other baselines in most of the tasks.
Scalable: Demonstrates promising scalability capabilities when subjected to massive parallelization.
Novel Solutions: Excels in finding a diverse set of solutions, achieving equal or higher diversity score (coverage) than all competitors in all tasks.

In these tasks, solutions refer to the robot’s walking behavior. The quality of solutions refers to the effectiveness with which the robot completes a given task, and the diversity of solutions denotes the range of different walking styles the robot can successfully employ. See below some interesting solutions:

A hopping Walker

Ant going around the trap

A fast Walker

For technical details of the method and the official results, please stay tuned. The paper is coming soon!

Author

Konstantinos Mitsides (konstantinos.mistides23@imperial.ac.uk)

Quality-Diversity optimization algorithms such as MAP-Elites, aim to generate collections of both diverse and high-performing solutions to an optimization problem. MAP-Elites has shown promising results in a variety of applications. In particular in evolutionary robotics tasks targeting the generation of behavioral repertoires that highlight the versatility of robots. However, for most robotics applications MAP-Elites is limited to using simple open-loop or low-dimensional controllers. Here we present Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites), a novel algorithm that enables MAP-Elites to efficiently evolve large neural network controllers by introducing a gradient-based variation operator inspired by Deep Reinforcement Learning. This operator leverages gradient estimates obtained from a critic neural network to rapidly find higher-performing solutions and is paired with a traditional genetic variation to maintain a divergent search behavior. The synergy of these operators makes PGA-MAP-Elites an efficient yet powerful algorithm for finding diverse and high-performing behaviors. We evaluate our method on four different tasks for building behavioral repertoires that use deep neural network controllers. The results show that PGA-MAP-Elites significantly improves the quality of the generated repertoires compared to existing methods.

Summary

Results

Author

References

2024

2021

2017

2016

2015