Skip to content

Divergent Search

Daniele Gravina edited this page Jul 1, 2019 · 9 revisions

[In progress]

Deception actively leads search away from the global optimum, often by converging prematurely to local optima in the search space. Numerous approaches have been proposed to discourage this behavior, as surveyed by Lehman et al. (2013). Many diversity maintenance techniques, such as speciation (Stanley and Miikkulainen, 2002) and niching (Wessing et al., 2013) enforce local competition among similar solutions. Similarity can be measured on the genotypical level (Goldberg et al., 1987), on the fitness scores (Hu et al., 2005), or on the age of the individuals (Hornby, 2006). An alternative way of exploring the search is coevolution, where the calculation of fitness is dependent on the current population (Angeline and Pollack, 1994); competition between individuals in the same population ideally leads to an arms race towards better solutions and finds a better gradient for search. However, coevolution runs the risk of causing mediocre stalemates where all competitors perform poorly and cannot improve, or that one competitor is so much better than the others that no gradient can be found (Ficici and Pollack, 1998). An interesting alternative approach is proposed in Hutter and Legg (2006), where the selection pressure is distributed uniformly across sparsely populated fitness regions. Such a method has the advantage not to bias the search towards highly fitted areas of the search space, and therefore it can counter the effect of deception. Another possible way to counter deceptive landscapes is to use hand-crafted (or hand-shaped) objective function. The idea, named incremental learning (Elman, 1993), is to design multiple objective functions from simpler to harder. The algorithm will learn the task from the easier objectives, and then it will progressively learn more complex and interesting tasks. However, this approach requires a strong human involvement and an in-depth knowledge of the task to solve, but, given a black-box scenario, this is not always possible. Techniques from multi-objective evolutionary algorithms can, at least in theory, explore the search space more effectively by evaluating individuals in more than one measure of quality (Knowles et al., 2001) and thus avoid local optima by attempting to improve other objectives; however, multi-objective optimization cannot guarantee to circumvent deception (Deb, 1999). Moreover, optimizing for multi-objectives objectives does not ensure a more manageable problem (Brockhoff et al., 2007), as the number of non-dominated solutions can grow with the number of the objectives (Purshouse and Fleming, 2007). Further, multi-modal function optimization (MMFO) can be employed (Goldberg et al., 1987; Mahfoud, 1995). For instance, the niching algorithm NEA2 by Preuss (2015) uses nearest-better clustering to spread several parallel populations over the multiple local optima that are present in the fitness landscape. However, in Lehman et al. (2013), it is argued that in perversely deceptive problems, such as maze navigation and biped robot locomotion, genotypic diversity is not sufficient, as all the possible “right” innovation steps towards the global optimum are punished by the optimization process.

Divergent search tackles the problem directly by rewarding diversity at the phenotypical level, and a shift of paradigm is proposed where rewarding behavioral diversity becomes the predominant driver of evolution. It is argued that divergence, therefore, can tackle the problem of a deceptive fitness landscape or a deceptive fitness characterization by awarding diverse behaviors (Lehman and Stanley, 2011a) which may eventually lead to optimal results. As a popular example of divergent search methods, novelty search by Lehman and Stanley (2011a) is inspired by open-ended evolution and rewards behaviors that have not been seen previously during search. As well as novelty search, surprise-search approaches can be framed within this paradigm (Yannakakis and Liapis, 2016; Gravina et al., 2016b, 2017c). In particular, we can see surprise as a proxy measure to guide the search towards the stepping stones needed to find the global solution of a given (deceptive) problem.

Novelty search

Novelty search (NS) (Lehman and Stanley, 2011a) differs from previous approaches at handling deceptive problems as it explicitly ignores the objective of the problem it attempts to solve. While traditional convergent approaches provide control mechanisms, modifiers or alternate objectives which complement the gradient search towards a better solution, novelty search motivates exploration of the search space by rewarding individuals that are phenotypically (or behaviourally) different without considering whether they are objectively “better” than others. Novelty search is different than a random walk, however, as it explicitly provides higher rewards to more diverse solutions and also because it maintains a memory of the areas of the search space that it has previously explored. The latter is accomplished via a novelty archive of past novel individuals, with individuals with a high novelty score being added continuously to this archive. Each individual’s novelty score is the average distance from a number of closest neighbors in the behavioral space; neighbors can be members of the current population or the novelty archive (see Fig. 2.5). The distance measure is problem-dependent and can also bias the search (Lehman and Stanley, 2011a) and thus affect the performance and behavior of the novelty search algorithm: examples include the agents’ final positions in a two-dimensional maze solving task, the position of a robot’s center of mass (Lehman and Stanley, 2011a), properties of images such as brightness and symmetry (Lehman and Stanley, 2012), machine-learned encodings (Liapis et al., 2013), or the amount of collected reward (Risi et al., 2009, 2010).

The novelty score is computed through as the average distance with the n closest individuals in the current population or an archive of novel solutions. In every generation, individuals with a novelty score above a fluctuating threshold are added to a novelty archive which persists throughout the evolutionary run. The distance d, which is used to assess the novelty score as well as what constitutes an individual’s closest neighbors, is based on the difference of behaviors (rather than genotypes) between individuals. This allows novelty search to explore a diverse set of behaviors without explicitly favoring behaviors closer to the desired behavior, i.e., the solution of a problem.

A high-level diagram of the Novelty Search algorithm. The novelty score is computed for every individual in the population by averaging the distance from the closest neighbors collected from the current population and an archive. Image inspired by Liapis (2014

Clone this wiki locally