# List of Tables xi

Generational Parallel Varying Mutation GAs and their Applications

A dissertation submitted in partial ful?llment of the requirements for the degree of Doctor of Philosophy at Shinshu University

Hern? an Eduardo Aguirre Dur? an

March 2003 Shinshu University Japan

Copyright 2003 Hern? an E. Aguirre All Rights Reserved

Dedication

To my parents.

i

Contents

List of Figures List of Tables Abstract 1 Introduction 1.1 Evolutionary Algorithms . . . . . . 1.1.1 Genetic Algorithms . . . . . 1.1.2 Evolution Strategies . . . . 1.1.3 Evolutionary Programming . 1.2 Background . . . . . . . . . . . . . 1.3 Issues and Goals . . . . . . . . . . 1.4 Outline . . . . . . . . . . . . . . . vii xi xiii 1 2 2 3 4 4 5 6 9 10 10 11 11 12 13 15 16 17 19 20 21 22 23 27 28 28 28 30 31

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

2

Generational Varying Mutation Genetic Algorithms 2.1 A Canonical Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 A Generational Standard Varying Mutation Genetic Algorithm . . . . . . . . . . 2.3 A Generational Model of Parallel Varying Mutation Genetic Algorithm (GA-SRM) 2.3.1 Parallel Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Extinctive Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Mutation Rate Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Mutation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Duplicates Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 GA-SRM’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Problems Generators 3.1 0/1 Multiple Knapsacks Problems 3.1.1 Real World Problems . . . 3.1.2 Large Random Problems . 3.2 NK-Landscapes . . . . . . . . . .

3

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4

Studying the Structure of the Parallel Varying Mutation GA-SRM 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Operators’ Balance . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Mutation Probability in CM . . . . . . . . . . . . . . . . . . . . 4.5 Extinctive Selection Pressure . . . . . . . . . . . . . . . . . . . iii

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 5

SRM’s Adaptive Minimum Level τ . . . . . . . . . . . . . . . . . . Two-point and Uniform Crossover . . . . . . . . . . . . . . . . . . Impact of the Population Size . . . . . . . . . . . . . . . . . . . . . Search Ability and Number of Evaluations . . . . . . . . . . . . . . Contribution of Parallel Genetic Operators and Extinctive Selection Results for Real-World 0/1 Multiple Knapsack Problems . . . . . . Results for Random 0/1 Multiple Knapsack Problems . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

33 33 34 34 35 39 41 46

Comparing Standard and Parallel Varying Mutation GAs: Effect of the Major Components 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Simple GA and Extinctive Selection . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Varying Mutation without Extinctive Selection . . . . . . . . . . . . . . . . . . 5.5 Extinctive Selection and Varying Mutation . . . . . . . . . . . . . . . . . . . . . 5.5.1 Deterministic Varying Mutation . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Self-Adaptive Varying Mutation . . . . . . . . . . . . . . . . . . . . . . 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Studying the Behavior of GA-SRM on Epistatic Problems 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 RBC+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Selection Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Parallel Varying Mutation, Mutation Strategy, and Patterns of Epistasis 6.6 Duplicates Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Eliminating Fitness Duplicates and Parallel Varying Mutation . . . . 6.8 No Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47 48 48 48 50 51 51 57 61 65 66 67 67 68 70 73 78 79 82 83 84 84 85 85 85 87 87 87 88 88 89 92 92 92 94 94

6

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

7

Distributed GA with Parallel Varying Mutation 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Distributed GA (Island Model) . . . . . . . . . . . . . . . . . . . . . . . . 7.3 DGA-SRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Communication Topology . . . . . . . . . . . . . . . . . . . . . . 7.3.2 CM and SRM in DGA . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 (?, λ) Proportional Selection in DGA . . . . . . . . . . . . . . . . 7.3.4 Migration Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.5 Algorithm of DGA-SRM . . . . . . . . . . . . . . . . . . . . . . . 7.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Results and Discussion for Large Random 0/1 Multiple Knapsack Problems 7.5.1 Problem Dif?culty . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Subpopulation Size . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Migration Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.5 Extinctive Selection . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Results and Discussion for Real-World 0/1 multiple knapsack problems . . iv

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

7.7 8

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 101 102 102 103 103 106 106 106 108 110 111 111 112 117 118 118 120 120 123 124 124 125 126 128 128 133 143 147 149 157

Real World Application: Halftone Image Generation 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Conventional Image Halftoning Technique Using GA . . . . . . . . . . 8.3 Accelerated Halftoning Scheme Using GA-SRM . . . . . . . . . . . . 8.3.1 Extension to Two Dimensional Image Halftoning Problem . . . 8.4 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . 8.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Performance Observation with Same Size Offspring Population Conventional Scheme . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Effect of Population Size Reduction . . . . . . . . . . . . . . . 8.4.4 Effect of Parameters’ Balance for Offspring Creation . . . . . . 8.4.5 Other Benchmark Images . . . . . . . . . . . . . . . . . . . . 8.4.6 Generated Images . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simultaneous Halftone Image Generation with Multiobjective GA-SRM 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Halftoning Problem with GAs . . . . . . . . . . . . . . . . . . . . . 9.3 Multiobjective Optimization (MO) . . . . . . . . . . . . . . . . . . . 9.4 Multiobjective GA-SRM for Halftoning Problem . . . . . . . . . . . 9.4.1 CM and SRM for Halftoning Problem . . . . . . . . . . . . 9.5 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . 9.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Comparison Between Simple and Multiobjective GAs . . . . 9.5.3 Non-dominated Pareto Solutions . . . . . . . . . . . . . . . . 9.5.4 Effect of Information Sharing . . . . . . . . . . . . . . . . . 9.5.5 Dynamic Con?gurations for Further Improvement . . . . . . 9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . Used . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . by . . . . . . . . . . . .

9

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

10 Conclusions Acknowledgements Bibliography Publications

v

List of Figures

1.1 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 The outline of an evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . Canonical Genetic Algorithm . . . . . . . . . . . . . . . . . . Genetic Algorithm with Varying Mutation . . . . . . . . . . . Deterministic Hyperbolic Schedule for Mutation Rate Control ADS (Adaptive Dynamic-Segment) mutation. . . . . . . . . . ADP (Adaptive Dynamic-Probability) mutation. . . . . . . . . GA with Parallel Varying Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 10 11 14 16 16 17 20

Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3) (3) An example of the ?tness function f3 (x3 , z1 , z2 ) associated to bit x3 in which (3) x3 epistatically interacts with its left and right neighboring bits, z1 = x2 and (3) z2 = x4 (N = 8, K = 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators’ balance and search ability. . . . . . . . . . . . . . . . . . . . . . . . CM’s mutation probability and search ability (elimination of ?tness duplicates is off). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SRM’s offspring number after extinctive selection. . . . . . . . . . . . . . . . . Extinctive selection pressure and search ability. . . . . . . . . . . . . . . . . . . SRM’s minimum level and search ability. . . . . . . . . . . . . . . . . . . . . . Average ?tness in 100 runs of the best-so-far individual. . . . . . . . . . . . . . Con?guration transition: GA(50,100) to GA-SRM(50,100). . . . . . . . . . . . . Con?guration transition: GA-SRM(50,100) to all SRM or all CM. . . . . . . . . Diversity and search ability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reducing the feasible region φ = {0.75, 0.50, 0.25}, m = 30, n = 100. . . . . . Increasing the number of constraints m = {5, 10, 30}, n = 100, φ = 0.25. . . . . Increasing the search space n = {100, 250, 500}, m = 30, φ = 0.25 . . . . . . . Transition from GA(50,100) to GA-SRM(50,100) at various fractions of T . . . . Transitions from GA-SRM(50,100) to either all CM(50,100) or all SRM(50,100) regimes at = {2, 3} . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SRM’s contribution to the parent population (?SRM ) and number of bits actually (CM ) = {0, 1/n} . . . . ?ipped by SRM (b) for CM’s mutation probabilities of pm (CM ) = {0, 1/n} . . . . . Fitness Transition for CM’s mutation probabilities of pm Effect of Extinctive Selection on a simple GA: Fitness transition over the generations by cGA and GA(?, λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of Extinctive Selection on a simple GA: M’s contribution to the parent population (?M ) in GA(?, λ) (pc = 0.6) . . . . . . . . . . . . . . . . . . . . . . . . vii

24 30 31 32 33 34 37 37 38 38 42 42 43 44 44 45 45

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 5.1 5.2

49 49

5.3 5.4 5.5 5.6 5.7 5.8 5.9

Deterministic Varying Mutation without Extinctive Selection . . . . . . . . . . . Deterministic Varying Mutation Serial to Crossover (m = 30, n = 100, φ = 0.25) Deterministic Varying Mutation Parallel to Crossover (m = 30, n = 100, φ = 0.25) Convergence Reliability of Deterministic Varying Mutation GAs: Reducing the feasible region φ = {0.75, 0.50, 0.25}, m = 30, n = 100. . . . . . . . . . . . . . Convergence Reliability of Deterministic Varying Mutation GAs: Increasing the number of constraints m = {5, 10, 30}, n = 100, φ = 0.25. . . . . . . . . . . . Convergence Reliability of Deterministic Varying Mutation GAs: Increasing the search space n = {100, 250, 500}, m = 30, φ = 0.25. . . . . . . . . . . . . . .

(t=0) pm (i)

50 52 52 53 53 54 58 58 59 59 60 60 61 69 71 71 72 72 73 73 74 74 75 75 77 77

= pmax (m = 30, Self-Adaptive Varying Mutation Serial to Crossover , m n = 100, φ = 0.25). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (t=0) 5.10 Self-Adaptive Varying Mutation Parallel to Crossover, pm (i) = pmax m (m = 30, n = 100, φ = 0.25). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (t=0) = 0.5 for hGA and 5.11 Convergence Velocity (m = 30, n = 100, φ = 0.25). pm (t=0) GA-hM. pm (i) = rand[1/n, 0.5] for sGA and GA-SM . . . . . . . . . . . .

(t=0) pm

= 0.5 for 5.12 Average Number of Flipped Bits (m = 30, n = 100, φ = 0.25). (t=0) hGA and GA-hM. pm (i) = rand[1/n, 0.5] for sGA and GA-SM . . . . . . . 5.13 Convergence Reliability of Self-Adaptive Varying Mutation GAs: Reducing the feasible region φ = {0.75, 0.50, 0.25}, m = 30, n = 100. . . . . . . . . . . . . . 5.14 Convergence Reliability of Self-Adaptive Varying Mutation GAs: Increasing the number of constraints m = {5, 10, 30}, n = 100, φ = 0.25. . . . . . . . . . . . 5.15 Convergence Reliability of Self-Adaptive Varying Mutation GAs: Increasing the search space n = {100, 250, 500}, m = 30, φ = 0.25. . . . . . . . . . . . . . . 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 Higher selection pressure. Landscapes with random patterns of epistasis. . . . . . Nearest neighbor patterns of epistasis: Mean ?tness after 104 generations for N = 48, K = 0, · · · , N ? 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random patterns of epistasis: Mean ?tness after 104 generations for N = 48, K = 0, · · · , N ? 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbor patterns of epistasis: Mean ?tness after 104 generations for N = 96, K = 0, · · · , N ? 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random patterns of epistasis: Mean ?tness after 104 generations for N = 96, K = 0, · · · , N ? 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbor patterns of epistasis: Fitness transition of best so far, N = 48, K = 12, 104 generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random patterns of epistasis: Fitness transition of best so far, N = 48, K = 12, 104 generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbor patterns of epistasis: Fitness transition of best so far, N = 96, K = 24, 105 generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random patterns of epistasis: Fitness transition of best so far, N = 96, K = 24, 105 generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Duplicates elimination: Higher selection pressure. . . . . . . . . . . . . . . . . . Duplicates elimination: Effect of number of evaluations (mean ?tness after 2 × 105 and 2 × 106 evaluations). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Offspring ranking by GA(?, λ) (K = 12). . . . . . . . . . . . . . . . . . . . . . Offspring ranking by GAed(?, λ) (K = 12). . . . . . . . . . . . . . . . . . . . . viii

6.14 Accumulated ?tness of the unique genotype offspring, which is re?ected in the selection probabilities of GA(?, λ) (K = 12). An ensemble of clones is treated as one individual and its ?tness is the accumulated ?tness of the clones in the ensemble. 6.15 Number of eliminated duplicates. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.16 Parallel varying mutation and mutation strategy on NK-Landscapes with N = 96 and random epistatic pattern . Duplicates elimination feature is turned on . . . . 6.17 Parallel varying mutation and mutation strategy on NKP-Landscapes with N = P = 96 and nearest neighbor epistatic pattern. Duplicates elimination feature is turned on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.18 No recombination, M-SRMed (100,200) . . . . . . . . . . . . . . . . . . . . . . 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 +1+2 communication topology. . . . . . . . . . . . . . . . . . . . . . . . . . . Migration policy and extinctive selection. . . . . . . . . . . . . . . . . . . . . . Algorithm of DGA-SRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tightness of the Capacities φ (K = 16, 10% migration, m = 30, n = 100). . . . Objects n (K = 16, 10% migration, m = 30, φ = 0.25). . . . . . . . . . . . . . Knapsacks m (K = 16, 10% migration, n = 100, φ = 0.25). . . . . . . . . . . . Subpopulation size λ, K (10% migration, m = 30, n = 100, φ = 0.25). . . . . . Migration Rate λm /λ (K = 16, m = 30, n = 100, φ = 0.25). . . . . . . . . . . SRM’s adaptation in DGA-SRM (K = 16, m = 30, n = 100, φ = 0.25). . . . . Effect of Extinctive Selection (K = 16, m = 30, n = 100, φ = 0.25). . . . . . . Fitness Transition (K = 16, m = 30, n = 100, φ = 0.25, M = 20). . . . . . . . Effect of the Migration Interval: Four subpopulations (K = 4), migration rate λm /λ = 10%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.13 Effect of the Migration Interval: Sixteen subpopulations (K = 16), migration rate λm /λ = 10%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.14 Fitness transition over the generations (K = 16) . . . . . . . . . . . . . . . . . . 7.15 Fitness transition over time (K = 16) . . . . . . . . . . . . . . . . . . . . . . . 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 9.1 9.2

78 78 80

80 81 86 87 88 90 90 91 92 93 93 95 95 98 98 99 99

Illustration of 2D Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Adaptive Dynamic-Block mutation (ADB) . . . . . . . . . . . . . . . . . . . . . 105 Bit swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Illustration of bit swapping process for qualitative mutation (4 × 4 mutation block) 106 cGA and GA-SRM’s performance using same size offspring population . . . . . 107 Mutation block’s side length reduction and SRM-ADB offspring that survive selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Performance by cGA with different populations sizes . . . . . . . . . . . . . . . 108 Performance by GA-SRMf (quantitative mutation) with different population sizes 109 Performance by GA-SRMs (qualitative mutation) with different population sizes 109 Performance with different parameter balance for offspring creation (? = 2 and λ = 4 con?guration) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 cGA and GA-SRMs’s performance for “Moon Surface” and “Aerial” ((?, λCM , λSRM ) = (2, 2, 2) con?guration in GA-SRMs) . . . . . . . . . . . . . . . . . . . . . . . . 111 Original and generated halftone images (“Girl”) . . . . . . . . . . . . . . . . . . 113 Original and generated halftone images (“Moon Surface” and “Aerial”) . . . . . 115 Block diagram of the extended multiobjective GA-SRM . . . . . . . . . . . . . . 121 Algorithm to simultaneously generate N halftone images with the extended multiobjective GA-SRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 ix

moGA-SRM’s average parent population distribution after 0.1T (Lenna) . . . . . 129 moGA-SRM’s average parent population distribution after T evaluations (Lenna) 129 Error transition for various ω n (Lenna) . . . . . . . . . . . . . . . . . . . . . . . 130 Error transition for various ω n using GA-SRM D? (2, 44) (Lenna) running for 0.70T (Lenna) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 9.7 Lennas’s original and simultaneously generated halftone images by moGA-SRMD? (2,44) after 0.70T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 9.8 Girls’s original and simultaneously generated halftone images by moGA-SRMD? (2,44) after 0.70T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 9.9 Aerial’s original and simultaneously generated halftone images by moGA-SRMD? (2,44) after 0.70T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.10 Moon’s original and simultaneously generated halftone images by moGA-SRMD? (2,44) after 0.70T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 9.3 9.4 9.5 9.6

x

List of Tables

3.1 3.2 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Real world 0/1 multiple knapsack test problems. . . . . . . . . . . . . . . . . . . 7 Subclasses of problems (10 random problems in each subclass) . . . . . . . . . Genetic algorithms parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for Weing 7 using GA-SRM with different population sizes. . . . . . . . Results for Weing 7 using GA-SRM(50,100) with ADS under various evaluation times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knapsack test problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for various knapsack test problems. . . . . . . . . . . . . . . . . . . . . Results by Khuri et al. [52]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for Weing 7 using other mutation schedules [31]. . . . . . . . . . . . . . Factorial ANOVA for hGA and GA-hM . . . . . . . . . . . . . . . . . . . . . . Error Gap Means by hGA and GA-hM reducing the feasible region (ratio φ) . . . Error Gap Means by hGA and GA-hM increasing the number of constraints (knapsacks m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Gap Means by hGA and GA-hM increasing the search space (number of objects n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factorial ANOVA for sGA and GA-sM . . . . . . . . . . . . . . . . . . . . . . Error Gap Means by sGA and GA-sM reducing the feasible region (ratio φ) . . . Error Gap Means by sGA and GA-sM increasing the number of constraints (knapsacks m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Gap Means by sGA and GA-sM increasing the search space (number of objects n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GAs Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 22 29 35 36 40 40 40 40 55 55 56 56 62 62 63 63 68 89 89 96 96 97 124 125 127 128

6.1 7.1 7.2 7.3 7.4 7.5

Genetic algorithms parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . Results by single population cGA and GA-SRM . . . . . . . . . . . . . . . . . . Results for Weing7 (105,2) by cGA and GA-SRM using different population sizes (T = 8 × 105 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for Weing7 (105,2) by DGA and DGA-SRM (λtotal = 800, T = 8 × 105 ). Results for other problems by DGA and DGA-SRM (K = 16, λtotal = 800, T = 4 × 105 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genetic algorithms parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluations needed to generate high quality images by cGA(200) . . . . . . . . . Obtained Pareto front (Lenna) . . . . . . . . . . . . . . . . . . . . . . . . . . . Actual percentage of evaluations expended in each direction by the GAs (Lenna) xi

9.1 9.2 9.3 9.4

9.5

Reduced evaluations to generate high quality images by moGA-SRM(2,44) with dynamic con?gurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

xii

Abstract

Evolutionary algorithms (EAs) describe computer-based problem solving systems that use computational models of evolutionary processes as primary elements of its design. Global search abilities, adaptation to the task in hand, and robust performance are favorable characteristics of EAs making them successful solving complex optimization problems. This work focuses on genetic algorithms (GAs), one of the mainstreams of EAs. Canonical GAs use a binary representation for the individuals, i.e. bit strings with {0,1} alphabet. Offspring are generated by randomized process intended to model recombination and mutation. Crossover exchanges segments of information between pairs of individuals with probability pc and then mutation inverts bits with a probability pm per bit. In the absence of crossover (1 ? pc ) mutation is applied alone. The canonical GA emphasizes crossover as the main genetic operator and considers mutation only as a secondary “background” operator to be used with very small probability. In addition, a common practice has been to run the GA with its search strategies ?xed. A run of a GA, however, is itself an intrinsically dynamic, adaptive process, and there is experimental evidence suggesting that different strategies might be optimal at different stages of the evolutionary process. Methods that modify its search strategies during the run of the algorithm are considered as one of the most important and promising areas of research in evolutionary algorithms, because they could lead to signi?cant improvements in performance. Among these, some approaches that seek to combine crossover with (higher) varying mutation rates during the course of a run have been proposed recently. It has been shown that deterministically varying mutation rates over the generations and/or across the representation can improve the performance of GAs. Self-adaptive mutation rate schedules have also been proposed. Self-adaptive algorithms evolve simultaneously strategy parameters and object variables and are regarded as the most promising way of combining forms of control (strategy parameters co-adaptation). From the application of operators standpoint, deterministic, adaptive, and self-adaptive varying mutation GAs have been mostly designed similar to a canonical GA. That is, crossover is applied with high probability pc and then follows mutation. Thus, under these standard varying mutation approaches, higher mutations are mostly applied serial to crossover. This model of standard varying mutation GAs raises several important questions regarding the interference between crossover and high mutation, how this affects performance of the algorithm, whether this affect the mutation rate control itself in the case of (adaptive) self-adaptive varying mutation algorithms, and more generally whether this is an appropriate model for combining forms of control (co-adaptation of strategy parameters). This work proposes a model of parallel varying mutation GA that addresses the questions raised by the standard model of varying mutation GAs. The proposed model detaches varying mutation from crossover, applies “background” mutation (or none at all) after crossover (CM) and varying mutations (SRM) only parallel to crossover, putting the operators CM and SRM in a cooperative-competitive stand with each other by subjecting their offspring to extinctive selection. The model relies in adaptive (self-adaptive) mutation schedules to increase the effectiveness of SRM and enhance selection by eliminating ?tness duplicates, which postpones genetic drift and creates a fair competition between the offspring created by both operators. The motivation of this work comes from the desire to design effective and ef?cient varying mutation GAs that can be used in real-world applications. There are several advantages in the proposed model. First, it gives an ef?cient framework to xiii

achieve better balances for varying mutation and crossover in which the strengths of the operators can be kept without interfering one with the other. Second, since varying mutation is detached from crossover, the instantaneous effectiveness of the varying mutation operator depends only upon itself and its relative success can be directly tied to the mutation rate to create adaptive (selfadaptive) schemes for mutation rate control. The same can be said for crossover, especially if no “background” mutation is applied after it. Third, parallel varying mutation can be studied on its own seeking to increase the performance of GAs. Fourth, the individual roles and the interaction of crossover and varying mutation throughout the run of the algorithm can be better understood, which could be important for co-adaptation studies. The model is studied using two test problem generators. One of the generators is for 0/1 multiple knapsack problems, which allows to test the model on a broad range of classes of constrained problems by varying the feasible region of the search space, number of constraints, and the size of the search space. Real-world 0/1 multiple knapsack problems with known global optimum are also used. These latter problems allow studying the global search abilities of the algorithms. The second test problem generator is the well known Kauffman’s NK-Landscapes. NK-Landscapes are stochastically generated ?tness functions on bit strings, parameterized with N bits and K epistatic interactions between bits. The term epistasis describes nonlinearities in ?tness functions due to changes in the values of interacting bits. For a giving N , we can tune the ruggedness of the ?tness function by varying K . In the limits, K = 0 corresponds to a model in which there are no epistatic interactions and the ?tness contribution from each bit value is simply additive, which yields a single peaked smooth ?tness landscape. On the opposite extreme, K = N ? 1 corresponds to a model in which each bit value is epistatically affected by all the remaining bit values, yielding a maximally rugged fully random ?tness landscape. Varying K from 0 to N ? 1 gives a family of increasingly rugged multi- peaked landscapes. NK-Landscapes allow testing the model in a broad range of classes of epistatic, non-linear, problems. First, the internal structure of the proposed model (GA-SRM) is studied in depth using an adaptive schedule for mutation. Important structural issues are the balance for offspring creation between CM and SRM, the ratio between number of parents and number of offspring (extinctive selection pressure), “background” mutation probability in CM, and the threshold to trigger adaptation in SRM. The effect of population size and number of evaluations is observed, too. Two mutation strategies to select the bits that will undergo mutation are investigated. In addition, the importance and effect on performance of extinctive selection and the interaction of varying mutation parallel to crossover is assessed. It is found that GA-SRM greatly improves the performance of GAs for global optimization in constrained problems. Extinctive selection accelerates the search process and parallel varying mutation increase the convergence reliability of the algorithm. Robustness even with small populations is also a remarkable characteristic observed in the improved GA-SRM. Mutation strategy for parallel varying mutation turns out to be an important issue to improve the performance of parallel varying mutation. Second, the proposed model is compared with the standard model of varying mutations GAs across a broad range of problems using deterministic and self-adaptive schedules for mutation rate control. The statistical signi?cance of the results is veri?ed with ANOVA tests. It is found that the proposed model is more effective and ef?cient than the standard model. In deterministic varying mutation GAs, a GA with varying mutation parallel to crossover showed faster convergence and higher robustness to initial settings of mutation rate than a GA with varying mutation serial to crossover. In self-adaptive varying mutation GAs, the convergence velocity of a parallel self-adaptive mutation GA was matched by a serial self-adaptive mutation GA only when initial diversity of parameters was allowed. Convergence reliability of the parallel varying mutation model was higher than the standard model of varying mutation in both deterministic and self-adaptive xiv

GAs. It was also found that the standard model of varying mutations in fact affects negatively the (adaptive) self-adaptive mutation rate control. This strongly suggests that the standard model of varying mutation GAs may not be appropriate for combining forms of control. Then, the behavior of the parallel varying mutation GA-SRM is examined on epistatic problems using NK-Landscapes. Properties of NK-Landscapes are discussed and the effect on performance of selection, drift, mutation, and recombination is veri?ed. Mutation strategy for the varying mutation operator is also studied in detail. Experiments are conducted using NK-Landscapes with nearest neighbor and random patterns of epistasis. Comparisons are made with a canonical GA, a simple GA with extinctive selection, a mutation only EA, and a random bit climber RBC+. It is shown that GAs can be robust algorithms on NK-Landscapes postponing drift by eliminating ?tness duplicates and using selection pressure higher than a canonical GA. Different to previous works, even simple GAs with these two features perform better than the single bit climber RBC+ for a broad range of classes of problems. It is also shown that the interaction of parallel varying mutation with crossover (GA-SRM), similar to constrained problems, improves further the reliability of the GA. Contrary to intuition it is found that a mutation only EA can perform as well as GA-SRM that includes crossover for small values of K , where crossover is supposed to be advantageous; but the relative importance of crossover interacting with varying mutation increases with K performing better than mutation alone for medium and high K . Better overall performance by population based mutation only evolutionary algorithms over random bit climbers is also observed. With regards to mutation strategy for parallel varying mutation, it is found that a dynamic segment mutation strategy improves further the performance of GAs on problems with nearest neighbor patterns of epistasis. After analyzing the proposed model and comparing it with the standard model of varying mutation GAs, it is shown that the fundamental concept of the model can be successfully extended to other important classes of GAs and that it can be effectively applied to real-world problems. An important area of research is the parallelization of GAs. Evolutionary algorithms are population based methods and it is considered that its full potential would come from implementing the algorithm in parallel architectures. It is shown that the proposed model extended to a parallel distributed GA (DGA-SRM) achieves higher search speed, higher convergence reliability, and less communication cost for migration than a canonical distributed GA. It is also shown that DGASRM scales up better as the dif?culty of the problem increases and tolerates population reductions better than a canonical distributed GA. Next, it is shown that GA-SRM can be successfully applied to real world problems in which ef?ciency in processing time and computer memory is a major issue. The improved GA-SRM is extended to the two dimensional image halftoning problem and an accelerated image halftoning technique using GA-SRM with tiny populations is proposed. Simulation results verify that the proposed scheme impressively reduces computer memory and processing time, making the improved approach appealing for practical implementation. Furthermore, it is shown that the concept of GA-SRM can also be effective for multi-objective optimization of real world applications, which is important due to the multi-objective nature of most real-world problems. The improved GA-SRM is extended to a multi-objective optimization GA to simultaneously generate halftone images with various combinations of gray level precision and spatial resolution. Simulation results verify that the proposed scheme can effectively generate several high quality images simultaneously in a single run reducing even further the overall processing time. Finally, this work is summarized presenting conclusions and suggesting future research.

xv

xvi

Chapter 1

Introduction

This chapter presents a brief introduction to evolutionary algorithms. Then, it describes the background, motivation and goals of this research. Finally, it outlines the contents of this work.

1

1.1 Evolutionary Algorithms

An evolutionary algorithm describes a computer-based problem solving system that uses computational models of evolutionary processes as primary elements of its design and implementation. The origins of evolutionary inspired algorithms for optimization and machine learning can be traced to the 1950s and 1960s[1, 2, 3, 4, 5, 6]. In addition, several evolutionary biologists used computers to simulate evolution for the purpose of controlled experiments[7, 8, 2, 9, 10]. However, historically the three mainstream instances of evolutionary algorithms that have received considerably attention are Genetic Algorithms[11], Evolution Strategies[12, 13], and Evolutionary Programming[14]. These approaches have inspired the development of additional evolutionary algorithms such as genetic programming and classi?er systems. Moreover, hybridizations of evolutionary algorithms with other soft computing techniques, such neural networks and fuzzy systems, or with other search heuristics, such local search, tabu search, and simulated annealing, are intensely being developed. Although a variety of evolutionary algorithms have been proposed, all of them share the following general and basic properties: (i) Evolutionary algorithms utilize the collective learning process of a population of individuals. Each individual represents and encodes a search point in the space of potential solutions to a given problem. (ii) By means of evaluating individuals in their environment, a measure of quality or ?tness can be assigned to the individuals. According to the quality measure, a selection process favors ?tter individuals to reproduce more often than those that are relatively less quali?ed. (iii) Descendants of individuals are generated by randomized process intended to model mutation and recombination. Mutation corresponds to an erroneous self-replication of individuals and recombination interchanges information between two or more individuals. Figure 1.1 outlines a typical evolutionary algorithm (EA). A population P of individuals is initialized and then evolved from generation t to generation t + 1 by repeated application of ?tness evaluation, selection, recombination, and mutation. An evolutionary algorithm typically initializes its population randomly, although domain-speci?c knowledge can also be used. Evaluation measures the ?tness of each individual according to its worth in some environment (problem). Fitness evaluation can be as simple as computing a function or as complex as running an elaborate simulation. Selection is often performed in two steps, parent selection and survival. Parent selection decides who becomes parent and how many children the parents have; higher-?tness individuals are more likely to be parents and have more offspring. Offspring are created via recombination and mutation. Recombination exchanges information between parents and mutation further perturbs the offspring. The offspring are then evaluated. Finally the survival step decides who survives in the population. Evolutionary algorithms have been used in a large number of scienti?c and engineering problems and models. Some of the areas where evolutionary algorithms are being used are optimization, automatic programming, machine learning, economics, immune systems, ecology, population genetics, evolution and learning, and social systems[15].

1.1.1 Genetic Algorithms

Genetic Algorithms (GAs) were invented and developed by Holland[11]. Holland’s original goal was to formally study the phenomenon of natural adaptation and to develop ways in which its mechanism might be imported into computer systems. GAs were presented as an abstraction of biological evolution and derived its behavior from a genetic/evolutionary metaphor. Traditionally, GAs use a binary representation for the individuals (chromosomes or structures). 2

procedure EA(); begin t = 0; /* Initial Generation */ initialize population(P (t)); evaluate(P (t)); while (not termination condition) begin P (t) = select parents(P (t)); P (t) = recombine(P (t)); P (t) = mutate(P (t)); evaluate(P (t)); P (t + 1) = select survivors(P (t) ∪ Q); /* Q ∈ {?, P (t)} */ t = t + 1; /* Next Generation */ end end Figure 1.1: The outline of an evolutionary algorithm Recently, however, many applications have focused on other representations such integers, realvalued vectors, graphs (neural networks), Lisp expressions, and ordered lists. Selection is a probabilistic function based on relative ?tness. With this selection method, known as ?tness-proportional selection, the expected number of times an individual will be selected to reproduce is the individual’s ?tness divided by the average ?tness of the population. A simple method of implementing ?tness-proportional selection is roulette-wheel sampling[16]. The number of offspring created is the same as the number of parents ?. Later, in the survivors selection step, the ? newly created offspring will replace the ? parents in the population. This form of selection is not elitist1 . Offspring are created by recombination (crossover) of parent individuals with probability pc . After that, mutation is applied with a very small probability pm per bit. In its initial conception, GAs emphasize recombination (crossover) as the primary search operator and apply mutation solely as a “background operator”. Interest in mutation has increased recently, partly due to the in?uence of Evolution Strategies and Evolutionary Programming.

1.1.2 Evolution Strategies

Evolution strategies (ESs) were developed by Rechenberg[12], using selection, mutation, and a population of one parent and one offspring. Schwefel[13] introduced recombination and populations with more than one individual, and compared ESs with more traditional optimization techniques. Evolution strategies typically use real-valued, vector representations. Individuals to be parents are selected randomly from a uniform distribution. The number of offspring λ created is greater than the number of parents ?. The selection of survivors is deterministic and is implemented in one of two methods. The ?rst method selects the best ? out of λ offspring and replaces the parents with this newly created individuals. In other words, only the best ? offspring are allowed to survive. This method is known as a (?,λ) selection strategy. The second method selects the best ? individuals among ? parents and λ offspring. Thus, in this method both the best parents

1

Elitist: the best individual always survive, ensuring that once an optimum is found it cannot be lost.

3

and offspring are allowed to survive. This second method is known as a (? + λ) selection strategy. Both methods belong to the kind of extinctive (truncation) selection methods. (? + λ) selection is elitist but (?,λ) selection is not. Offspring are created by recombination (when ? > 1) of parent individuals followed by mutation. A variety of different recombination mechanisms are currently used in ESs and the operators are sexual and panmictic. In sexual operators, recombination acts on two randomly chosen parent individuals. Conversely, in panmictic operators, one parent is randomly chosen and held ?xed while for each component of its vectors the second parent is randomly chosen anew from the population. Mutation perturbs the individuals using a normal distribution with expectation zero. In ESs considerably effort has been put on (self) adapting the mutations during the run of the algorithm. ESs allow each individual (or each variable within the individual) to have adaptive mutation rates that are normally distributed with a zero expectation. ESs emphasize recombination and mutation as essential operators for searching simultaneously in the variables space and in the strategy parameters space (self-adaptation).

1.1.3 Evolutionary Programming

Evolutionary programming (EP) was developed by Fogel et al. [14]. EP arose from the desire to generate machine intelligence using selection and mutation to evolve ?nite-state machines. EP traditionally has used representations for the individual that are tailored to the problem domain. In the case of ?nite-state machine applications, the individuals within the population are represented as graphs. For other applications, appropriate representations such real-valued vectors and ordered lists are used. Selection is a probabilistic function based on tournament. The number of offspring created is the same as the number of parents ?. In the survivors selection step, ? individuals are chosen from the 2? (parents and offspring) individuals. This form of selection is elitist and can be considered to be a (? + ?) selection strategy. Offspring are created by mutation of parent individuals. The form of mutation is based on the representation used, and similar to ESs is often (self) adaptive. For real valued optimization problems, for example, it works with normally distributed mutations with expectation zero and extends the evolutionary process to the strategy parameters (self-adaptation). The forms of mutation used are usually quite ?exible and can produce perturbations similar to recombination, if desired. However, EP emphasizes mutation and does not incorporate the recombination of individuals. The justi?cation for this is that in EP each individual is usually viewed as the analog of a species, and there is no sexual recombination between species.

1.2 Background

This work concentrates on GAs. Canonical GAs use a binary representation for the individuals, i.e. bit strings with {0,1} alphabet. Crossover exchanges segments of information between pairs of chromosomes with probability pc , and mutation inverts bits with a probability pm per bit. Crossover and mutation are applied one after the other; in the absence of crossover (1 ? pc ), mutation is applied alone. Holland[11] de?ned crossover as the main genetic operator and considered mutation only as a “background” operator to be used with very small mutation rates. The role of crossover is to construct high order building blocks (hyperplanes) from low order ones; whereas the primary role of mutation is to replace allele values lost from the population, assuring that crossover has a full range of alleles so that the “adaptive plan” is not trapped on local optima. 4

Common settings recommended for crossover probability pc are 0.60[17], 0.95[18], and 0.750.95[19]. Similarly, common settings recommended for the mutation probability pm are 0.001[17], 0.01[18], and 0.005-0.01[19]. All these values were obtained by experimental investigations and the combined setting of pc and pm , especially in [19], usually depends on population size. Mutation rates within the ranges mentioned above are still widely used in applications of canonical GAs (i.e. using binary representation), because these settings are consistent with Holland’s proposal for mutation as a background operator and Goldberg’s recommendation to invert on the order of one thousand bits by mutation[16]. Recently, some analytical results concerning optimal schedules of the mutation rate in the cases of simple objective functions and simpli?ed genetic algorithms suggest that a 1/ constant mutation rate is almost optimal[20, 21, 22, 23], where is the bit string length. It is important to mention that not useful analytical results are known for the dependence of the optimal mutation rates on offspring population size[24]. In addition, a common practice has been to run the GA with its parameters set to constant values. A run of a GA, however, is an intrinsically dynamic, adaptive process, and there is experimental evidence suggesting that different values of parameters might be optimal at different stages of the evolutionary process. In order to pursue better balances for crossover and mutation, parameter control methods that modify the values of the strategy parameters during the run of the algorithm by taking into account the actual search process are being proposed. These methods are an alternative form to the common practice of tuning parameters “by hand” and are considered as one of the most important and promising areas of research in evolutionary algorithms[25]. One way to pursue better dynamic balances is to use adaptive or self-adaptive mechanisms to control the rate of operators[26, 27, 28]. Another approach seeks to combine crossover with (higher) varying mutation rates during the course of a run. It has been shown that deterministically varying mutation rates over the generations and/or across the representation can improve the performance of GAs[29, 30, 31]. Self-adaptive mutation rate schedules inspired from Evolution Strategies and Evolutionary Programming have also been proposed to control the mutation rate of generational and steady state GAs[31, 32, 33]. The deterministic approach uses one mutation rate for all individuals in the population. Conversely, the principle of self-adaptation incorporates strategy parameters into the representation of individuals evolving simultaneously strategy parameters and object variables. The self-adaptive approach is regarded as the method having the advantage of reducing the number of exogenous parameters[33] and is thought to be the most promising way of combining forms of control (parameters co-adaptation)[25].

1.3 Issues and Goals

Varying mutation approaches differ from canonical GAs mainly in the mutation rate control. Also, some approaches use a selection mechanism with higher selection pressure; for example, (?,λ) selection instead of proportional selection. However, from the application of operators standpoint, deterministic, adaptive, and self-adaptive varying mutation GAs have been mostly designed similar to a canonical GA. That is, crossover is applied with probability pc and then follows mutation; in the absence of crossover (1 ? pc ), mutation is applied alone. As mentioned above, the probability pc is usually set to 0.6 and higher values are often used. Thus, under these standard varying mutation approaches, higher mutations are mostly applied serial to crossover. This model of standard varying mutation GAs raises several important questions such as: Is there interference between crossover and high mutation? If so, does it affect performance of the algorithm? In the case of (adaptive) self-adaptive varying mutation algorithms, does it affect the mutation rate control itself? And more generally, is it an appropriate model for combining forms 5

of control (co-adaptation of parameters)? The objective of this work is to design effective and ef?cient varying mutation GAs that can be used in real world application. In order to achieve this goal it is important to explore models of varying mutation GAs that address the questions raised by the standard model of varying mutation GAs. An alternative to standard varying mutation methods is to design approaches that apply background mutation after crossover (or none at all) and higher mutations only parallel to crossover. There are several advantages in these models. First, such approaches could give an ef?cient framework to achieve better balances for varying mutation and crossover in which the strengths of the operators can be kept without interfering one with the other. Second, since varying mutation is detached from crossover, the instantaneous effectiveness of the varying mutation operator depends only upon itself and its relative success can be directly tied to the mutation rate to create adaptive (self-adaptive) schemes for mutation rate control. Third, parallel mutation can be studied on its own. For example, higher mutation rates raise the question of mutation strategy and its relevance to a given epistatic pattern or class of problems. Fourth, the individual roles and the interaction of crossover and varying mutation throughout the run of the algorithm can be better understood, which could be important for co-adaptation studies. From this point of view, this work explores a model of GA that applies varying mutations parallel to crossover & background mutation putting the operators in a cooperative-competitive stand with each other by subjecting their offspring to extinctive selection (GA-SRM)[34, 35]. Adaptation and mutation strategy are designed to improve the effectiveness of parallel varying mutations. Selection is also enhanced by eliminating ?tness duplicates to postpone drift and to create a fair competition between operators. Two test problem generators are used to study systematically the proposed model. One of the generators is for 0/1 multiple knapsack problems (constrained problems). The other one is the well known Kauffman’s NK-Landscapes (epistatic non-linear problems). These generators are well suited to study the behavior of the model on classes of problems which characteristics resemble those of the dif?cult, constrained, and non-linear problems found in real world-applications. The model of parallel varying mutation GAs shall be compared against the standard model of varying mutation GAs using deterministic and self-adaptive schedules for mutation rate control. This will help to clarify and answer some of the questions raised above. The model should be able to prove its worth in a real-world application and in order to have wide applicability the fundamental concept of the model should be extended to other important classes of genetic algorithms such distributed GAs and multiobjective GAs.

1.4 Outline

The central theme of this work is the design of ef?cient and effective generational parallel varying mutation GAs that can be used in practical applications to optimize dif?cult and highly constrained problems. Chapter 2 describes and contrasts two models of designing generational varying mutation GAs. One of the models is a simple extension of a canonical GA that applies varying mutations mostly after crossover, which has been the standard approach for designing generational varying mutation GAs. The second, called GA-SRM, is the proposed model that applies varying mutations only parallel to crossover. Chapter 3 describes two test problem generators used in this work for testing the performance of genetic algorithms (GAs). One is a test problem generator for 0/1 multiple knapsacks problems, 6

and the other one is the well known Kauffman’s NK-landscapes model. Three chapters are dedicated to study the model of parallel varying mutation GAs. Chapter 4 focuses on studying the structure of the parallel varying mutation model GA-SRM. Important structural issues include the balance between operators, the importance of mutation after crossover, the contribution of parallel varying mutation to the search, extinctive pressure, and the threshold parameter to trigger adaptation of the parallel varying mutation operator. The search velocity and search reliability of the model is observed under various evaluation times and different population sizes. Chapter 5 studies and compares in detail the impact on performance (convergence reliability and convergence velocity) of extinctive selection and higher mutations in the standard and parallel models of varying mutation GAs. Deterministic and self-adaptive mutation rate controls are used for varying mutation in both models of GA. Chapter 6 examines the behavior of GA-SRM on epistatic problems and looks into the effect of elimination of ?tness duplicates to further improve the model. Another three chapters are dedicated to show that the fundamental concept of GA-SRM can be extended successfully to other important classes of GAs, such parallel and multiobjective GAs, and that it can be effectively applied to real world problems. An important area of research is the parallelization of GAs. Evolutionary algorithms are population based methods and it is considered that its full potential would come from implementing the algorithm in parallel architectures. Most models of parallel GAs have considered the parallelization of the evaluation function, the structure of the population, and the selection strategy. However, from a processing time standpoint, the parallel application of operators has been overlooked because it is considered that only minor gains would come from parallelizing simple operators. Chapter 7 extends GA-SRM to a parallel distributed GA (DGA-SRM) arguing that the parallel application of crossover and higher varying mutations within parallel GAs is worth exploring, because the parallelization of operators would exploit their interaction in a more effective way achieving signi?cant gains in performance and robustness. Simulation results show that DGA-SRM achieves higher search speed and higher convergence reliability with less communication cost for migration. It is also shown that DGA-SRM scales up better as the dif?culty of the problem increases and tolerates population reductions better than a canonical distributed GA. Chapter 8 shows that GA-SRM can be successfully applied to real world problems in which ef?ciency in processing time and computer memory is a major issue. Here, the improved GA-SRM is extended to the two dimensional image halftoning problem and an accelerated image halftoning technique using GA-SRM with tiny populations is proposed. Simulation results show that the proposed scheme impressively reduces computer memory and processing making the improved approach appealing for practical implementation. The multiobjective nature of most real-world problems makes multiobjective optimization a very important research topic. Chapter 9 shows that the concept of GA-SRM can also be effective for multiobjective optimization of real world applications. The improved GA-SRM is extended to a multiobjective optimization GA to simultaneously generate halftone images with various combinations of gray level precision and spatial resolution. Simulation results verify that the proposed scheme can effectively generate several high quality images simultaneously in a single run reducing even further the overall processing time. Finally, Chapter 10 summarizes this work, presents conclusions, and suggests future research.

7

Chapter 2

Generational Varying Mutation Genetic Algorithms

This chapter starts with a brief description of the main components of a canonical genetic algorithm. Then, it describes the standard approach that has been used for designing generational varying mutation GAs. This model is a simple extension of a canonical GA in which varying mutations are mostly applied after crossover. Finally, it presents in detail the proposed model of GA that applies varying mutations only parallel to crossover.

9

2.1 A Canonical Genetic Algorithm

A canonical GA[11, 16] selects individuals from the parent population P (t) with a selection probability proportional to their ?tness and applies crossover with probability pc followed by mutation with a very small constant mutation probability pm per bit (background mutation). In the absence of crossover (1 ? pc ) mutation is applied alone. From the application of operators standpoint, it can be said that within the canonical GA the probability of crossover pc enables an implicit parallel application of two operators. One of the operators is crossover followed by mutation (CM) and the other one is mutation alone (M). Figure 2.1 presents a block diagram of the canonical GA and illustrates the implicit parallel application of CM and M.

P(t) ? = λ = λCM(t) + λM(t) Selection 1 - pc pm

Proportional

CM C

λCM(t)

pc

M

λCM(t)

M

λM(t)

Figure 2.1: Canonical Genetic Algorithm It should be noted that mutation in both CM and M is governed by the same constant mutation probability pm and applies the same (bit by bit) mutation strategy. The number of offspring created (t) (t) by CM and M, λCM and λM , respectively, depends on the probability of crossover pc and may vary at each generation t due to the stochastic process. However, the total number of offspring λ remains constant and is equal to the number of parents ?.

2.2 A Generational Standard Varying Mutation Genetic Algorithm

Generational standard varying mutation GAs differ from canonical GAs mainly in the mutation rate control. Also, some of them use a selection mechanism with selection pressure higher than the canonical GA; for example, (?,λ) selection instead of proportional selection. However, the application of operators has been similar to canonical GAs. That is, crossover is applied with probability pc and then follows mutation with probability pm per bit. Following the same line of thought used in 2.1, a GA that applies crossover with probability pc followed by mutation with varying probability can also be seen as implicitly applying CM and M in parallel with a sole mutation probability pm governing mutation rates in both CM and M. Figure 2.2 illustrates the implicit parallel application of CM and M when varying mutations are used. Since the probability pc is usually set to 0.6, and higher values are often used[25], it turns out that mutation is mostly applied serial to crossover. In canonical GAs pm is small, therefore the amount of diversity introduced by mutation either through CM or M is modest. For the same reason, the disruption that mutation causes to crossover in CM is also expected to be small. In 10

CM C

λCM(t)

pc pm

1 - pc

M

λCM(t)

M

λM(t)

Figure 2.2: Genetic Algorithm with Varying Mutation varying mutation GAs, however, mutations are higher than canonical GAs and the combined effect of crossover and mutation in CM as well as the effect of mutation alone in M should be carefully reconsidered. In the case of CM, besides those cases in which crossover and mutation aggregate in a positive manner or are neutral, those cases in which one of the operators is working well but is being hampered by the other should also be taken into account. For example, if mutation probabilities were high then although crossover could be doing a good job it is likely that some of the just created favorable recombinations would be lost, before they become ?xed in the offspring, due to the high disruption introduced by mutation. We can think of this case as a mutation interference with crossover in the creation of bene?cial recombinations. On the other hand, mutation could be working well but crossover may produce poor performing individuals affecting the survivability of bene?cial mutations that can contribute to the search. We can think of this case as a crossover interference with mutation in the introduction of bene?cial mutations. In the case of mutation alone M, its instantaneous effectiveness depends only upon itself and does not diminish the effectiveness of other operator. High mutations in M, when are harmful, will have a negative impact on the propagation of bene?cial recombinations already present in the parent population. However, it will not affect the creation of bene?cial recombinations by crossover as high mutation can do it in CM. In the following these approaches that apply varying mutations after crossover are referred as varying mutation serial to crossover. Although as explained above, due to the crossover probability they also implicitly apply parallel varying mutation.

2.3 A Generational Model of Parallel Varying Mutation Genetic Algorithm (GA-SRM)

2.3.1 Parallel Operators

The model of standard varying mutation GAs of 2.2 raises several important questions such as 1. Is there interference between crossover and high mutation? If so, 2. Does it affect performance of the algorithm? 3. Does it affect the mutation rate control itself? 4. Is it an appropriate model for combining forms of control (co-adaptation of parameters)? 11

To answer the questions raised by the standard varying mutation approach, this work explores a model of GA that explicitly differentiate the mutation operator applied parallel to crossover from the mutation operator applied after crossover. There are some advantages to this differentiation. 1. Each mutation operator could be assigned its own mutation probability. Thus, varying mutation can be applied only parallel to crossover and mutation after crossover can be applied with a small mutation rate (or none at all) avoiding interferences between crossover and high mutation. 2. Since in this case the instantaneous effectiveness of the varying mutation operator depends only upon itself its relative success can be directly tied to the mutation rate to create adaptive (self-adaptive) schemes for mutation rate control. 3. Parallel mutation can be studied on its own. For example, higher mutation rates raise the question of mutation strategy and its relevance to a given epistatic pattern or class of problems. 4. The individual roles and the interaction of crossover and varying mutation throughout the run of the algorithm can be better understood, which could be important for co-adaptation studies. Thus, this work explores a model of GA that in addition to crossover followed by background mutation (CM) it also explicitly applies parallel varying mutation[34, 35]. To clearly distinguish between the mutation operator applied after crossover and the mutation operator applied parallel to crossover, the parallel varying mutation operator is called Self-Reproduction with Mutation (SRM). In the following this model of GA is called GA-SRM. As suggested above, the explicit parallel formulation of CM and SRM can give an ef?cient framework to achieve better balances for mutation and crossover during the run of the algorithm in which the strengths of higher mutation and crossover can be kept without interfering one with the other. SRM parallel to CM implicitly increases the levels of cooperation to introduce bene?cial mutations and create bene?cial recombinations. It also sets the stage for competition between operators’ offspring. In the following GAs that explicitly apply varying mutations only parallel to crossover are referred as varying mutation parallel to crossover.

2.3.2 Extinctive Selection

Recent works have given more insights to better characterize the roles of recombination and mutation in evolutionary algorithms. An important issue to consider is the deleterious effects caused by the operators and especially how to deal with them. On the one hand, it has been shown that mutation is more powerful than recombination (crossover) in terms of exploration or disruption[36, 37, 38, 39] and that mutation’s disruption capabilities are directly related to the mutation rate[36, 38]. While the explorative effect of mutation is desirable, we should expect that, at all times of the search process, only a number of the individuals created by parallel mutation SRM would offer variability and still keep a reasonable high performance (bene?cial mutants). The others would be diverse but poor performing individuals (harmful mutants). On the other hand, it has also been shown that recombination (crossover) would have a deleterious effect especially on multimodal ?tness landscapes[38, 39], performing worse as the number of peaks increase. The recombination of individuals on different peaks will likely produce offspring 12

in the valleys between peaks, where the ?tness is lower. This effect will be more evident during the latest stages of the search when the population moves towards the peaks of the landscape. The parallel formulation of CM and SRM can avoid interferences between crossover and high mutation; however it cannot prevent SRM from creating deleterious mutations or CM from producing ineffective crossing over operations. To cope with these cases the model also incorporates the concept of extinctive selection that has been widely used in Evolution Strategies[12]. Through extinctive selection the offspring created by CM and SRM coexist competing for survival (the poor performing individuals created by both operators are eliminated) and reproduction. Among the various extinctive selection mechanisms available in the EA literature (?, λ) proportional selection[24] is chosen to implement the required extinctive selection mechanism. Selection probabilities for this kind of selection are computed by

? ? ? ? ? ?

f (xi )

? (t) f (xj )

(t)

(t) Ps (xi )

(1 ≤ i ≤ ?) (2.1) (? < i ≤ λ)

(t)

=

? ? ? j =1 ? ?

0

where xi is an individual at generation t which has the i-th highest ?tness value f (xi ), ? is the number of parents and λ is the number of offspring. The parallel formulation of genetic operators tied to extinctive selection creates a cooperativecompetitive environment for the offspring created by CM and SRM.

(t)

2.3.3 Mutation Rate Control

Deterministic, adaptive, and self-adaptive mutation rate control schedules are used to study the serial/parallel application of varying mutations. The adaptive schedule is applied only parallel to crossover and the deterministic and self-adaptive schedules are applied both serial and parallel to crossover. Deterministic The deterministic approach implements a time-dependent mutation schedule that reduces mutation rate in a hyperbolic shape. It was originally proposed in [31] and is expressed by

(t) = ro + pm

n ? ro t T ?1

?1

(2.2)

where T is the maximum number of generations, t ∈ {0, 1, · · ·, T ? 1} is the current generation, (t) and n is the bit string length. The mutation rate pm varies in the range [1/ro , 1/n]. In the original formulation ro = 2. Here ro is included as a parameter in order to study different ranges for mutation. In the deterministic approach the mutation rate calculated at time t is applied to all individuals created by SRM. Figure 2.3 illustrates the mutation rates over the generations by this schedule for three initial (0) mutation rates, pm = {0.5, 0.10, 0.05}. Self-Adaptive To include self-adaptation, each individual incorporates its own mutation probability within the representation. SRM to produce offspring ?rst mutates the mutation probability of the selected 13

pm 0.5

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1000 2000 3000 Generations 4000 5000 hM [0.50, 1/n] [0.10, 1/n] [0.05, 1/n]

Figure 2.3: Deterministic Hyperbolic Schedule for Mutation Rate Control individual and then mutates the object variable using the individual’s mutated probability. This work applies the self-adaptive approach originally proposed in [31, 33], which uses a continuous representation for the mutation rate and mutates the mutation probability of each individual by

(t) pm (i)

=

1+

1 ? pm pm

(t?1)

(i)

?1

(t?1)

(i)

exp(?γN (0, 1))

(2.3)

where i indicates the i-th individual, γ is a learning rate that control the speed of self-adaptation, and N (0, 1) is a normally distributed random number with expectation zero and standard deviation one. Note that individuals selected to reproduce with SRM at generation t could have been created either by SRM or CM at generation t?1. Since the mutation rate of each individual is mutated only by SRM, individuals created by CM do not carry an updated mutation rate. Thus, the mutation rate of individuals that were created by CM at generation t ? 1 is ?rst updated by

(t?1) (j ) pm

=

1 ?SRM

?SRM k =1

(t?1) pm (k)

(2.4)

where j indicates an individual created by CM at (t ? 1), k indicates the individuals created by SRM at (t ? 1) that survived extinctive selection, and ?SRM is the number of offspring created by SRM that survived extinctive selection. In the case that no offspring created by SRM survived (t?1) extinctive selection, pm (j ) is set to the mutation value of the best SRM’s offspring. SRM will mutate this updated mutation in order to mutate the object variable. It should be mentioned that besides the method described here, other self-adaptive approaches exclusively for parallel varying mutation have been also implemented successfully[40]. Adaptive Since the instantaneous effectiveness of SRM depends only upon itself its relative success can be directly tied to the mutation rate to create adaptive (self-adaptive) schemes for mutation rate control. In this work the adaptive mutation rate control dynamically adjusts the mutation rate 14

within SRM every time a normalized mutants survival ratio γ falls under a threshold τ . The ratio γ is speci?ed by ?SRM λ (2.5) · γ= λSRM ? where ?SRM is the number of individuals created by SRM present in the parent population P (t) after extinctive selection at time t, λSRM is the number of offspring created by SRM, λ is the total number of offspring (λCM + λSRM ), and ? is the number of individuals in P (t). The number of offspring λCM and λSRM that will be created by CM and SRM, respectively, is deterministically decided at the beginning of the run. It should be noted that the deterministic and adaptive schedules use only one mutation rate for all the individuals in the population. The self-adaptive schedule on the other hand, uses one mutation rate per individual.

2.3.4 Mutation Strategy

In the case of background mutation we expect in the average to ?ip 1 bit (or less) in each individual at each generation. When higher mutations are applied, however, many more bits would be ?ipped in the same individual. This raises the question of whether a mutation strategy to choose the bits that will undergo mutation would be more effective than other and for which classes of problems. To study this point, two mutation strategies are investigated for SRM: (i) adaptive dynamic-segment (ADS), and (ii) adaptive dynamic-probability (ADP). ADS (Adaptive Dynamic-Segment) ADS directs mutation only to a segment of the chromosome using constant mutation probabilities per bit α (if the bit is in the segment ) SRM ) = p( m 0 (otherwise ) while the mutation segment size is dynamically adjusted every time γ falls under τ . The segment reduction is summarized below: = n (t = 0) if (γ < τ ) and ( > 1/α) = /2 where the segment size varies from n (bit string length) to 1/α, [n, 1/α] following a step decreasing approach as shown in Figure 2.4. The segment initial position, for each chromosome, is chosen at random, si = N [0, n), and its ?nal position is calculated by (2.6) sf = (si + ) mod n. With this scheme, the average number of ?ipped bits goes down from nα to 1, [nα, 1]. ADP (Adaptive Dynamic-Probability) With ADP, every bit in the chromosome is always subject to mutation with probability pm varying each time γ falls under τ : 15

(SRM )

=n si at random

?

= n/2 si at random

?

t

(SRM ) pm

=0

6

= n/4 =α

? 6

. . .

(SRM ) pm

Chromosome Mutation Segment

Figure 2.4: ADS (Adaptive Dynamic-Segment) mutation. =n pm =n pm =n

(SRM ) (SRM )

=α

6

6

t

= α/2

6

. . .

pm

(SRM )

= α/4

?

Chromosome

Figure 2.5: ADP (Adaptive Dynamic-Probability) mutation. = α (t = 0) pm (SRM ) > 1/n) if (γ < τ ) and (pm (SRM ) (SRM ) = pm /2 pm follows a step decreasing In other words, the segment size is kept constant, = n, but pm (SRM ) = [α, 1/n] as shown in Figure 2.5. approach from α to 1/n per bit, pm Both schemes, ADS and ADP, impose an adaptive mutation rate control with the same expected average number of ?ipped bits; the difference lies whether mutation is applied locally inside the segment (ADS) or globally inside the whole chromosome (ADP).

(SRM ) (SRM )

2.3.5 Duplicates Elimination

Genetic drift is postponed enhancing selection by eliminating ?tness duplicates. If several individuals have exactly the same ?tness then one of them is chosen at random and kept. The other 16

λ = λCM + λSRM (? < λ ) Extinctive P(t) ? Selection

?SRM

Proportional Selection

CM C

λCM

λCM : λSRM pm(CM)

pm(SRM)

SRM

λSRM

M

λCM

Figure 2.6: GA with Parallel Varying Mutation equal ?tness individuals are eliminated from the population. The ?tness duplicates elimination is carried out before extinctive selection. Note that under this approach it is possible that two individuals with the same ?tness may actually be different at the genotype level. Preventing duplicates removes an unwanted source of selective bias[41] and allows a fair competition between offspring created by CM and SRM.

2.4 GA-SRM’s Algorithm

The algorithm of the parallel varying mutation GA based on the proposed model is presented below and its block diagram is shown in Figure 2.6. procedure GA-SRM begin t := 0; initialize (P (0)); evaluate (P (0)); while (not termination condition) begin P (t) = crossover and mutation (P (t)); P (t) = self-reproduction with mutation (P (t)); evaluate (P (t) ∪ P (t)); P (t)= eliminate ?tness duplicates (P (t) ∪ P (t)); P (t + 1) = (?, λ) proportional selection (P (t)); t := t + 1; end end

/* creates λCM */ /* creates λSRM */ /* λ = λCM + λSRM */ /* ? < λ */

17

Chapter 3

Test Problems Generators

Test function generators for broad classes of problems are seen as the correct approach for testing the performance of genetic algorithms. This chapter describes two test problems generators and the test data used in this work. One of generators is for 0/1 multiple knapsack problems, which allows to test the algorithms on a broad range of classes of constrained problems by varying the feasible region of the search space, number of constraints, and the size of the search space. Real-world 0/1 multiple knapsack problems with known global optimum are also used. These latter problems allow studying the global search abilities of the algorithms. The second generator is the well known Kauffman’s NK-landscapes model of epistatic interactions that has been the center of several theoretical and empirical studies both for the statistical properties of the generated landscapes and for their GA-hardness. This generator allows testing the algorithms on a broad range of classes of epistatic non-linear problems. Both, 0/1 multiple knapsack problems and NKLandscapes, are known to be NP-hard combinatorial problems.

19

3.1 0/1 Multiple Knapsacks Problems

In the 0/1 multiple knapsack problem there are m knapsacks and n objects. The capacities of the knapsacks are c1 , c2 , ..., cm . For each object there is a pro?t pi (1 ≤ i ≤ n) and a set of weights wij (1 ≤ j ≤ m), one weight per knapsack. If an object is selected its pro?t is accrued and the knapsacks are ?lled with the object’ weights. The problem consists on ?nding the subset of objects that maximizes pro?t without over?lling any of the knapsacks with objects’ weights. The 0/1 multiple knapsack problem can be formulated to maximize the function

n

g(x) =

i=1

pi xi

(3.1)

subject to

n

wij xi ≤ cj

i=1

(j = 1, ..., m)

(3.2)

where xi ∈ {0,1} (i = 1, ..., n) are elements of a solution vector x = (x1 , x2 , ..., xn ), which is the combination of objects we are interested in ?nding. Solutions to this problem have a natural binary representation in the GA constructed by mapping each object to a locus within the binary chromosome. A 1 in locus i indicates that the object i is being selected and a 0 otherwise. A solution vector x should guarantee that no knapsack is over?lled and the best solution should yield the maximum pro?t. An x that over?lls at least one of the knapsacks is considered as an infeasible solution. Figure 3.1 illustrates the problem.

?

x = (x1 ,…, xi ,…, xn) xi {0,1}

Maximize: n i=1

Objects

$$$ $ $$

profit

p1 1 … wi1

pi i wij … wim

pn n

∑ pi xi

weights

Knapsacks

c1 1 cj … j … cm m

Subject to: n

∑

wij xi <= cj

(j=1,…,m)

i=1

Figure 3.1: Knapsack Problem Besides de?ning the number of knapsacks m (number of constraints) and the number of objects n (size of the search space, 2n ), it is also possible to de?ne the tightness ratio φ between knapsack capacities and object weights (which implies a ratio between the feasible region and the whole search space). Thus, by varying m, n, and φ, 0/1 multiple knapsacks allows us to carefully observe the behavior and scalability of the algorithms in three important aspects that are correlated to the dif?culty of a problem. The 0/1 multiple knapsack problem is also known as the 0/1 multidimensional or m-dimensional knapsack problem and can be regarded as a general statement of any zero-one integer programming problem with non-negative coef?cients. Indeed much of the early work on the problem viewed the problem in this way. 20

Table 3.1: Real world 0/1 multiple knapsack test problems. Name n m Max Petersen 3 15 10 4015 Petersen 4 20 10 6120 Petersen 5 28 10 12400 Petersen 6 39 5 10618 Petersen 7 50 5 16537 Sento 1 60 30 7772 Sento 2 60 30 8722 Weing 7 105 2 1095445

The 0/1 multiple knapsack problem is a NP-hard combinatorial optimization problem and its importance is well known both from a theoretic and practical point of view. It is a generalization of the 0/1 simple (m = 1) knapsack problem. Simple knapsack problems can be used as subproblems to solve more complicated ones[42] and other combinatorial problems, such the partition problem, can be polinomially transformed into it [43]. Also, well known NP-complete problems, such satis?ability problems (SAT), can be formulated as special instances of a 0/1 multiple knapsack problem [44]. In addition, many practical problems can be formulated as a 0/1 multiple knapsack problem. For example, the capital budgeting problem where project i (object) has pro?t pi and consumes wij units of resource j (knapsack). The goal is to ?nd a subset of the n projects such that the total pro?t is maximized and all resources constraints are satis?ed. Other applications of the problem include allocating processors and databases in a distributed computer system[45], project selection and cargo loading[46], cutting stock problems[47], and maximizing the number of served clients or the use of bandwidth for ad hoc networks[48]. Two sets of 0/1 multiple knapsacks problems are selected to test the algorithms. One of the sets consists of real-world problems with known optimum solution. The other set consists of larger randomly generated problems with unknown optimum solution. Both sets of problems were obtained from the OR-Library1 .

3.1.1 Real World Problems

The problems in this set range from 2 to 30 knapsacks and from 15 to 105 objects (see Table 3.1). Problems Petersen 3-7 are due to Petersen[49], Sento 1- 2 were introduced by Senyu and Toyoda[50], and Weing-7 was introduced by Weingartner and Ness [51]. Since the optimum solution is known for these problems we can observe the behavior of the genetic algorithms for global optimization. These problems have been used by other authors to test canonical GAs and standard varying mutation GAs, allowing initial relative comparisons with other GA approaches. The ?tness function for the real-world knapsack problems introduces the same penalty term used in [52] to deal with infeasible solutions (no repair strategy is used). Thus, the ?tness function is speci?ed by f1 (x) = g(x) ? s · max{pi } where s (0 ≤ s ≤ m) is the number of over?lled knapsacks.

1

(3.3)

http://mscmga.ms.ic.ac.uk/jeb/orlib/info.html

21

Table 3.2: 7 Subclasses of problems (10 random problems in each subclass) Paremeters m n φ 30 100 0.75 0.50 0.25 5 100 0.25 10 30 30 100 0.25 250 500

Subclass 1 2 3 4 5 (3) (3) 6 7

Comment reducing feasible region increasing number constraints increasing search space

3.1.2 Large Random Problems

The problems in this set consists of classes of larger and highly constrained instances of 0/1 multiple knapsack. These classes of problems were created using a knapsack test problem generator. The problems and the generator were initially proposed in [53]. The generator itself is based in the procedure suggested in [54] and its main characteristics are as follows: 1. The weights wij are drawn at random from a uniform distribution U (0, 1000). 2. For each combination of m-n, the capacities of the knapsacks are set by

n

cj = φ

j =1

wij

where φ is the tightness ratio. 3. The pro?ts of the objects are correlated to the weights of the objects2 by

m

pi =

i=1

wij /m + 500qj

where qj is a real number drawn from a continuous uniform distribution U (0, 1). To obtain a broader perspective on the performance of the GAs and to have a clear idea on scalability the GAs are applied to several knapsacks problems systematically varying φ, m, and n. Each combination of φ, m, and n de?nes a subclass of problem. Table 3.2 shows the combination of values of the parameters φ, m, and n used to de?ne the subclasses of problems. This allows to observe the robustness of the algorithms reducing the feasible region (tightness ratio between knapsack capacities and object weights φ = {0.75, 0.50, 0.25}), increasing the number of constraints (m = {5, 10, 30} knapsacks), and increasing the search space (n = {100, 250, 500} objects). We use 7 subclasses and 10 random problems in each subclass. The quality of the solutions found by the algorithms are measured by the average percentage error gap in a subclass of problems, which is calculated as the normalized difference between the

2

The dif?culty of the problems is greatly increased by the correlation between pro?ts and weights[42].

22

best solutions found and the optimal value given by the linear programming relaxation (LP) (the optimal integer solutions are unknown)[53]. The average error gap is taken for the 10 random problems in each subclass performing 50 runs for each problem by % Error Gap = LP ? P roblem Average 1 10 100 × 10 p=1 LP 1 50 Best Solution 50 r=1

P roblem Average =

Similar to the real-world problems, to deal with infeasible solutions a penalty term is introduced into the ?tness function. The two following ?tness functions are tried f1 (x) = g(x) ? s · max{pi } f2 (x) = g(x)/s · max{oj } (s > 0) g(x) (s = 0) (3.4) (3.5)

where s (0 ≤ s ≤ m) is the number of over?lled knapsacks and oj (> 1) is the over?lling ratio of n knapsack j calculated by wij xi /cj . (3.6) oj =

i=1

Note that the penalty term of f1 is merely a function of the number of violated constraints (s), but has no direct correlation with any metric that indicates the distance from feasibility. On the other hand, the penalty term of f2 is a function of both number of violated constraints (s) and distance from feasibility (oj ). The ?tness function f1 is the one used for the real-world problems, which has been successfully used before with smaller 0/1 multiple knapsacks problems [31, 34, 35, 52, 55]. In the larger randomly generated problems, however, f1 did not produce feasible solutions or the results were very poor especially on problems with restrictive knapsack capacity and small number of knapsacks. This is because the penalty term used in f1 is too weak for these problems, causing a major portion of the infeasible region to end up with ?tness higher than the feasible region. In such case, the ?tness function misleads the algorithm to search within the infeasible region of the search space. Similar behavior with other penalty functions was observed in [30] for single (m = 1) 0/1 knapsack problems of restrictive capacity. Results by f1 are in agreement with [56]. That is, penalties that are solely functions of the number of violated constraints are not likely to ?nd feasible solutions for problems having few constraints and reduced feasible region. As mentioned above, the ?tness functions f2 of Eq. (3.5) includes a penalty that is also a function of the distance from feasibility. In this case, feasible solutions were found on all test problems. In general, a good penalty function should balance the preservation of information of the infeasible solutions with the pressure for feasibility[56]. The results reported here for the randomly generated problems are obtained using f2 .

3.2 NK-Landscapes

NK-Landscapes are stochastically generated ?tness functions on bit strings, parameterized with N bits and K epistatic interactions between bits. In biology, the term epistasis is used to describe a range of non-additive phenomena due to the non-linear inter-dependence of gene values, i.e. the expression of one gene masks the genotypic effect of another. In the context of GAs this 23

(3 ) z 1(3 ) x 3 z 2

x

0 1 0 0 1 1 0 0

z1 x 3 z2 000 001 010 011 100 101 110 111

f3 0.83 0.34 0.68 0.10 0.67 0.24 0.60 0.64

Figure 3.2: An example of the ?tness function f3 (x3 , z1 , z2 ) associated to bit x3 in which x3 (3) (3) epistatically interacts with its left and right neighboring bits, z1 = x2 and z2 = x4 (N = 8, K = 2)

(3)

(3)

terminology is used to describe nonlinearities in ?tness functions due to changes in the values of interacting bits3 . More formally, an NK-Landscape is a function f : B N → where B = {0, 1}, N is the bit string length, and K is the number of bits in the string that epistatically interact with each bit. Kauffman’s original NK-Landscape[57, 58] can be expressed as an average of N functions as follows 1 N (i) (i) (i) fi (xi , z1 , z2 , · · · , zK ) (3.7) f (x) = N i=1 where fi : B K +1 → gives the ?tness contribution of bit xi , and z1 , z2 , · · · , zK are the K bits interacting with bit xi in the string x. That is, there is one ?tness function associated to each bit in the string. NK-Landscapes are stochastically generated and usually the ?tness contribution fi of bit xi is a number between [0.0, 1.0] drawn from a uniform distribution. Figure 3.2 shows (3) (3) an example of the ?tness function f3 (x3 , z1 , z2 ) associated to bit x3 for N = 8, K = 2, in (3) (3) which x3 epistatically interacts with its left and right neighboring bits, z1 = x2 and z2 = x4 , respectively. For a giving N , we can tune the ruggedness of the ?tness function by varying K . In the limits, K = 0 corresponds to a model in which there are no epistatic interactions and the ?tness contribution from each bit value is simply additive, which yields a single peaked smooth ?tness landscape. On the opposite extreme, K = N ? 1 corresponds to a model in which each bit value is epistatically affected by all the remaining bit values yielding a maximally rugged fully random ?tness landscape. Varying K from 0 to N ? 1 gives a family of increasingly rugged multi- peaked landscapes. Besides de?ning N and K , it is also possible to arrange the epistatic pattern between bit xi and K other interacting bits. That is, the distribution of the K bits among the N . Kauffman investigated NK-Landscapes with two kinds of epistatic patterns: (i) nearest neighbor, in which a bit interacts with its K/2 left and right adjacent bits, and (ii) random, in which a bit interacts with K other randomly chosen bits in the chromosome. By varying N , K , and the distribution of K among the N , we can study the effects of the size of the search space, intensity of epistatic

3

(i)

(i)

(i)

In this work a gene can be interpreted as a bit and an allele (gene value) as a bit value, i.e. 0 or 1

24

interactions, and epistatic pattern on the performance of genetic algorithms. The works of Altenberg[59, 60] Heckendorn et al.[61], and Smith and Smith[62, 63] are important contributions extending NK-landscapes to a more general framework of tunable random landscapes and on their analysis. The simplest of the generalized NK-Landscapes[59, 61, 62] can be expressed as the average of P functions as follows f (x) = 1 P

P j =1

fj (xi , z1

(j )

(i,j )

, z2

(i,j )

, · · · , zK )

(i,j )

(3.8)

where fj : B K +1 → gives the ?tness contribution of the K + 1 interacting bits. That is, in this model there could be zero, one, or more than one ?tness function associated to each bit in the string. An implication of epistatic interactions among bits is that the ?tness function develops con?icting constraints[59]. For example, a mutation in one bit may improve its own contribution to ?tness but may decrease the contributions of other bits with which it interacts. Conversely, if a bit value at other interacting position in the bit string changes, a bit value that had been optimal may not longer be optimal. Thus, epistatic interactions increase the dif?culty in trying to optimize all bits simultaneously. The study of the effects of epistasis is of great interest to the evolutionary algorithm’s community because epistatic interactions are directly correlated to the underlying representation that the evolutionary algorithm uses and to the multimodality and non-linearity of the ?tness landscape that the algorithm searches. Lately, the in?uence of epistasis on the performance of Genetic Algorithms (GAs) is being increasingly investigated. NK-Landscapes, particularly, have been the center of several theoretical and empirical studies both for the statistical properties of the generated landscapes and for their GA-hardness[60, 64, 65, 66, 67]. Previous works that investigate properties of GAs with NK-Landscapes have mostly limited their study to small landscapes, typically 10 ≤ N ≤ 48, and observed the behavior of GAs only for few generations. Recently, studies are being conducted on larger landscapes expending more evaluations[61, 68, 69, 70] and the performance of GAs is being benchmarked against hill climbers[61, 68, 70]. In this work experiments are conducted on landscapes with random and near neighbor patterns of epistasis for N = 48 and N = 96 varying K from 0 to N ? 1 in intervals of 4. Each combination of pattern of epistasis, N , and K de?nes a class of problem. For each one of these classes 50 random problems are created. This allows testing the performance of GAs on a broad range of classes of epistatic non-linear problems. There are several empirical and theoretical studies of the statistical properties of the generated NK-Landscapes. The following is a synopsis of the results for one-mutant adaptive walks on NK landscapes taken from [59]. For K = 0, the ?tness function becomes the classical adaptive multilocus model. 1. There is a single globally attractive genotype. 2. The adaptive walk from any genotype will proceed by reducing the Hamming distance to the optimum by one at each step, and the number of ?tter one-mutant neighbors is equal to this Hamming distance. Therefore, the expected number of steps to the global optimum is N/2. 3. The ?tness of one-mutant neighbor genotypes are highly correlated, as N ? 1 of the N ?tness components are unrelated between the neighbors. For K = N ? 1, the ?tness function is equivalent to the random assignment of ?tness over the genotype space. 25

1. The probability that a genotype is a local optimum is 1/(N + 1). 2. The expected total number of local optima is 2N /(N + 1). 3. The expected fraction of one-mutant neighbors that are ?tter decreases by 1/2 each step of the adaptive walk. 4. The expected length of adaptive walks is approximately ln(N ? 1). 5. The expected number of mutants tested before reaching local optimum is

log2 (N ?1)?1 i 2. i=0

6. As N increases, the expected ?tness of the local optimum reached from a random initial genotype decreases towards the mean ?tness of the entire space, 0.5. Kauffman[57, 58] calls this the complexity catastrophe. For intermediate K , it is found that 1. For small K , the highest local optima share many of their alleles in common. As K increases, this allelic correlation among local optima falls away, and more rapidly for random neighbors than adjacent neighbors. 2. For large K , the ?tness of local optima are distributed with an asymptotically normal distribution with mean approximately. 2 ln(K + 1) ?+σ K +1 and variance approximately (K + 1)σ 2 N [K + 1 + 2(K + 2) ln(K + 1)] where ? is the expected value of Fi , and σ 2 its variance. In the case of uniform distribution, ? = 1/2 and σ = (1/12)1/2 . 3. The average Hamming distance between local optima, which is roughly twice the length of a typical adaptive walk, is approximately N log2 (K + 1) 2(K + 1) 4. The ?tness correlation between genotypes that differ at d loci is R(d) = 1 ? for the random neighborhood model, and R(d) = 1 ? K +1 d+ N 1 N d

min(K,N +1?d) 1/2

d N

1?

K N ?1

d

(K ? j + 1)

j =1

N ?j?1 d?2

for the adjacent neighborhood model.

26

Chapter 4

Studying the Structure of the Parallel Varying Mutation GA-SRM

This chapter studies in detail the internal structure of the parallel varying mutation GA-SRM using adaptive mutations. It analyzes the impact of important issues that affect the performance of GAs and study the contribution of extinctive selection, adaptation, mutation strategy, and the interaction between parallel varying mutation and crossover. Real-world and randomly generated 0/1 knapsack problems are used to study and test the model. Comparisons are made with simple GAs and other improved GAs.

27

4.1 Introduction

The GA-SRM model applies varying mutations parallel (SRM) to crossover & background mutation (CM) putting the operators in a cooperative-competitive stand with each other through extinctive selection. Important structural issues to be studied are the balance between CM and SRM for offspring creation, the mutation probability in CM, the ratio between number of parents, number of offspring (extinctive selection pressure), and the threshold to trigger adaptation in SRM. Recombination in canonical GAs has been emphasized and deeply studied. It is known that different crossover operators create dissimilar degrees of disruption and construction of building blocks. Therefore it is also important to verify whether the kind of diversity that we expect from SRM can be produced by other kinds of recombination operators. The impact of other important issues related to the performance of GAs, such as population size and evaluation time, is observed, too. Besides the internal structure, the effect on performance of extinctive selection, the interaction of varying mutation parallel to crossover, and the effect of adaptation and mutation strategy are also investigated. In this chapter, all these issues are studied using real-world and randomly generated 0/1 knapsack problems.

4.2 Experimental Setup

Experiments are conducted using the following algorithms: (i) a canonical GA denoted as cGA (CM and proportional selection), (ii) a simple GA with (?, λ) proportional selection denoted as GA(?, λ) (CM and extinctive proportional selection), and (iii) the proposed algorithm denoted as GA-SRM(?, λ) (CM, SRM and extinctive proportional selection). In GA-SRM, SRM is used with ADS and ADP mutation strategies. Unless stated otherwise, the genetic algorithms used here are set with the parameters speci?ed in Table 4.1. It should be noted that throughout this chapter elimination of ?tness duplicates is not considered and that SRM’s mutation rate control is the adaptive mechanism explained in 2.3.3. Experiment with the real-world knapsack problems consisted of 100 independent runs for each problem. For the randomly generated problems 50 runs were performed for each one of the seventy problem in the seven subclasses of problems, as explained in 3.1.2. Each run was set with a different seed for the random initial population and ended after T evaluations were performed. The values of T used for each real-world problem are indicated in Table 4.4. For all randomly generated problems the number of evaluations was set to T = 5 × 105 . The number of generations for each experiment is calculated as T /λ. In GA-SRM, the values of τ (threshold to trigger adaptation in SRM) are sampled and the one that produces the overall higher performance on the seven different classes of random problems is chosen. The threshold is set to τ = 0.64 for ADS and τ = 0.54 for ADP. The values of τ used for the real-world problems are indicated in Table 4.4.

4.3 Operators’ Balance

First, the importance of SRM and the operators’ balance for offspring creation is investigated. Three general cases are considered. 1. The size of the parent population P (t) is equal to the size of CM’s and SRM’s offspring populations, λSRM = ? = λCM . 2. P (t) is smaller than CM’s but bigger than SRM’s population, λSRM < ? < λCM . 28

Table 4.1: Genetic algorithms parameters. P arameter Representation Selection Scaling M ating Crossover pc (CM ) pm pm

(SRM )

cGA Binary Proport. Linear (xi , xj ), i = j one point 0.6 1/n -

GA(?, λ) Binary (?, λ) Proport. Linear (xi , xj ), i = j one point 0.6 1/n 1:2 -

?:λ λCM : λSRM

GA ? SRM (?, λ) Binary (?, λ) Proport. Linear (xi , xj ), i = j one point 1.0 1/n α = 0.5, = [n, 1/α] (ADS ) α = [0.5, 1/n], = n (ADP ) 1:2 1:1

3. P (t) is bigger than CM’s but smaller than SRM’s population, λSRM > ? > λCM . In the case of equal size populations, both operators could allocate all their offspring to P (t). Therefore, there is competition between the two operators’ offspring for every spot in P (t). The normalized mutant survival ratio γ , speci?ed by eq.(2.5), re?ects the number of mutants winners that survive after competing with CM’s offspring. Also, in this case the number of mutants that survive selection equals the number of CM’s offspring being eliminated. However, if one of the offspring populations is smaller than P (t), then it could at most cover a fraction of the parent population. Hence, competition for survival between operators is not for the ? spots but rather for ?c speci?ed by the size of the smaller population. This is because the best ? ? ?c of the exceeding population need not to compete in order to survive. For example, if the bigger population corresponds to SRM, ?SRM in eq.(2.5) includes not only the mutants winners but also those that survive without competition. Also, note that in this case the number of mutants that survive selection does not equal to the number of CM’s offspring being eliminated. To re?ect the competition between operators when different offspring population sizes are used the mutants survival ratio of eq.(2.5) is extended to γ=

c ?w SRM λ · c λc SRM ?

(4.1)

where ?w SRM is the number of individuals created by SRM that compete and survive selection (mutants winners), λc SRM is the offspring number created by SRM that undergoes competition, c c λc is the total offspring number that compete for survival (λc CM + λSRM ), and ? is the number of spots that SRM’s and CM’s offspring compete for in the parent population P (t). Note that eqs.(2.5) and (4.1) are the same if equal size populations are used (case 1). The balance between operators for offspring creation is studied using eq.(4.1). Setting the parent and offspring population to (?, λ) = (50, 100) several experiments are conducted especially for Weing 71 varying the number of offspring created by CM and SRM from an all CM regime to a 90% SRM regime. A 100% CM regime represents a genetic algorithm that applies “background” mutation after crossover and uses (?,λ) Proportional Selection, i.e. GA(50,100). Also, note that

1 For this problem the number of evaluations is set to T = 2 × 105 . The same number of evaluations were used in [31] and [52] using offspring populations of 100 and 50 individuals, respectively.

29

1

&0

V2IIVSULQJ

$YHUDJH ×

$YH 1

650

V2IIVSULQJ

Figure 4.1: Operators’ balance and search ability. because SRM’s adaptive mutation schedule is based on a mutant survival ratio, which re?ects the competition among SRM’s and CM’s offspring, it is not possible to test the algorithm with an all SRM regime and simultaneously keep its adaptive feature. The relationship between operators’ offspring balance and search ability (best individuals’ average and number of times the global optimum was found in 100 runs, Average and N respectively) is shown in Figure 4.1 when SRM is implemented with ADS. From this ?gure the following observations can be drawn. Ratios that favor SRM’s offspring, i.e. λSRM > 50%, produce better results than its opposites. A λCM : λSRM = 1 : 1 ratio is the best choice for stable and robust performance that simultaneously maximizes N and Average. In the following sections we use the best 1 : 1 operators’ balance.

4.4 Mutation Probability in CM

Second, the relevance of CM’s mutation probability is studied ?xing λCM : λSRM = 50 : 50. (CM ) values in the range [0.5/n, 1.5/n] are shown in Figure The model’s searching ability for pm 4.2, where n is the bit string length. From this ?gure the following observations are relevant. A (CM ) = 1/n turns out to be the probability that gives the highest values for Average and N , that pm (CM ) > 1/n are less deteriorative than values is a coincidence with the results in [24]. Values of pm (CM ) < 1/n are. of pm Segment size reduction, , as well as the number of individuals produced by SRM that survive (CM ) = 1/n in Figure 4.3 (a). Here it can be selection, ?SRM , are shown for one of the runs for pm observed that SRM contributes to the survivor parent population in every generation of the search process. The key factor for SRM to be an effective operator lies in its own regulation mechanism, i.e. the mutation rate is adjusted every time the number of mutants that survive selection falls 30

1 Q Q

$YHUDJH × Q

$YH 1

Q &0

VSP

Q

Figure 4.2: CM’s mutation probability and search ability (elimination of ?tness duplicates is off). under a minimum level τ . Also, the average number of SRM’s offspring that survive selection increases as the mutation segment is reduced. (CM ) ≤ 0.5/n it is observed that SRM’s offspring ?tness cannot compete with CM’s For pm offspring, which is specially critical during the early stages of the search, causing a premature reduction of SRM’s mutation rate, lost of diversity in the population and convergence to a local optimum. Another typical ?gure on SRM’s contribution ?SRM and segment size reduction for (CM ) = 0 (without mutation after crossover) is shown in Figure 4.3 (b) to compare with Figure pm 4.3(a). Small mutation after crossover is required in CM to achieve a robust search performance by keeping an appropriate balance between CM and SRM. Note that in this case, elimination of ?tness duplicates is not being considered.

4.5 Extinctive Selection Pressure

Next, the effect that extinctive selection has in the model is studied varying the size of the parent (CM ) = 1/n. Figure 4.4 shows results for population ? and setting λCM : λSRM = 50 : 50, pm (?, λ) = ({10, 20, ..., 90}, 100). High values of Average are attained for ratios of extinctive selection pressure in the range ?/λ=[40/100, 70/100]. For ? < 50 both CM and SRM produce offspring in excess of the parent population’s requirement (λCM > ? and λSRM > ?). In this case, there exists competition for survival even among CM’s offspring, and SRM’s offspring have to outperform CM’s best offspring to survive. As the parent population size is reduced competition conditions become severer. Conversely, for ? > 50 neither CM alone nor SRM can cover the parent population’s demand (λCM < ? and λSRM < ?). In this situation, even if CM totally outperforms SRM, the latter has guaranteed at least ? ? λCM of its best progeny for the next generation. However, with this scheme 31

?

3

O

650

VPLQLPXPOHYHO τ γ IDOOVXQGHU τ

O ? 650

*HQHUDWLRQ1XPEHU

(a) pm

(CM )

= 1/n.

?

50 40 30 20 10 0

105 ( n )

l

l ? SRM

SRM's minimum level τ γ falls under τ

70

35

1

10

100 Generation Number

(b) pm

(CM )

0 1000 2000

= 0.

Figure 4.3: SRM’s offspring number after extinctive selection.

the removal of CM’s offspring that are performing poorly is not facilitated. Note that for the worst CM’s offspring to be eliminated it has to be worse than the best ? ? λSRM SRM’s offspring. 32

1

$YHUDJH ×

λ=100 λ :λ $#

$YH 1

3DUHQW3RSXODWLRQ ?

Figure 4.4: Extinctive selection pressure and search ability.

4.6 SRM’s Adaptive Minimum Level τ

Figure 4.5 plots Average and N for values of τ in the range [0.20,0.52] for Weing 7. SRM’s offspring that survive selection increases as mutation rates are reduced. Therefore, if τ is too small the mutation rate could remain too high at the end of the search, i.e. after certain point there are no further reductions in SRM’s mutation rate because the mutants survival ratio γ is always higher than τ . In the experiments the minimum mutation rates were 3/n in more than 90% of the runs for τ ≤ 0.28 and 1.5/n in 92% of the runs for τ = 0.32. In both cases Average is high but there is a big difference in N. As τ is increased the minimum value of the mutation rate will tend to be 1/n and its reduction will be faster. In this example, the average time (on the hundred runs) at which 1/n mutation rate is reached is about 0.5T for τ = 0.48, and 0.25T for τ = 0.52. A proper reduction speed of SRM’s mutation rates guarantees a high Average and a SRM’s mutation rate close to 1/n during the ?nal stage of the search helps to locate the global optimum. From Figure 4.5, it can be observed that there is a broad range for the threshold τ in which the Average is very high. Also, that there is a safety-range in which both Average and N are high. Similar behavior is observed on other problems used to test the model.

4.7 Two-point and Uniform Crossover

Experiments using two point and uniform crossover are also conducted. Results using two point crossover by cGA and GA(?,λ) after T evaluations are (N, Ave, Stdev)={(0, 1086886.8, 1590.8), (0, 1092647.5, 3032.0)}, respectively. Similarly, results using uniform crossover are (N, Ave, Stdev)={(0, 1090392.1, 1020.7), (0, 1093748.4, 1776.9)}. 33

? 650

1 $YHUDJH ×

λ=100 λ :λ $#

$YH 1

650

V0LQLPXP/HYHO τ

Figure 4.5: SRM’s minimum level and search ability. These results are better than those obtained with one point crossover, yet the global optimum solution could not be found. In the case of GA-SRM(?,λ), however, the obtained results are quite similar to one point crossover.

4.8 Impact of the Population Size

The impact of the population size in the method’s robustness is veri?ed in Table 4.2 (a) and (b) by GA-SRM using ADS and ADP, respectively. All population con?gurations use the same T evaluations. The right number indicates the value of the global (local) optimum and the left one the number of times it was found. At the bottom of each column, Average and Stdev are also presented. From Table 4.2 it can be seen that the model using only 40% of the population size still produces high values for Average and N. These results are encouraging and suggest that another important bene?t of the cooperative-competitive model could be the reduction of the population size. Results by a larger population con?guration, i.e. GA-SRM(100,200), are also included. Under the same evaluation time, no considerable difference in results by GA-SRM could be seen varying its con?guration from (100,200) to (30,60) for this particular problem.

4.9 Search Ability and Number of Evaluations

The global search ability at various evaluation times is observed by letting the algorithms run for 4T evaluations. Table 4.3 presents results by GA-SRM for some intermediate times, where Stdev denotes the value of standard deviation around Average. Results by cGA and GA(?,λ) after 4T evaluations are (N, Ave, Stdev)={(0, 1087312.0, 1555.2), (0, 1093797.3, 1624.1) }, respectively. 34

Table 4.2: Results for Weing 7 using GA-SRM with different population sizes. (a) ADS.

GA-SRM(100,200) (CM ) pm = 0.01 GA-SRM(50,100) (CM ) pm = 0.01 GA-SRM(30,60) (CM ) pm = 0.01 GA-SRM(20,40) (CM ) pm = 0.01

22 58 2 4 1 4

τ = 0.46 1095445 1095382 1095357 1095352 1095295 1095264

26 47 15 2 3

τ = 0.40 1095445 1095382 1095357 1095352 1095266

22 43 12 3 2 5

τ = 0.33 1095445 1095382 1095357 1095352 1095266 1095264

9 < 1095264 Ave = 1095344.1 Stdev = 267.9

7 < 1095264 Ave = 1095345.47 Stdev = 337.17

13 < 1095264 Ave = 1095350.46 Stdev = 174.50

τ = 0.30 15 1095445 34 1095382 13 1095357 5 1095352 1 1095295 4 1095266 7 1095264 21 < 1095264 Ave = 1095265.45 Stdev = 498.52

(b) ADP.

GA-SRM(100,200) (CM ) pm = 0.01 GA-SRM(50,100) (CM ) pm = 0.01 GA-SRM(30,60) (CM ) pm = 0.01 GA-SRM(20,40) (CM ) pm = 0.01

τ = 0.46 11 1095445 21 1095382 11 1095357 1 1095352 3 1095295 15 1095264 38 < 1095264 Ave = 1095050.63 Stdev = 752.3

τ = 0.40 11 1095445 25 1095382 8 1095357 2 1095352 2 1095295 4 1095266 6 1095264 42 < 1095264 Ave = 1094908.34 Stdev = 1106.3

τ = 0.33 13 1095445 24 1095382 14 1095357 3 1095266 3 1095264

τ = 0.30 5 1095445 29 1095382 4 1095357 3 1095295 2 1095266 13 1095264 44 < 1095264 Ave = 1094823.49 Stdev = 1032.36

43 < 1095264 Ave = 1094877.47 Stdev = 986.56

Table 4.3 empirically show the effect of the proposed cooperative-competitive model in terms of higher search velocity and higher search reliability (reach better solutions with small Stdev values). Note the Average, N and Stdev for 0.25T and 0.5T . They also indicate that SRM is a continuous and effective source of diversity, which at the expense of time could be used to improve the search results. For example, when the algorithm was allowed to run for 2T evaluations for this particular problem, a remarkable improvement was achieved ?nding the global maximum 57% of the times with Average greater than the second known optimum and very small Stdev values.

4.10 Contribution of Parallel Genetic Operators and Extinctive Selection

In order to isolate the contributions of parallel genetic operators and higher selection pressure induced by extinctive selection several additional experiments are conducted using cGA, GA(?, λ), and GA-SRM(?, λ). Figure 4.6 plots the average objective ?tness in 100 runs of the best-so-far 35

Table 4.3: Results for Weing 7 using GA-SRM(50,100) with ADS under various evaluation times. Maximum 1095445 1095382 1095357 1095352 < 1095352 Average Stdev 0.25 T 2 8 2 1 87 1094854.1 602.8 0.5 T 11 29 8 2 50 1095177.6 545.3 T = 2 × 105 26 47 15 2 3 1095345.5 337.2 2T 57 41 1 1 1095417.4 41.6 4T 77 23

1095430.5 27.63

individual over the generations by a cGA(100), GA(50,100), and GA-SRM(50,100). From this ?gure it can be seen that the higher selection pressure of extinctive selection causes an increase on search velocity. GA(?,λ) in this problem also exhibits higher convergence reliability than cGA without extinctive selection; however, GA(?,λ) is still not able to ?nd the global optimum and the Average is lower compared to GA-SRM(?,λ) (See Table 4.5 for Weing 7). The only difference between GA(?,λ) and GA-SRM(?,λ) is the inclusion of adaptive mutation SRM in the latter. Therefore any difference in performance between these algorithms can be attributed to SRM. To better observe SRM’s contribution experiments are conducted in which starting with a GA(?,λ) con?guration (all CM and extinctive selection) after a predetermined number of evaluations the algorithm switches to a GA-SRM(?,λ) con?guration (CM, SRM and extinctive selection). Figure 4.7 plots results by an algorithm that makes the con?guration transition from GA(50,100) to GA-SRM(50,100) at {0.02T, 0.05T, 0.10T, 0.20T, 0.5T } evaluations, respectively. As a reference it also includes the results presented in Figure 4.6 by GA(?,λ) and GA-SRM(?,λ). From Figure 4.7 it can be seen that as soon as SRM is included ?tness starts to pick up increasing the convergence reliability of the algorithm. Also, early transitions produce higher performance. For example, ?nal results for the algorithms that perform transitions at 0.10T and 0.50T are (N, Ave)={(22, 1095242.1), (10, 1094912.8) }, respectively. Summarizing Figure 4.6 and Figure 4.7, GA-SRM(?,λ) gains its increase on search velocity from extinctive selection and its higher convergence reliability from the inclusion of parallel adaptive mutation. To further clarify the contribution of the interaction between CM and SRM during the latest stages of the search experiments are also conducted in which starting with a GA-SRM(?,λ) con?guration, after the mutation rate on SRM has reached a predetermined value, the algorithm switches either to a all CM regime with extinctive selection or to a all SRM regime with extinctive selection (in the latter case no further reductions on SRM’s mutation rate are done). Figure 4.8 plots results by an algorithm that makes the con?guration transition from GA-SRM(50,100) when the mutation segment length in SRM has reached = {6, 3}. As a reference it also includes the results presented in Figure 4.6 by GA-SRM(?,λ). From Figure 4.8 it can be seen that neither CM nor SRM alone but the interaction of both CM and SRM leads to a higher convergence reliability. To explain why the interaction of both CM and SRM works better than CM alone diversity values and performance should be looked at simultaneously. Figure 4.9 presents the ?tness value of the best individual in the population and the average hamming distance to the best individual ? over the generations for one of the runs by cGA and GA-SRM. The SRM’s mutation segment h reduction is also presented for GA-SRM. ? higher than GA-SRM after T evaluations. It can be seen that cGA ends up with values of h 36

Fitness 1.1 (× 10 )

6

1.05 1 0.95 0.9 0.85 1 10 100 Generation Number 1000 2000

GA-SRM (50,100) GA (50,100) cGA (100)

Figure 4.6: Average ?tness in 100 runs of the best-so-far individual.

)LWQHVV (YDOXDWLRQV

×

7

*$ 7R*$650DIWHU7 7R*$650DIWHU7 7R*$650DIWHU7 7R*$650DIWHU7 7R*$650DIWHU7 *$650

*HQHUDWLRQ1XPEHU

Figure 4.7: Con?guration transition: GA(50,100) to GA-SRM(50,100).

Also, at the end of the run the number of diverse individuals in the parent population is about 95% ? and the number of in cGA and 83% in GA-SRM. Letting the cGA run for 4T it is found that h 37

×

)LWQHVV

(YDOXDWLRQV

7

*$650 7RDOO650DIWHU O 7RDOO650DIWHU O 7RDOO&0DIWHU O 7RDOO&0DIWHU O

*HQHUDWLRQ1XPEHU

Figure 4.8: Con?guration transition: GA-SRM(50,100) to all SRM or all CM.

)LWQHVV ×

Q

O

*$650%HVW F*$%HVW O K LQ*$650 K LQF*$

*HQHUDWLRQ1XPEHU

Figure 4.9: Diversity and search ability.

diverse individuals remains at the same levels and that the quality of the solution does not increase signi?cantly (See 4.9). The higher levels of diversity observed in cGA after T evaluations and the 38

lack of improvement in the quality of solutions after 4T evaluations seem rather contradictory. The explanation for this comes from the highly multimodal nature of the landscape that the problem used in our simulations has and it is in accordance with the ?ndings in [38]. In [38] it is shown that the recombination of high ?tness individuals located on multiple peaks can often produce poor performing individuals in the valley between peaks, with the side effect of increasing the values of ? and the number of diverse individuals. On the other hand, the presence diversity metrics such as h of multiple peaks will have less in?uence on the mutation of a high ?tness individual on a particular ? in Figure 4.9 rather than being an peak (assuming small mutation probabilities). The values of h indication of the cGA’s ability to keep diversity show that CM alone has dif?culties pulling the population to higher peaks. In the case of GA-SRM, the lost of effectiveness by CM during the ?nal stages of the search is supplemented by the augment of SRM’s contribution conform mutation rates are reduced as shown in Figure 4.3(a).

4.11 Results for Real-World 0/1 Multiple Knapsack Problems

In this section additional results for various real-world 0/1 multiple knapsack problems are presented. In Table 4.4 column Problem indicates the knapsack instance Name, the number of objects n (it corresponds to the chromosome bit string length), the number of knapsacks m and the known global optimum value Max. Column Parameters shows the speci?c values set for τ (used only (CM ) ≈ 1/n, and number of evaluations T . Table in GA-SRM), CM’s mutation probability pm 4.5 show results by cGA (population of 100 individuals), GA(50,100), and GA-SRM(50,100) using either adaptive dynamic segment SRM-ADS or adaptive dynamic probability SRM-ADP. As a reference for comparison, Table 4.6 presents results for the same problems reported by Khuri et al.[52] running a genetic algorithm for the same T evaluations with a population of 50 individuals. Table 4.7 shows the latest results by B¨ ack et al.[31] particularly for Weing 7 running a genetic algorithm with constant mutation rates and other enhanced genetic algorithms that apply varying mutations for the same T = 2 × 105 evaluations using offspring populations of 100 individuals. From Table 4.5 it can be seen that the proposed method outperforms cGA and GA(?,λ) in every knapsack test problem where simulations were conducted. Also, although direct comparisons are not possible between GA-SRM(?,λ) and the algorithms used in [31] and [52], looking at Table 4.5, Table 4.6 and Table 4.7 it can be seen that the proposed algorithm gives better results. It should be specially noticed the results obtained for Weing 7 where the proposed GA-SRM found the global optimum 26% of the runs. In the same problem, genetic algorithms with either constant mutation rate or self-adaptive mutation rates could not ?nd the global optimum. Note that the algorithm that uses a time dependent hyperbolic deterministic schedule for mutation rate control with a (15,100) selection mechanism found it only 3% of the runs[31, 52]. Based on observations of the ?nal feasible solutions reached by the algorithms used in the simulations, it can be seen that for Weing 7 in the ranges [1095000,1095445], [1094000,1095445], and [1093000,1095445], there are at least 30, 134, and 189 peaks of different heights, respectively. This data might help to visualize the kind of landscape and the global search ability of the algorithms. It should be mentioned that ADS and ADP exhibit similar behavior. Although better results were obtained with ADS for most of the knapsack test problems used here, at this time it cannot be concluded whether ADS is superior to ADP. Also, the difference in performance between ADS and ADP might be relevant to the kind of epistasis[59] associated to the test problem. For the test problems used here there is no knowledge about the degree or the pattern of epistasis of the test problems. More investigation should be done to clarify this point in the future. 39

Table 4.4: Knapsack test problems. Problem n m 15 10 20 10 28 10 39 5 50 5 60 30 60 30 105 2 Parameters (CM ) pm T 0.067 5 × 103 0.050 104 0.036 5 × 104 0.030 105 0.020 105 0.017 105 0.017 105 0.01 2 × 105

Name Petersen 3 Petersen 4 Petersen 5 Petersen 6 Petersen 7 Sento 1 Sento 2 Weing 7

Max 4015 6120 12400 10618 16537 7772 8722 1095445

τ 0.48 0.52 0.48 0.48 0.48 0.52 0.52 0.40

Table 4.5: Results for various knapsack test problems.

cGA(100) Average Stdev 4007.0 12.0 6031.1 50.8 12278.0 55.7 10454.5 36.5 16300.9 53.7 7505.1 50.5 8506.3 33.9 1085421.8 1881.2 GA(50,100) Average 4013.4 6099.7 12375.1 10524.7 16367.8 7712.7 8682.1 1092615.0 GA-SRM(50,100) SRM-ADS SRM-ADP Average Stdev N Average Stdev 4015.0 0.0 97 4014.7 1.2 6112.5 8.4 54 6113.5 8.1 12398.9 5.5 98 12399.8 1.2 10588.2 37.6 16 10587.3 24.5 16485.2 53.1 21 16474.2 50.0 7770.3 5.4 67 7765.1 9.0 8718.5 5.3 50 8717.7 6.6 1095345.5 337.17 11 1094908.3 1106.3

Name Petersen 3 Petersen 4 Petersen 5 Petersen 6 Petersen 7 Sento 1 Sento 2 Weing 7

N 48 6 2 -

N 85 35 50 4 14 -

Stdev 3.8 59.6 67.5 67.4 93.6 57.8 31.7 2843.4

N 100 42 94 16 23 85 55 26

Table 4.6: Results by Khuri et al. [52]. Name N Average Petersen 3 83 4012.7 Petersen 4 33 6102.3 Petersen 5 33 12374.7 Petersen 6 4 10536.9 Petersen 7 1 16378.0 Sento 1 5 7626 Sento 2 2 8685 Weing 7 - 1093897 Table 4.7: Results for Weing 7 using other mutation schedules [31]. Mutation Schedule Constant mutation rate pm = 1/n Self-adaptive Time-dependent hyperbolic deterministic Selection (15,100) selection proportional selection (15,100) selection proportional selection (15,100) selection proportional selection N 3 Average 1091268 1093924 1092743 1094311 1094711 1094479

40

4.12 Results for Random 0/1 Multiple Knapsack Problems

In the previous sections the structure of GA-SRM has been deeply examined. The effect of extinctive selection, varying mutation parallel to crossover, and the interaction of adaptively varying mutations parallel to crossover on several small real-world knapsacks problems were studied too. In this section, to obtain a broader perspective on performance and scalability, the algorithms used before are applied to classes of larger random knapsacks problems varying the dif?culty of the problem. As mentioned before, factors related to the dif?culty of the problem are the number of knapsacks m (constraints), the size of the search space (number of objects n), and the tightness ratio φ (ratio of the feasible region to the whole search space). First, the performance of the algorithms on classes of problems with different ratios of the feasible region (φ) is observed. Figure 4.10 illustrates results by the algorithms on problems with m = 30 knapsacks, n = 100 objects, and ratio φ = {0.75, 0.50, 0.25}. The main conclusions drawn from Figure 4.10 are as follows. (i) The quality of solutions found by the GAs decrease (larger Gap values) as the ratio φ is reduced. These results are quite intuitive since reductions on φ imply reductions on the fraction of possible subsets of objects that constitute feasible solutions. Consequently, the ratio between the feasible part of the search space and the whole search space gets smaller and the smaller this ratio is the harder it is to ?nd feasible results. This was also observed for 0/1 single (m = 1) knapsack problems in [30]. (ii) The performance of GA(50,100) is by far superior to cGA(100) for all values of φ, indicating that extinctive selection is an important factor to increase the performance of GAs in problems with infeasible regions. (iii) The algorithms that combine extinctive selection with varying mutations give better results than a simple GA with extinctive selection. (iv) GA-SRM with ADS is superior to GA-SRM with ADP; the gain in quality of the solutions comes from the use of segment mutation strategy (ADS). Note that the difference in performance between GA-SRM with ADS and the other algorithms becomes more apparent for problems with larger infeasible regions (more dif?cult problems). Second, the effect of the number of constraints (knapsacks m) is observed ?xing the size of the search space (number of objects n) and the ratio of the feasible region to the whole search space (tightness ratio φ). Figure 4.11 illustrates results by the GAs on problems of m = {5, 10, 30} capacities, n = 100 objects, and ratio φ = 0.25. Similar to the tightness ratio φ of the feasible region, from Figure 4.11 it can be seen that increasing the number of constraints (knapsacks m) also has an strong impact on the performance of the algorithms. The behavior of the algorithms is similar to that observed in Figure 4.10. Looking at Figure 4.10 and Figure 4.11, judging from the relative increase on the Gap values and from the slopes of the curves, reducing the ratio of the feasible region (φ) has an stronger impact than increasing the number of constraints (knapsacks m). Third, the effect of the size of the search space (number of objects n) is observed ?xing the number of constraints (knapsacks m) and the ratio of the feasible region (tightness ratio φ). Figure 4.12 illustrates results by GAs on problems of m = 30 knapsacks, n = {100, 250, 500} objects, and ratio φ = 0.25. From Figure 4.12 it can be seen that increasing the size of the search space also makes it harder for the algorithms to ?nd high quality solutions. Looking at Figure 4.10, Figure 4.11, and Figure 4.12 and again judging from the slopes of the curves, it can be seen that (for the values of n, m, and φ used here) increasing the size of the search space (n) presents less dif?culties to the GAs than reducing the feasible region (φ) or increasing the number of constraints (m). An important observation is that changing the mutation strategy from ADP to ADS in SRM noticeably increases the quality of results achieved by GA-SRM. The better results achieved by 41

% Error Gap

10 9 8 7 6 5 4 3 2 1 0 0.75

cGA (50,100) GA (50,100) GA-SRM (50,100) ADP GA-SRM (50,100) ADS Tightness Ratio (φ)

0.5

0.25

Figure 4.10: Reducing the feasible region φ = {0.75, 0.50, 0.25}, m = 30, n = 100.

% Error Gap

10 9 8 7 6 5 4 3 2 1 5 10

cGA (50,100) GA (50,100) GA-SRM (50,100) ADP GA-SRM (50,100) ADS

Knapsacks (m)

30

Figure 4.11: Increasing the number of constraints m = {5, 10, 30}, n = 100, φ = 0.25.

GA-SRM with ADS2 suggests that the design of parallel mutation could be an important factor to improve further the search performance of GAs. The mutation strategies used by parallel mutation (SRM-ADS and SRM-ADP) is studied in detail in chapter 6 using NK-Landscapes[57] for problems having either nearest neighbor or random ?tness epistatic patterns among bits. It is shown that SRM-ADS’s convergence reliability

2

GA-SRM with ADS has proven effective for other problems as well[71, 72]

42

% Error Gap

10 9 8 7 6 5 4 3 2 1 0 100 250

cGA (50,100) GA (50,100) GA-SRM (50,100) ADP GA-SRM (50,100) ADS

Objects (n)

500

Figure 4.12: Increasing the search space n = {100, 250, 500}, m = 30, φ = 0.25

(which applies mutation within a continuous segment of the chromosome) is higher than SRMADP’s on problems with nearest neighbor epistatic patterns. When random epistatic patterns are present convergence reliability by SRM-ADS and SRM-ADP are similar[69, 73]. Since the object weights of the knapsack problems are drawn from a uniform generator, as mentioned in 5.2, random ?tness epistatic patterns among bits would seem more likely to be present. However, the results obtained in this work by GA-SRM with ADS resemble those achieved in NK-Landscapes with nearest neighbor epistatic patterns (especially for medium and large values of K ; that is, for highly multimodal problems). It should be noticed that in NK-Landscapes the whole search space constitutes a feasible region whereas the knapsack problems are highly constrained. Since a penalty term is used in the ?tness function, unfeasible solutions would map to low ?tness values increasing the sharpness around the peaks. However, it is not clear whether and how the constraints (number of knapsacks m) and the restrictiveness of the capacities (ratio φ) could affect the ?tness epistatic interactions among bits. This deserves further research. Similar to the case of real-world knapsack problems, the contribution of parallel adaptive mutation SRM and the robustness of its adaptive mechanism in the larger random problems are also veri?ed. Figure 4.13 plots results by an algorithm that makes the con?guration transition from GA(50,100) to GA-SRM(50,100) at {0.10T, 0.20T, 0.5T } evaluations, respectively. Every (SRM ) = 0.5. As time a transition takes place, initial mutation probability for SRM is set to pm a reference it also includes results by GA(50,100) and GA-SRM(50,100). From Figure 4.13 it can be seen that as soon as SRM is included ?tness starts to pick up increasing the convergence reliability of the algorithm. The almost instantaneous pick up on ?tness after the transitions in Figure 4.13 is also an indication of the robustness of the adaptation mechanism. That is, it quickly ?nds the suitable mutation probability regardless of the stage of the search. Figure 4.14 plots results by an algorithm that makes the con?guration transition from GASRM(50,100) to a all CM regime with extinctive selection or to a all SRM regime with extinctive selection (in the latter case no further adaptations on SRM’s mutation rate are done) as soon as 43

Fitness

22000

0.05 0.10

Evaluations 0.50

T

21000

GA-SRM (50,100) to GA-SRM after 0.10T to GA-SRM after 0.20T to GA-SRM after 0.50T GA (50,100)

20000

0

1000

2000

3000

4000

5000

Generations

Figure 4.13: Transition from GA(50,100) to GA-SRM(50,100) at various fractions of T

Fitness

22000

21000

20000

GA-SRM (50,100) to all CM after l = 2 to all SRM after l = 2 to all CM after l = 3 to all SRM after l = 3

19000

0

1000

2000

3000

4000

5000

Generations

Figure 4.14: Transitions from GA-SRM(50,100) to either all CM(50,100) or all SRM(50,100) regimes at = {2, 3} the mutation segment length in SRM has reached = {2, 3}. As a reference it also includes the results by GA-SRM(50,100). From Figure 4.14 it can be seen that neither CM nor SRM alone but the interaction of both CM and SRM leads to a higher convergence reliability. Figure 4.14 shows that CM alone (even though extinctive selection is on) has dif?culties pulling the population to higher peaks. In the case of GA-SRM, the lost of effectiveness by CM during the ?nal stages 44

?SRM

30 25 20 15 10 5 0 1

pm(CM) = 1/n ? SRM b

6 5 4 3 2 1 0

b

pm(CM) = 0 ? SRM b

10

100

Generations

1000

Figure 4.15: SRM’s contribution to the parent population (?SRM ) and number of bits actually (CM ) = {0, 1/n} ?ipped by SRM (b) for CM’s mutation probabilities of pm

Fitness

22000

20000

18000

pm(CM) 1/n 0 GA-SRM (50,100) GA-hM (50,100) GA (50,100) cGA

16000

0

1000

2000

3000

4000

5000

Generations

Figure 4.16: Fitness Transition for CM’s mutation probabilities of pm

(CM )

= {0, 1/n}

of the search is supplemented by the augment of SRM’s contribution conform mutation rates are (CM ) = 1/n). reduced as shown in Figure 4.15 (pm Furthermore, the behavior of GA-SRM when CM’s mutation probability is either on or off is also veri?ed for random problems. Figure 4.15 plots SRM’s average contribution (in 50 runs) to the parent population (?SRM ) and the average number of bits actually ?ipped by SRM (b) for 45

CM’s mutation probabilities of pCM = {0, 1/n}, respectively. From Figure 4.15 it can be seen m (CM ) = 1/n the average number of SRM’s offspring that survive selection (?SRM ) that when pm (CM ) = 0, however, it can be seen that after increases as SRM’s mutation rates are reduced. For pm few generations SRM does not contribute to the parent population, causing a premature reduction of SRM’s mutation rate, lost of diversity in the population and convergence to a local optimum. (CM ) = 0) nor by Note that in this case diversity is not being introduced at all neither by CM (pm SRM (?SRM = 0). Finally, Figure 4.16 shows the ?tness of the best so far individual by GA-SRM(50,100) and GA-hM(50,100) when CM’s mutation probability is either on or off. GA-hM(50,100) is a parallel varying mutation algorithm that includes deterministic varying mutations instead of the adaptive approach. Results by cGA(100) and GA(50,100) are also included for comparison. The major conclusion from this ?gure is that the lack of “background” mutation in CM affects negatively the convergence reliability even if deterministic mutation is used, although in this case the mutation schedule is not affected because mutation rate depends on time. Again, in the case that ?tness duplicates are not eliminated, small mutation after crossover in CM helps to achieve a robust search performance on random problems by keeping an appropriate balance between CM and SRM.

4.13 Conclusions

This chapter has studied the internal structure of GA-SRM applying adapting varying mutations (SRM) parallel to crossover & background mutation (CM) putting the operators in a cooperativecompetitive stand with each other by way of extinctive selection. Important structural issues such the balance between CM and SRM for offspring creation, the mutation probability in CM, the ratio between number of parents and number of offspring (extinctive selection pressure), and the threshold to trigger adaptation in SRM were analyzed. It was found that the sexual operator CM performs better than the asexual operator SRM during the initial stages of the search. On the other hand, SMR’s contribution signi?cantly increases as the search progresses, mutation rates are reduced, and the population approaches the global optimum. Also, in spite of CM’s initial effectiveness, con?gurations favoring SRM (mutation in general) result into better performance than con?gurations favoring CM. Properly balancing the operators the cooperation expected from CM and SRM emerges producing a higher search velocity and higher search reliability in both realworld and random generated 0/1 multiple knapsacks problems. Small mutation after crossover in CM helps to achieve a robust search performance by keeping an appropriate balance between CM and SRM. The robustness of the adaptation mechanism in SRM was veri?ed and it was also shown that neither CM nor SRM alone but the interaction of both CM and SRM leads to higher convergence reliability. Regarding mutation strategy, it was found that ADS performs better than ADP in these classes of problems. Comparisons were performed with canonical GAs and other enhanced GAs.

46

Chapter 5

Comparing Standard and Parallel Varying Mutation GAs: Effect of the Major Components

This chapter compares the proposed model of parallel varying mutation with the standard model of varying mutation across a broad range of problems. The statistical signi?cance of the results is veri?ed with ANOVA tests. First, the models are compared using deterministic varying mutations. Then, the models are compared using self-adaptive mutations. It is found that the proposed model is more effective and ef?cient than the standard model in both deterministic and self-adaptive varying mutation GAs. It is also found that the standard model of varying mutations affects negatively the (adaptive) self-adaptive mutation rate control. This strongly suggests that the standard model of varying mutation GAs may not be appropriate for combining forms of control.

47

5.1 Introduction

The standard model of varying mutation GAs raises several important questions such as: Is there interference between crossover and high mutation? If so, does it affect performance of the algorithm? In the case of (adaptive) self-adaptive varying mutation algorithms, does it affect the mutation rate control itself? And more generally, is it an appropriate model for combining forms of control (co-adaptation of parameters)? This chapter tries to answer such questions by comparing the performance of standard and parallel models of varying mutation GAs using deterministic and self-adaptive mutation rate controls for varying mutation[74, 75, 76]. As explained in 2.3.3. The deterministic time-dependent schedule reduces mutation rates in a hyperbolic shape[31]. The self-adaptive approach uses a continuous representation for mutation rates and updates mutation probabilities using a learning rate[31, 33]. In the following varying mutation serial to crossover refers to the standard model of varying mutation and varying mutation parallel to crossover refers to the proposed model of varying mutation. The effect on performance of extinctive selection and higher mutations is studied in both models of varying mutation GAs.

5.2 Experimental Setup

The following GAs are used in the simulations. A simple canonical GA that applies crossover followed by background mutation, denoted as cGA. A GA with deterministic varying mutation serial (parallel) to crossover, denoted as hGA (GA-hM). A GA with self-adaptive varying mutation serial (parallel) to crossover, denoted as sGA (GA-sM). The GAs use either proportional selection or (?,λ) proportional selection. This is indicated by appending to the name of the GA (?) or (?,λ), respectively1 . All algorithms use ?tness linear scaling and mating is restricted to (xi , xj ), i = j , so a solution will not cross with itself. For cGA, hGA, and sGA pc = 0.6 and for GA-hM and GA-sM the ratio for offspring creation is set (CM ) = 1/n. The learning rate for to λCM : λSRM = 1 : 1. Background mutation is set to pm self-adaptation is set to γ = 0.2. This chapter uses the dif?cult, large, and highly constrained, 0/1 multiple knapsack problems2 [53], created by the test problem generator as explained in 3.1, to observe the behavior of the varying mutation GAs in a broad range of classes of problems. Results are averaged over 10 problems for each of the seven classes of problems (50 runs per problem) and the number of generations is set to T = 5000. Vertical bars overlaying the mean curves represent 95% con?dence intervals. The statistical signi?cance of the results is veri?ed using ANOVA tests.

5.3 Simple GA and Extinctive Selection

First, we observe the effect of extinctive selection on the performance of a simple GA. Figure 5.1 plots the ?tness of the best-so-far individual over the generations by the canonical cGA(100) and a simple GA using (?, λ) = {(15, 100), (50, 100)} extinctive ratios. From this ?gure we see that extinctive selection alone remarkable improves the solution quality reached by the cGA in this kind of problems.

1 2

a simple GA with (?,λ) Proportional Selection is denoted GA(?,λ) http://mscmga.ms.ic.ac.uk/jeb/orlib/info.html

48

Fitness

22000

20000

cGA (100) GA (50,100) GA (15,100)

18000

0

1000

2000

3000

4000

5000

Generations

Figure 5.1: Effect of Extinctive Selection on a simple GA: Fitness transition over the generations by cGA and GA(?, λ)

?M

40 35 30 25 20 15 10 5 0 1 10 100

Generations ? M GA (50,100) ? M GA (15,100)

1000

Figure 5.2: Effect of Extinctive Selection on a simple GA: M’s contribution to the parent population (?M ) in GA(?, λ) (pc = 0.6)

Figure 5.2 plots the number of individuals created by implicit parallel mutation M3 present in the parent population after extinctive selection (?M ). Looking at GA(50,100), for example, we can see that ?M ? 20 throughout the run. That is, only about 50% of the offspring created by CM survive extinctive selection (pc = 0.6). Since GA(50,100) exhibits higher convergence than cGA(100), this suggests that the crossover based operator is highly disruptive.

3

Implicit parallel mutation M refers to mutation when is applied alone in the absence of crossover (1 ? pc ), see 2.2

49

As mentioned before, the problems used in this study are highly constrained with sparse feasible regions where algorithms with penalty functions have a hard time ?nding feasible solutions[30, 53]. A higher selection pressure in these problems is helping the algorithm to focus the search around the feasible regions. Extinctive selection has also another effect. It increases the convergence speed of the algorithm[24]. Both GA(15,100) and GA(50,100) are faster than cGA(100). However, between (15,100) and (50,100) we observe that the latter gives better ?nal results than the former.

5.4 Varying Mutation without Extinctive Selection

Second, we observe the effect of deterministically varying mutations either serial or parallel to crossover when no extinctive selection is being used. Figure 5.3 plots results by hGA(100) and GA-hM(100). In both algorithms the range for either serial or parallel varying mutation is [0.5, 1/n]. As a reference for comparison, it also presents results obtained by cGA(100) and GA(50,100).

Fitness

22000

20000

18000

16000

cGA (100) GA (50,100) GA-hM (100) hGA (100)

14000

0

1000

2000

3000

4000

5000

Generations

Figure 5.3: Deterministic Varying Mutation without Extinctive Selection From Figure 5.3 we can observe that varying mutations either serial or parallel to crossover without extinctive selection undermines the reliability of the algorithm (?nal results are poorer than cGA’s). Convergence velocity is considerably affected as well. It should be remarked that there is an initial period consisting of a large number of generations in which the ?tness of the best-so-far individual by hGA and GA-hM corresponds to the best feasible individual randomly created at generation 0 (horizontal lines). The mutation probability applied by these algorithms is too high at the beginning of the search. As mutation is reduced with the number of generations, the mutation probability becomes more suitable for these constrained problems and ?tness picks up. In the case of GA-hM this initial ?at period can be attributed exclusively to a too high mutation probability used by the varying mutation operator. However, in the case of hGA the disruption 50

that mutation causes to crossover also contributes to extend this ?at period. Note that between hGA and GA-hM, which apply the same determinist schedule for mutation rate control, GA-hM’s ?tness picks up much earlier than hGA’s. This is a clear indication of the disruption caused by high mutation after crossover in the case of hGA.

5.5 Extinctive Selection and Varying Mutation

5.5.1 Deterministic Varying Mutation

Deterministic mutation varies mutation rates with exactly the same schedule whether it is applied serial (hGA) or parallel to crossover (GA-hM) and therefore is an ideal candidate to isolate and observe the impact of higher mutations in both models of GAs. In order to observe in detail the effect of extinctive selection and deterministic varying mutation serial/parallel to crossover experiments are conducted using various populations (?, λ) = (t=0) = {0.50, 0.10, 0.05}. {(15, 100), (50, 100), (100, 100)} and initial mutation probabilities pm Figure 5.4 and 5.5 plot the average ?tness of the best-so-far individual over the generations illustrating the convergence behavior by hGA and GA-hM, respectively. Results by the simple GA with extinctive selection GA(50,100) are also included for comparison. Looking at Figure 5.3 and Figure 5.4 we can see that extinctive selection, similar to the case of cGA, increases the performance of hGA. Moreover, the combination of extinctive selection and varying mutation produces results with better ?nal quality than the simple GA with extinctive selection. As we increase the extinctive pressure (reduce ? while keeping constant λ) the initial ?at (t=0) = 0.5. In these period is shorten. See for example hGA(50,100) and hGA(15,100) for pm cases mutation would not be less harmful than it is in hGA(100) of Figure 5.3, but extinctive selection will discard much of the poor performing individuals. Note, however, that better ?nal results are given by (?,λ)=(50,100) rather than by (?,λ)=(15,100). Also, as expected, using lower values for the initial mutation probability helps to speed up (t=0) (t=0) = 0.5 and pm = 0.05. These smaller convergence. See for example hGA(50,100) for pm initial mutation values, however, do not help to increase the quality of the ?nal results. If convergence speed is to be considered, then from this ?gure it also becomes clear that deterministic varying mutations mechanisms, besides the number of generations, incorporates the initial mutation probability as an additional parameter, which must be set properly in order to achieve high performance. From Figure 5.5 we can see that the inclusion of extinctive selection and the reductions of the initial mutation probability in GA-hM produce similar effects to those remarked for hGA. Looking at both Figure 5.4 and Figure 5.5 becomes more apparent that varying mutation parallel to crossover is less disruptive than varying mutation serial to crossover. Contrary to hGA, in the case of GA-hM the initial ?at periods have disappeared and in all cases GA-hM converges faster than hGA for similar values of (?,λ) extinctive ratio and initial mutation proba(hM )(t=0) . Also, GA-hM(50,100)’s ?nal quality is better than hGA(50,100)’s (the statistical bility pm signi?cance is shown below). As a consequence of the less disruptiveness of mutation parallel to crossover, the initial value (t=0) pm set for varying mutation in GA-hM has a smaller impact on convergence speed than it does (t=0) (t=0) = 0.5 and pm = 0.05 and compare it in hGA. See for example GA-hM(50,100) for pm with hGA for similar settings in Figure 5.4. Thus, GA-hM can be considered as a more robust algorithm than hGA. 51

Fitness

22000

20000

18000

GA(50,100)

16000

hGA (15,100) (50,100)

pm(t=0) 0.50 0.10 0.05

14000

0

1000

2000

3000

4000

5000

Generations

Figure 5.4: Deterministic Varying Mutation Serial to Crossover (m = 30, n = 100, φ = 0.25)

Fitness

22000

20000

18000

GA(50,100)

16000

14000

GA-hM pm(t=0) (15,100) (50,100) 0.50 0.10 0.05

0

1000

2000

3000

4000

5000

Generations

Figure 5.5: Deterministic Varying Mutation Parallel to Crossover (m = 30, n = 100, φ = 0.25)

In GA-hM, similar to GA and hGA, a (?, λ)=(50,100) extinctive ratio gives better ?nal results than (?, λ)=(15,100). In fact, notice that GA-hM(15,100)’s ?nal quality is not better than GA(50,100)’s, which does not apply varying mutations. This is because the offspring created by varying parallel mutation hM within GA-hM have to compete with CM’s offspring (that always applies M with background mutation) and a (15,100) extinctive selection turns out to be too strong. A less strong selection pressure, such (50,100), gives better chances to hM’s offspring, which in turn would help to improve the search process. Figure 5.6, Figure 5.7, and Figure 5.8 plot the average percentage error gap by hGA and 52

% Error Gap

6

5

4

3

2

hGA (50,100) GA-hM (50,100)

1

0.75

Tightness Ratio (φ)

0.5

0.25

Figure 5.6: Convergence Reliability of Deterministic Varying Mutation GAs: Reducing the feasible region φ = {0.75, 0.50, 0.25}, m = 30, n = 100.

% Error Gap

6

5

4

3

hGA (50,100) GA-hM (50,100)

2

1

5

10

Knapsacks (m)

30

Figure 5.7: Convergence Reliability of Deterministic Varying Mutation GAs: Increasing the number of constraints m = {5, 10, 30}, n = 100, φ = 0.25.

GA-hM. These ?gures show the effect on convergence reliability of reducing the feasible region (φ), increasing the number of constraints (m), and increasing the search space (n), respectively. The vertical bars, overlaying the mean curves, represent 95% con?dence intervals. The statistical signi?cance of the results achieved by hGA and GA-hM is veri?ed. Table 53

% Error Gap

6

5

4

3

hGA (50,100) GA-hM (50,100)

2

1

100

250

Objects (n)

500

Figure 5.8: Convergence Reliability of Deterministic Varying Mutation GAs: Increasing the search space n = {100, 250, 500}, m = 30, φ = 0.25.

5.1 summarizes the three two-factor factorial analysis of variance (ANOVA) corresponding to the plots presented in Figure 5.6, Figure 5.7, and Figure 5.8. In Table 5.1 Source indicates the source of variation, df the degrees of freedom, F is the ratio between the mean square treatment and the mean square error, and Pval is the p value (the smallest signi?cant level α that would allow rejection of the null hypothesis). The treatment means are illustrated in Table 5.2, Table 5.3, and Table 5.4, respectively. From Table 5.1, inspection of the p values reveals that in the case of reducing the feasible region (φ) there is some indication of an effect by the GA type factor (serial/parallel application of deterministic varying mutation). Note that Pval = 0.0766 is not much greater than α = 0.05. In the cases of increasing the number of constraints (m) and the size of the search space (n) there are indications of a strong main effect by the GA type concluding that the parallel deterministic varying mutation (GA-hM) attains signi?cantly smaller error than the standard deterministic varying mutation GA (hGA). Notice that the p values 0.0157 and 0.0047, respectively, are considerably less than 0.05. Furthermore, note that in all three cases there is indication of a main effect by the problem dif?culty factor (φ, m, and n) but the interaction between GA type and problem dif?culty factor was not signi?cant.

54

Table 5.1: Factorial ANOVA for hGA and GA-hM Source GA φ GA-φ Error Total GA m GA-m Error Total GA n GA-n Error Total SS 0.18371 132.47870 0.05120 3.04276 135.75637 0.40017 70.77277 0.00654 3.47476 74.65424 0.54150 11.72973 0.00121 3.36696 15.63940 df 1 2 2 54 59 1 2 2 54 59 1 2 2 54 59 MS 0.18371 66.23935 0.02560 0.05635 0.40017 35.38639 0.00327 0.06435 0.54150 5.86487 0.00060 0.06235 F 3.260 1175.553 0.454 Pval 0.0766 0.0000 0.6373

6.219 549.927 0.051

0.0157 0.0000 0.9505

8.685 94.062 0.010

0.0047 0.0000 0.9903

Table 5.2: Error Gap Means by hGA and GA-hM reducing the feasible region (ratio φ) GA 0.75 (φ1 ) 1.49, 1.48 1.37, 1.31 1.48, 1.42 1.36, 1.65 1.45, 1.57 14.58 1.46 1.38, 1.42 1.35, 1.3 1.54, 1.36 1.31, 1.59 1.35, 1.47 14.07 1.41 Tφ1 = 28.65 ?φ1 = 1.43 Ratio φ 0.50 (φ2 ) 2.56, 2.38 2.51, 2.63 2.74, 2.7 2.32, 2.36 2.56, 2.37 25.13 2.51 2.46, 2.27 2.38, 2.56 2.56, 2.57 2.36, 2.26 2.46, 2.34 24.22 2.42 Tφ2 = 49.35 ?φ2 = 2.47

hGA (g1 )

GA-hM (g2 )

Totals Averages Totals Averages

0.25 (φ3 ) 4.92, 5.14 5.22, 5.33 4.96, 5.5 4.7, 5.39 5.17, 4.34 50.67 5.07 4.88, 5.02 5.06, 5.13 4.59, 5.33 4.43, 5.22 5.06, 4.05 48.77 4.88 Tφ3 = 99.44 ?φ3 = 4.97

Tg1 = 90.38 ?g1 = 3.01

Tg1 = 87.06 ?g2 = 2.90

55

Table 5.3: Error Gap Means by hGA and GA-hM increasing the number of constraints (knapsacks m) GA 5 (m1 ) 2.27, 2.43 2.53, 2.22 2.50, 2.42 2.60, 2.24 2.30, 2.36 23.87 2.39 2.10, 2.36 2.35, 1.99 2.24, 2.39 2.49, 2.25 2.02, 2.29 22.48 2.25 Tm1 = 46.35 ?m1 = 2.32 Knapsacks m 10 (m2 ) 3.35, 3.44 3.64, 3.43 3.55, 3.71 3.44, 3.58 3.68, 3.91 =35.73 3.57 3.25, 3.36 3.55, 3.11 3.56, 3.35 3.29, 3.48 3.44, 3.73 34.12 3.41 Tm2 = 69.85 ?m2 = 3.49

hGA (g1 )

Totals Averages GA-hM (g2 )

Totals Averages Totals Averages

30 (m3 ) 4.92, 5.14 5.22, 5.33 4.96, 5.50 4.70, 5.39 5.17, 4.34 50.67 5.07 4.88, 5.02 5.06, 5.13 4.59, 5.33 4.43, 5.22 5.06, 4.05 48.77 4.88 Tm3 = 99.44 ?m3 = 4.97

Tg1 = 110.27 ?g1 = 3.68

Tg2 = 105.37 ?g2 = 3.51

Table 5.4: Error Gap Means by hGA and GA-hM increasing the search space (number of objects n) GA 100 (n1 ) 4.92, 5.14 5.22, 5.33 4.96, 5.5 4.7, 5.39 5.17, 4.34 50.67 5.07 4.88, 5.02 5.06, 5.13 4.59, 5.33 4.43, 5.22 5.06, 4.05 48.77 4.88 Tn1 = 99.44 ?n1 = 4.97 Objecs n 250 (n2 ) 4.34, 4.04 4.21, 4.05 3.89, 4.25 4.21, 4.27 4.31, 4.27 41.84 4.18 4.02, 3.86 4, 3.95 3.64, 4.1 4.17, 4.01 4.14, 4.16 40.05 4.01 Tn2 = 81.89 ?n2 = 4.09

hGA (g1 )

Totals Averages GA-hM (g2 )

Totals Averages Totals Averages

500 (n3 ) 3.94, 4.34 4.11, 4.12 4.16, 3.76 4.19, 4.13 4, 4.09 40.84 4.08 3.98, 4.1 3.93, 3.87 3.91, 3.6 4.01, 3.91 3.73, 3.79 38.83 3.88 Tn3 = 79.67 ?n3 = 3.98

Tg1 = 133.35 ?g1 = 4.45

Tg2 = 127.65 ?g2 = 4.26

56

5.5.2 Self-Adaptive Varying Mutation

A self-adaptive scheme uses one mutation rate per individual, which are usually set at t = 0 to random values in the range allowed for mutation. Two important ingredients of self-adaptation are the diversity of parameter settings and the capability of the method to adapt the parameters. It has been indicated that some of the implementations of self-adaptation exploit more the diversity of parameter settings rather than adapting them. However, it has also been argued that the key to the success of self-adaptation seems to consist in using at the same time both a reasonably fast adaptation and reasonably large diversity to achieve a good convergence velocity and a good convergence reliability, respectively[33]. To observe the in?uence that the serial/parallel application of varying mutations could have on the self-adaptive capability itself we avoid initial diversity of parameters. Experiments are conmax ducted using populations (?, λ) = {(15, 100), (50, 100)} and mutation ranges of pm = [pmin m , pm ] = [1/n, {0.50, 0.25, 0.10, 0.05}]. In all cases initial mutation for each individual is set to the max(t=0) = pmax imum value allowed for the range, pm m . Figure 5.9 and Figure 5.10 plot the average ?tness of the best-so-far individual over the generations illustrating the convergence behavior by sGA and GA-sM, respectively. Results by GA(50,100) are also included for comparison. From these and previous ?gures it is worth noting the following. (i) Self-adaptive mutation increases convergence speed compared to deterministic mutation either serial or parallel to crossover. Looking at Figure 5.9 and Figure 5.4, note that in sGA the initial ?at periods observed in hGA have disappeared completely. Also, looking at Figure 5.10 and Figure 5.5 we can see that GA(t=0) sM(50,100)’s ?tness picks up much earlier than GA-hM(50,100)’s for similar values of pm . Between sGA and GA-sM, however, looking at Figure 5.9 and Figure 5.10 note that sGA can (t=0) match GA-sM’s convergence velocity only for small values of pm . This is an indication that even in the presence of adaptation the convergence velocity of a GA that applies varying mutation serial to crossover would depend heavily on initial mutation rates, which is not an issue if adaptive mutation is applied parallel to crossover. (ii) Contrary to deterministic varying mutation, convergence reliability of self-adaptive mutation serial to crossover could be severely affected, which becomes quite notorious if no initial diversity of parameters is allowed. Note in Figure 5.9 (t=0) = {0.10, 0.05} achieved better ?nal rethat only the con?gurations of sGA(50,100) having pm sults than GA(50,100). On the other hand, the initial lack of diversity of parameters does not affect convergence reliability of GA-sM. Note in Figure 5.10 that for the same selection pressure conver(t=0) gence reliability of GA-sM is similar for all values of pm . (iii) Similar to deterministic varying mutation, better results are achieved by (?, λ) = (50, 100) rather than by (?, λ) = (15, 100). (t=0) to a random value between Next, we allow for initial diversity of parameters setting pm the minimum and maximum value allowed for mutation. In this case, the disruption that higher serial mutation causes to crossover becomes less apparent due to the initial diversity of parameters and convergence speed is similar for both sGA and GA-sM. Convergence reliability of sGA also improves. However, the negative impact on reliability remains quite signi?cant for sGA. Figure 5.11 and Figure 5.12 illustrate the ?tness transition and the average ?ipped bits (Log scale) by sGA and GA-sM both with random initial mutation rates between [1/n,0.50]. Results for hGA and GA-hM are also included in Figure 5.11 for comparison. From these ?gures note that sGA converges to lower ?tness and reduces mutation rates faster than GA-sM. The self-adaptation principle tries to exploit the indirect link between favorable strategy parameters and objective function values. That is, appropriate parameters would lead to ?tter individuals, which in turn are more likely to survive and hence propagate the parameter they carry with them to their offspring. A GA that applies varying mutation parallel to crossover as GA-sM can interpret better the self-adaptation principle and achieve higher performance because (i) inappro57

Fitness 22000

21000

20000

19000

GA(50,100) sGA (15,100) (50,100) pm(t=0) 0.50 0.25 0.10 0.05

18000

17000

0

1000

2000 3000 Generations

4000

5000

(t=0)

Figure 5.9: Self-Adaptive Varying Mutation Serial to Crossover , pm n = 100, φ = 0.25).

(i) = pmax (m = 30, m

Fitness 22000

21000

20000

19000

GA(50,100) GA-sM (15,100) (50,100) pm(t=0) 0.50 0.25 0.10 0.05

18000

17000

0

1000

2000 3000 Generations

4000

5000

(t=0)

Figure 5.10: Self-Adaptive Varying Mutation Parallel to Crossover, pm n = 100, φ = 0.25).

(i) = pmax (m = 30, m

priate mutation parameters do not disrupt crossover, and (ii) it preserves mutation rates (see Eq. (2.4)) that are being useful to the search. A GA that applies varying mutation serial to crossover as sGA, however, can mislead the mutation rate control because (i) appropriate parameters can be eliminated due to ineffective crossover operations, and (ii) in sGA an appropriate parameter 58

Fitness

22000 21000 20000 19000 18000 17000 16000 15000 1 10 100 1000

(t=0)

hGA (50,100) GA-hM (50,100) sGA (50,100) GA-sM (50,100)

Generations

Figure 5.11: Convergence Velocity (m = 30, n = 100, φ = 0.25). pm (t=0) GA-hM. pm (i) = rand[1/n, 0.5] for sGA and GA-SM

= 0.5 for hGA and

Flipped Bits

24

sGA

20 16 12 8 4 0

GA-sM

1

10

Generations

100

1000

(t=0)

Figure 5.12: Average Number of Flipped Bits (m = 30, n = 100, φ = 0.25). pm (t=0) hGA and GA-hM. pm (i) = rand[1/n, 0.5] for sGA and GA-SM

= 0.5 for

implies parameters that would not affect greatly crossover. Thus, in sGA there is a selective bias towards smaller mutation rates. Figure 5.13 , Figure 5.14, and Figure 5.15 plot the average percentage error gap by sGA and GA-sM showing the effect on performance of reducing the feasible region (φ), increasing the 59

% Error Gap

6

5

4

3

2

sGA (50,100) GA-sM (50,100)

1

0.75

Tightness Ratio (φ)

0.5

0.25

Figure 5.13: Convergence Reliability of Self-Adaptive Varying Mutation GAs: Reducing the feasible region φ = {0.75, 0.50, 0.25}, m = 30, n = 100.

% Error Gap

6

5

4

3

sGA (50,100) GA-sM (50,100)

2

1

5

10

Knapsacks (m)

30

Figure 5.14: Convergence Reliability of Self-Adaptive Varying Mutation GAs: Increasing the number of constraints m = {5, 10, 30}, n = 100, φ = 0.25.

number of constraints (m), and increasing the search space (n). Table 5.5 summarizes the three two-factor factorial ANOVA corresponding to the plots presented in Figure 5.13, Figure 5.14, and Figure 5.15. The treatment means are illustrated in Table 5.6, Table 5.7, and Table 5.8, respectively. 60

% Error Gap

6

5

4

3

sGA (50,100) GA-sM (50,100)

2

1

100

250

Objects (n)

500

Figure 5.15: Convergence Reliability of Self-Adaptive Varying Mutation GAs: Increasing the search space n = {100, 250, 500}, m = 30, φ = 0.25.

From Table 5.5, inspection of the p values reveals that in all three cases (reducing the feasible region, increasing the number of constraints, and increasing the size of the search space) there are indications of a strong main effect by the GA type concluding that the parallel self-adaptive varying mutation GA (GA-sM) attains signi?cantly smaller error than the standard self-adaptive varying mutation GA (sGA). Notice that the p values 0.0029, 0.0009, and 0.0000, respectively, are considerably less than 0.05. Furthermore, note that in all three cases there are indications of a main effect by the problem dif?culty factor (φ, m, and n). In other words, increasing the dif?culty of the problem makes it more dif?cult for the self-adaptive GA to ?nd good solutions. In addition, note that the interaction between GA type and problem dif?culty factor was signi?cant for ratio φ and number of constraints m, which means that the self-adaptive parallel varying mutation GA (GA-sM) not only performs better but also scales up better than the standard self-adaptive varying mutation GA (sGA) as the dif?culty of the problem increases.

5.6 Conclusions

We have studied the application of varying mutation either serial or parallel to crossover and discussed its effect on the performance of deterministic and self-adaptive varying mutation GAs. Experiments were conducted with several classes of 0/1 multiple knapsacks problems. We found that mutation parallel to crossover is more effective and ef?cient than mutation serial to crossover. In deterministic varying mutation GAs, a GA with varying mutation parallel to crossover showed faster convergence and higher robustness to initial settings of mutation rate than a GA with varying mutation serial to crossover. Reducing the feasible region (φ) an ANOVA gave some indication of higher convergence reliability by the parallel application of deterministic varying mutation. Increasing the number of constraints (m) and the size of the search space (n) there are indications of a strong main effect concluding that the parallel deterministic varying mutation 61

Table 5.5: Factorial ANOVA for sGA and GA-sM Source GA φ GA-φ Error Total GA m GA-m Error Total GA n GA-n Error Total SS 0.70634 82.77031 0.59193 3.91955 87.98813 0.89060 74.48359 0.69806 3.87291 79.94516 3.33233 9.25077 0.08025 4.27902 16.94237 df 1 2 2 54 59 1 2 2 54 59 1 2 2 54 59 MS 0.70634 41.38516 0.29596 0.07258 0.89060 37.24180 0.34903 0.07172 3.33233 4.62539 0.04013 0.07924 F 9.731 570.167 4.078 Pval 0.0029 0.0000 0.0224

12.418 519.263 4.867

0.0009 0.0000 0.0114

42.053 58.371 0.506

0.0000 0.0000 0.6055

Table 5.6: Error Gap Means by sGA and GA-sM reducing the feasible region (ratio φ) GA 0.75 (φ1 ) 1.73, 1.8 1.84, 1.85 1.86, 1.82 1.77, 1.93 1.76, 1.99 18.35 1.84 1.78, 1.72 1.61, 1.58 1.75, 1.78 1.63, 1.93 1.74, 1.77 17.29 1.73 Tφ1 = 35.64 ?φ1 = 1.78 Ratio φ 0.50 (φ2 ) 2.97, 2.95 3, 3.2 3.36, 3.22 2.76, 3.01 3.09, 2.66 30.22 3.02 2.78, 2.93 2.93, 3.22 3.2, 3.24 2.73, 2.93 3.06, 2.71 29.73 2.97 Tφ2 = 59.95 ?φ2 = 3.00

sGA (g1 )

Totals Averages GA-sM (g2 )

Totals Averages Totals Averages

0.25 (φ3 ) 4.75, 4.92 4.96, 5.05 4.8, 5.73 4.49, 5.13 5.05, 4.08 48.96 4.90 4.33, 4.44 4.48, 4.71 4.22, 4.82 3.95, 4.81 4.62, 3.62 44.00 4.40 Tφ3 = 92.96 ?φ3 = 4.65

Tg1 = 97.53 ?g1 = 3.25

Tg2 = 91.02 ?g2 = 3.03

62

Table 5.7: Error Gap Means by sGA and GA-sM increasing the number of constraints (knapsacks m) GA 5 (m1 ) 1.73, 1.99 2.09, 1.66 1.98, 2.02 2.12, 1.74 1.86, 1.98 19.17 1.92 1.79, 1.99 2.1, 1.78 2.04, 1.95 2.01, 1.89 1.97, 1.96 19.48 1.95 Tm1 = 38.65 ?m1 = 1.93 Knapsacks m 10 (m2 ) 3.08, 3.21 3.3, 2.93 3.39, 2.95 3.22, 3.13 3.18, 3.48 31.87 3.19 2.76, 2.9 3.05, 2.67 2.9, 2.9 2.83, 2.96 2.94, 3.3 29.21 2.92 Tm2 = 61.08 ?m2 = 3.05

sGA (g1 )

Totals Averages GA-sM (g2 )

Totals Averages Totals Averages

30 (m3 ) 4.75, 4.92 4.96, 5.05 4.8, 5.73 4.49, 5.13 5.05, 4.08 48.96 4.90 4.33, 4.44 4.48, 4.71 4.22, 4.82 3.95, 4.81 4.62, 3.62 44.00 4.40 Tm3 = 92.96 ?m3 = 4.65

Tg1 = 100.00 ?g1 = 3.33

Tg2 = 92.69 ?g2 = 3.09

Table 5.8: Error Gap Means by sGA and GA-sM increasing the search space (number of objects n) GA 100 (n1 ) 4.75, 4.92 4.96, 5.05 4.8, 5.73 4.49, 5.13 5.05, 4.08 48.96 4.90 4.33, 4.44 4.48, 4.71 4.22, 4.82 3.95, 4.81 4.62, 3.62 44.00 4.40 Tn1 = 92.96 ?n1 = 4.65 Objecs n 250 (n2 ) 4.01, 4.04 4.09, 3.86 3.76, 4.12 4.33, 4.75 4.28, 4.33 41.57 4.16 3.56, 3.47 3.57, 3.54 3.28, 3.65 3.7, 3.88 3.67, 3.79 36.11 3.61 Tn2 = 77.68 ?n2 = 3.88

sGA (g1 )

Totals Averages GA-sM (g2 )

Totals Averages Totals Averages

500 (n3 ) 4, 4.11 4.02, 3.93 4.06, 3.7 3.94, 3.98 3.76, 3.96 39.46 3.95 3.59, 3.74 3.52, 3.57 3.77, 3.29 3.62, 3.57 3.52, 3.55 35.74 3.57 Tn3 = 75.20 ?n3 = 3.76

Tg1 = 129.99 ?g1 4.33

Tg2 = 115.85 ?g2 = 3.86

63

attains signi?cantly smaller error than the standard deterministic varying mutation GA. In self-adaptive varying mutation GAs, the convergence velocity of a parallel self-adaptive mutation GA was matched by a serial self-adaptive mutation GA only when initial diversity of parameters was allowed. Convergence reliability was higher for the parallel varying self-adaptive mutation GA with or without initial diversity of parameters. An ANOVA gave a strong indication in this direction weather the feasible region (φ) is reduced, the number of constraints (m) are increases, or the size of the search space (n) is enlarge. We also found that the standard model of varying mutations in fact affects negatively the (adaptive) self-adaptive mutation rate control. This strongly suggests that the standard model of varying mutation GAs may not be appropriate for combining forms of control. Among deterministic and self-adaptive varying mutation GAs, best performance was achieved by a parallel varying mutation self-adaptive GA.

64

Chapter 6

Studying the Behavior of GA-SRM on Epistatic Problems

This chapter examines the behavior of the parallel varying mutation GA-SRM on epistatic problems using NK-Landscapes. We discuss properties of NK-Landscapes and analyze the major components of GA-SRM. Comparisons are made with a canonical GA, a simple GA with extinctive selection, a mutation only evolutionary algorithm, and a random bit climber RBC+. We show that avoiding unwanted selective bias by eliminating ?tness duplicates and using an appropriate selection pressure make GAs quite robust in NK-Landscapes. We also show that parallel varying mutation improves further reliability of the GA and that mutation strategy is an important factor to improve the performance of GAs on problems with nearest neighbor patterns of epistasis. Similar to recent works, relatively larger landscapes than previous studies are used in order to be a step closer to problems found in real world applications.

65

6.1 Introduction

Test function generators[64] for broad classes of problems are seen as the correct approach for testing the performance of genetic algorithms (GAs). Kauffman’s NK-Landscapes model of epistatic interactions[57, 58] is a well known example of a class of test function generator and has been the center of several theoretical and empirical studies both for the statistical properties of the generated landscapes and for their GA-hardness[60, 64, 65, 66, 67]. Previous works that investigate properties of GAs with NK-Landscapes have mostly limited their study to small landscapes, typically 10 ≤ N ≤ 48, and observed the behavior of GAs only for few generations. Recently, studies are being conducted on larger landscapes expending more evaluations[61, 68, 69, 70] and the performance of GAs is being benchmarked against hill climbers[61, 68, 70]. Heckendorn et al.[61] analyzed the epistatic features of embedded landscapes showing that for NK-Landscapes all the schema information is random if K is suf?ciently large predicting that, “since genetic algorithms theoretically only perform well when the algorithm can effectively exploit relationships between schemata”[61], a standard GA would have no advantage over a strictly local search algorithm. To verify this, the authors empirically compared the performance of a random bit climber (RBC+), a simple GA, and an enhanced GA (CHC)[77] known to be robust in a wide variety of problems. Experiments were conducted for N = 100 varying K from 0 to 65. A striking result of this study was the overall better performance of the random bit climber RBC+. The authors encourage test generators for broad classes of problems but suggest that NK-Landscapes (and kSAT) seem not to be appropriate for testing the performance of genetic algorithms. Motivated by [61], Mathias et al.[68] provided an exhaustive experimental examination of the performance of similar algorithms including also Davis’ RBC[78]. A main conclusion of this study is that over the range 19 ≤ N ≤ 100 there is a niche for the enhanced CHC in the region of N > 30 for K = 1 and N > 60 for 1 ≤ K < 12. Yet, this niche is very narrow compared to the broad region where RBC and RBC+ show superiority. Adaptive evolution is a search process driven by mutation, recombination, drift, and selection over ?tness landscapes[58]. All of these are important components in a GA and are carefully considered within GA-SRM. (i) GA-SRM applies varying mutation (SRM) parallel to crossover & background mutation (CM) using extinctive selection to put higher mutations and crossover in a cooperative-competitive stand with each other. (ii) It relies in adaptation and mutation strategy to increase effectiveness of the parallel varying mutation operator. (iii) It eliminates ?tness duplicates enhancing selection by removing an unwanted source of selective bias, postponing drift, and providing a more fair competition between CM and SRM. This chapter examines the behavior of the parallel varying mutation GA-SRM on epistatic problems using NK-Landscapes[70]. Properties of NK-Landscapes are discussed and the effect on performance of the four major processes mentioned above (mutation, recombination, drift, and selection) is veri?ed. Mutation strategy for the varying mutation operator is also studied in detail. Experiments are conducted with NK-Landscapes for N = 48 and N = 96 varying K from 0 to N ? 1 in increments of 4. Comparisons are made with a canonical GA, a simple GA with extinctive selection, a mutation only EA, and the random bit climber RBC+. It is shown that avoiding unwanted selective bias by eliminating ?tness duplicates and using an appropriate selection pressure make GAs quite robust in NK-Landscapes. With regards to strictly local search algorithms, different to previous works, it is shown that even simple GAs with these two features perform better than RBC+ for a broad range of classes of problems (K ≥ 4). It is also shown that the interaction of parallel varying mutation with crossover improves further the reliability of the GA for 12 < K < 32. Contrary to intuition, it is found that for small K a mutation only EA is very effective and crossover may be omitted; but the relative importance 66

of crossover interacting with varying mutation increases with K performing better than mutation alone for K > 12. Finally, Mutation strategy for parallel varying mutation turns out to be an important factor to improve the performance of GAs on problems with nearest neighbor patterns of epistasis.

6.2 RBC+

RBC+[61] is a variant of a random bit climber (RBC) de?ned by Davis[78]. Both are local search algorithms that use a single bit neighborhood. We implement RBC+ following indications given in [61, 68, 78]. RBC+ begins with a random string of length N . A random permutation of the string positions is created and bits are complemented (i.e. ?ipped) one at the time following the order indicated by the random permutation. Each time a bit is complemented the string is re-evaluated. All changes that results in equally good or better solutions are kept and accounted as an accepted change. After testing all N positions indicated by the random permutation, if accepted changes were detected a new permutation of string positions is generated and testing continues. If no accepted changes were detected a local optimum has been found, in which case RBC+ opts for a “soft-restart”. That is, a random bit is complemented, the change is accepted regardless of the resulting ?tness, a new permutation of string positions is generated, and testing continues. These soft-restarts are allowed until 5 × N changes are accepted (including the bit changes that constituted the soft restarts). When RBC+ has exhausted the possibility of a soft-restart it opts for a “hard-restart” generating a new random bit string. This process continues until a given total number of evaluations have been expended. The difference between RBC and RBC+ is the inclusion of soft-restarts in the latter.

6.3 Experimental Setup

The study is conducted on NK Landscapes with N = 48 and N = 96 bits varying the number of epistatic interactions from K = 0 to K = N ? 1 in increments of 4. Two kinds of epistatic patterns are investigated for the distribution of the K bits among the N : (i) nearest neighbor, in which each bit interacts with its K/2 left and right adjacent bits, and (ii) random, in which each bit interacts with K other randomly chosen bits in the chromosome. Circular genotypes are considered to avoid boundary effects. The landscapes created with random espistatic pattern correspond to Kauffman’s random neighborhood whereas the landscapes created with nearest neighbor epistatic pattern are of two types: Kauffman’s adjacent neighborhood and generalized NK-Landscapes with P = N . Unless indicated otherwise, results are reported for landscapes with random epistatic pattern In order to observe and compare the effect on performance of selection pressure, parallel varying mutation, and recombination, GA-SRM is decomposed to create the four following algorithms: (i) A simple canonical GA with proportional selection and crossover & background mutation (CM), denoted as cGA; (ii) a simple GA with (?, λ) proportional selection and CM, denoted as GA(?, λ); (iii) a GA that uses (?, λ) proportional selection, CM, and parallel varying mutation SRM, denoted as GA-SRM(?, λ); and (iv ) a version of GA-SRM(?, λ) with no crossover (pc = 0.0), denoted as M-SRM(?, λ). To observe the effect of drift the algorithms are used with the ?tness duplicates elimination feature either on or off. The superscript ed attached to the name of a GA indicates that the elimination of duplicates feature is on, i.e cGAed , GAed(?, λ), GA-SRMed(?, λ), and M-SRMed(?, λ). GA-SRM’s adaptive mutation strategies, ADS and ADP, are used to study the relevance of mutation strategy to epistatic pattern. 67

Table 6.1: GAs Parameters

Parameter cGA Proportional 0.6 1/N – – GA(?, λ) (?, λ) Prop. 0.6 1/N – – GA GA-SRM(?, λ) ADS ADP (?, λ) Prop. (?, λ) Prop. 0.6 0.6 1/N 1/N α = 0.5 α = [0.5, 1/N ] = [N, 1/α] =N 1:1 1:1

Selection pc (CM ) pm (SRM ) pm λCM : λSRM

Similar to [64], for CM two-point crossover is used with probability pc as the recombination (CM ) as the mutation operator after operator and the standard bit-?ipping method with probability pm (SRM ) . crossover. In the case of GA-SRM, SRM is used with ADS or ADP with mutation rate pm Also, since adaptation is used for SRM the threshold τ that triggers adaptation is sampled for all combinations of mutation strategy, N and K . Results presented here for ADS and ADP are for its best sampled τ . The number of evaluations is set to 2 × 106 for all GAs and RBC+. The offspring population λ is set to 200. Several parent populations (?) are tried for the GAs that use extinctive selection. All GAs use linear scaling. The parameters used for the GAs are summarized in Table 6.1. The parameters for M-SRM(?, λ) are the same as GA-SRM(?, λ) with ADP, except that pc = 0.0 (no recombination). Results presented here are the mean value of the best solution observed over 50 different randomly generated problems starting each time with the same initial population. The vertical bars overlaying the mean curves in the plots represent 95% con?dence intervals. Note that the problems are maximized.

6.4 Selection Pressure

First, the effect of a higher selection pressure on the performance of a simple GA is observed. Figure 6.1 plots results by cGA(200) and GA with (?, λ)={(100,200), (60,200), (30,200)}. Results by the random bit climber RBC+ are also included for comparison. K = 0 and K = N ? 1 In the limits, K = 0 corresponds to an additive genetic model in which there are no epistatic interactions and yields a single peaked smooth ?tness landscape. On the opposite extreme, K = N ? 1 corresponds to a maximally rugged fully random ?tness landscape in which each gene is epistatically affected by all the remaining genes. In the extremes the performance of the algorithms is expected to be similar. This can be seen for K = 0 in Figure 6.1. 0<K <N ?1 In [57, 58] sampling the landscapes by one-mutant adaptive walks, it was observed that low levels of epistatic interactions seem to bend the landscape and yield higher optima than the K = 0 landscape. Increasing K , however, would both cause a complexity catastrophe (the attainable 68

Fitness

0.8

0.8

cGA(200) GA(100,200) GA(60,200) GA(30,200) RBC+

0.78 0.76 0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6

0 8 16 24

0.78 0.76 0.74 0.72 0.7 0.68 0.66 0.64 0.62

K

32

40

48

95

0.6

Figure 6.1: Higher selection pressure. Landscapes with random patterns of epistasis.

optima falls toward the mean ?tness of the landscape) and make ?tness landscapes more multipeaked. Recent works, using exhaustive search for N ≤ 25, con?rm that there is a positive correlation between K and the number of peaks[62] and that the global optima of the landscape increase for small K [68]. Note from Figure 6.1 that the highest optima is for small K . Contrary to [57, 58], however, [62] and [68] show that the global optima is not a function of K for a given N as K increases from small to large values. Thus, the decreasing best values found by the algorithms when K increases indicates that the search performance is in fact worsening, and not that the values of the optima in the NK-Landscapes are decreasing. For 0 < K < N ? 1, different from K = 0 and K = N ? 1, differences in performance by the GAs can be seen. Some important observations are as follows. 1. cGA does relatively well for very low epistasis (K ≤ 8) but its performance falls sharply for medium and high epistasis (K ≥ 12). Similar behavior by another simple GA has been observed in [61] and [68]. 2. The behavior of GA(?,λ) is revealing. GA(?, λ) that includes an stronger selection pressure performs worse than cGA for low epistasis but it outperforms cGA(200) for medium and high epistasis (K ≥ 12). The behavior of GA(100,200), GA(60,200), and GA(30,200) are similar. Results by cGA and GA(?, λ) indicate the importance of an appropriate selection pressure to pull the population to ?ttest regions of the search space (note that the genetic operators are the same in both algorithms). The selection pressure induced by proportional selection seems to be appropriate only for very small K . As K increases a stronger selection pressure works better. It is worth noting that a similar behavior by extinctive selection has been observed in large, dif?cult, and highly constrained combinatorial optimization problems[79]. 3. The overall performance of RBC+ is better than both cGA and GA(?, λ). 69

6.5 Parallel Varying Mutation, Mutation Strategy, and Patterns of Epistasis

Second, the effect of the mutation strategy used in parallel varying mutation is investigated on landscapes with different patterns of epistasis. Figure 6.2 and Figure 6.3 present the mean ?tness of the best solutions achieved by GA-SRM(100,200) with ADS, and GA-SRM(100,200) with ADP on landscapes with nearest neighbor and random epistatic patterns for N = 48 and K = 0, · · · , N ? 1. Similarly, Figure 6.4 and Figure 6.5 present results for N = 96. Results by cGA and GA(100,200) are also included for comparison. Kauffman[57, 58] has observed that (i) the actual ?tness of optima are largely insensitive to the distribution of K among N . That is, for the same values of N and K , the expected attainable optima is similar for nearest neighbor and random epistatic patterns. Also, he showed that (ii) for small values of K and two alleles either for random or nearest neighbor epistatic patterns there is a global structure to the ?tness landscape. That is, the global optima are not distributed randomly in genotype space but instead are near one another. As K increases this correlation drops, more rapidly for random than for nearest neighbor epistatic pattern. The correlation of K to the dispersion of sub optima, its relative ?tness and Hamming distance from the global optimum, is corroborated in [62]. Looking at Figure 6.2 and Figure 6.3 it can be seen that for same values of K (N = 48) higher optima are achieved by the GAs when nearest neighbor epistatic interactions exist (4 < K < N ? 1). Similar behavior can be observed for N = 96 in Figure 6.4 and Figure 6.5. If Kauffman’s observation that the achievable optima is similar for random and nearest neighbor epistatic patterns holds, this indicates that the GAs perform better on the presence of nearest neighbor epistatic patterns. Otherwise, this suggests that a nearest neighbor epistatic pattern does not only impose a stronger structure to the landscape as showed by Kauffman but also that it is capable of bending the landscape more than a random epistatic pattern does, which yields higher achievable optima. Either case, the epistatic pattern should be taken into account during the design of representations for problems to be solved by GAs. Representations that induce nearest neighbor epistatic interactions should be preferred over those that produce random interactions. From Figure 6.2 and Figure 6.4 looking at the results by ADS and ADP it is observed that in the case of a nearest neighbor epistatic pattern the strategy that mutates within a continue mutation segment, i.e. ADS, performs signi?cantly better than the strategy that mutates any bit of the chromosome, i.e. ADP. Bigger differences are observed for N = 96 than N = 48, which suggest that mutation strategy could be a more signi?cant factor as the search space increases. ADP performs similar to ADS only for low epistatic levels, K ≤ 8 and K ≤ 4 for N = 48 and N = 96, respectively. In the case of a random epistatic pattern, from ?gure Figure 6.3 it can be seen that for N = 48 there is no difference in performance by ADS and ADP for any value of K . From Figure 6.5, however, it can be observed that when N = 96 ADS performs better than ADP as K increases from medium to high levels of epistatis (K ≥ 20). ADS’s better performance is explained from Kaufmman’s ?ndings that epistatic interactions, even a random epistatic pattern, in?ict a global structure to the ?tness landscape in which high peaks are close in genotype space. In such case a segment mutation strategy as ADS can take advantage of this underlying structure to perform a more effective search. The global structure of the landscape is stronger for a nearest neighbor epistatic pattern and the difference in performance between ADS and ADP can be clearly noticed as indicated above. In the case of a random epistatic pattern this structure is weaker and its effect would remain hidden in small search spaces as N = 48, but would become more evident for medium and high K as N increases. 70

Fitness

0.8

cGA GA(?,λ) ADS ADP

0.75

0.7

0.65

0.6

0

4

8

12

16

20

24

47

K

Figure 6.2: Nearest neighbor patterns of epistasis: Mean ?tness after 104 generations for N = 48, K = 0, · · · , N ? 1

Fitness

0.8

cGA GA(?,λ) ADS ADP

0.75

0.7

0.65

0.6

0

4

8

12

16

20

24

47

K

Figure 6.3: Random patterns of epistasis: Mean ?tness after 104 generations for N = 48, K = 0, · · · , N ? 1

Figure 6.6 and Figure 6.7 illustrate typical ?tness transitions of the best so far individual on landscapes with nearest neighbor and random epistatic patterns for N = 48 and medium epistasis K = 12. Similarly, Figure 6.8 and Figure 6.9 present results for N = 96 and K = 24. Note that the plots for N = 96 are for 105 rather than 104 generations so the behavior of the algorithms can be observed in the long run. From these ?gures it can be seen that GA(?, λ) and GA-SRM(?, λ) that include extinctive 71

Fitness

0.8

cGA GA(?,λ) ADS ADP

0.75

0.7

0.65

0.6

0

8

16

24

32

40

48

95

K

Figure 6.4: Nearest neighbor patterns of epistasis: Mean ?tness after 104 generations for N = 96, K = 0, · · · , N ? 1

Fitness

0.8

cGA GA(?,λ) ADS ADP

0.75

0.7

0.65

0.6

0

8

16

24

32

40

48

95

K

Figure 6.5: Random patterns of epistasis: Mean ?tness after 104 generations for N = 96, K = 0, · · · , N ? 1

selection converge faster than cGA. As mentioned above, because of the higher selection pressure of extinctive selection, GA(?, λ) also converges to higher optima than cGA (for medium and higher epistasis). However, it stagnates because of lack of diversity. GA-SRM(?, λ), either with ADS or ADP, can improve further the ?tness achieved by GA(?, λ) because of its better balance between higher selection pressure and diversity introduced by SRM. 72

Fitness 0.8

0.75

0.7

0.65

cGA GA(?,λ) GA-SRM(ADS) GA-SRM(ADP)

0 2,000 4,000 6,000 8,000 10,000 Generation

0.6

Figure 6.6: Nearest neighbor patterns of epistasis: Fitness transition of best so far, N = 48, K = 12, 104 generations

Fitness 0.8

0.75

0.7

0.65

cGA GA(?,λ) GA-SRM(ADS) GA-SRM(ADP)

0 2,000 4,000 6,000 8,000 10,000 Generation

0.6

Figure 6.7: Random patterns of epistasis: Fitness transition of best so far, N = 48, K = 12, 104 generations

6.6 Duplicates Elimination

Third, the effect of genetic drift is observed by setting on the ?tness duplicates elimination feature. Figure 6.10 plots results by cGAed (200) and GAed(?, λ) with (?, λ)= {(100,200), (60,200), (30,200)}. Results by cGA(200) and RBC+ are also included for comparison. From this ?gure it can seen that eliminating duplicates affects differently the performance of the GAs. (i) It 73

Fitness 0.8

0.75

0.7

0.65

cGA GA(?,λ) GA-SRM(ADS) GA-SRM(ADP)

0 20,000 40,000 60,000 80,000 100,000 Generation

0.6

Figure 6.8: Nearest neighbor patterns of epistasis: Fitness transition of best so far, N = 96, K = 24, 105 generations

Fitness 0.8

0.75

0.7

0.65

cGA GA(?,λ) GA-SRM(ADS) GA-SRM(ADP)

0 20,000 40,000 60,000 80,000 100,000 Generation

0.6

Figure 6.9: Random patterns of epistasis: Fitness transition of best so far, N = 96, K = 24, 105 generations

deteriorates even more the performance of cGA. (ii) Conversely, the combination of higher selection pressure with duplicates elimination produces a striking increase on performance. Note that all GAed (?, λ) algorithms achieved higher optima than RBC+ for 4 ≤ K ≤ 40. The optima achieved by GAed(100,200) is lower than RBC+ for K = 48. However, note that GAed(60,200) and GAed (30,200) achieved higher optima than RBC+. This suggest that for K = 48 even the pressure imposed by (?, λ)=(100,200) is not strong enough. 74

Fitness

0.8

0.8 0.78 0.76 0.74 0.72 0.7 0.68 0.66

cGA(200