GA Grid (Beta) is an in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.
The following diagram depicts the steps performed by GA Grid.
NOTE: If you are unfamiliar with GA terminology please refer to the Glossary .
Figure 1: Genetic Algorithm Process
GA Grid is a viable solution for developing scalable GAs as it leverages Apache Ignite's core components :
- Advanced Clustering
- Compute Grid
- Data Grid
- SQL Grid
The following diagram depicts how GA Grid component fits into Apache Ignite.
Figure 2: GA Grid component in Apache Ignite.
The following diagram depicts GA Grid's architecture:
Figure 3: GA Grid Architecture
The diagram above depicts the major elements of GA Grid.
(F)itness Calculation, (C)rossover, and (M)utation operations are modeled as a ComputeTask for distributive behavior. The ComputeTask is split into multiple ComputeJobs, (ie: Fn,Cn,Mn) assigned to respective nodes, and executed in parallel.
All of these ComputeTasks leverage Apache Ignite's Affinity Colocation to route ComputeJobs to respective nodes where Chromosomes are stored in cache.
NOTE: GAGrid_HOME refers to GA Grid installation folder.
Prerequisites:
| Name | Value |
|---|---|
| JDK | Oracle JDK 7 and above |
| Ignite | 1.9.x and above |
Steps:
Here is the quick summary on installation of GA Grid on Apache Ignite:
- Copy the GAGrid_HOME\dist\gagrid-beta.jar to IGNITE_HOME\libs
- Copy the GAGrid_HOME\dist\gagrid-beta-tests.jar to IGNITE_HOME\libs
NOTE: gagrid-beta-tests.jar is required to run the examples
In order to begin using GA Grid, you will follow these steps:
- Create a GAConfiguration object
- Define the Gene and Chromosome
- Implement a fitness function
- Define terminate condition
- Evolve the population
We will use the HelloWorldGATest example to demonstrate.
In the HelloWorldGATest example, our goal is to derive the phrase:
"HELLO WORLD"
Create a GAConfiguration
To begin, we create a GAConfiguration object. This class is utilized to customize the behaviour of GA Grid.
ignite = Ignition.start("config/gagrid-config.xml");
// Create GAConfiguration
gaConfig = new GAConfiguration();
Define the Gene and Chromosome
Next, we define our Gene. For our problem domain, an optimal solution is the phrase "HELLO WORLD". Since the discrete parts are letters, we use a Character to model our Gene. Next, we need to initialize a Gene pool of 27 Gene objects utilizing Characters. The code snippet below depicts this process.
NOTE: GA Grid utilizes the Gene pool to initialize a replicated Gene cache.
List<Gene> genePool = new ArrayList();
char[] chars = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };
for (int i = 0; i < chars.length; i++) {
Gene gene = new Gene(new Character(chars[i]));
genePool.add(gene);
gaConfig.setGenePool(genes);
Next, we define the Chromosome as is central to a GA, because it models an optimal solution. The Chromosome is made of Genes which represent discrete parts of a particular solution.
For our GA, since the goal is a solution containing the phrase "HELLO WORLD", our Chromosomes should have 11 Genes (ie:Characters). As a result, we use the GAConfiguration to set to our Chromosome length to 11.
//set the Chromosome Length to '11' since 'HELLO WORLD' contains 11 characters.
gaConfig.setChromosomeLength(11);
During GA Grid execution, Chromosomes evolve into optimal solutions through the process of crossover and mutation. Next, only the best Chromosomes (ie: solutions) are chosen based on a fitness score.
NOTE: GA Grid internally stores potential (Chromosomes) in a Chromosome cache.
Optimal Solution:
Implement Fitness Function
GA Grid is intelligent enough to perform a majority of the process of natural selection. However, the GA Grid has no knowledge of the problem domain. For this reason, we need to define a fitness function. We will need to extend GA Grid’s IFitnessFunction class to calculate a 'fitness score' for a potential Chromosome. A ‘fitness score’ is used to determine how optimal the solution is relative to other potential solutions in the population. The code below demonstrates our fitness function.
public class HelloWorldFitnessFunction implements IFitnessFunction {
private String targetString = "HELLO WORLD";
@Override
public double evaluate(List<Gene> genes) {
double matches = 0;
for (int i = 0; i < genes.size(); i++) {
if (((Character) (genes.get(i).getValue())).equals(targetString.charAt(i))) {
matches = matches + 1;
}
}
return matches;
}
}
Next, we configure GAConfiguration with our HelloWorldFitnessFunction
// create and set Fitness function
HelloWorldFitnessFunction function = new HelloWorldFitnessFunction();
gaConfig.setFitnessFunction(function);
Define terminate condition
The next step is to specify a suitable terminate condition for GA Grid. The terminate condition will vary
depending the problem domain. For our use case, we want GA Grid to terminate when
a Chromosome's fitness score equals 11. We specify a terminate condition by implementing the ITerminateCriteria interface which has a single method isTerminateConditionMet().
public class HelloWorldTerminateCriteria implements ITerminateCriteria {
private IgniteLogger igniteLogger = null;
private Ignite ignite = null;
public HelloWorldTerminateCriteria(Ignite ignite) {
this.ignite = ignite;
this.igniteLogger = ignite.log();
}
public boolean isTerminationConditionMet(Chromosome fittestChromosome, double averageFitnessScore, int currentGeneration) {
boolean isTerminate = true;
igniteLogger.info("##########################################################################################");
igniteLogger.info("Generation: " + currentGeneration);
igniteLogger.info("Fittest is Chromosome Key: " + fittestChromosome);
igniteLogger.info("Chromsome: " + fittestChromosome);
printPhrase(GAGridUtils.getGenesInOrderForChromosome(ignite, fittestChromosome));
igniteLogger.info("Avg Chromsome Fitness: " + averageFitnessScore);
igniteLogger.info("##########################################################################################");
if (!(fittestChromosome.getFitnessScore() > 10)) {
isTerminate = false;
}
return isTerminate;
}
/**
* Helper to print Phrase
*
* @param genes
*/
private void printPhrase(List<Gene> genes) {
StringBuffer sbPhrase = new StringBuffer();
for (Gene gene : genes) {
sbPhrase.append(((Character) gene.getValue()).toString());
}
igniteLogger.info(sbPhrase.toString());
}
Next, we configure GAConfiguration with our HelloWorldTerminateCriteria.
//create and set TerminateCriteria
HelloWorldTerminateCriteria termCriteria = new HelloWorldTerminateCriteria(ignite);
gaConfig.setTerminateCriteria(termCriteria);
The final step is to initialize a GAGrid instance using our GAConfiguration and Ignite instances. Then we evolve the population by invoking GAGrid.evolve().
// initialize GAGrid
gaGrid = new GAGrid(gaConfig, ignite);
// evolve the population
Chromosome fittestChromosome = gaGrid.evolve();
To begin using GA Grid, open the command shell and type the following:
$ bin/ignite.sh GAGrid_HOME/config/ga-config.xml
NOTE: gs-config.xml is a copy Ignite's default Spring configuration .xml. Feel free to further customize based on use case.
Repeat this step for the number nodes you desire in your cluster.
Next, go to GAGrid_HOME folder and type:
mvn test -Dignite.version={ignite.version} -Dtest=HelloWorldGATest
Upon startup, you should see the following similar output on all nodes in the topology:
[21:41:49,327][INFO][main][GridCacheProcessor] Started cache [name=populationCache, mode=PARTITIONED]
[21:41:49,365][INFO][main][GridCacheProcessor] Started cache [name=geneCache, mode=REPLICATED]
Next, You will see the following output after some number of generations:
[19:04:17,307][INFO][main][] Generation: 208
[19:04:17,307][INFO][main][] Fittest is Chromosome Key: Chromosome [fitnessScore=11.0, id=319, genes=[8, 5, 12, 12, 15, 27, 23, 15, 18, 12, 4]]
[19:04:17,307][INFO][main][] Chromosome: Chromosome [fitnessScore=11.0, id=319, genes=[8, 5, 12, 12, 15, 27, 23, 15, 18, 12, 4]]
[19:04:17,310][INFO][main][] HELLO WORLD
[19:04:17,311][INFO][main][] Avg Chromosome Fitness: 5.252
[19:04:17,311][INFO][main][]
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.883 sec
All examples are JUnit tests and can be found in GAGrid_Home\src\test directory:
Please see examples for additional help on using GA Grid.
Apache Zeppelin is a web-based notebook that enables interactive data analytics. In this section we show how to leverage Zeppelin's visualization features to display optimal solutions generated by GA Grid. We will demonstrate these features using the JUnit HelloWorldGATest as an example.
NOTE: The steps presented in this section apply to all examples included with GA Grid.
Using the Ignite SQL interpreter, we can retrieve solutions from GA Grid's distributive caches:
[21:41:49,327][INFO][main][GridCacheProcessor] Started cache [name=populationCache, mode=PARTITIONED]
[21:41:49,365][INFO][main][GridCacheProcessor] Started cache [name=geneCache, mode=REPLICATED]
Be sure to download Apache Zeppelin 0.7.3 and follow these instructions prior to going to next section.
NOTE: Apache Zeppelin 0.7.3's highest supported version of Apache Ignite is 1.9.0.
Once, Apache Zeppelin is installed and running you should see the following start page:
Next, configure Ignite interpreter:
Click on "Interpreter" menu item. This page contains settings for all configured interpreter groups. Next, scroll down to the "Ignite" section and update properties as you need using "Edit" button.
The Ignite SQL interpreter requires only the ignite.jdbc.url property that contains the JDBC connection URL. Next, edit ignite.jdbc.url property and set to the following value: jdbc:ignite://localhost:11211/geneCache as shown below.
Then, click "Save" button to update changes in configuration. Be sure to restart interpreter after changes in configuration.
Next, click the "Create new note" menu item under "Notebook" to create a new Notebook.
Give your notebook the name "GAGridNotebook" and select "ignite" as the default interpreter as shown below. Next, click "Create Note" to continue.
Now, your newly created notebook should now be displayed as shown below. In order to execute SQL queries, we must use "%ignite.ignitesql" prefix.
GA Grid provides improved knowledge discovery by adding custom SQL functions to 'pivot' genetic optimization results. Columns in a result set are dynamically driven by Chromosome size for an individual genetic sample.
GA Grid Custom SQL Functions:
The following SQL functions are available in GA Grid's distributive 'geneCache':
| Function Name | Description |
|---|---|
| getSolutionsDesc() | Retrieve optimal solutions in descending order based on fitness score |
| getSolutionsAsc() | Retrieve optimal solutions in ascending order based on fitness score |
| getSolutionById(key) | Retrieve an optimal solution by Chromosome key |
To begin using your "GAGridNotebook", start a standalone Ignite node:
$ bin/ignite.sh GAGrid_HOME/config/ga-config.xml
Next, run the HelloWorldGATest example:
mvn test -Dignite.version={ignite.version} -Dtest=HelloWorldGATest
Once GA Grid has begun generating optimal solutions for the first generation you may begin querying data.
1st Generation:
##########################################################################################
[13:55:22,589][INFO][main][] Generation: 1
[13:55:22,596][INFO][main][] Fittest is Chromosome Key: Chromosome [fitnessScore=3.0, id=460, genes=[8, 3, 21, 5, 2, 22, 1, 19, 18, 12, 9]]
[13:55:22,596][INFO][main][] Chromosome: Chromosome [fitnessScore=3.0, id=460, genes=[8, 3, 21, 5, 2, 22, 1, 19, 18, 12, 9]]
[13:55:22,598][INFO][main][] HCUEBVASRLI
[13:55:22,598][INFO][main][] Avg Chromosome Fitness: 0.408
In Zeppelin paragraph window, simply type the following SQL and click the execute button:
%ignite.ignitesql call "geneCache".getSolutionsDesc();
After a few seconds, you will see a result similar to the the display below. For the HellowWorldGATest example, GA Grid will maintain a population of 500 solutions for each generation. Also, solutions in this example will contain a total of '11' genes and highest fitness score are considered 'most fit'. In the display below, the highlighted Chromosome record is the best solution because it has a fitness score of "9". If you wait a few seconds and re-execute query, you will see solution continue to evolve until the phrase "HELLO WORLD" is found.
After several generations, a GA Grid calculates a final solution as displayed below.
By clicking the bar chart icon, the SQL query results can be displayed as a Bar Chart as shown below:
Once you know the Chromosome key, you can search for an individual record. The display below depicts how to search for a single Chromosome record.
%ignite.ignitesql call "geneCache".getSolutionById(289);
Additional SQL for Data Analysis
#The following SQL retrieves all genes stored in Gene pool.
%ignite.ignitesql select * from "geneCache".Gene;
#The following SQL retrieves average fitness score among all Chromosomes in the current generation.
%ignite.ignitesql select AVG(FITNESSSCORE) from "populationCache".Chromosome;
Chromosome is a sequence of Genes. A Chromosome represents a potential solution.
Crossover is the process in which the genes within chromosomes are combined to derive new chromosomes.
Fitness Score is a numerical score that measures the value of a particular Chromosome (ie: solution) relative to other Chromosome in the population.
Gene is the discrete building blocks that make up the Chromosome.
Genetic Algorithm (GA) is a method of solving optimization problems by simulating the process of biological evolution. A GA continuously enhances a population of potential solutions. With each iteration, a GA selects the 'best fit' individuals from the current population to create offspring for the next generation. After subsequent generations, a GA will "evolve" the population toward an optimal solution.
Mutation is the process where genes within a chromosomes are randomly updated to produce new characteristics.
Population is the collection of potential solutions or Chromosomes.
Selection is the process of choosing candidate solutions (Chromosomes) for the next generation.
















