GitHub - techbysample/gagrid: GA Grid (Beta) is a distributive in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving complex optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform.

Summary

GA Grid (Beta) is an in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.

The following diagram depicts the steps performed by GA Grid.

NOTE: If you are unfamiliar with GA terminology please refer to the Glossary .

Figure 1: Genetic Algorithm Process

GA Grid is a viable solution for developing scalable GAs as it leverages Apache Ignite's core components :

Advanced Clustering
Compute Grid
Data Grid
SQL Grid

The following diagram depicts how GA Grid component fits into Apache Ignite.

Figure 2: GA Grid component in Apache Ignite.

The following diagram depicts GA Grid's architecture:

Figure 3: GA Grid Architecture

The diagram above depicts the major elements of GA Grid.
(F)itness Calculation, (C)rossover, and (M)utation operations are modeled as a ComputeTask for distributive behavior. The ComputeTask is split into multiple ComputeJobs, (ie: Fn,Cn,Mn) assigned to respective nodes, and executed in parallel.

All of these ComputeTasks leverage Apache Ignite's Affinity Colocation to route ComputeJobs to respective nodes where Chromosomes are stored in cache.

How to Install

NOTE: GAGrid_HOME refers to GA Grid installation folder.

Prerequisites:

Name	Value
JDK	Oracle JDK 7 and above
Ignite	1.9.x and above

Steps:

Here is the quick summary on installation of GA Grid on Apache Ignite:

Copy the GAGrid_HOME\dist\gagrid-beta.jar to IGNITE_HOME\libs
Copy the GAGrid_HOME\dist\gagrid-beta-tests.jar to IGNITE_HOME\libs

NOTE: gagrid-beta-tests.jar is required to run the examples

Usage Guide

In order to begin using GA Grid, you will follow these steps:

We will use the HelloWorldGATest example to demonstrate.

In the HelloWorldGATest example, our goal is to derive the phrase:

"HELLO WORLD"

Create a GAConfiguration

To begin, we create a GAConfiguration object. This class is utilized to customize the behaviour of GA Grid.

            ignite = Ignition.start("config/gagrid-config.xml");
            // Create GAConfiguration
            gaConfig = new GAConfiguration();

Define the Gene and Chromosome

Next, we define our Gene. For our problem domain, an optimal solution is the phrase "HELLO WORLD". Since the discrete parts are letters, we use a Character to model our Gene. Next, we need to initialize a Gene pool of 27 Gene objects utilizing Characters. The code snippet below depicts this process.

NOTE: GA Grid utilizes the Gene pool to initialize a replicated Gene cache.

    
       List<Gene> genePool = new ArrayList();

       char[] chars = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

       for (int i = 0; i < chars.length; i++) {
           Gene gene = new Gene(new Character(chars[i]));
           genePool.add(gene);
           
           gaConfig.setGenePool(genes);

Next, we define the Chromosome as is central to a GA, because it models an optimal solution. The Chromosome is made of Genes which represent discrete parts of a particular solution.

For our GA, since the goal is a solution containing the phrase "HELLO WORLD", our Chromosomes should have 11 Genes (ie:Characters). As a result, we use the GAConfiguration to set to our Chromosome length to 11.

//set the Chromosome Length to '11' since 'HELLO WORLD' contains 11 characters.
gaConfig.setChromosomeLength(11);

During GA Grid execution, Chromosomes evolve into optimal solutions through the process of crossover and mutation. Next, only the best Chromosomes (ie: solutions) are chosen based on a fitness score.

NOTE: GA Grid internally stores potential (Chromosomes) in a Chromosome cache.

Optimal Solution:

Implement Fitness Function

GA Grid is intelligent enough to perform a majority of the process of natural selection. However, the GA Grid has no knowledge of the problem domain. For this reason, we need to define a fitness function. We will need to extend GA Grid’s IFitnessFunction class to calculate a 'fitness score' for a potential Chromosome. A ‘fitness score’ is used to determine how optimal the solution is relative to other potential solutions in the population. The code below demonstrates our fitness function.

    public class HelloWorldFitnessFunction implements IFitnessFunction {

    private String targetString = "HELLO WORLD";

    @Override
    public double evaluate(List<Gene> genes) {

        double matches = 0;

        for (int i = 0; i < genes.size(); i++) {
            if (((Character) (genes.get(i).getValue())).equals(targetString.charAt(i))) {
                matches = matches + 1;
            }
        }
        return matches;
    }
}

Next, we configure GAConfiguration with our HelloWorldFitnessFunction

  // create and set Fitness function
            HelloWorldFitnessFunction function = new HelloWorldFitnessFunction();
            gaConfig.setFitnessFunction(function);

Define terminate condition
The next step is to specify a suitable terminate condition for GA Grid. The terminate condition will vary depending the problem domain. For our use case, we want GA Grid to terminate when a Chromosome's fitness score equals 11. We specify a terminate condition by implementing the ITerminateCriteria interface which has a single method isTerminateConditionMet().

public class HelloWorldTerminateCriteria implements ITerminateCriteria {

    private IgniteLogger igniteLogger = null;
    private Ignite ignite = null;
    
    public HelloWorldTerminateCriteria(Ignite ignite) {
        this.ignite = ignite;
        this.igniteLogger = ignite.log();

    }

    public boolean isTerminationConditionMet(Chromosome fittestChromosome, double averageFitnessScore, int currentGeneration) {
        boolean isTerminate = true;

        igniteLogger.info("##########################################################################################");
        igniteLogger.info("Generation: " + currentGeneration);
        igniteLogger.info("Fittest is Chromosome Key: " + fittestChromosome);
        igniteLogger.info("Chromsome: " + fittestChromosome);
        printPhrase(GAGridUtils.getGenesInOrderForChromosome(ignite, fittestChromosome));
        igniteLogger.info("Avg Chromsome Fitness: " + averageFitnessScore);
        igniteLogger.info("##########################################################################################");

        if (!(fittestChromosome.getFitnessScore() > 10)) {
            isTerminate = false;
        }
        
        return isTerminate;
    }
    
    
    /**
     * Helper to print Phrase
     * 
     * @param genes
     */
    private void printPhrase(List<Gene> genes) {
        
        StringBuffer sbPhrase = new StringBuffer();
        
        for (Gene gene : genes) {
            sbPhrase.append(((Character) gene.getValue()).toString());
        }
        igniteLogger.info(sbPhrase.toString());
    }

Next, we configure GAConfiguration with our HelloWorldTerminateCriteria.

//create and set TerminateCriteria
HelloWorldTerminateCriteria termCriteria = new HelloWorldTerminateCriteria(ignite);
gaConfig.setTerminateCriteria(termCriteria);

Evolve the population

The final step is to initialize a GAGrid instance using our GAConfiguration and Ignite instances. Then we evolve the population by invoking GAGrid.evolve().

// initialize GAGrid
gaGrid = new GAGrid(gaConfig, ignite);
// evolve the population
Chromosome fittestChromosome = gaGrid.evolve();

Start GA Grid

To begin using GA Grid, open the command shell and type the following:

$ bin/ignite.sh GAGrid_HOME/config/ga-config.xml

NOTE: gs-config.xml is a copy Ignite's default Spring configuration .xml. Feel free to further customize based on use case.

Repeat this step for the number nodes you desire in your cluster.

Next, go to GAGrid_HOME folder and type:

mvn test -Dignite.version={ignite.version} -Dtest=HelloWorldGATest

Upon startup, you should see the following similar output on all nodes in the topology:

[21:41:49,327][INFO][main][GridCacheProcessor] Started cache [name=populationCache, mode=PARTITIONED]
[21:41:49,365][INFO][main][GridCacheProcessor] Started cache [name=geneCache, mode=REPLICATED]

Next, You will see the following output after some number of generations:

[19:04:17,307][INFO][main][] Generation: 208
[19:04:17,307][INFO][main][] Fittest is Chromosome Key: Chromosome [fitnessScore=11.0, id=319, genes=[8, 5, 12, 12, 15, 27, 23, 15, 18, 12, 4]]
[19:04:17,307][INFO][main][] Chromosome: Chromosome [fitnessScore=11.0, id=319, genes=[8, 5, 12, 12, 15, 27, 23, 15, 18, 12, 4]]
[19:04:17,310][INFO][main][] HELLO WORLD
[19:04:17,311][INFO][main][] Avg Chromosome Fitness: 5.252
[19:04:17,311][INFO][main][] 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.883 sec

Examples

All examples are JUnit tests and can be found in GAGrid_Home\src\test directory:

Please see examples for additional help on using GA Grid.

Apache Zeppelin Integration

Apache Zeppelin is a web-based notebook that enables interactive data analytics. In this section we show how to leverage Zeppelin's visualization features to display optimal solutions generated by GA Grid. We will demonstrate these features using the JUnit HelloWorldGATest as an example.

NOTE: The steps presented in this section apply to all examples included with GA Grid.

Using the Ignite SQL interpreter, we can retrieve solutions from GA Grid's distributive caches:

[21:41:49,327][INFO][main][GridCacheProcessor] Started cache [name=populationCache, mode=PARTITIONED]
[21:41:49,365][INFO][main][GridCacheProcessor] Started cache [name=geneCache, mode=REPLICATED]

Be sure to download Apache Zeppelin 0.7.3 and follow these instructions prior to going to next section.

NOTE: Apache Zeppelin 0.7.3's highest supported version of Apache Ignite is 1.9.0.

Configure Ignite interpreter

Once, Apache Zeppelin is installed and running you should see the following start page:

Next, configure Ignite interpreter:

Click on "Interpreter" menu item. This page contains settings for all configured interpreter groups. Next, scroll down to the "Ignite" section and update properties as you need using "Edit" button.

The Ignite SQL interpreter requires only the ignite.jdbc.url property that contains the JDBC connection URL. Next, edit ignite.jdbc.url property and set to the following value: jdbc:ignite://localhost:11211/geneCache as shown below.

Then, click "Save" button to update changes in configuration. Be sure to restart interpreter after changes in configuration.

Create a New Notebook

Next, click the "Create new note" menu item under "Notebook" to create a new Notebook.

Give your notebook the name "GAGridNotebook" and select "ignite" as the default interpreter as shown below. Next, click "Create Note" to continue.

Now, your newly created notebook should now be displayed as shown below. In order to execute SQL queries, we must use "%ignite.ignitesql" prefix.

Using the Notebook

GA Grid provides improved knowledge discovery by adding custom SQL functions to 'pivot' genetic optimization results. Columns in a result set are dynamically driven by Chromosome size for an individual genetic sample.

GA Grid Custom SQL Functions:

The following SQL functions are available in GA Grid's distributive 'geneCache':

Function Name	Description
getSolutionsDesc()	Retrieve optimal solutions in descending order based on fitness score
getSolutionsAsc()	Retrieve optimal solutions in ascending order based on fitness score
getSolutionById(key)	Retrieve an optimal solution by Chromosome key

To begin using your "GAGridNotebook", start a standalone Ignite node:

$ bin/ignite.sh GAGrid_HOME/config/ga-config.xml

Next, run the HelloWorldGATest example:

mvn test -Dignite.version={ignite.version} -Dtest=HelloWorldGATest

Once GA Grid has begun generating optimal solutions for the first generation you may begin querying data.

1st Generation:

 ##########################################################################################
[13:55:22,589][INFO][main][] Generation: 1
[13:55:22,596][INFO][main][] Fittest is Chromosome Key: Chromosome [fitnessScore=3.0, id=460, genes=[8, 3, 21, 5, 2, 22, 1, 19, 18, 12, 9]]
[13:55:22,596][INFO][main][] Chromosome: Chromosome [fitnessScore=3.0, id=460, genes=[8, 3, 21, 5, 2, 22, 1, 19, 18, 12, 9]]
[13:55:22,598][INFO][main][] HCUEBVASRLI
[13:55:22,598][INFO][main][] Avg Chromosome Fitness: 0.408

In Zeppelin paragraph window, simply type the following SQL and click the execute button:

%ignite.ignitesql call "geneCache".getSolutionsDesc();

After a few seconds, you will see a result similar to the the display below. For the HellowWorldGATest example, GA Grid will maintain a population of 500 solutions for each generation. Also, solutions in this example will contain a total of '11' genes and highest fitness score are considered 'most fit'. In the display below, the highlighted Chromosome record is the best solution because it has a fitness score of "9". If you wait a few seconds and re-execute query, you will see solution continue to evolve until the phrase "HELLO WORLD" is found.

After several generations, a GA Grid calculates a final solution as displayed below.

By clicking the bar chart icon, the SQL query results can be displayed as a Bar Chart as shown below:

Once you know the Chromosome key, you can search for an individual record. The display below depicts how to search for a single Chromosome record.

%ignite.ignitesql call "geneCache".getSolutionById(289);

Additional SQL for Data Analysis

#The following SQL retrieves all genes stored in Gene pool.

%ignite.ignitesql select * from "geneCache".Gene;

#The following SQL retrieves average fitness score among all Chromosomes in the current generation.

%ignite.ignitesql select AVG(FITNESSSCORE) from "populationCache".Chromosome;

Glossary

Chromosome is a sequence of Genes. A Chromosome represents a potential solution.

Crossover is the process in which the genes within chromosomes are combined to derive new chromosomes.

Fitness Score is a numerical score that measures the value of a particular Chromosome (ie: solution) relative to other Chromosome in the population.

Gene is the discrete building blocks that make up the Chromosome.

Genetic Algorithm (GA) is a method of solving optimization problems by simulating the process of biological evolution. A GA continuously enhances a population of potential solutions. With each iteration, a GA selects the 'best fit' individuals from the current population to create offspring for the next generation. After subsequent generations, a GA will "evolve" the population toward an optimal solution.

Mutation is the process where genes within a chromosomes are randomly updated to produce new characteristics.

Population is the collection of potential solutions or Chromosomes.

Selection is the process of choosing candidate solutions (Chromosomes) for the next generation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
dist		dist
javadoc		javadoc
src		src
License.txt		License.txt
ReadMe.md		ReadMe.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

How to Install

Usage Guide

Examples

Apache Zeppelin Integration

Glossary

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Uh oh!

Languages