Skip to content

techbysample/gagrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GA Process Diagram

GA Grid (Beta) is an in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.

The following diagram depicts the steps performed by GA Grid.

NOTE: If you are unfamiliar with GA terminology please refer to the Glossary .

Figure 1: Genetic Algorithm Process


GA Process Diagram


GA Grid is a viable solution for developing scalable GAs as it leverages Apache Ignite's core components :

  • Advanced Clustering
  • Compute Grid
  • Data Grid
  • SQL Grid

The following diagram depicts how GA Grid component fits into Apache Ignite.

Figure 2: GA Grid component in Apache Ignite.

GA Process Diagram


The following diagram depicts GA Grid's architecture:

Figure 3: GA Grid Architecture

GA Process Diagram

The diagram above depicts the major elements of GA Grid.
(F)itness Calculation, (C)rossover, and (M)utation operations are modeled as a ComputeTask for distributive behavior. The ComputeTask is split into multiple ComputeJobs, (ie: Fn,Cn,Mn) assigned to respective nodes, and executed in parallel.

All of these ComputeTasks leverage Apache Ignite's Affinity Colocation to route ComputeJobs to respective nodes where Chromosomes are stored in cache.

NOTE: GAGrid_HOME refers to GA Grid installation folder.

Prerequisites:

Name Value
JDK Oracle JDK 7 and above
Ignite 1.9.x and above

Steps:

Here is the quick summary on installation of GA Grid on Apache Ignite:

  • Copy the GAGrid_HOME\dist\gagrid-beta.jar to IGNITE_HOME\libs
  • Copy the GAGrid_HOME\dist\gagrid-beta-tests.jar to IGNITE_HOME\libs

    NOTE: gagrid-beta-tests.jar is required to run the examples

In order to begin using GA Grid, you will follow these steps:

  1. Create a GAConfiguration object
  2. Define the Gene and Chromosome
  3. Implement a fitness function
  4. Define terminate condition
  5. Evolve the population

We will use the HelloWorldGATest example to demonstrate.

In the HelloWorldGATest example, our goal is to derive the phrase:

"HELLO WORLD"

Create a GAConfiguration

To begin, we create a GAConfiguration object. This class is utilized to customize the behaviour of GA Grid.

            ignite = Ignition.start("config/gagrid-config.xml");
            // Create GAConfiguration
            gaConfig = new GAConfiguration();
            

Define the Gene and Chromosome

Next, we define our Gene. For our problem domain, an optimal solution is the phrase "HELLO WORLD". Since the discrete parts are letters, we use a Character to model our Gene. Next, we need to initialize a Gene pool of 27 Gene objects utilizing Characters. The code snippet below depicts this process.

NOTE: GA Grid utilizes the Gene pool to initialize a replicated Gene cache.

    
       List<Gene> genePool = new ArrayList();

       char[] chars = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

       for (int i = 0; i < chars.length; i++) {
           Gene gene = new Gene(new Character(chars[i]));
           genePool.add(gene);
           
           gaConfig.setGenePool(genes);
      
   

Next, we define the Chromosome as is central to a GA, because it models an optimal solution. The Chromosome is made of Genes which represent discrete parts of a particular solution.

For our GA, since the goal is a solution containing the phrase "HELLO WORLD", our Chromosomes should have 11 Genes (ie:Characters). As a result, we use the GAConfiguration to set to our Chromosome length to 11.

//set the Chromosome Length to '11' since 'HELLO WORLD' contains 11 characters.
gaConfig.setChromosomeLength(11);

During GA Grid execution, Chromosomes evolve into optimal solutions through the process of crossover and mutation. Next, only the best Chromosomes (ie: solutions) are chosen based on a fitness score.

NOTE: GA Grid internally stores potential (Chromosomes) in a Chromosome cache.


Optimal Solution:

HelloWorld


Implement Fitness Function

GA Grid is intelligent enough to perform a majority of the process of natural selection. However, the GA Grid has no knowledge of the problem domain. For this reason, we need to define a fitness function. We will need to extend GA Grid’s IFitnessFunction class to calculate a 'fitness score' for a potential Chromosome. A ‘fitness score’ is used to determine how optimal the solution is relative to other potential solutions in the population. The code below demonstrates our fitness function.


    public class HelloWorldFitnessFunction implements IFitnessFunction {

    private String targetString = "HELLO WORLD";

    @Override
    public double evaluate(List<Gene> genes) {

        double matches = 0;

        for (int i = 0; i < genes.size(); i++) {
            if (((Character) (genes.get(i).getValue())).equals(targetString.charAt(i))) {
                matches = matches + 1;
            }
        }
        return matches;
    }
}
       

Next, we configure GAConfiguration with our HelloWorldFitnessFunction

  // create and set Fitness function
            HelloWorldFitnessFunction function = new HelloWorldFitnessFunction();
            gaConfig.setFitnessFunction(function);

Define terminate condition
The next step is to specify a suitable terminate condition for GA Grid. The terminate condition will vary depending the problem domain. For our use case, we want GA Grid to terminate when a Chromosome's fitness score equals 11. We specify a terminate condition by implementing the ITerminateCriteria interface which has a single method isTerminateConditionMet().

public class HelloWorldTerminateCriteria implements ITerminateCriteria {

    private IgniteLogger igniteLogger = null;
    private Ignite ignite = null;
    
    public HelloWorldTerminateCriteria(Ignite ignite) {
        this.ignite = ignite;
        this.igniteLogger = ignite.log();

    }

    public boolean isTerminationConditionMet(Chromosome fittestChromosome, double averageFitnessScore, int currentGeneration) {
        boolean isTerminate = true;

        igniteLogger.info("##########################################################################################");
        igniteLogger.info("Generation: " + currentGeneration);
        igniteLogger.info("Fittest is Chromosome Key: " + fittestChromosome);
        igniteLogger.info("Chromsome: " + fittestChromosome);
        printPhrase(GAGridUtils.getGenesInOrderForChromosome(ignite, fittestChromosome));
        igniteLogger.info("Avg Chromsome Fitness: " + averageFitnessScore);
        igniteLogger.info("##########################################################################################");

        if (!(fittestChromosome.getFitnessScore() > 10)) {
            isTerminate = false;
        }
        
        return isTerminate;
    }
    
    
    /**
     * Helper to print Phrase
     * 
     * @param genes
     */
    private void printPhrase(List<Gene> genes) {
        
        StringBuffer sbPhrase = new StringBuffer();
        
        for (Gene gene : genes) {
            sbPhrase.append(((Character) gene.getValue()).toString());
        }
        igniteLogger.info(sbPhrase.toString());
    }

Next, we configure GAConfiguration with our HelloWorldTerminateCriteria.

//create and set TerminateCriteria
HelloWorldTerminateCriteria termCriteria = new HelloWorldTerminateCriteria(ignite);
gaConfig.setTerminateCriteria(termCriteria);

Evolve the population

The final step is to initialize a GAGrid instance using our GAConfiguration and Ignite instances. Then we evolve the population by invoking GAGrid.evolve().

// initialize GAGrid
gaGrid = new GAGrid(gaConfig, ignite);
// evolve the population
Chromosome fittestChromosome = gaGrid.evolve();

Start GA Grid

To begin using GA Grid, open the command shell and type the following:

$ bin/ignite.sh GAGrid_HOME/config/ga-config.xml

NOTE: gs-config.xml is a copy Ignite's default Spring configuration .xml. Feel free to further customize based on use case.

Repeat this step for the number nodes you desire in your cluster.

Next, go to GAGrid_HOME folder and type:

mvn test -Dignite.version={ignite.version} -Dtest=HelloWorldGATest

Upon startup, you should see the following similar output on all nodes in the topology:

[21:41:49,327][INFO][main][GridCacheProcessor] Started cache [name=populationCache, mode=PARTITIONED]
[21:41:49,365][INFO][main][GridCacheProcessor] Started cache [name=geneCache, mode=REPLICATED]

Next, You will see the following output after some number of generations:

[19:04:17,307][INFO][main][] Generation: 208
[19:04:17,307][INFO][main][] Fittest is Chromosome Key: Chromosome [fitnessScore=11.0, id=319, genes=[8, 5, 12, 12, 15, 27, 23, 15, 18, 12, 4]]
[19:04:17,307][INFO][main][] Chromosome: Chromosome [fitnessScore=11.0, id=319, genes=[8, 5, 12, 12, 15, 27, 23, 15, 18, 12, 4]]
[19:04:17,310][INFO][main][] HELLO WORLD
[19:04:17,311][INFO][main][] Avg Chromosome Fitness: 5.252
[19:04:17,311][INFO][main][] 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.883 sec

All examples are JUnit tests and can be found in GAGrid_Home\src\test directory:

Please see examples for additional help on using GA Grid.

Apache Zeppelin is a web-based notebook that enables interactive data analytics. In this section we show how to leverage Zeppelin's visualization features to display optimal solutions generated by GA Grid. We will demonstrate these features using the JUnit HelloWorldGATest as an example.

NOTE: The steps presented in this section apply to all examples included with GA Grid.

Using the Ignite SQL interpreter, we can retrieve solutions from GA Grid's distributive caches:

[21:41:49,327][INFO][main][GridCacheProcessor] Started cache [name=populationCache, mode=PARTITIONED]
[21:41:49,365][INFO][main][GridCacheProcessor] Started cache [name=geneCache, mode=REPLICATED]

Be sure to download Apache Zeppelin 0.7.3 and follow these instructions prior to going to next section.

NOTE: Apache Zeppelin 0.7.3's highest supported version of Apache Ignite is 1.9.0.

Configure Ignite interpreter

Once, Apache Zeppelin is installed and running you should see the following start page:

Zeppelin Configuration

Next, configure Ignite interpreter:

Click on "Interpreter" menu item. This page contains settings for all configured interpreter groups. Next, scroll down to the "Ignite" section and update properties as you need using "Edit" button.

The Ignite SQL interpreter requires only the ignite.jdbc.url property that contains the JDBC connection URL. Next, edit ignite.jdbc.url property and set to the following value: jdbc:ignite://localhost:11211/geneCache as shown below.

Then, click "Save" button to update changes in configuration. Be sure to restart interpreter after changes in configuration.

Zeppelin Configuration

Create a New Notebook

Next, click the "Create new note" menu item under "Notebook" to create a new Notebook.

Zeppelin Configuration

Give your notebook the name "GAGridNotebook" and select "ignite" as the default interpreter as shown below. Next, click "Create Note" to continue.

Zeppelin Configuration

Now, your newly created notebook should now be displayed as shown below. In order to execute SQL queries, we must use "%ignite.ignitesql" prefix.

Zeppelin Configuration

Using the Notebook

GA Grid provides improved knowledge discovery by adding custom SQL functions to 'pivot' genetic optimization results. Columns in a result set are dynamically driven by Chromosome size for an individual genetic sample.

GA Grid Custom SQL Functions:

The following SQL functions are available in GA Grid's distributive 'geneCache':

Function Name Description
getSolutionsDesc() Retrieve optimal solutions in descending order based on fitness score
getSolutionsAsc() Retrieve optimal solutions in ascending order based on fitness score
getSolutionById(key) Retrieve an optimal solution by Chromosome key

To begin using your "GAGridNotebook", start a standalone Ignite node:

$ bin/ignite.sh GAGrid_HOME/config/ga-config.xml

Next, run the HelloWorldGATest example:

mvn test -Dignite.version={ignite.version} -Dtest=HelloWorldGATest

Once GA Grid has begun generating optimal solutions for the first generation you may begin querying data.

1st Generation:

 ##########################################################################################
[13:55:22,589][INFO][main][] Generation: 1
[13:55:22,596][INFO][main][] Fittest is Chromosome Key: Chromosome [fitnessScore=3.0, id=460, genes=[8, 3, 21, 5, 2, 22, 1, 19, 18, 12, 9]]
[13:55:22,596][INFO][main][] Chromosome: Chromosome [fitnessScore=3.0, id=460, genes=[8, 3, 21, 5, 2, 22, 1, 19, 18, 12, 9]]
[13:55:22,598][INFO][main][] HCUEBVASRLI
[13:55:22,598][INFO][main][] Avg Chromosome Fitness: 0.408

In Zeppelin paragraph window, simply type the following SQL and click the execute button:

%ignite.ignitesql call "geneCache".getSolutionsDesc();

Zeppelin Configuration

After a few seconds, you will see a result similar to the the display below. For the HellowWorldGATest example, GA Grid will maintain a population of 500 solutions for each generation. Also, solutions in this example will contain a total of '11' genes and highest fitness score are considered 'most fit'. In the display below, the highlighted Chromosome record is the best solution because it has a fitness score of "9". If you wait a few seconds and re-execute query, you will see solution continue to evolve until the phrase "HELLO WORLD" is found.

Zeppelin Configuration

After several generations, a GA Grid calculates a final solution as displayed below.

Zeppelin Configuration

By clicking the bar chart icon, the SQL query results can be displayed as a Bar Chart as shown below:

Zeppelin Configuration

Once you know the Chromosome key, you can search for an individual record. The display below depicts how to search for a single Chromosome record.

%ignite.ignitesql call "geneCache".getSolutionById(289);

Zeppelin Configuration

Additional SQL for Data Analysis

#The following SQL retrieves all genes stored in Gene pool.

%ignite.ignitesql select * from "geneCache".Gene;

Zeppelin Configuration

#The following SQL retrieves average fitness score among all Chromosomes in the current generation.

%ignite.ignitesql select AVG(FITNESSSCORE) from "populationCache".Chromosome;

Zeppelin Configuration

Chromosome is a sequence of Genes. A Chromosome represents a potential solution.

Crossover is the process in which the genes within chromosomes are combined to derive new chromosomes.

Fitness Score is a numerical score that measures the value of a particular Chromosome (ie: solution) relative to other Chromosome in the population.

Gene is the discrete building blocks that make up the Chromosome.

Genetic Algorithm (GA) is a method of solving optimization problems by simulating the process of biological evolution. A GA continuously enhances a population of potential solutions. With each iteration, a GA selects the 'best fit' individuals from the current population to create offspring for the next generation. After subsequent generations, a GA will "evolve" the population toward an optimal solution.

Mutation is the process where genes within a chromosomes are randomly updated to produce new characteristics.

Population is the collection of potential solutions or Chromosomes.

Selection is the process of choosing candidate solutions (Chromosomes) for the next generation.

About

GA Grid (Beta) is a distributive in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving complex optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages