Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

HeatWave ML Code for Performance Benchmarks

This set of benchmarks is based around popularly used datasets in Machine Learning fetched from multiple sources.

Software prerequisites:

  1. Python 3.8
  2. MySQL Shell

Download and Preprocess the datasets to the current directory

Click on the link below to download the respective benchmark. You can also use wget from the command line.

airlines

bank_marketing

cnae-9

connect-4

fashion_mnist

nomao

numerai

higgs

census

titanic

creditcard

appetency

twitter

nyc_taxi

news_popularity

black_friday

mercedes

diamonds

After you have downloaded a benchmark, run the preprocess.py script with the benchmark name as below

python3 sql/preprocess.py --benchmark <name>

Running a benchmark

Launch MySQL Shell as below

mysqlsh user@hostname --mysql --sql

On the mysql-shell prompt, run

> source sql/<benchmark_name>.sql

where <benchmark_name> is a name from the above table. The train and test csvs generated above should be present in the current directory in MySQL Shell. Each SQL file will create the schemas for a benchmark, train a HeatWave ML model on it, and score the model on the test data. The test score will be output at the e end.

Running scalability experiments

In order to run scalability numbers for HeatWave ML, for the benchmarks above, run the ML_TRAIN commands from the sql files above for each benchmark on 1, 2, 4, 8 and 16 nodes. Measure the end-to-end training time (ML_TRAIN time from MySQL client perspective) for each configuration (benchmark + number of nodes). Graphing the number of nodes against the runtime on each node should give the scalability for a benchmark.