sql

HeatWave ML Code for Performance Benchmarks

This set of benchmarks is based around popularly used datasets in Machine Learning fetched from multiple sources.

Software prerequisites:

Download and Preprocess the datasets to the current directory

Click on the link below to download the respective benchmark. You can also use wget from the command line.

airlines

https://www.openml.org/data/get_csv/66526/phpvcoG8S.csv

bank_marketing

https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip

cnae-9

https://www.openml.org/data/get_csv/1586233/phpmcGu2X.csv

connect-4

https://archive.ics.uci.edu/ml/machine-learning-databases/connect-4/connect-4.data.Z

fashion_mnist

nomao

https://archive.ics.uci.edu/ml/machine-learning-databases/00227/Nomao.zip

numerai

https://www.openml.org/data/get_csv/2160285/phpg2t68G.csv

higgs

https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz

census

titanic

https://www.openml.org/data/get_csv/16826755/phpMYEkMl.csv

creditcard

http://www.ulb.ac.be/di/map/adalpozz/data/creditcard.Rdata

appetency

https://www.openml.org/data/get_csv/53994/KDDCup09_appetency.arff

twitter

https://archive.ics.uci.edu/ml/machine-learning-databases/00248/regression.tar.gz

nyc_taxi

https://www.openml.org/data/get_csv/22044763/dataset.csv

news_popularity

https://www.openml.org/data/get_csv/22044756/dataset.csv

black_friday

https://www.openml.org/data/get_csv/21230845/file639340bd9ca9.arff

mercedes

https://www.openml.org/data/get_csv/21854646/dataset.csv

diamonds

https://www.openml.org/data/get_csv/21792853/dataset.csv

After you have downloaded a benchmark, run the preprocess.py script with the benchmark name as below

python3 sql/preprocess.py --benchmark <name>

Running a benchmark

Launch MySQL Shell as below

mysqlsh user@hostname --mysql --sql

On the mysql-shell prompt, run

> source sql/<benchmark_name>.sql

where <benchmark_name> is a name from the above table. The train and test csvs generated above should be present in the current directory in MySQL Shell. Each SQL file will create the schemas for a benchmark, train a HeatWave ML model on it, and score the model on the test data. The test score will be output at the e end.

Running scalability experiments

In order to run scalability numbers for HeatWave ML, for the benchmarks above, run the ML_TRAIN commands from the sql files above for each benchmark on 1, 2, 4, 8 and 16 nodes. Measure the end-to-end training time (ML_TRAIN time from MySQL client perspective) for each configuration (benchmark + number of nodes). Graphing the number of nodes against the runtime on each node should give the scalability for a benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

HeatWave ML Code for Performance Benchmarks

Software prerequisites:

Download and Preprocess the datasets to the current directory

Running a benchmark

Running scalability experiments

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
preprocess.py		preprocess.py
table_classification_airlines.sql		table_classification_airlines.sql
table_classification_appetency.sql		table_classification_appetency.sql
table_classification_bank_marketing.sql		table_classification_bank_marketing.sql
table_classification_census.sql		table_classification_census.sql
table_classification_cnae-9.sql		table_classification_cnae-9.sql
table_classification_connect-4.sql		table_classification_connect-4.sql
table_classification_creditcard.sql		table_classification_creditcard.sql
table_classification_fashion_mnist.sql		table_classification_fashion_mnist.sql
table_classification_higgs.sql		table_classification_higgs.sql
table_classification_nomao.sql		table_classification_nomao.sql
table_classification_numerai.sql		table_classification_numerai.sql
table_classification_titanic.sql		table_classification_titanic.sql
table_regression_black_friday.sql		table_regression_black_friday.sql
table_regression_diamonds.sql		table_regression_diamonds.sql
table_regression_mercedes.sql		table_regression_mercedes.sql
table_regression_news_popularity.sql		table_regression_news_popularity.sql
table_regression_nyc_taxi.sql		table_regression_nyc_taxi.sql
table_regression_twitter.sql		table_regression_twitter.sql

FilesExpand file tree

sql

Directory actions

More options

Directory actions

More options

Latest commit

History

sql

Folders and files

parent directory

README.md

HeatWave ML Code for Performance Benchmarks

Software prerequisites:

Download and Preprocess the datasets to the current directory

Running a benchmark

Running scalability experiments