GPUUtilNet: GPU Hardware Metrics Estimator for Deep Learning Training

This repository contains the artifacts for our work on building deep learning-based estimators for predicting GPU hardware metrics (SMACT, SMOCC, DRAMA) during deep learning model training. Since data is central to this effort, we structured the workflow in several key stages:

Analysis & Modeling: With the cleaned data from GPUMemNet, we performed exploratory analysis and trained various models to estimate GPU hardware metrics.
We explored MLPs for classification.

Repository Structure

GPUUtilNet/
├── Analysis/                           # Data analysis and model training notebooks
│   └── 00-Cleaned-NoteBooks/
│       ├── 001-visualizations/         # Data visualization notebooks
│       └── 002-MLP-based-estimators/   # Model training notebooks
│           ├── Classifiers/MLP/        # Classification models
│           │   ├── MLP_*.ipynb         # MLP dataset models
│           │   ├── CNN_*.ipynb         # CNN dataset models
│           │   └── Transformer_*.ipynb # Transformer dataset models
│           ├── Regressors/             # Regression models
│           └── data/                   # Datasets
│               ├── CNN.csv
│               ├── MLP.csv
│               └── Transformers.csv
├── test/                              # Testing and model evaluation
│   ├── estimator/                     # Trained model checkpoints
│   ├── estimations/                  # Prediction outputs
│   ├── cnn_models/                    # CNN model architectures
│   ├── models/                        # Additional models
│   ├── Trans_models/                  # Transformer models
│   ├── 01-mlp4cnn.py                  # Main CNN estimator script
│   └── trans_test.py                  # Transformer test script
└── README.md

MLP Model Results

The MLP-based classifiers predict GPU hardware utilization levels (low, medium, high) for three different neural network datasets:

Dataset	SMACT (Accuracy)	SMACT (F1-Score)	SMOCC (Accuracy)	SMOCC (F1-Score)	DRAMA (Accuracy)	DRAMA (F1-Score)
MLP	92%	0.91	94%	0.62	88%	0.69
CNN	76%	0.72	71%	0.69	73%	0.73
Transformer	82%	0.65	68%	0.63	62%	0.63

Hardware Metrics

SMACT: SM Activity - measures GPU SM (Streaming Multiprocessor) utilization
SMOCC: SM Occupancy - measures the occupancy of GPU SMs
DRAMA: DRAM Activity - measures GPU memory bandwidth utilization

How to Use

Training/Testing Models

Refer to the notebooks in Analysis/00-Cleaned-NoteBooks/002-MLP-based-estimators.

Test on unseen real-world models

For testing on unseen real-world transformers:

python test/trans_test.py --metric [smact, smocc, drama]

For testing on unseen real-world CNNs:

python test/01-mlp4cnn.py --metric [smact, smocc, drama]

Vision

The potential contributions and improvements to the current study can come from more data points, data points from different GPU models, with broader range of arguments, and also innovations on how to view the GPU hardware metric estimation.

License & Citation

This repository is released for non-commercial academic research purposes only under the following terms:

Code and Notebooks: Custom research-only license. You may use, modify, and share for academic research, but commercial use is prohibited.
Trained Models: Provided for academic evaluation only. Do not use in commercial products or services without explicit permission.
Dataset: Licensed under CC BY-NC 4.0.
Figures and Visualizations: Also under CC BY-NC 4.0.

Citation

If you use this repository (code, models, data, or ideas), you must cite:

GitHub Repository: https://github.com/itu-rad/GPUUtilNet

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Analysis/00-Cleaned-NoteBooks		Analysis/00-Cleaned-NoteBooks
test		test
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPUUtilNet: GPU Hardware Metrics Estimator for Deep Learning Training

Repository Structure

MLP Model Results

Hardware Metrics

How to Use

Training/Testing Models

Test on unseen real-world models

Vision

License & Citation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPUUtilNet: GPU Hardware Metrics Estimator for Deep Learning Training

Repository Structure

MLP Model Results

Hardware Metrics

How to Use

Training/Testing Models

Test on unseen real-world models

Vision

License & Citation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages