This repository contains the artifacts for our work on building deep learning-based estimators for predicting GPU hardware metrics (SMACT, SMOCC, DRAMA) during deep learning model training. Since data is central to this effort, we structured the workflow in several key stages:
- Analysis & Modeling: With the cleaned data from GPUMemNet, we performed exploratory analysis and trained various models to estimate GPU hardware metrics.
- We explored MLPs for classification.
GPUUtilNet/
├── Analysis/ # Data analysis and model training notebooks
│ └── 00-Cleaned-NoteBooks/
│ ├── 001-visualizations/ # Data visualization notebooks
│ └── 002-MLP-based-estimators/ # Model training notebooks
│ ├── Classifiers/MLP/ # Classification models
│ │ ├── MLP_*.ipynb # MLP dataset models
│ │ ├── CNN_*.ipynb # CNN dataset models
│ │ └── Transformer_*.ipynb # Transformer dataset models
│ ├── Regressors/ # Regression models
│ └── data/ # Datasets
│ ├── CNN.csv
│ ├── MLP.csv
│ └── Transformers.csv
├── test/ # Testing and model evaluation
│ ├── estimator/ # Trained model checkpoints
│ ├── estimations/ # Prediction outputs
│ ├── cnn_models/ # CNN model architectures
│ ├── models/ # Additional models
│ ├── Trans_models/ # Transformer models
│ ├── 01-mlp4cnn.py # Main CNN estimator script
│ └── trans_test.py # Transformer test script
└── README.md
The MLP-based classifiers predict GPU hardware utilization levels (low, medium, high) for three different neural network datasets:
| Dataset | SMACT (Accuracy) | SMACT (F1-Score) | SMOCC (Accuracy) | SMOCC (F1-Score) | DRAMA (Accuracy) | DRAMA (F1-Score) |
|---|---|---|---|---|---|---|
| MLP | 92% | 0.91 | 94% | 0.62 | 88% | 0.69 |
| CNN | 76% | 0.72 | 71% | 0.69 | 73% | 0.73 |
| Transformer | 82% | 0.65 | 68% | 0.63 | 62% | 0.63 |
- SMACT: SM Activity - measures GPU SM (Streaming Multiprocessor) utilization
- SMOCC: SM Occupancy - measures the occupancy of GPU SMs
- DRAMA: DRAM Activity - measures GPU memory bandwidth utilization
Refer to the notebooks in Analysis/00-Cleaned-NoteBooks/002-MLP-based-estimators.
For testing on unseen real-world transformers:
python test/trans_test.py --metric [smact, smocc, drama]For testing on unseen real-world CNNs:
python test/01-mlp4cnn.py --metric [smact, smocc, drama]The potential contributions and improvements to the current study can come from more data points, data points from different GPU models, with broader range of arguments, and also innovations on how to view the GPU hardware metric estimation.
© 2025. All rights reserved.
This repository is released for non-commercial academic research purposes only under the following terms:
- Code and Notebooks: Custom research-only license. You may use, modify, and share for academic research, but commercial use is prohibited.
- Trained Models: Provided for academic evaluation only. Do not use in commercial products or services without explicit permission.
- Dataset: Licensed under CC BY-NC 4.0.
- Figures and Visualizations: Also under CC BY-NC 4.0.
If you use this repository (code, models, data, or ideas), you must cite:
GitHub Repository: https://github.com/itu-rad/GPUUtilNet