Skip to content

itu-rad/GPUUtilNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPUUtilNet: GPU Hardware Metrics Estimator for Deep Learning Training

This repository contains the artifacts for our work on building deep learning-based estimators for predicting GPU hardware metrics (SMACT, SMOCC, DRAMA) during deep learning model training. Since data is central to this effort, we structured the workflow in several key stages:

  • Analysis & Modeling: With the cleaned data from GPUMemNet, we performed exploratory analysis and trained various models to estimate GPU hardware metrics.
  • We explored MLPs for classification.

Repository Structure

GPUUtilNet/
├── Analysis/                           # Data analysis and model training notebooks
│   └── 00-Cleaned-NoteBooks/
│       ├── 001-visualizations/         # Data visualization notebooks
│       └── 002-MLP-based-estimators/   # Model training notebooks
│           ├── Classifiers/MLP/        # Classification models
│           │   ├── MLP_*.ipynb         # MLP dataset models
│           │   ├── CNN_*.ipynb         # CNN dataset models
│           │   └── Transformer_*.ipynb # Transformer dataset models
│           ├── Regressors/             # Regression models
│           └── data/                   # Datasets
│               ├── CNN.csv
│               ├── MLP.csv
│               └── Transformers.csv
├── test/                              # Testing and model evaluation
│   ├── estimator/                     # Trained model checkpoints
│   ├── estimations/                  # Prediction outputs
│   ├── cnn_models/                    # CNN model architectures
│   ├── models/                        # Additional models
│   ├── Trans_models/                  # Transformer models
│   ├── 01-mlp4cnn.py                  # Main CNN estimator script
│   └── trans_test.py                  # Transformer test script
└── README.md

MLP Model Results

The MLP-based classifiers predict GPU hardware utilization levels (low, medium, high) for three different neural network datasets:

Dataset SMACT (Accuracy) SMACT (F1-Score) SMOCC (Accuracy) SMOCC (F1-Score) DRAMA (Accuracy) DRAMA (F1-Score)
MLP 92% 0.91 94% 0.62 88% 0.69
CNN 76% 0.72 71% 0.69 73% 0.73
Transformer 82% 0.65 68% 0.63 62% 0.63

Hardware Metrics

  • SMACT: SM Activity - measures GPU SM (Streaming Multiprocessor) utilization
  • SMOCC: SM Occupancy - measures the occupancy of GPU SMs
  • DRAMA: DRAM Activity - measures GPU memory bandwidth utilization

How to Use

Training/Testing Models

Refer to the notebooks in Analysis/00-Cleaned-NoteBooks/002-MLP-based-estimators.

Test on unseen real-world models

For testing on unseen real-world transformers:

python test/trans_test.py --metric [smact, smocc, drama]

For testing on unseen real-world CNNs:

python test/01-mlp4cnn.py --metric [smact, smocc, drama]

Vision

The potential contributions and improvements to the current study can come from more data points, data points from different GPU models, with broader range of arguments, and also innovations on how to view the GPU hardware metric estimation.

License & Citation

© 2025. All rights reserved.

This repository is released for non-commercial academic research purposes only under the following terms:

  • Code and Notebooks: Custom research-only license. You may use, modify, and share for academic research, but commercial use is prohibited.
  • Trained Models: Provided for academic evaluation only. Do not use in commercial products or services without explicit permission.
  • Dataset: Licensed under CC BY-NC 4.0.
  • Figures and Visualizations: Also under CC BY-NC 4.0.

Citation

If you use this repository (code, models, data, or ideas), you must cite:

GitHub Repository: https://github.com/itu-rad/GPUUtilNet

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages