This directory contains Slurm job scripts for running experiments on the Klab HPC infrastructure.
Trains an RNN model with temporal stability loss as an alternative objective to energy efficiency.
Usage:
-
Update the following placeholders in the script:
[email protected]→ your actual email addressyour_env_name→ your conda environment nameyour_username→ your username on the cluster
-
Make the script executable:
chmod +x jobscripts/submit_temporal_stability.sh
-
Submit the job:
sbatch jobscripts/submit_temporal_stability.sh
-
Monitor the job:
squeue -u $USER
Parameters:
- Partition:
klab-gpu(uses H100 80GB GPU) - Resources: 8 CPUs, 64GB RAM, 1 GPU
- Time limit: 48 hours
- Model: Temporal stability loss with L2 objective
- Dataset: MS-COCO with DeepGaze fixations
Output:
- Logs:
logs/temporal_stability_<job_id>.out/err - Models:
models/temporal_stability/ - Wandb logging enabled
- Single experiment: Use H100 GPU (
klab-gpupartition) - Multiple experiments: Use L40S GPUs (
klab-l40spartition) to run parallel jobs - CPU-only baseline: Use
klab-cpupartition
Check GPU usage during training:
watch -n 1 nvidia-smiView job logs in real-time:
tail -f logs/temporal_stability_<job_id>.out