I have seen a large discrepancy between identical PINO training running on 4090 + torch 2.6.0 and 5090 + torch 2.8.0. After some investigation, I found the cause to be the tensor core optimization that cuDNN uses. For my case, I was able to fix this (and gain substantial improvement in accuracy) by setting:
torch.backends.cudnn.allow_tf32 = False
for torch < 2.9.0, and
torch.backends.fp32_precision = "ieee"
for torch >= 2.9.0
I have seen a large discrepancy between identical PINO training running on 4090 + torch 2.6.0 and 5090 + torch 2.8.0. After some investigation, I found the cause to be the tensor core optimization that cuDNN uses. For my case, I was able to fix this (and gain substantial improvement in accuracy) by setting:
for torch < 2.9.0, and
for torch >= 2.9.0