Inspiration

I wanted to make a meaningful contribution to PyTorch — one of the world's most widely used deep learning frameworks — while learning how it works under the hood at the C++ level.

What it does

This project documents the reproduction of two real open PyTorch bugs and answers a community question on the PyTorch discussion forum.

How we built it

Using Google Colab, I wrote minimal Python scripts to reproduce open GitHub issues and verify the bugs independently. I also engaged with the PyTorch discussion forum to help other users.

Contributions:

  • #175866 — Reproduced silent precision loss in torch.nested.to_padded_tensor for 64-bit integer padding values
  • #175953 — Reproduced torch.set_default_device being ignored by torch.compile on CUDA
  • Forum answer — Explained "unbound" method terminology in PyTorch's C extension and got approved post within 48 hours

Challenges we ran into

Understanding why precision loss occurs at the C++ level, and identifying which issues were reproducible without complex hardware setups.

Accomplishments that we're proud of

Confirming that padding value -7704671799122851728 is silently corrupted to -7704671799122851840 — a difference of 112 — due to float64 mantissa truncation in the ATen dispatcher.

What we learned

  • How PyTorch's ATen dispatcher types parameters in C++ (float vs Scalar)
  • How float64 mantissa limits (53 bits) silently corrupt 64-bit integers
  • How torch.compile's FakeTensor tracing handles device propagation differently from eager mode
  • How to engage with a large open source codebase as a first-time contributor

What's next for PyTorch Bug Hunter

Attempt a fix PR for #175866 by changing the padding parameter type from float to Scalar in the ATen dispatcher definition.

Built With

Share this project:

Updates