| Author | Maria Smirnova |
| Consultant | Alexey Kravatskiy |
| Advisor | Dmitry Kovalev, PhD |
SignSGD is a popular algorithm due to its strong performance in both centralized and decentralized settings: it performs competitively with Adam while being communication-efficient, transmitting up to 32x fewer bits than standard optimizers. This work introduces SignMuon, an algorithm that combines the structured LMO-based update of Muon with the sign compression of SignSGD. Our experiments show that SignMuon achieves performance nearly on par with Muon in the centralized setting and outperforms SignSGD in both centralized and federated settings. We establish theoretical convergence guarantees for SignMuon in the smooth non-convex regime. We further analyze specific modifications to the algorithm that enhance numerical stability without compromising communication efficiency. Finally, this work extends our framework to the broader class of SignA algorithms, where A denotes any LMO-based optimizer, providing a unified convergence analysis for sign-compressed LMO methods. Empirical validation is conducted on synthetic convex and nonconvex problems with known smoothness constants, CIFAR-airbench, federated MNIST/CIFAR-10 classification, and NanoGPT training.
The example of running code in the centralized setting:
python3 -m main --dataset cifar10 --optimizer signmuon --data data --device cuda:1 --epochs 50
The example of running code in the federated setting:
python3 -m federated_main --model cnn2 --dataset cifar10 --algorithm signmuon --rounds 2000 --n_parties 10 --n_steps 3 --batch_size 64 --device cuda:3 --eval_freq 100