Abstract
Training deep neural networks (DNNs) is predominantly carried out using stochastic gradient descent and its variants.
While these methods are robust and widely applicable, their convergence often deteriorates for large-scale, ill-conditioned, or stiff problems commonly encountered in scientific machine learning.
This has motivated the development of more advanced training strategies that can accelerate convergence, offer better parallelism, enable convergence control, and facilitate the automatic tuning of hyperparameters.
To this end, we introduce a novel training framework for DNNs inspired by nonlinear multilevel and domain-decomposition (ML-DD) methods.
Starting from deterministic ML-DD algorithms, we will discuss how to ensure the convergence in the presence of the subsampling noise.
Moreover, we will present several strategies for constructing a hierarchy of subspaces by exploring the properties of the network architecture, data representation, and the loss function.
The performance of the proposed ML-DD training algorithms will be demonstrated through a series of numerical experiments from the field of scientific machine learning, such as physics-informed neural networks or operator learning approaches.
[1] Gratton, S., Kopaničáková, A., & Toint, P. L. (2023). Multilevel objective-function-free optimization with an application to neural networks training. SIAM Journal on Optimization, 33(4), 2772-2800.
[2] Gratton, S., Kopaničáková, A., & Toint, P. (2025). Recursive bound-constrained AdaGrad with applications to multilevel and domain decomposition minimization. arXiv preprint arXiv:2507.11513.