Department Mathematik
print


Navigationspfad


Inhaltsbereich

Mathematisches Kolloquium


Am Donnerstag, 29. Juni 2023 um 16:30 Uhr spricht

Holger Rauhut
(RWTH Aachen)

im Hörsaal Raum A027 über das Thema

Implicit bias of gradient descent methods in deep learning

Abstract: Deep neural networks are usually trained by minimizing a non-convex loss functional via (stochastic) gradient descent methods. Unfortunately, the convergence properties are not very well-understood. Moreover, a puzzling empirical observation is that learning neural networks with a number of parameters exceeding the number of training examples often leads to zero loss, i.e., the network exactly interpolates the data. Nevertheless, it generalizes very well to unseen data, which is in stark contrast to intuition from classical statistics which would predict a scenario of overfitting. A current working hypothesis is that the chosen optimization algorithm has a significant influence on the selection of the learned network. In fact, in this overparameterized context there are many global minimizers so that the optimization method induces an implicit bias on the computed solution. It seems that gradient descent methods and their stochastic variants favor networks of low complexity (in a suitable sense to be understood), and, hence, appear to be very well suited for large classes of real data. Initial attempts in understanding the implicit bias phenomen considers the simplified setting of linear networks, i.e., (deep) factorizations of matrices. This has revealed a surprising relation to the field of low rank matrix recovery (a variant of compressive sensing) in the sense that gradient descent favors low rank matrices in certain situations. Moreover, restricting further to diagonal matrices, or equivalently factorizing the entries of a vector to be recovered, shows connection to compressive sensing and l1-minimization. Despite such initial theoretical results on simplified scenarios, the understanding of the implicit bias phenomenon in deep learning is widely open. The talk will give a basic introduction to this topic, highlighting initial results and open problems. Based on Joint works with Bubacarr Bah, Hung-Hsu Chou, Johannes Maly, Ulrich Terstiege, Rachel Ward and Michael Westdickenberg.