
Global Convergence of Threelayer Neural Networks in the Mean Field Regime
In the mean field regime, neural networks are appropriately scaled so th...
read it

On Sparsity in Overparametrised Shallow ReLU Networks
The analysis of neural network training beyond their linearization regim...
read it

A Riemannian Mean Field Formulation for Twolayer Neural Networks with Batch Normalization
The training dynamics of twolayer neural networks with batch normalizat...
read it

Normalization effects on shallow neural networks and related asymptotic expansions
We consider shallow (single hidden layer) neural networks and characteri...
read it

An analytic theory of shallow networks dynamics for hinge loss classification
Neural networks have been shown to perform incredibly well in classifica...
read it

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training
The mean field (MF) theory of multilayer neural networks centers around ...
read it

Beyond Scaling: Calculable Error Bounds of the PowerofTwoChoices MeanField Model in HeavyTraffic
This paper provides a recipe for deriving calculable approximation error...
read it
A Dynamical Central Limit Theorem for Shallow Neural Networks
Recent theoretical work has characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic regime called the meanfield limit as the number of parameters tends towards infinity. At initialization, the randomly sampled parameters lead to a deviation from the meanfield limit that is dictated by the classical Central Limit Theorem (CLT). However, the dynamics of training introduces correlations among the parameters, raising the question of how the fluctuations evolve during training. Here, we analyze the meanfield dynamics as a Wasserstein gradient flow and prove that the deviations from the meanfield limit scaled by the width, in the widthasymptotic limit, remain bounded throughout training. In particular, they eventually vanish in the CLT scaling if the meanfield dynamics converges to a measure that interpolates the training data. This observation has implications for both the approximation rate and the generalization: the upper bound we obtain is given by a MonteCarlo type resampling error, which does not depend explicitly on the dimension. This bound motivates a regularizaton term on the 2norm of the underlying measure, which is also connected to generalization via the variationnorm function spaces.
READ FULL TEXT
Comments
There are no comments yet.