[Submitted on 10 Nov 2016 (v1), last revised 26 Feb 2017 (this version, v2)]
Summary: Despite their huge dimension, successful deep man made neural networks can
show masks a remarkably dinky distinction between practising and check performance.
Faded wisdom attributes dinky generalization error both to properties
of the model family, or to the regularization tactics venerable at some level of practising.
By means of enormous systematic experiments, we value how these aged
approaches fail to gift why mammoth neural networks generalize neatly in
note. Namely, our experiments set apart that deliver of the art work
convolutional networks for image classification trained with stochastic
gradient programs effortlessly fit a random labeling of the practising files. This
phenomenon is qualitatively unaffected by explicit regularization, and occurs
even supposing we exchange the supreme shots by entirely unstructured random noise. We
corroborate these experimental findings with a theoretical development showing
that easy depth two neural networks enjoy already acquired supreme finite sample
expressivity as soon because the assortment of parameters exceeds the assortment of files
parts as it incessantly does in note.
We interpret our experimental findings by comparison with aged models.