Colloquium| Institute of Mathematical Sciences
Time:16:00-17:00, June 24, Mondday
Location:Room S407, IMS
Speaker: Zhouwang Yang, University of Science and Technology of China
Abstract: Deep neural networks have recently demonstrated an amazing performance on many challenging tasks. Overfitting is one of the notorious features for DNNs. Empirical evidence suggests that inducing sparsity can relieve overfitting, and weight normalization can accelerate the algorithm convergence. In this talk, we report our work on L_{1,\infty}-weight normalization for deep neural networks with bias neurons to achieve the sparse architecture. We theoretically establish the generalization error bounds for both regression and classification under the L_{1,\infty}-weight normalization. It is shown that the upper bounds are independent of the network width and sqrt(k)-dependence on the network depth k, which are the best available bounds for networks with bias neurons. These results provide theoretical justifications on the usage of such weight normalization to reduce the generalization error. We also develop an easily implemented gradient projection descent algorithm to practically obtain a sparse neural network. We perform various experiments to validate our theory and demonstrate the effectiveness of the resulting approach.