?A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network?
Computer Science Department
Abstract: Whileover-parameterization is widely believed to be crucial for the success ofoptimization for the neural networks,most existing theories onover-parameterization do not fully explain the reason — they either work inthe Neural Tangent Kernel regime where neurons don’t move much, or require anenormous number of neurons. In practice, when the data is generated using ateacher neural network, even mildly over-parameterized neural networks canachieve 0 loss and recover the directions ofteacher neurons. In this paper wedevelop a local convergence theory for mildly over-parameterized two-layerneural net. We show that as long as the loss is already lower than a threshold(polynomial in relevant parameters), all student neurons in an over-parameterizedtwo-layer neural network will converge to one of teacher neurons, and the losswill go to 0. Our result holds for any number of student neurons as long as itis at least as large as the number of teacher neurons, and our convergence rateis independent of the number of student neurons. A key component of ouranalysis is the new characterization of local optimization landscape — we showthe gradient satisfies a special case of Lojasiewicz property which is different from localstrong convexity or PL conditions used in previous work.
Bio: Rong Ge is an assistant professoratDuke University. He received his Ph.D. from Princeton University, advisedbySanjeev Arora. Before joining Duke Rong Ge was a post-doc atMicrosoft Research New England. Rong Ge’s research focuses onproving theoretical guarantees for modern machine learning algorithms, andunderstanding the optimization for non-convex optimization and in particularneural networks. Rong Ge has received NSF CAREERaward and Sloan Fellowship.