Clark Hall 314
Understanding Deep Neural Networks with Rectified Linear Unitsn
We will begin by giving a precise characterization of the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). In this characterization, we will also attempt to give a gradation within this family of functions based on size and depth of the network. As a consequence of our classification, we give polynomial time algorithms for training ReLU DNN with one hidden layer to *global optimality*, fora fixed size and number of inputs; in other words, the running time is polynomial in the number of data points to be trained on, assuming the sizeand number of inputs to the network as fixed constants.
We will next investigate the depth v/s size trade-off for such neural networks. In particular, we will construct a smoothly parameterized family of R -> R “hard” functions that lead to an exponential blow-up in size, if the number of layers is decreased by a small amount. An example consequence of our gap theorem is that for every natural number N, there exists a smoothly parameterized family of R-> R functions representable by ReLU DNNs with N^2 hidden layers and total size N^3, such that any ReLU DNN with depth at most N hiddenlayers will require at least 1/2*(N^N)?1 total nodes to represent any function from this class.
Finally, we construct a family of R^n -> R functions for n >= 2 (also smoothly parameterized), whose number of affine pieces scales exponentially with the dimension ‘n’ at any fixed size and depth. To the best of our knowledge, such a construction with exponential dependence on ‘n’ has not been achieved by previous families of “hard” functions in the neural nets literature.
The mathematical tools used to obtain theabove results range from tropical geometry to the theory of zonotopes frompolyhedral theory.
This is a joint work with Raman Arora, Poorya Mianjy and Anirbit Mukherjee.
Bio: Amitabh Basu is an assistant professor in the Dept. of Applied Mathematics and Statisticsat Johns Hopkins University since 2013. He was a visiting assistant professor in the Dept. of Mathematics at University of California, Davis, from 2010-2013. He obtained his Ph.D. in 2010 from Carnegie Mellon University with a thesis on mixed-integer optimization. He received an M.S. in Computer Science from Stony Brook University in 2006, and a bachelor’s in Computer Science from Indian Institute of Technology, Delhi in 2004.
Amitabh Basu is the recipient of the NSF CAREER award in 2015. He was one of the threefinalists for the A.W. Tucker Prize 2012 awarded by the Mathematical Optimization Society. Basu serves as an Associate Editor for the journal Mathematics of Operations Research. He also currently serves as Vice Chair for theInteger and Discrete Optimization cluster within the INFORMS Optimization Society.