Clark Hall, Room 316
12:00 – 1:00 Seminar
1:00 – 1:30 Lunch
?The Fastest L1,oo Prox in the West?
Abstract: Proximal operators are of particular interestin optimization problems dealing with non-smooth objectives because in many practical cases they lead to optimization algorithms whose updates can becomputed in closed form or very efficiently. A well-known example is the proximal operator of the vector L1norm, which is given by the soft-thresholding operator. In many applications, one is also interested in enforcing some additional structure (i.e., block-sparsity) to the variables of interest, which can be done using mixed norms such as the Lp,1 matrix norm (p >1). In particular, the Loo,1 norm has been shown to be useful for estimating a set of covariate regressors in problems such as multi-task learning, and representative (exemplar) selection. A general formulation for these type of problems is to minimize some convex loss function subject to norm constraints. However, in solving these problems one faces the main computational challenge of how to project onto the Loo,1 norm.
In this talk, we address the problem of efficiently projecting onto the mixed Loo,1 norm via the computation of the proximal operator of its dual norm–the mixed L1,oo norm. Weshow that it can be computed in “closed form” by applying the well-known soft-thresholding operator to each column of the matrix. However,unlike the vector L1norm case where the threshold is constant, in the mixed L1,oomatrixnorm case each column of the matrix might require a different threshold and all thresholds depend on the given matrix. We propose a general iterative algorithm for computing these thresholds, as well as two efficient implementations that further exploit easy to compute lower bounds fortheL1,oonorm of the optimal solution. Experiments on large-scale synthetic and real data indicate that the proposed methods can be orders of magnitude faster than state-of-the-art methods. We further illustrate the use of the Loo,1 norm as a feature selection mechanism for the problem of cancer classification from gene expression data (e.g., problems in the low samplesize high dimensionality regime).