April 26, 2019

12:00 pm / 1:15 pm


Hackerman Hall B17 @ 3400 N Charles St, Baltimore, MD 21218, USA

The cocktail party problem, or speech separation, has evaded a solution for decades in speech and audio processing. I have been advocating a new formulation of this old challenge that estimates an ideal time-frequency mask (binary or ratio). This formulation turns the classical signal processing problem into a machine learning problem, and deep neural networks (DNNs) are particularly well-suited for this task due totheir representational capacity. I will describe recent algorithms that employ deep learning for supervised speech separation, including speech enhancement and speaker separation. DNN-based mask estimation elevates speech separation performance to new levels, and produces the first demonstration of substantial speech intelligibility improvements for both hearing-impaired and normal-hearing listeners in background interference. These advances represent big strides towards solving the cocktail party problem.
DeLiang Wang received the B.S. degree and the M.S. degree from Peking (Beijing) University and the Ph.D. degree in 1991 from the Universityof Southern California all in computer science. Since 1991, he has been with the Department of Computer Science & Engineering and the Center for Cognitive and Brain Sciences at The Ohio State University, where he is a Professor and University Distinguished Scholar. He received the U.S. Officeof Naval Research Young Investigator Award in 1996, the 2005 Best Paper Award of IEEE Transactions on Neural Networks, and the 2008 Helmholtz Award from the International Neural Network Society. He is an IEEE Fellow andCo-Editor-in-Chief of Neural Networks.