Hackerman Hall B17 @ 3400 N Charles St, Baltimore, MD 21218, USA
Based on Komogorov’s Superposition Theorem and Universal Approximation Theorem by Cybenko and Barron, any multivariate function can be approximated by a multi-layer perceptron. We therefore cast classical speech pre-processing problems into a regression setting by learningnonlinear spectral mapping from noisy to clean speech feature vectors based on deep neural networks (DNNs), combining the emerging deep learning and big data paradigms. DNN-enhanced speech usually demonstrates good quality and intelligibility in challenging acoustic conditions. Furthermore, this paradigm facilitates an integrated learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal processing, feature extraction and acoustic modeling, all altogether in a unified manner. The proposed approach was tested on recent challenging ASR tasks in CHiME-2, CHiME-4, CHiME-5 and REVERB, designed to evaluate ASR robustness in mixed speakers, multi-channel, multiple-devices and reverberant conditions, respectively. Leveraging on the top speech qualities achieved in speech separation, microphone array based speech enhancement, real-world recording environments and speech dereverberation, needed for the corresponding speaking conditions, our team scored the lowest word error rates in all four scenarios. More recently, the same approach is extended to enhancement of microphone array speech with multiple sources of interference in reverberant conditions. We also achieved significant gains in speech quality and word error reductions even with a black-box LVCSR system.
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and 30 patents, with more than 35,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including theBell Labs President’s Gold Award in 1998. He won the SPS’s 2006 TechnicalAchievement Award for ?Exceptional Contributions to the Field of Automatic Speech Recognition?. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for ?pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition?.