Clark Hall, Room 316
Title: An Overview of Computer Vision Research at Comcast
Comcast is one of the largest media companies in the world, analyzing huge amounts of content every second to create engaging experiences formillions of customers. Machine learning and computer vision play an immense role in the research and development of customer-focused products at Comcast. In this talk, we discuss the role of computer vision and deep learning in analyzing and interpreting vast amounts of content in media analytics and smart home applications.
We introduce scalable, efficient and robust computer vision algorithms for retrieving relevant video events among millions of cameras installed in the homes of Comcast customers who opt for homesecurity monitoring. Layout-Induced Video Representation (LIVR) is a novelrepresentation of scene layouts aimed towards the recognition of agent-in-place actions, associated with “who” (object detection, facial and personrecognition) does “what” (action recognition) and “where” (region-of-interest). We also present a novel neural network for motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video.
Computer vision algorithms paired with computational linguistics and audio analytics can provide further insights into multi-modal content. We tackle the problem of semantic chaptering of video in the recorded TV show, allowing the user to automatically navigate through the content such as commercials, intros, and end credits.We introduce a machine learning algorithm that extracts features from audio (e.g., background music), video (e.g., placement of the channel logo), and language (e.g., keywords appearing in the closed captions), and uses gradient boosting, conditional random field, and recurrent neural networks to make an accurate prediction.