Hackerman B17 @ 3400 N. Charles Street, Baltimore, MD 21218
Cross-Lingual Transfer of Natural Language Processing Systems
Accurate natural language processing systems rely heavily on annotated datasets. In the absence of such datasets, transfer methods can help to develop a model by transferring annotations from one or more rich-resource languages to the target language of interest. These methods are generally divided into two approaches: 1) annotation projection from translation data, aka parallel data, using supervised models in rich-resource languages,and 2) direct model transfer from annotated datasets in rich-resource languages.
In this talk, we present different methods for transfer of syntactic and semantic dependency parsers. We propose an annotation projection method that performs well in the scenarios for which a large amount of in-domain parallel data is available. We also propose a method which is a combination of annotation projection and direct transfer that can leverage a minimal amount of information from a small out-of-domain parallel dataset to develop highly accurate transfer models. Furthermore, we present an unsupervised syntactic reordering model to improve the accuracy of dependency parser transfer for non-European languages. Finally, we propose a semantic transfer method based on multi-task learning by leveraging supervised syntactic information in the target language of interest.
Mohammad Sadegh Rasooli is a research scientist at the language and translation technologies (LATTE) team at Facebook AI. He completed a Ph.D. of computer science from Columbia University with Michael Collins as his advisor in 2018.