Deep learning relies on massive training sets of labeled examples to learn from – often tens of thousands to millions to reach peak predictive performance. However, large amounts of training data are only available for very few standardized learning problems. Even small variations of the problem specification or changes in the data distribution would necessitate re-annotation of large amounts of data. However, domain knowledge can often be expressed by sets of prototypical descriptions: For example, vision experts can exploit meta information for image labeling, linguists can describe phenomena by prototypical realization patterns, social scientists can specify events of interest by characteristic key phrases, and bio-medical researchers have databases of known interactions between drugs or proteins that can be used for heuristic labeling. This tutorial will present a software framework that provides a unified interface for the modular integration of weak labeling sources with the training of deep neural networks. We hope that this will enable researchers to apply weakly supervised learning to new problems in text analysis.
Benjamin Roth is a joint professor of Computer Science and Philology at the University of Vienna, Austria. His research interests are the extraction of knowledge from text with statistical methods and weakly supervised learning.