Public event

Data Science Brown Bag Series: Analysing Political Texts with Less Data and Deep Transfer Learning

Join us for a talk by Moritz Laurer on exploring the advantages of deep transfer learning, particularly using BERT models, in analyzing political text data.

Abstract from speaker: 

Supervised machine learning is an increasingly popular tool for analyzing large political text corpora. The main disadvantage of supervised machine learning is the need for thousands of manually annotated training data points. This issue is particularly important in the social sciences where most new research questions require new training data for a new task tailored to the specific research question. This presentation discusses how deep transfer learning can help address this challenge by accumulating “prior knowledge” in language models. Models like BERT can learn statistical language patterns through pre-training (“language knowledge”), and reliance on task-specific data can be reduced by training on universal tasks like natural language inference (NLI; “task knowledge”). We demonstrate the benefits of transfer learning on a wide range of eight tasks. Across these eight tasks, our BERT-NLI model fine-tuned on 100 to 2,500 texts performs on average 10.7 to 18.3 percentage points better than classical models without transfer learning. Our study indicates that BERT-NLI fine-tuned on 500 texts achieves similar performance as classical models trained on around 5,000 texts. Moreover, we show that transfer learning works particularly well on imbalanced data. We conclude by discussing limitations of transfer learning and by outlining new opportunities for social science research. The presentation is based on a recently published paper (https://doi.org/10.1017/pan.2023.20) and follow-up experiments by one of the authors.

Bring your own lunch bag! Light pastries and drinks will be available in case you forget to bring it. 

The Data Science Brown Bag Series is an informal and interactive gathering where participants bring their own brown bag lunch and engage in discussions on research and insights the field of data and computational social science (light pastries and drinks will be available if you forget your lunch bag!). 

The series provides a platform for data enthusiasts, researchers, and practitioners to share their experiences, best practices, and emerging methodologies and research in using data science to analyze and understand social and political phenomena. The brown bag talk series is for anyone interested in data science and social science to network, learn, and share ideas in a casual and friendly setting.