Quantitative Text Analysis II

Prof. Madrid-Morales is an outstanding instructor, extremely passionate and helpful. I hope he will continue to offer this great course. — participant from Hong Kong

This course offers students an advanced exploration of computational text analysis methods to systematically extract insights from text data. It builds on the foundational concepts covered in Quantitative Text Analysis I and introduces students to more sophisticated techniques, including unsupervised machine learning, word embeddings, transformer-based models and Large Language Models. Students will learn how to implement these methods using R and apply them to diverse research questions. Students will also gain experience with multilingual datasets and networked representations of text, addressing complex challenges often encountered in advanced text analysis.

This course is the second part in a two-course sequence. It requires participants to be familiar with the material covered by the introductory Quantitative Text Analysis I or have prior experience with basic computational text analysis, preferably using the R programming language.

Dates

This one-week, 17.5-hour course runs Monday-Friday, July 8-12, 2024. The course is scheduled for 1:30-5:00 pm.

Classroom Location 

Faculty of Arts and Social Science, AS1  02-08

Image preview

Instructor

Dani Madrid-Morales, University of Sheffield

Detailed Description

This course provides an applied and advanced introduction to quantitative text analysis, focusing on methods that allow participants to uncover latent structures in text data, work with semantic representations, and analyse multilingual or networked corpora. Designed for students with some experience in R and basic computational text analysis, the course helps learners extend their skillset with techniques for handling larger and more complex datasets.

The main learning objectives of the course are:

  • Employing unsupervised methods, such as structural topic models (STM), to uncover themes in text data.
  • Creating and applying word embeddings to analyse semantic relationships in text.
  • Using transformer-based models (e.g., BERT) for classification and other tasks.
  • Analysing multilingual text datasets and addressing language-specific challenges.
  • Constructing and interpreting text-based networks (e.g., co-occurrence or citation networks).

The course begins with a review of unsupervised text analysis methods, focusing on structural topic models (STM) and clustering techniques to identify latent themes in text data. Participants will then explore word embeddings, learning how they can be used to analyse the meaning and relationships between words in a corpus. These sessions provide a foundation for working with transformer-based models, such as BERT. The latter part of the course covers specialized topics, including multilingual text analysis and network-based text representations.

The course combines lectures with hands-on labs, providing practical experience with modern tools and workflows for analysing text data. Throughout the week, participants will engage in both individual and collaborative exercises to reinforce their learning and apply it to their own research contexts. At the end of the week, students will have the opportunity to share their work with their peers, and receive feedback an guidance on how to take their research projects further.

Prerequisites

We strongly encourage participants to combine this course with the introductory Quantitative Text Analysis I. Alternatively, participants should have prior experience with quantitative text analysis and some familiarity with the statistical software R. Participants may contact the instructor with questions regarding the knowledge required to follow this course.

Requirements

Participants are expected to have access to an internet-connected computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.

Core Readings

Van Atteveldt, Wouter; Trilling, Damian, and Arcila Calderón, Carlos. 2022. Computational Analysis of Communication. Hoboken: Wiley.

Suggested Readings

Grimmer, Justin; Roberts, Margaret E., and Stewart, Brandon M. 2022. Text As Data. Princeton and Oxford: Princeton University Press.

Jurafsky, Daniel, and James H. Martin. 2023. Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Stanford.