workshop4

Workshop 4: Introduction to natural language processing and topic modeling with Python

Lecturers

Dr. Nicolò Gozzi & Dr. N. Gizem Bacaksizlar Turbic

Description

Documents and full texts as data have a long history in the social sciences. Besides these, Computational Social Science is also concerned with new forms of text data that can be collected from digital platforms and the web. All such datasets resemble expressions of natural language and bring methods from computational linguistics and machine learning like Natural Language Processing (NLP) and automated content analysis to center stage. In the workshop, we will give an introduction to how text data can be preprocessed and analyzed in Python. In particular, we will discuss how information can be extracted from raw texts using regular expressions, how words can be reduced to their basic forms, what language models are, how they allow us to extract meaningful pieces of symbolic communication like n-grams, how grammatical parts of speech (e.g., nouns, verbs) can be identified, and how all those steps combine into a text preprocessing pipeline. At the end of such a pipeline stands a document-word matrix that is ready for analysis. For analysis, we will introduce Latent Dirichlet Allocation (also called topic modeling), a fully automated content analysis method that reduces the dimensionality of the document-term matrix. It assumes that documents are generated from topics and infers topics as groups of words. As data, we will use a popular text corpus still to be determined. The workshop will alternate between live-coding demonstrations and periods in which participants apply that knowledge in context, both using Jupyter Notebooks. The software we will be using are SpaCy and Gensim, two standard Python libraries for NLP and topic modeling.

Name		Name	Last commit message	Last commit date
parent directory ..
code		code
data		data
slides		slides
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Workshop 4: Introduction to natural language processing and topic modeling with Python

Lecturers

Description

FilesExpand file tree

workshop4

Directory actions

More options

Directory actions

More options

Latest commit

History

workshop4

Folders and files

parent directory

README.md

Workshop 4: Introduction to natural language processing and topic modeling with Python

Lecturers

Description