Skip to content

LukeFarrell/Word-Embeddings

Repository files navigation

Word-Embeddings (In progress)

This project combines webscrapping, the gensim word embedding libary, and vector analysis to collect large text corpuses, build a novel word embedding, and quantify certain asepects of the embeddings dimensionality.

Specifically my goal was to analyze racial, gender, and religious bias in word embedding models constructed from different media corpuses I am actively collecting. Fox News, CNN, and MSNBC are the three media groups I am currently collecting data on. From there I have build a program to parse the text corpuses, build word vector spaces, and then programatically analyze the racial biases present in the language used on air.

From my findings I hope to illuminate the underlying biases present in specific news media not through single instances of bias, but rather by quantifying the structure of the language used on air and the bias mathematically defined within it.

(Updates to text corpuses and analysis will continue through August 2017)

About

Analyzing racial, gender, and religious bias in word embedding models constructed from different media corpuses

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages