Skip to content

ryanjguo/WLIT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

WLIT - What Language Is This?

Authors:
Ethan Huynh, Sahitya Nulu, Ryan Guo, Roshan Dewmima
What is WLIT?

WLIT is a beginner hackathon project created to detect what language a given input is written in, using artificial intelligence. The languages that it can currently detect are English, French, Spanish, Portuguese, Italian, German, Swedish, Danish, Dutch, Russian, and Greek. The languages it can currently detect are limited by the dataset we used, as well as the modules we implemented.

A video demo of the product can be seen here

DevPost : here

Dataset

The dataset that we chose for this project is the language detection dataset by Basil Saji from Kaggle. According to Basil he "Collected the data from Wikipedia by scraping using BeautifulSoup python library".

Methodology

To read and manipulate our data, we used the _pandas_ module. To filter out any unwanted characters, we employed the _re_ module. We used the spaCy module, that provides us with information about each language, such as the stopwords, lemmatization, and punctuation. Using the spaCy modules, we removed any punctuation and stopwords ("the", "a"), as well as lemmatized ("is" -> "be", "going" -> "go") each word. After finishing pre-processing, we used _sklearn_ to convert individual words into a vector counterpart. We further used _sklearn_ to identify which language our input is written in. Overall, the project was coded using Python on Google Colabs.

In the future, we hope to expand to more languages, with better accuracy with implementations such as detecting spelling errors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors