Language Identification

Input a random sentence
Get the language identified!

Inspiration

We are fascinated by the languages of the world and are always interested in learning about new languages! We wanted to combine this interest with machine learning to help settle our curiosity whenever we stumble upon an unknown language.

What it does

It takes a sentence as an input, and it returns the language of the sentence as an ouput.

How we built it

We used a bag of words and Naive Bayes to train and classify sentences into their appropriate language.

Challenges we ran into

Our dataset is very big (contains sentences in 235 languages). Our model was not efficient enough to train on all 235 languages, so instead we reduced the data to the ten most common languages, which allowed us to have an efficient and highly accurate model.