Inspiration

We are fascinated by the languages of the world and are always interested in learning about new languages! We wanted to combine this interest with machine learning to help settle our curiosity whenever we stumble upon an unknown language.

What it does

It takes a sentence as an input, and it returns the language of the sentence as an ouput.

How we built it

We used a bag of words and Naive Bayes to train and classify sentences into their appropriate language.

Challenges we ran into

Our dataset is very big (contains sentences in 235 languages). Our model was not efficient enough to train on all 235 languages, so instead we reduced the data to the ten most common languages, which allowed us to have an efficient and highly accurate model.

Accomplishments that we're proud of

We have a very high accuracy on our test set!

What we learned

We learned how to apply machine learning to solve a problem that we encounter

What's next for Language Identification

We can increase the number of languages that our model can detect. We can also improve the design of our front end

Share this project:

Updates