Inspiration

With recent data breaches in big companies like facebook affecting millions of people, and the new GDPR rules implemented by the EU, it is more important than ever to know what data a service you use collects about you. This project is inspired by the work done by tosdr.com, where contributers look through privacy policy and terms of service statements to summarize and assess what a company does with users' data. However, tosdr.com requires real people to read through the statements and write the assessments, hence is difficult to apply to small services that are not well known worldwide.

What it does

Hence, we created Priv.io. It searches through privacy policy statements searching for important statements regarding how the company collects data, what data it collects, how it manages the data, how it shares the data, and how users can control the data. Critical statements are presented to the user for them to decide on whether they agree with such actions of the company. Priv.io thus helps users assess the privacy policies of companies, particularly smaller ones that do not have wider public scrutiny.

How we built it

To get an accurate understanding of which aspects are important, we read through GDPR rules, tosdr.com opinions and several privacy policy statements from Google, Apple, Github, etc. We then wrote an algorithm that searches through the input privacy policy statements to find these aspects, presenting the critical statements to the user.

Challenges we ran into

Different terms, and different forms of words, are used by different companies to describe the same thing. Hence, we needed to account for all of these possibilities while not casting the net too wide such that there is too much for the user to read. We solved this by including as many terms as we could, as well as applying lemmatization to the input. A ranking system is then implemented to decide which sentences are important, and which are not relevant.

Accomplishments that we're proud of

We knew from the beginning a simple summarizing algorithm would not work, since these algorithms favour sentences with words of high occurrence in the entire text. In these privacy policies, some critical aspects like sharing with third parties are only mentioned once or twice.

What we learned

500px.com tries to obfuscate that they sell users' data to third parties.

Rest assured that we will not rent or sell your personal information to anyone and that we will share your personal information only as described below:

Business Transfers: In some cases, we may choose to buy or sell assets. In these types of transactions, user information is typically one of the business assets that is transferred.

What's next for Priv.io

While the algorithm currently assesses the importance of each statement, it does not assess whether it is good or bad and to what degree. Semantic analysis can be added to the algorithm to make Priv.io a complete tool.

Group members

675, 642, 648

Built With

Share this project:

Updates