Inspiration

The inspiration for this project was my computer science teacher, who stated that he needed some form or plagiarism checker between students, as he started to look into many similar files of code. He said that students were able to change the variable names and order of some of the lines, however it was still clearly not their own work, and he would have to go through everyone's code from now on.

What it does

The job of the program is to take in two files of text, either of writing or of code, and return a "percent match".

How we built it

The program looks over two specified files and compares each line of the first file to every line of the second one, altering the percent match as it goes. The checking is done by breaking down each line into strings of comparable words, usually items separated by spaces.

Challenges we ran into

One of the challenges I ran into was overly high matches on two files that were fairly distinct in content. In other words, the program would often return a strikingly high percent for files that were most likely not plagiarized. I realized that this was due to the fact that common words would increase the percent such as "and", "the", "a", etc. The program was simply looking instead of understanding.

Accomplishments that we're proud of

I was proud of how I was able to combat the challenge I mentioned above, and make the program look at content of the two files as opposed to simply words. In other words, I started to make the program understand meaning. To do this, I created a "most common words" file. This file contains around 100 words that don't really add meaning to writing, and are found in almost every text. When the code came across these words, it would ignore it, and match more off meanings. This highly increased the reliability and accuracy of the program, so it wouldn't increase the match ###percentage over items that are most likely not plagiarism.

What we learned

During the course of this project, I learned that there is really no defined match percent for what can be considered copied or original material. It is always up to the teacher or whoever is viewing the writing, but the program is still beneficial in flagging what has the potential to be such.

What's next for Original Work Detection

Next, I would like to be able to create an app or webpage for the file comparison, as everyone may not understand how to use it as a program. It would also be beneficial if the program could match over multiple files as opposed to comparing just 2 at a time.

Built With

Share this project:

Updates