GIF
Demo

CodeWatch

Project Description

CodeWatch is an AI code detection tool that applies recently published research techniques in order to classify code as either synthetic or human written.

Intuition

Generative LLMs "write" code by continuously predicting the next most likely token. We can leverage this in order to detect code whether code is likely AI generated or human generated. If we ask an LLM to write code to answer a certain prompt, it would likely have similar answers across different responses because it would be constantly picking on of the few most likely tokens.

Humans, however, don't think the same way and typically have somewhat unique style of writing code. If we ask an LLM to rewrite this code, it typically ditches the structure of human code and changes quite a lot of structure (because it uses its token prediction method for generation). AI generated code typically follows a very similar structure across different iterations, and if the code doesn't change much when we rewrite it, it is likely AI generated.

How it works?

We take in source code that can be either human-written or AI-generated. Our task is to classify the code appropriately
We ask an LLM to understand the functionality of the code and then rewrite it by itself
We now have two code snippets, one that could be generated by AI/Humans and one that is defintiely generated by an AI.
We rewrite our code k times. We used k=2 in order to avoid rate limits but k >= 4 is idea according to the paper.
We create embeddings of all of the code snippets that we now have. Then we apply cosine similarity to conduct comparisons of the embeddings. We compare the difference between the original code (reminder, this can be AI generated or human written) rewrites and the AI code rewrites. If there is substantial deviation, it is likely human written. If the deviation is small, then it is most likely AI geneated

Features

Implementation of novel research techniques

The paper we implemented was published earlier this year and to our knowledge, there exists no product that implements such methods.

Fine-tuning of embedding models to work better with code

We fine-tuned GraphCodeBert in order to create more useful embeddings of code snippets

Cosine Similarity

We compared vectors in embedded spaces with cosine similarity to get a gauge of how similar different code snippets are with one another. We then used this to classify code as either synthetic or human-written.

Tech Stack

Frontend: React.js, TypeScript
Backend: Python, Flask
AI/ML: PyTorch, Transformers, CodeT5+

Team Members

Zhen Tao Pan
Ashraful Mahin
Ataur Muhith
Nakib Abedin

Acknowledgments

OpenAI GPT-4 for code analysis
CodeT5+ for code embeddings
React and TypeScript for frontend framework
Flask for backend API
Tailwind CSS for styling

Built With

flask
huggingface
openai
python
pytorch
react
typescript

Submitted to

HunterHacks 2025: Build for Better
- Winner Best AI/ML Hack

Created by

I worked on tweaking the backend algorithm and manually testing/verifying various implementation ideas.

AtaurM Muhith
Ashraful Mahin
Nakib Abedin
Zhen Tao Pan