CodeWatch

Project Description

CodeWatch is an AI code detection tool that applies recently published research techniques in order to classify code as either synthetic or human written.

Intuition

Generative LLMs "write" code by continuously predicting the next most likely token. We can leverage this in order to detect code whether code is likely AI generated or human generated. If we ask an LLM to write code to answer a certain prompt, it would likely have similar answers across different responses because it would be constantly picking on of the few most likely tokens.

Humans, however, don't think the same way and typically have somewhat unique style of writing code. If we ask an LLM to rewrite this code, it typically ditches the structure of human code and changes quite a lot of structure (because it uses its token prediction method for generation). AI generated code typically follows a very similar structure across different iterations, and if the code doesn't change much when we rewrite it, it is likely AI generated.

How it works?

  1. We take in source code that can be either human-written or AI-generated. Our task is to classify the code appropriately
  2. We ask an LLM to understand the functionality of the code and then rewrite it by itself
  3. We now have two code snippets, one that could be generated by AI/Humans and one that is defintiely generated by an AI.
  4. We rewrite our code k times. We used k=2 in order to avoid rate limits but k >= 4 is idea according to the paper.
  5. We create embeddings of all of the code snippets that we now have. Then we apply cosine similarity to conduct comparisons of the embeddings. We compare the difference between the original code (reminder, this can be AI generated or human written) rewrites and the AI code rewrites. If there is substantial deviation, it is likely human written. If the deviation is small, then it is most likely AI geneated

Features

Implementation of novel research techniques

The paper we implemented was published earlier this year and to our knowledge, there exists no product that implements such methods.

Fine-tuning of embedding models to work better with code

We fine-tuned GraphCodeBert in order to create more useful embeddings of code snippets

Cosine Similarity

We compared vectors in embedded spaces with cosine similarity to get a gauge of how similar different code snippets are with one another. We then used this to classify code as either synthetic or human-written.

Tech Stack

  • Frontend: React.js, TypeScript
  • Backend: Python, Flask
  • AI/ML: PyTorch, Transformers, CodeT5+

Team Members

  • Zhen Tao Pan
  • Ashraful Mahin
  • Ataur Muhith
  • Nakib Abedin

Acknowledgments

  • OpenAI GPT-4 for code analysis
  • CodeT5+ for code embeddings
  • React and TypeScript for frontend framework
  • Flask for backend API
  • Tailwind CSS for styling

Built With

Share this project:

Updates