About the project

Software engineering teams often find themselves repeating the exact same security mistakes. Despite having extensive historical records of resolved vulnerabilities, this institutional knowledge is usually trapped in closed, forgotten Pull Requests. Standard regex-based security scanners lack contextual understanding and often miss logic flaws. The Agentic Code Reviewer solves this by acting as an autonomous DevSecOps Auditor that actively guards your codebase using historical context, powered end-to-end by Elasticsearch's semantic search and AI orchestration.

In this prototype, once a review is triggered, the Elastic-powered agent autonomously fetches the raw file changes using the GitHub Model Context Protocol (MCP). It translates the code diff into dense vectors and leverages an Elasticsearch vector database to perform a $k$-Nearest Neighbors (kNN) semantic search. This allows it to find identical anti-patterns that caused vulnerabilities in the past. If a match is found, the Elastic agent automatically posts a warning comment directly on GitHub, citing the exact historical PR and providing a secure code fix.

Inspiration

Developers spend hours doing manual code reviews, only to miss logic flaws that the team has already solved months or years prior. Traditional SAST tools rely on brittle syntax matching, and standalone LLMs lack company-specific context. I realized that by leveraging Elasticsearch as an intelligent, long-term memory store, I could turn dead PR history into an active DevSecOps guard. I wanted to build an agent that didn't just understand code, but understood my code's history, using Elastic's blazing-fast vector search to spot repeating patterns.

What it does

Agentic Code Reviewer lives in your repository but thinks in the Elastic Cloud. When a PR is created, the agent pulls the code via the GitHub MCP and queries an Elasticsearch index for similar historical vulnerabilities. Instead of relying on rigid keyword matches, it understands the fundamental logic of the vulnerability by calculating the cosine similarity between the incoming code vector $A$ and the historical PR vector $B$:

$$\text{similarity}(A, B) = \frac{A \cdot B}{|A| |B|}$$

If it spots a developer making a known mistake (like using f-strings for database queries instead of parameterized queries), the Elastic agent orchestrates the MCP tools to post a detailed GitHub comment blocking the merge, referencing the historical PR where this was fixed before, and offering the secure code snippet.

How I built it

I have built the architecture heavily around the Elastic ecosystem, splitting it into two main components: the "Brain" and the "Hands."

  1. The Brain (Elastic Vector Database): Wrote a custom Python ETL pipeline using Pandas and Hugging Face to extract 5,000 real-world PRs from the hao-li/AIDev dataset. And encoded the PR titles and bodies into 384-dimensional vectors using all-MiniLM-L6-v2 and bulk-indexed them into an Elastic Cloud deployment under the pr-code-reviews index. Specifically mapped the title_vector field as a dense_vector to optimize for kNN similarity searches.
  2. The Logic Engine (Elastic Agent Builder): This was the core of the orchestration. Inside the Elastic Agent Builder, I created a custom DevSecOps Auditor persona. I built a dedicated index search tool, codebase.search_historical_prs, which allowed the agent to autonomously execute kNN queries against the Elastic index to find semantic matches.
  3. The Hands (MCP Action Pipeline): To give the Elastic Agent access to the live repository, I deployed a local GitHub MCP server orchestrated via Supergateway and exposed it via a Pinggy Streamable HTTP tunnel. And then bound the MCP tools (get_pull_request, get_file_contents, add_issue_comment) directly into the Elastic Agent UI, completing the loop.

Challenges I ran into

The toughest challenge was the "silent failure" of autonomous tool execution across the network bridge. The local MCP server booted as an anonymous user, stripping the agent of its write permissions. I had to debug the Node.js execution environment and manually pass repository-scoped, fine-grained Personal Access Tokens into the terminal environment variables to give the agent its voice back. Additionally, ensuring the Elastic Agent consistently formatted the exact JSON payload required by the add_issue_comment MCP tool required highly specific prompt engineering within the Elastic Agent Builder instructions.

Accomplishments that I am proud of

Successfully getting an Elastic AI Agent to orchestrate multiple disjointed systems completely autonomously. Watching the Elastic Agent read a live GitHub URL via a secure tunnel, semantically understand the code was vulnerable to SQL injection, query an Elasticsearch database of thousands of past PRs, find the exact matching pattern (PR #76), and write a secure fix directly to GitHub was an incredible "Aha!" moment. It perfectly validated the Elastic-to-GitHub architecture.

What I have learned

I learned the sheer power of the Elastic ecosystem when paired with the Model Context Protocol (MCP). Elastic Data Management made it incredibly easy to verify my vector embeddings, and the Elastic Agent Builder abstracted away the complex orchestration logic usually required for multi-tool LLM workflows. I also learned that vector search fundamentally changes DevSecOps: semantic matching catches logic flaws that traditional scanners will always miss.

What's next for Agentic Code Reviewer

Currently, the agent is triggered manually via the Elastic chat interface. The next iteration will implement an Event-Driven Architecture. By deploying a lightweight Python webhook listener, GitHub pull_request.opened events will automatically trigger the Elastic Agent API, resulting in a 100% zero-touch, fully automated DevSecOps pipeline. For future updates, I plan to implement the "Auto-Fixer" loop. Instead of just leaving a warning comment, the Elastic Agent will use the push_files tool to actually commit the secure code alternative directly to the developer's branch. I also plan to expand the Elasticsearch cluster to ingest ticket data (Jira/Linear) and Slack threads, giving the agent a complete 360-degree view of the company's historical engineering context.

Built With

  • agentic-ai
  • client
  • devsecops
  • elastic-agent-builder
  • elastic-cloud
  • elasticsearch
  • github-api
  • huggingface
  • index
  • knn-search
  • mcp
  • model-context-protocol
  • node.js
  • pandas
  • parquet
  • pinggy
  • python
  • rag
  • search
  • sentence-transformers
  • server
  • supergateway
  • vector-database
Share this project:

Updates