Inspiration

One of our developers recently discovered – 4 months into an 8 month thesis course – that the codebase they were building upon had become obsolete years ago. The documentation was outdated, files were missing, dependencies were unclear, and 4 months were wasted trying to put the pieces together.

That experience made one thing clear: developers are often misled and forced to trust repositories without truly understanding them. GitConnect was built so no developer would have to learn their lesson the hard way again.

What it does

GitConnect performs a full-repository analysis to build structured representations of how code components interact. By mapping dependencies, evaluating repository viability, and answering user questions, GitConnect acts as a mini project expert, helping developers understand strengths, weaknesses, and risks before building.

How we built it

GitConnect uses a pipeline-based-architecture to analyze repositories end-to-end. We begin by ingesting the entire repository, leveraging Moorcheh’s Python SDK to make a retrieval-augmented-generation (RAG) system, enabling an AI agent to answer user questions using grounded, repository-specific context.

During ingestion, relationships between code components are extracted and stored in Neo4j’s graph representation, allowing us to model the codebase as an interconnected system, making its architecture easier to explore, understand, and analyze.

In parallel, Google’s Gemini API does a secondary analysis of the repository, evaluating documentation quality, platform relevance, and project health. This external analysis contributes to a viability assessment that helps developers determine whether a project is worth building upon.

Challenges we ran into

The parsing and modeling of large repositories introduced a lot of performance and complexity challenges. Specifically, the ingestion pipeline required heavy vector embedding to properly extract relationships between files. This became a tight bottleneck when trying to overcome timeout errors and trying to preserve the user experience.

Furthermore, deciding which nodes and dependencies to show in the context of a large codebase proved to be quite challenging, it required careful balancing of displaying relevant components and their dependencies, as well as not overwhelming the user and displaying too much information.

Finally, with the help of a variety of different APIs, we were able to ingest, process, and extract lots of information on immense datasets and repositories, yet the challenge came with properly joining those api edges with frontend components. Latency, different API syntax, and data handling all became hurdles to overcome when putting the pieces of our product together.

Accomplishments that we're proud of

Despite the scale and complexity of large repositories and external APIs, we successfully built a system capable of ingesting and analyzing entire codebases end-to-end, delivering a unique and practical tool for developers.

We’re particularly proud of our custom RAG pipeline and integration of Moorcheh, Neo4j, and Google’s Gemini APIs to create a powerful agentic AI system. This architecture enables grounded question answering, interactive dependency graph visualization, and repository-specific summaries driven directly by a repo’s code structure.

Finally, we were able to deliver a working viability assessment pipeline that goes beyond surface level documentation or lack thereof, and combines standardized criteria with external research to form a truly informed judgement for developers.

What we learned

During this project, we learned how to combine and implement a variety of APIs and concepts to make a cohesive and scalable product. Yet, more importantly we learned that truly understanding a codebase requires more than skimming files or documentation, but repositories must be analyzed as complete systems, where every component plays an important role in the system.

We also learned about the power and importance of proper grounding and retrieval in AI systems, especially when dealing with smaller scale or more niche data. A RAG pipeline is a powerful tool that lets developers ensure that AI agents only have access to and use certain information, greatly reducing hallucinations and increasing the confidence we can place in AI-generated outputs.

Finally, we learned that a project’s viability goes well beyond just good documentation or well written code. A wide variety of factors and moving parts go into making a good system, from documentation, to platforming, to code quality, all of these factors are necessary for analyzing a repository, and both our concept and experience making our product taught us that.

What's next for GitConnect

Going forward, we want to further improve the scalability and performance of the platform. Supporting larger repositories more efficiently will make GitConnect a more robust, industry level tool with the potential for a much broader user base.

Furthermore, we wish to expand GitConnect’s analysis capabilities by introducing a more sophisticated viability scoring system. Specifically, we want to implement deeper evaluation metrics such as maintenance activity, commit history analysis, and code quality.

Finally, we plan to deploy GitConnect as a hosted application. While time constraints and the scope of the project limited our ability to launch a live website during the hackathon, hosting the platform is definitely our next step so developers around the world can access and benefit from the tool.

Built With

Share this project:

Updates