GitChat

Arun Krishna Vajjala, Ajay Krishna Vajjala, Deval Parikh

Features

Allows users to link a GitHub Repo to navigate via the CLI chat interface
Stores previous Repos for easy access and retrieval
Uses DeepLake Vector storage to segment code and documentation, bypassing major Token Limit issues

Using GitChat

Provide your OpenAi and VectorStore key in the gitchat.py file
run python3 driver.py to start the CLI interface

Problem and Motivation

Many companies in the industry have large software codebases that can be challenging and time consuming to navigate for engineers, non-technical team members, and leadership. These current practices require developers to get training and familiarize themselves with the codebases to make meaningful contributions. This can consume a significant amount of the company’s time and resources. To tackle these challenges, we introduce GitChat—an interactive AI Developer tool designed to comprehend codebases and enable developers to ask questions about it using natural language in plain English. This give developers a means to interact directly with large codebases without tediously navigating the documentation and technical components within it. This tool has the potential to significantly reduce the time it takes for developers to familiarize themselves with the source code. It can also enhance the productivity of developers at any level by enabling them to directly "chat" with the codebase. GitChat eliminates the need for laborious exploration to comprehend the complex logic and architectures within the code. This increase in productivity can potentially result in developer output and performance since they no longer need to spend time manually searching the code. This allows developers to focus their efforts on more creative problem solving and innovation. Companies can benefit from this heightened level of productivity to achieve their goals.

Use Case

Case 1: Team Members of Any Technical Background o In cases where there are multidisciplinary teams, there may be team members who may not understand the codebase but need to understand the overarching functionality of features within the product. GitChat will not only help those with an advanced technical background but provide a natural language explanation of features within the codebase for those with a less-technical expertise. This improves productivity within teams by providing a seamless knowledge transfer between team members of varied technical experience.
Case 2: Onboarding Process o New team members within the company or team must familiarize themselves with the codebase in order to make meaningful contributions. This process can take weeks to months of the company’s time. By providing new hires with GitChat, they can simply “chat with the codebase” to understand it. These queries can be as simple as asking questions to understand functions or as complex as asking how certain large-level systems function.
Case 3: Experienced Developers o Experienced developers are well versed in the code base and provide creative solutions to problems that may arise. These problems often require a search of the documentation to ensure that they can make the necessary changes. GitChat allows these high-skilled developers to simply query the codebase for design details and architecture information that they may need in order to implement the solution.

Technical Overview

Built with: Python, OpenAI GPT API, LangChain, GitHub Python Package

Index the Codebase: Duplicate the target repository, load all contained files, divide the files, and initiate the indexing procedure. Alternatively, you can bypass this step and use a pre-indexed dataset.
Store Embeddings and the Code: Code segments are embedded using a code-aware embedding model and saved in the Deep Lake VectorStore. This is done via LangChain
Assemble the Retriever: Conversational Retriever Chain searches the VectorStore to find a specific query's most relevant code segments. It uses context-aware filtering and ranking to figure out which code snippets and info are most relevant. Also done via LangChain
Build the Conversational Chain: Customize retriever settings and define any user-defined filters as necessary.
Pose Questions: Create a list of questions about the codebase, then use the Conversational Retrieval Chain to produce context-sensitive responses. The LLM (GPT-4, in this case) should now generate detailed, context-aware answers based on the retrieved code segments and conversation history.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
GitChat.docx		GitChat.docx
README.md		README.md
databaseOp.py		databaseOp.py
demo.gif		demo.gif
driver.py		driver.py
gitchat.py		gitchat.py
history.json		history.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitChat

Features

Using GitChat

Problem and Motivation

Use Case

Technical Overview

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GitChat

Features

Using GitChat

Problem and Motivation

Use Case

Technical Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages