citewise

Inspiration

Legal research is tedious and time-consuming, with lawyers spending hours finding relevant cases. For instance, if a client was punched, lawyers must manually try different keywords like “kick” or “slap” to find matches. Analyzing cases is also challenging. Even with case summaries, lawyers must still verify their accuracy by cross-referencing with the original text, sometimes hundreds of pages long. This is inefficient, given the sheer length of cases and the need to constantly toggle between tabs to find the relevant paragraphs.

What it does

Our tool transforms legal research by offering AI-powered legal case search. Lawyers can input meeting notes or queries in natural language, and our AI scans open-source databases to identify the most relevant cases, ranked by similarity score. Once the best matches are identified, users can quickly review our AI-generated case summaries and full case texts side by side in a split-screen view, minimizing context-switching and enhancing research efficiency.

How we built it

Our tool was developed to create a seamless user experience for lawyers. The backend process began with transforming legal text into embeddings using the all-MiniLM-L6-v2 model for efficient memory usage. We sourced data from the CourtListener, which is backed by the Free Law Project non-profit, and stored the embeddings in LanceDB, allowing us to retrieve relevant cases quickly.

To facilitate the search process, we integrated the CourtListener API, which enables keyword searches of court cases. A FastAPI backend server was established to connect LanceDB and CourtListener for effective data retrieval.

Challenges we ran into

The primary challenge was bridging the gap between legal expertise and software development. Analyzing legal texts proved difficult due to their complex and nuanced language. Legal terminology can vary significantly across cases, necessitating a deep understanding of context for accurate interpretation. This complexity made it challenging to develop an AI system that could generate meaningful similarity scores while grasping the subtleties inherent in legal documents.

Accomplishments that we're proud of

Even though we started out as strangers and were busy building our product, we took the time to get to know each other personally and have fun during the hackathon too! This was the first hackathon for two of our teammates, but they quickly adapted and contributed meaningfully. Most importantly, we supported each other throughout the process, making the experience both rewarding and memorable.

What we learned

Throughout the process, we emphasized constant communication within the team. The law student offered insights into complex research workflows, while the developers shared their expertise on technical feasibility and implementation. Together, we balanced usability with scope within the limited time available, and all team members worked hard to train the AI to generate meaningful similarity scores, which was particularly demanding.

One teammate delved deep into embeddings, learning about their applications in similarity search, chunking, prompt engineering, and adjacent concepts like named entity recognition, hybrid search, and retrieval-augmented generation (RAG) — all within the span of the hackathon.

Additionally, two of our members had no front-end development experience and minimal familiarity with design tools like Figma. By leveraging resources like assistant-ui, we quickly learned the necessary skills.

What's next for citewise

We aim to provide support for the complete workflow of legal research. This includes enabling lawyers to download relevant cases easily and facilitating collaboration by allowing the sharing of client files with colleagues.

Additionally, we plan to integrate with paid legal databases, making our product platform agnostic. This will enable lawyers to search across multiple databases simultaneously, streamlining their research process and eliminating the need to access each database individually.

Built With

cloudflare
courtlistener
fastapi
gpt-4o-mini
huggingface
javascript
lancedb
python
react
text-embedding-3-small

Submitted to

HackHarvard 2024
- Winner Best AI Project with Databricks Open Source

Created by

I worked on the backend. I used Python and FastAPI to ingest CourtListener API data into LanceDB vector store to serve similarity search queries. I also used OpenAI SDK to generate titles & summaries of legal documents from ChatGPT 4o.

Thinh Nguyen
Grace Kong
Jane Shin
Tan Dexter

Updates

Thinh Nguyen started this project — Oct 13, 2024 05:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.