Inspiration
We all had experience working as interns for various governmental agencies. However, we all faced the process of judging and verifying the security clearance for each section of each document. As a result, we decided to try to automate this process using generative AI.
What it does
Classify.ai is an AI-driven solution for government document classification. Input text from a government document, and we can classify its security clearance for you using an extensive vector database that we built to speed up your workflow from 30 minutes per subsection to just a few seconds.
How we built it
We first gathered test documents to feed into an upload script that would convert each section into a vector embedding(using LangChain and OpenAI's API). Then, we upserted these embeddings into a vector database to assist in retrieving them to help with our security classification. Then, we used TypeScript and Next.js to fetch these results based on similar vectors to the one we wanted to search for and displayed them on a full-stack web application.
Challenges we ran into
We faced challenges integrating new technologies across languages. For instance, we used Python to generate the vector embeddings and upload them to the Pinecone database. However, the documentation was hard to follow since the TypeScript documentation was not clearly set out or described, making it harder to adopt a new technology. As a result, we went back to LangChain to retrieve our vector embeddings.
Accomplishments that we're proud of
We are proud of the interface allowing you to interact with the application, creating a seamless integration between the React.js front-end and the machine learning/ AI back-end. Also, classifying the security clearance of documents was very rewarding.
What we learned
We learned about the government technology space and the opportunities/restrictions surrounding it. We also learned new technologies we've never used before, like LlamaIndex and Pinecone.
What's next for Classify.AI
We want to continue adding to our Pinecone vector database to increase the accuracy of our classifications and potentially add more encryption and authentication to ensure the total security of government documents. Also, adding more UI components and elevating the design components of both the front-end and back-end portions would be a plus. Also, we want to look into government incubator programs to develop our MVP and also potentially look to get a government contract and expand our startup
Built With
- html
- javascript
- langchain
- pinecone
- python
- tailwind
- typescript

Log in or sign up for Devpost to join the conversation.