Inspiration
In a world where time is money, the traditional process of expense reporting is a glaring inefficiency in corporate workflows. Inspired by the staggering statistics from the Global Business Travel Association—highlighting that the average expense report takes 20 minutes to complete and costs companies $58, coupled with a high error rate—our team set out to revolutionize this outdated system. Our goal was to harness the power of artificial intelligence to create a solution that not only saves time and money but also brings a new level of simplicity and accuracy to expense management. Thus, ExpenseAI was born.
What we Learned
For half of our team, this was our first hackathon, so it was a major learning experience for us. We learned a lot about close collaboration across our team to ensure successful implementation. We also learned the value of simplicity and efficiency. We made sure that our project reinforces the value of a simple, intuitive user interface that minimizes the effort required from the user to submit expense reports. We learned about the importance of stress testing our program to identify pain points and areas for enhancement, particularly in understanding user expectations for automation and interaction with the system. Lastly, we learned about optimizing for speed using techniques like preprocessing.
How We Built It
- ExpenseAI was built on an array of cutting-edge AI technologies. We used LangChain, an open-source framework, in Python to create the AI agent
- We use GPT4-Vision for converting images to high-quality text descriptions, and OpenAI’s Whisper for reliably converting audio to text.
- We then fed these text inputs into GPT4 to generate the expense reports. A major issue we encountered was that, despite adding a system message and spending hours on prompt engineering, GPT4 was unable to consistently output an expense report with most of the details provided in the prompt and the user’s contact info. We combined the little real-world expense report data we found with our own synthetic dataset, and used this result with ElasticSearch’s vector database and OpenAI’s embeddings in Retrieval Augmented Generation with GPT4. Specifically, we use Fast API to host our LangChain agent so we can call it from the frontend. We also made some optimizations for speed such as preprocessing. This resulted in a much higher quality and more consistent output.
- We used Streamlit for the frontend, largely due to its beautiful design and ease of use.
- We used Firebase’s Firestore as the backend to store user data and expense reports, allowing easy communication between the client and manager side of the application.
- We then wrote our own Autoencoder anomaly detection model trained on our vector database of existing, synthetic, and our own expense report data. This model then accepts or rejects a report that appears in the manager’s dashboard, updating the database too so the change is reflected in the client side.
Challenges we Faced
We encountered many challenges while making this model. Here are some of them.
- Streamlit’s model depends on running the entirety of a file from top to bottom whenever something on the screen is updated. This had massive downsides, such as not having a cookie manager, so we needed to include many workarounds to get the effects we wanted.
- We had a massive issue where our API was rerunning all of our preprocessing issue each time it was called, leading to slow response times. We eventually fixed this using FastAPI’s methods.
- Prompt engineering became a challenge as mentioned previously, so we eventually used RAG to improve outputs.
Log in or sign up for Devpost to join the conversation.