Inspiration
We have always had a passion for finance/ Investing. The financial services theme was an obvious choice for us. We tried to understand the real bottlenecks Investors face when performing investment research and realised that reading and understanding long financial documentation is time–consuming and arduous.
Documents such as SEC filings are crucial in making informed investment decisions as they describe a company’s financial reports/filings, insider dealings, etc. Flora AI makes understanding these reports ridiculously easy for investors by talking to an AI assistant.
We decided to go beyond just financial documents to also include the latest news reports for specific companies, giving a holistic investment research experience.
What it does
Flora AI allows users to pick a specific publicly traded US stock and ask questions about the company's financial documents / SEC filings. The LLM responds to user questions by referring to all the latest Financial documents for a company while also providing citations and links to the documents.
The users can also decide to browse the latest financial news articles related to the company and receive a very concise and accurate summary of the article to get a strong sense of the real-time events surrounding the stock they chose.
How we built it
Our project scrapes the SEC website (using beautiful-soup) to get the latest filings (forms 10-K, 10-Q, 8-K, 3, 4, 144) for all the top companies that are publicly traded in the US (by Market Cap).
It then splits the document into relevant chunks and indexes it to a vector database (pinecone). The LLM (Cohere’s Command R+) can then be asked to answer any questions about the documents, and it does this through retrieval-augmented generation. We used the metadata from the given chunks to provide citations and links for the answers.
The vectors corresponding to the financial documents for each company were separated into different namespaces so we can ensure Flora AI never queries the wrong company document to answer a question during the semantic search.
Next, we built a real-time news query pipeline, by scraping the top financial news articles for the requested stock/ticker from Finviz - a financial news aggregator website. At the click of a button, we would scrape the text content of the news article and provide it to our LLM for summarization.
We tried and tested many different prompts/retrieval techniques before we arrived at the final solution.
The website was built and hosted on Streamlit.
Challenges we ran into
The structure of the webpage on which the actual form was displayed made it so that parsing it was extremely difficult. We had to try multiple different approaches and spent many days on it before we were able to parse the form effectively to get all of the text and financial tables which were then Indexed in our vector database.
Another challenge we faced was the namespace limit for the pinecone starter index was 50, which limited the number of companies whose data we could Index. We also realized that certain non-US companies such as LVMH don’t file their forms with the SEC and we could not scrape any data for them.
Accomplishments that we're proud of
We are very proud to have built an entire Web application that we could host by the end of the Hackathon, that had most of the intended functionalities we aimed for at the start of the Hackathon. I am also proud of how much work we accomplished in such a short amount of time as we joined the Hackathon two weeks late due to our Universities’ final exams.
What we learned
This project taught us a lot about both coding and financial documents. We learned a lot about the procedure of scraping a complex web page and parsing the text / tabular data within it. We gained a whole new perspective on building sophisticated RAG pipelines and the nuances of prompt engineering.
In the case of financial documents, we had a fair idea of what annual and quarterly reports entailed, but building this project made us dive into the fine details of what these forms contain so that we could understand and verify if the LLM was giving us an accurate answer.
What's next for Flora AI
We have several ideas and a clear roadmap for the features we want to implement next into our project. Some of them are:
- Get older filings for each company, we currently only have the latest ones.
- Expand our database to include more US-based companies and also non-US-based companies.
- Functionality to embed the financial document in the app, enabling users to see the source in front of them.
- Add earnings call transcripts to our database.
- Have a sidebar option to take you to the webpage for your portfolio, where the latest news, stock charts, and other info is available on the stocks you choose.
Built With
- beautiful-soup
- cohere
- langchain
- openai
- pinecone
- python
- streamlit
Log in or sign up for Devpost to join the conversation.