Inspiration
Financial information about companies and their stock evaluation is difficult to find for the average person. Much of this information is spread across several documents and sites making it hard to understand a company's performance. We wanted to make a simple automated tool that collects financial information from the US EDGAR database and several other financial sites and provides the user with a company evaluation based on the Net Asset Valuation method.
What it does
This program will collect information from the US EDGAR, Electronic Data Gathering, Analysis, and Retrieval, system database which is the database that has all the financial filings of companies that are publically traded on the stock market. Outside of EDGAR, it collects information from platforms such as Yahoo Finance. Our program uses one of the financial valuation methods called the net asset valuation method. This essentially takes all the assets of a company and gets subtracted by the liabilities of a company, also known as the Net Asset Value (NAV). There is no "standardized" way of using this method and our group uses a relatively simple approach to this method. Once the NAV is calculated, we divide it by the total shares outstanding to get the price of the stock per share. We also collected the current price to see if the company is "overvalued" or "undervalued". The details of the assets, liabilities and other information are outputted into a file in a CSV format.
How we built it
We used python for all of the tasks. As for parsing data from EDGAR and Yahoo Finance, we had For reformatting, we made use of Pandas to create the table to reformat, add, and remove information that we wanted.
Challenges we ran into
For the data collecting part: Sites such as Yahoo Finance stored key financial information under different names depending on the company. This made it much harder for us to accurately retrieve information and required multiple conditional checks depending on the retrieved data format and naming scheme. The complexity of Yahoo Finance also made it difficult to figure out what information and network requests made by the site were important (~300 requests for a normal page load).
For the formatting part: We retrieved the majority of the data from the EDGAR database. Each company has a form called "10-K" or "10-Q" which states the financial information. These forms are collected from an HTML file that is listed on the EDGAR website. Each HTML file consists of many tables. Finding the table with the information we need was a big challenge. Once the table was retrieved, we mainly used pandas to reformat, remove, and add information that we wanted. There are many rows and columns that are duplicated, do not carry information we wanted, etc and as a result, we had to cut out all the unneeded information. Adding information and rearranging was also another challenging part we found.
Accomplishments that we're proud of
- Parsing financial information from EDGAR and especially Yahoo Finance
- Reformatting pandas dataframe
- Reverse engineering the sites we collected the data from
What we learned
Cale: Large sites like Yahoo Finance make a significant number of requests upon page loads to various site resources and trackers. This made the initial reverse engineering of the site difficult as combing through requests to find important ones took longer. A key simplification for retrieving information from client-side rendered sites (ex. React) is that most of the data is used by Yahoo Finance is stored in the React application context under various stores. It was interesting for me to see discrepancies between the data used in the backend and the information displayed to the user. The data displayed to the user includes multiple rounding errors which made it difficult to verify that our retrieved results were correct. As a nuclear engineer, this was my first time diving into the complexities of the finance sector and the importance of certain financial indicators in a company's evaluation
Casey: Using the different python modules. I never really coded much in my life. I went through COSC102 and 130, and did some projects on the side in C++ but not really in python. I found out how useful using python modules are. It is a very useful tool and makes the life of a programmer much easier. Also, this was the first time I engaged in parsing financial forms. This came out to be a very complicated task that took many trials and errors to finally find out how exactly to get the information we want. Finally, this was the first time I used Github. Cale and I coordinated with each other and used Github as the hub where we shared the updates and such. I learned the importance and usefulness of using Github.
What's next for NAV Python
We want to go further with the Net Asset Valuation method. We currently use a simple way of using the NAV and the more information we can add, the more reliable the valuation of the companies we come up with will be. Stock valuation is subjective and a certain valuation method does not apply well for certain companies or industries. Beyond this, we want to look into the different, more complex valuation methods to be able to value stocks in many different ways and use the most suitable valuation method depending on the company we are looking into.
Log in or sign up for Devpost to join the conversation.