Skip to content

EvanWeis/edgarDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EDGAR SEC Database

The intent of this database is to enhance the speed and efficiency of fundimental analysis used by some investors to identify potential investment opportunbities.

Note: to use this script you must update the submissions_path, facts_path, and test_path to correspond with the location of your downloaded zip files...

SEC EDGAR (electronic data gathering, analysis, and retrival) system is an APi used to request public company filings. However programatic requets a rate limited, scraping is prohibited and the bilk download zip files filled with json are cumbersome.

The edgarDB.py script is a script used to build and deploy a sqlite3 database to query and analyze financial statements from public traded companies on the CBOE, OTC, NYSE, and Nasdaq exhanges.

Reports included in the databse are 10k and 10Q filings for all publicly traded companies insofar as EDGAR makes them available via the bulk download .zip files downloaded here and here

Contents of the reports include shares outstanding, market value (where aplicable) and the contents of the financial statements complaint with the generally accepted accounting principals (GAAP) reported to the SEC.

Tables included in the datebase and associated schema are

companies:

cik TEXT PRIMARY KEY NOT NULL,
ticker TEXT NOT NULL,
name TEXT NOT NULL,
sic TEXT,
description TEXT,
exchange TEXT

Description:

- cik: the central index key is a unique number assigned by the sec for companies to submit filings

- ticker: a unique sting of letters used to identify a publicly traded company on a stock exchange

- sic: Standard Industrial Classification Code is used to categorize industries that companies belong to based on business activities

- description: is the text description of the sic code

- name: is the name of the company

- exchange: is the primary exhange the company is traded on i.e. NYSE

filings

accn TEXT PRIMARY KEY NOT NULL,
cik TEXT,
form TEXT NOT NULL,
filingDate DATE,
reportDate DATE,

Description:

- accn: accession number for specific filings organizing the line items in the financials

- cik: serves as foreign key

- form: either 10-K or 10-Q

- filingDate: date form was filied with SEC

- reportDate: fiscal period end report is referencing

annualReports and quarterlyReports table schema are the same, difference is in the report contents, 10-K and 10-Q respectively

as of this stage of development the tables are incomplete covering only a portion of the balance sheet

accn TEXT PRIMARY KEY NOT NULL,
report TEXT,
CommonStockSharesOutstanding INT,
CashAndEquivalent INT,
CurrentAssets INT,
Assets INT,
CurrentLiabilities INT,
LiabilitiesAndEquity INT,
NetIncome INT

Description of this table should be self explainatory at this point with the financial components being somewhat intuitive, I wont belabor the point by describing them further.

Future development includes plans to include a json file with a complete description of the GAAP in the repo.

Creating the Database

This process requires the facts_path and submissions_path to be updated to reference the location of the directory in which the json files live.

The output is an sqlite3 databse in the directory where the script is run.

Usage:

create_company_table()
create_filings_table()
create_reports_table('10-Q', 'quarterlyReports')
create_reports_table('10-K', 'annualReports')

Sample usage of sqlite3:

SELECT sic AS "SIC",
descprition AS "Description",
COUNT(name) AS "Compaines Included",

ROUND(AVG(NetIncome), 0) AS "Avg Profit",
ROUND(AVG(Assets), 0) AS "Avg Assets",

ROUND(AVG(CAST(NetIncome AS float)/CAST(Assets AS float)), 4) AS "Avg Gross Profitability"

FROM Companies AS c
JOIN filings AS f
ON c.cik = f.cik
JOIN annualReports AS a
ON f.accn = a.accn

WHERE Assets > 200000 AND strftime('%Y', f.reportDate) IN ('2018', '2019', '2020', '2021')

GROUP BY sic

ORDER BY AVG(CAST(NetIncome AS float)/CAST(Assets AS float)) DESC
LIMIT 15
;

Outputs of top SICs with the heighest average gross profitability ratio for the years 2018-2021

Note:

Due to inconsistencies in filing format from accross companies and years the INSERT failure rate is still quite high. Improvements are ongoing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages