The Financial Statement and Notes Data Sets provide the text and detailed numeric information from all financial statements and their notes. This data is extracted from exhibits to corporate financial reports filed with the SEC using eXtensible Business Reporting Language (XBRL).
A small golang script to download financial statements and notes to a local sqlite database from sec.gov
- Go & Go modules
- Sqlite
Clone this repository, then run
$ go install
to compile the source to an executable in the bin/
directory.
$ ./bin/filingsdb
Usage: filingsdb <year>
Give it a year and the script will download and store the data to a local filings_$YEAR.db
sqlite database. This can take a while as there's a lot of data to ingest (the 2019 database clocks in at 16G)
The DB schema (tables, columns and types) follows the structure outlined in the dataset official pdf documentation. The script also builds a convenient ticker <> cik table to make querying easier via join. Use this table with caution, as it's a snapshot of today's data. In the past a given ticker could potentially map to a different cik.
The submissions table (data_sub
) contains one entry per submission. A filing's Accession Number (or adsh
) is the main identifier used to join other facts tables.
Facebook's 10-K Accession Number
select data_tickers.*, data_subs.adsh, data_subs.form, data_subs.accepted
from data_subs
left join data_tickers on data_subs.cik = data_tickers.cik
where data_tickers.ticker = "FB"
and form = '10-K';
Numbers from the filing
select *
from data_subs
left join data_nums on data_nums.adsh = data_subs.adsh
where data_subs.adsh = "0001326801-19-000009";
Identify companies in a given SIC (here biotech related)
select data_subs.adsh, data_subs.accepted, data_tickers.ticker , data_txts.tag, data_subs.form, data_subs.sic, data_subs.cik, data_txts.ddate, data_txts.value
from data_txts
left join data_subs on data_txts.adsh = data_subs.adsh
left join data_tickers on data_tickers.cik = data_subs.cik
where data_txts.tag IN ("NatureOfOperations", "BusinessDescriptionAndBasisOfPresentationTextBlock", "BusinessDescriptionAndAccountingPoliciesTextBlock", "OrganizationConsolidationAndPresentationOfFinancialStatementsDisclosureTextBlock",
"OrganizationConsolidationAndPresentationOfFinancialStatementsDisclosureAndSignificantAccountingPoliciesTextBlock",
"OrganizationConsolidationBasisOfPresentationBusinessDescriptionAndAccountingPoliciesTextBlock"
)
and data_subs.sic IN ("2834", "2835", "2836", "8071", "8731")
and data_subs.form = "10-K"
order by data_subs.accepted;
Welcome to the messy nature of XBRL. Different filers sometimes use different tags for the same thing.
MIT