Data

Data Flow

The data-flow diagram below shows events, lambdas, queues, and RDS tables. Most data originates with the FEC API but also uses fecfile to parse FEC filings from https://docquery.fec.gov/paper/posted/{fec_file_id}.fec. You can also view the lambdas, along with monitoring and logging, through the aws console./

Data-Flow Diagram

Data Model

The tables came from: - The FEC API - candidate_detail -> CandidateDetail from https://api.open.fec.gov/developers/#/candidate/get_candidate__candidate_id__ - committee_detail -> CommitteeDetail from https://api.open.fec.gov/developers/#/committee/get_committee__committee_id__ - committee_candidates -> CommitteeDetail - The docquery filings - filings - filings_amendment_chain - filings_schedule_b - filings_schedule_e - form_1_supplemental - committee_totals

If you inspect the data-flow-diagram above you can see what API was used to make any redshift table. Each API endpoint returns a different Data Model. The APIs and their DMs are listed above. The docquery API returns fec files which are pretty free-form

Entity Relationship Diagarm

This was generated using DataGrip and makes uses of the foreign keys which are not enforced by redshift

Database Statistics

There is an extra lambda, GetDBStats, that queries the database for row counts and the like. It then writes this information back to the DB with a date for point-in-time records. The lambda code is in src/get_db_stats.py and you can query for it's results from the database with select * from fec.loading_stats; to get something like this:

query_time candidate_detail_count candidate_detail_min_first_file committee_detail_count committee_detail_min_first_file filings_count filings_min_receipt_date committee_totals_count filing_amendment_chain_count filings_schedule_b_count filings_schedule_e_count form_1_supplemental_count
2020-08-21 15:40:48.731069 1166 2016-11-06 880 1972-06-09 11454 2019-01-20 897 0 18600 1951 0
2020-08-21 15:42:21.751421 1166 2016-11-06 880 1972-06-09 11454 2019-01-20 897 0 18600 1951 0