Skip to content

Congressional Data Model

deserat edited this page Oct 28, 2014 · 6 revisions

Approach to Data

We are dealing with heterogeneous data set; yaml, json, csv. The data is mostly json and structured as documents but this isn't very useful for statistical analysis. We are converting the dataset to something that is more easily modeled by Pandas. The resembles a normalized data relational structure and may ultimately become one, but for the time being it is not. Data is stored in CSV's.

Below are notes on the data objects. Some include the fields we are storing for the model, don't consider these authoritative. The code is authoritative, the models here are just notes while we work through things.

Congress

A number, a span of years. We'll and auxiliary data later.

Legislator

Information about a given legislator. Right now we are pulling in the legislators-current and legislator-historic pretty much as is.

Legislation

The core legislation information. Relationships and events will be abstracted into timeseries and relational data structures. Also note the text of various legislation is removed from the core record.

[DataModel](Legislative DataModel)

Topics

Published congressional topics

Committee

  • A committee has members and subcommittees.
  • Committees persist between legislative sessions.
  • Committees can sponsor legislation.

Legislative Event

This isn't really an object in the data set. It's some thing are going to make up. There are a number events that occur around legislation. Sponsorships occur at a date and are withdrawn at a date. Legislation is introduced, voted upon, passed, etc...

Sponsor

This is technically a relationship. So it will be a mapping table. There is also co-sponsorship.

Amendments

Bills are Amended. I've not dug into these yet.

Party

Political affiliation. Yes there are more than two.

State

The state, the number of represenatives. Whatever else we find useful later.

District

Perhaps we should consult Steve Colbert about this. Congressional Districts. Number - someday maybe geo... economic and financials