-
Notifications
You must be signed in to change notification settings - Fork 0
Congressional Data Model
We are dealing with heterogeneous data set; yaml, json, csv. The data is mostly json and structured as documents but this isn't very useful for statistical analysis. We are converting the dataset to something that is more easily modeled by Pandas. The resembles a normalized data relational structure and may ultimately become one, but for the time being it is not. Data is stored in CSV's.
Below are notes on the data objects. Some include the fields we are storing for the model, don't consider these authoritative. The code is authoritative, the models here are just notes while we work through things.
A number, a span of years. We'll and auxiliary data later.
Information about a given legislator. Right now we are pulling in the legislators-current and legislator-historic pretty much as is.
The core legislation information. Relationships and events will be abstracted into timeseries and relational data structures. Also note the text of various legislation is removed from the core record.
[DataModel](Legislative DataModel)
Published congressional topics
- A committee has members and subcommittees.
- Committees persist between legislative sessions.
- Committees can sponsor legislation.
This isn't really an object in the data set. It's some thing are going to make up. There are a number events that occur around legislation. Sponsorships occur at a date and are withdrawn at a date. Legislation is introduced, voted upon, passed, etc...
This is technically a relationship. So it will be a mapping table. There is also co-sponsorship.
Bills are Amended. I've not dug into these yet.
Political affiliation. Yes there are more than two.
The state, the number of represenatives. Whatever else we find useful later.
Perhaps we should consult Steve Colbert about this. Congressional Districts. Number - someday maybe geo... economic and financials