GitHub - CTCL/bip-data

This repository was archived by the owner on Aug 2, 2024. It is now read-only.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.idea		.idea
data		data
districts		districts
old_stefan		old_stefan
schema		schema
sql		sql
vf_scripts		vf_scripts
.gitignore		.gitignore
__init__.py		__init__.py
benchmark.py		benchmark.py
bipbuild.py		bipbuild.py
cutcode.py		cutcode.py
determine_districts.py		determine_districts.py
ersatzpg		ersatzpg
readme		readme
rsyncex		rsyncex

Repository files navigation

This is essentially the Ballot Information Project. The working branch is in new_nat
bipbuild.py builds the BIP database for each state you pass it. It's mostly based on ersatz: https://github.com/natgaertner/ersatzpg but has a bunch of its own config ideas. You can see some sample state configs in vf_scripts/exception_states.
vf_scripts in general has a ton of quick and dirty script solutions that helped me manage the parallel state directories. make_state_confs.py might be the most important one, since it uses the state_conf_template.py and exception state files to create configurations for each state that reference the correct locations and names.
bip_build.py has a lot of functions, but you usually just run -all_no_clean which ditches the old data, inserts new data, cleans up duplicate districts, merges tables that need to be merged, maps sequential keys correctly, and dumps json.
running -all will actually drop and remake the partitioned tables. This is a little more drastic than you might want.
Generally, the database is built by importing into a buffer table, then performing some SQL commands to square everything away with keying before inserting into the actual table. A lot of the ugly table creation and keying commands are run using functions in schema/table_tools.py. schema/create_partitions.py contains a partition generator that is pretty useful. It also contains a Permutation generator. Did I know about itertools.permutations and itertools.combinations? I did not.
schema/process_schema.py has some attempts to make python classes that represent a database schema as far as I needed it to. Some of the ugliest SQL generation happens in here for rekeying. It could probably get a lot cleaner. At one point I was trying to build a tree of foreign key relations and detect loops and use that to generate rekeying commands after data was loaded in. This became unnecessary, thankfully. We really didn't need a lot of explicit relations, since we were dumping the separate tables out to json anyway.
Some of the base configs that are extended for the states are in the data directory. univ_settings.py, table_defaults.py, target_smart_defaults.py, and candidate_defaults.py are the main configs that are inherited.

The specific state data sits in data/voterfiles/<two letter state abbreviation> but the voterfiles directory is omitted for a variety of reasons, not least because it is a couple hundred GB of data.