Elex 4549 add model flexibility #109

lennybronner · 2024-09-25T02:19:49Z

Description

Instead of being stuck with OLS as our base model for the bootstrap model, this PR gives us flexibility to use different models. We were wed to OLS because we had a quick way of computing the leave-one-out-residual, which we needed for the bootstrap, since the training residual would be biased towards zero. We now use k-fold cross validation to get an estimate for the leave-one-out residual (k-fold residual will be greater than or equal to loo residual, so this is if anything more conservative). We also add the ability to play with OLS vs. QR models. In the future, we may expand the models that are being used here.

This needs this branch of elex-solver, which allows us to calculate the k-fold residual. Tests will fail until this branch is merged/released.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-4549

Test Steps

elexmodel 2017-11-07_VA_G --estimands=margin --office_id=G --geographic_unit_type=county --pi_method bootstrap  --features baseline_normalized_margin --percent_reporting 30 --aggregates postal_code --aggregates unit --model_parameters '{"model_type": "QR"}'

vs

elexmodel 2017-11-07_VA_G --estimands=margin --office_id=G --geographic_unit_type=county --pi_method bootstrap  --features baseline_normalized_margin --percent_reporting 30 --aggregates postal_code --aggregates unit --model_parameters '{"model_type": "OLS"}'

…CV branch for testing purposes

dmnapolitano · 2024-09-30T17:06:34Z

setup.py

@@ -5,7 +5,7 @@

 INSTALL_REQUIRES = (
    "click>=8.1",
-    "elex-solver>=2.1.1",
+    "elex-solver @ git+https://github.com/washingtonpost/elex-solver.git@updating-residuals",


I changed this temporarily so we can test / run the tests here in github 🎉

dmnapolitano

This looks pretty good! 🎉 🎉 I have some questions here and on the elex-solver PR, and I was also wondering if you could add documentation about the new solver options to the README? 🤔 🎉

dmnapolitano · 2024-09-30T17:14:20Z

src/elexmodel/models/BootstrapElectionModel.py

+            model.fit(x, y, weights=weights, taus=0.5, **model_kwargs)
+        else:
+            raise BootstrapElectionModelException(
+                f"Please use defined model type 'QR' or 'OLS', input is: {self.model_type}"


This is such a polite and helpful error message! 😄 🎉

dmnapolitano · 2024-09-30T17:19:07Z

src/elexmodel/models/BootstrapElectionModel.py

+        model_kwargs = {"fit_intercept": True, "regularize_intercept": False, "n_feat_ignore_reg": 1}
+        if self.model_type == "OLS":
+            # if we run OLS we use cross validation to find the optimal lambda
+            # TODO: why do we not do this for other models also


lol wait, why do we not do this for other models also? 🤔

Is this a "how can I write code to make this happen" question or a theoretical question (or both)? 🤔

dmnapolitano · 2024-09-30T17:21:44Z

src/elexmodel/models/BootstrapElectionModel.py

+            y_train,
+            weights_train,
+            aggregate_indicator_train,
+            K=5,


Just curious, how did you pick the default of 5 here (and elsewhere)? I mean I've done (and seen) 3-fold, 5-fold, 10-fold CV but usually my choice there depends on the number of examples in the data set and how much computing power I have 🤔

dmnapolitano · 2024-09-30T17:22:29Z

src/elexmodel/models/BootstrapElectionModel.py


        # we cannot just apply ols_y_B/old_z_B to the test units because that would be missing
        # our contest level random effect
+        # TODO: DO WE NEVER DO ANYTHING WITH THE OUTPUT


lennybronner added 2 commits September 24, 2024 22:09

unit tests work

848bf52

linter

13ac4e8

lennybronner requested a review from a team as a code owner September 25, 2024 02:19

Temporarily swapping out the elex-solver package with the new k-fold …

47ecd75

…CV branch for testing purposes

dmnapolitano self-assigned this Sep 30, 2024

dmnapolitano reviewed Sep 30, 2024

View reviewed changes

dmnapolitano mentioned this pull request Dec 13, 2024

Resolve pylint and other warnings #146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elex 4549 add model flexibility #109

Elex 4549 add model flexibility #109

lennybronner commented Sep 25, 2024 •

edited

Loading

dmnapolitano Sep 30, 2024

dmnapolitano left a comment

dmnapolitano Sep 30, 2024

dmnapolitano Sep 30, 2024

dmnapolitano Sep 30, 2024

dmnapolitano Sep 30, 2024

Elex 4549 add model flexibility #109

Are you sure you want to change the base?

Elex 4549 add model flexibility #109

Conversation

lennybronner commented Sep 25, 2024 • edited Loading

Description

Jira Ticket

Test Steps

dmnapolitano Sep 30, 2024

Choose a reason for hiding this comment

dmnapolitano left a comment

Choose a reason for hiding this comment

dmnapolitano Sep 30, 2024

Choose a reason for hiding this comment

dmnapolitano Sep 30, 2024

Choose a reason for hiding this comment

dmnapolitano Sep 30, 2024

Choose a reason for hiding this comment

dmnapolitano Sep 30, 2024

Choose a reason for hiding this comment

lennybronner commented Sep 25, 2024 •

edited

Loading