Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected district duplication in ReCom chains #426

Open
proebsting opened this issue Oct 21, 2024 · 1 comment
Open

Unexpected district duplication in ReCom chains #426

proebsting opened this issue Oct 21, 2024 · 1 comment

Comments

@proebsting
Copy link

We are seeing two unexpected behaviors from ReCom/GerryChain:

  1. We're getting an identical plan from a chain twice in a row with some frequency. Our mental model is that two successive plans should share N-2 identical districts along with two new districts (from the pairwise recombination/split).
  2. We're also seeing some district configurations get "reincarnated" with some frequency, by which we mean a district configuration in the current plan wasn't in the immediate predecessor plan in the chain but that district configuration was in one of the plans on the chain before the predecessor. Our mental model here is that generated districts are random and astronomical in number so the same one reappearing should be extraordinarily rare.
    Put another way, we have evidence that some districts are being created by multiple independent steps of the chain. Even more concerning, we are seeing some districts being created many independent times over the course of a single chain. (We had one district appear from 42 different recombination/split steps in a 10,000-long chain.)

I have attached a bundle that can reproduce both phenomena. The quickest is with the 20x30 uniform grid.
bundle.zip

The reproduction with a 40x40 uniform grid with 4 districts is particularly noteworthy given that each step is splitting 800 nodes in half, which ought to have an astronomical number of unique possibilities.

(Note, this happens with both the Kruskal and Wilson spanning tree algorithms.)

@peterrrock2
Copy link
Collaborator

peterrrock2 commented Oct 30, 2024

Hey @proebsting, thanks for the replication code: it was very helpful in figuring out what is going
on here. The good news (for both of us, I believe) is that the district "reincarnation" that you
are seeing does not appear to be a bug in the code: it's actually a subtle result of the structure
of the partitions. I think that this is best illustrated by example.

Let's handle the point (1) first. When I ran the code on a 20x30 grid with a population tolerance
of 0, at some point, I get the following partition

example_of_repetition

And then, in the very next step, I get the exact same partition. But this is weird since ReCom
should be merging the districts and drawing a random spanning tree, right? Well, the issue here is
that the districts that where selected for recombination were districts 1 and 2, which are
only adjacent by two cut edges. In the event that we are trying to draw a random spanning tree,
we have to pick one of the two edges in the bottleneck connecting the districts. We then try to
find a place to cut the spanning tree so that the populations on both sides of the cut are equal,
and since the space of spanning trees which admit such a cut is very small given the structure of
the graph, we end up picking one of the two edges that produces the same partition.

Now on to point (2) the "reincarnation" phenomenon. This is a natural extension of the
structural issue that appeared in part (1). However, for the
"reincarnation" to appear, we need to pick bottlenecked districts twice and then
either have one of the districts change mildly as the result of some other ReCom or have both
districts change as part of a ReCombination between the two before being switched back
at a later step. As an example, please refer to districts 4 and 9 in the following gif showing
the phenomenon on a 50x20 grid:

reincarnation_highlight

And here are some surrounding steps so you can see the chain in action:

reincarnation

We can see here that district 9 in step 1365 is the same as district 4 in step 1348
and that district 4 in step 1365 is the same as district 9 in step 1360. However, the
reason why we see the repetition of these district structures comes from the fact that
we are trying to call ReCom on the pair of bottlenecked districts 4 and 9. In the transition
between step 1364 and step 1365, we do succeed in making a random spanning tree, but the
edge that we use to cut the tree actually produces the same partition and we happen to have
flipped the labels of the districts in the interim.

Hopefully this answers your question! I'll leave this issue open for a little bit so that you
have a chance to respond but will close it next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants