Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator APIs (via ResumableFunctions.jl) #121

Merged
merged 6 commits into from
Aug 3, 2021
Merged

Conversation

InnovativeInventor
Copy link
Member

Since #94 was stagnating, I decided to break out the iterator-style API into this PR using ResumableFunctions.jl.

Caveat: The API exposed by this PR is different from the one proposed in #94. Rather than just returning scores, it also returns the Partition object as well in a tuple. This makes it more versatile.

Although this is a breaking change, this brings GerryChain Julia
more in line with GerryChain Python and allows the user to save
and export state more easily. This API is different from the one
proposed in #94 in that it returns a Partition object and the
scores.
The iterators are now exposed in `recom_chain_iter` and
`flip_chain_iter` respectively to avoid any breaking changes.
src/flip.jl Outdated Show resolved Hide resolved
@InnovativeInventor
Copy link
Member Author

For some more context, I decided to not break backwards compatibility and made recom_chain and flip_chain dumb wrappers around recom_chain_iter and flip_chain_iter respectively.

@InnovativeInventor
Copy link
Member Author

a678eb6#diff-ade2663ea888449e8028f941e69e33ec7f16635c54248240974af0ea18ffe682R288 is a long line, but it is necessary since I can't add it to the multi-line comment above and the workaround itself is finnicky.

Copy link
Collaborator

@pjrule pjrule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the original author of most of these changes (over in #94), and I've run a bunch of chains this way—it works! However, would love a review from other folks just to make sure I didn't miss any subtle ordering issues, etc.

@InnovativeInventor
Copy link
Member Author

InnovativeInventor commented Jul 26, 2021

I am the original author of most of these changes (over in #94), and I've run a bunch of chains this way—it works! However, would love a review from other folks just to make sure I didn't miss any subtle ordering issues, etc.

Yep, this is a fairly simple, but impactful change (it really changes up the internals), that should hopefully maintain backwards compatibility. I've tested it with my scripts, etc. and it appears to work -- more testing and eyeballs would be better.

Finally, as this change is a minor but backwards-compatible change, the minor version number should be bumped up as per semantic versioning (e.g. 0.1.3 -> 0.2.0).

@pjrule
Copy link
Collaborator

pjrule commented Jul 26, 2021

Actually, I just spotted an issue here: the semantics of the iterator are different from the Python version of GerryChain, which may confuse our users. Specifically, Partition objects are basically immutable in Python; the iterator in the MarkovChain object returns distinct Partition objects. In this version, we update a single Partition struct at each step by default. This behavior can be changed internally with the copy_partition flag of update_partition!, but this is not really exposed to users. Regardless of how easy the behavior is to change, though, the defaults are most relevant here: the Python version is immutable by default, and the Julia version is mutable by default. Julia users expecting the Python behavior will, to their surprise, end up with a bunch of pointers to the same Partition struct. Some ideas, roughly in order of performance impact (least to greatest):

  • Ignore this and add a note in the docstring.
  • Introduce a copy flag to recom_chain_iter; set it to false by default. (Internally, this would be forwarded to the aforementioned copy_partition flag.)
  • For each chain, provide two iterator functions: a basic one that returns just scores (always immutable), and an advanced one that returns the latest Partition (copy or not) and the scores. This sounds clunky—and it kinda is—but in most cases I imagine users will only want the scores. IMO, that's the usage we should encourage.
  • Introduce a copy flag to recom_chain_iter; set it to true by default.

@InnovativeInventor
Copy link
Member Author

I'm mostly in favor of adding a note about this slight divergence from GerryChain Python in the docs for a few reasons. Firstly, I'm not quite sure when/how often someone would want a copy of the partition object outside the for loop (I've certainly never had to do that) -- even then, it should be fairly easy to detect that all the copies one is storing are the same. Secondly, changing the behavior to be less speedy (by default) undercuts the only selling point of GerryChain Julia: speed. I don't think we should compromise on that, especially with the defaults. So, I'm not sure if we're sabotaging our performance out of concern for a phantom use case.

I do like the third idea (however, it doesn't really answer whether we should copy the partition by default). What do others think?

@InnovativeInventor InnovativeInventor linked an issue Jul 26, 2021 that may be closed by this pull request
Copy link
Collaborator

@sunnymatt sunnymatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!! Glad that we have this option. I wish there were a cleaner way to name the functions besides just adding _iter but that's not, like, a fixable thing afaik. 👍 thank you @InnovativeInventor for this!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Saving assignments
3 participants