Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/89 rload fn #90

Merged
merged 17 commits into from
Aug 9, 2019
Merged

Feature/89 rload fn #90

merged 17 commits into from
Aug 9, 2019

Conversation

mitzimorris
Copy link
Member

Submission Checklist

  • Run unit tests
  • Declare copyright holder and open-source license: see below

Summary

Add utility function rload - see issue #89 for discussion.
Test data for unit tests taken from Stan src/test/test-models/good.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Added a few comments

*_, vals, dim = rhs.replace('(', ' ').replace(')', ' ').split('c')
vals = [float(v) for v in vals.split(',')[:-1]]
dim = [int(v) for v in dim.split(',')]
val = np.array(vals).reshape(dim[::-1]).T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here?

Would this equal

np.array(vals).reshape(dim[::-1]).T 
--> np.array(vals, order='F').reshape(dim, order='F')
or
--> np.array(vals).reshape(dim, order='F')
or
--> np.reshape(vals, dim, order='F')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, that's not very clear. I added a subroutine that processes the Rdump multi-dim structure - used an ugly regex with named groups for the essential bits - array values, array dimensions. I ran the Rdump file in R to see what R does, then added unit tests that verify that the array values end up in the right place. hope this helps.

idx += 1
next_var = idx
var_data = ''.join(lines[start_idx:next_var]).replace('\n', '')
lhs, rhs = [_.strip() for _ in var_data.split('<-')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, just a small comment: using _ instead of item seems weird to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, fixed.

"""
data_dict = {}
with open(fname, 'r') as fp:
lines = fp.readlines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, larger than RAM files are not really that common.

cmdstanpy/utils.py Show resolved Hide resolved
@mitzimorris
Copy link
Member Author

I did a rewrite of the drump parsing logic and plugged it into the read_rdump_metric helper fn. @ahartikainen, if you have any free cycles, please recheck.

(this feature is as much of a PITA as everything else to do with R, but I can see its utility...)

@mitzimorris mitzimorris merged commit a20b051 into master Aug 9, 2019
@ahartikainen ahartikainen deleted the feature/89-rload-fn branch August 11, 2019 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants