-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oslo method errors #187
base: master
Are you sure you want to change the base?
Oslo method errors #187
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
setup.py
Outdated
@@ -171,7 +171,8 @@ def write_version_py(filename='ompy/version_setup.py'): | |||
"uncertainties>=3.0.3", | |||
"tqdm", | |||
"pathos", | |||
"pybind11>=2.6.0" | |||
"pybind11>=2.6.0", | |||
"pymc3>=3.11.2,<4.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making pymc3, pymulintest (...) optional later on? -> So one could have a leightweight version of ompy reading and plotting spectra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about the same. I'm not sure exactly how this is best solved. Maybe that could be the goal for next major version of the OMpy project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made an issue. See #195.
error_estimator (ErrorFinder): The algorithm used to estimate the | ||
relative errors of the extracted NLD and γSF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistent naming: error_estimator
? -> ErrorEstimator
class...?
rel_err_missing (float): Relative error used for points that cannot be | ||
estimated by error_estimator object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when would that be the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes if the statistics are low some of the realizations may have different number of points in the extracted NLD and GSF (values that are not nan).
regenerate: Optional[bool] = None): | ||
"""Decompose each first generation matrix in an Ensemble | ||
|
||
If `regenerate` is `True` it saves the extracted nld and gsf to file, | ||
or loads them if already generated. Exposes the vectors in the | ||
attributes self.nld and self.gsf. | ||
|
||
If the error_estimator attribute or argument is set then relative | ||
errors will be estimated using the provided algorithm. Points in the | ||
nld and gsf that the algorithm are unable to estimate will be set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: /Relative error/ of points (...) will be set to self.rel_err_missing
.
nld.save(self.path / f'nld_{i}.npy') # Overwrite with errors! | ||
gsf.save(self.path / f'gsf_{i}.npy') # Overwrite with errors! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the meaning of the inline comment? is this supposed to happen here (but not yet...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of attribute error_estimator
is set or not, the extracted NLD/GSF will be written to disk here: https://github.com/vetlewi/ompy/blob/1ab2618ec0c2b2e7b8c5935242183587e3451f00/ompy/extractor.py#L169-L170
If error_estimator
is set then these two lines will overwrite the generated files on the disk. Could change the comment to Update files on disk with errors
or something similar
This class uses pyMC3 to calculate the relative errors of the data points | ||
in the extracted NLDs and γSFs. The class implements two different models, | ||
one logarithmic and one linear. The logarithmic model will usually be more | ||
stable and should be used if the linear doesn't converge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add something about the article (to be submitted)
ompy/error_finder.py
Outdated
Tuple of nld energy points, gsf energy points and the observed | ||
matrix for the NLD and γSF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the observed matrix for the nld and gsf? Outdated? The return in definitively different now
ompy/error_finder.py
Outdated
return vec_all_common | ||
|
||
def condition_data(self, _nlds: List[Vector], _gsfs: List[Vector], | ||
) -> Tuple[ndarray, ndarray, ndarray, ndarray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return signature inconsistent (see below)?
ompy/error_finder.py
Outdated
else: | ||
return nld_rel_err, gsf_rel_err | ||
|
||
def keep_only(self, vecs: List[Vector]) -> List[Vector]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to below condition_data
?
ompy/error_finder.py
Outdated
# Make a copy of the values arrays | ||
nld_array = [nld.values.copy() for nld in nlds] | ||
gsf_array = [gsf.values.copy() for gsf in gsfs] | ||
|
||
del nld_array[n] | ||
del gsf_array[n] | ||
|
||
nld_array = np.array(nld_array) | ||
gsf_array = np.array(gsf_array) | ||
|
||
# Set the observed data | ||
nld_obs.append(nld.values[idx_nld]/nld_array) | ||
gsf_obs.append(gsf.values[idx_gsf]/gsf_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is somewhat funky. Why do you need to manually delete the array -- but after setting it by a copy?
I if I understand this correctly, it is because you want to get rid of the reference dataset. Maybe some comments would be appreciated here. - I'm very confused about this section:
Why/ what does it do? Why is it a devision (...)? Is it
# Set the observed data nld_obs.append(nld.values[idx_nld]/nld_array) gsf_obs.append(gsf.values[idx_gsf]/gsf_array)
q
in the article, and not the nld (gsf)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored this entire part of the code. Should be much easier to follow now. Please see the format_data
function (https://github.com/vetlewi/ompy/blob/621c95c4797bf11fefd6a95a032c9296fc0c7054/ompy/error_finder.py#L282-L322)
ompy/error_finder.py
Outdated
nld_obs (ndarray): The observation tensor of the NLD | ||
gsf_obs (ndarray): The observation tensor of the γSF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where and everywhere else: Would it be better to say _ref and the reference tensor, instead of observed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored. Should be more understandable.
ompy/error_finder.py
Outdated
nld_obs = np.array(nld_obs) | ||
gsf_obs = np.array(gsf_obs) | ||
|
||
idx_coef_nld = np.repeat(np.arange(N-1), M_nld).reshape(N-1, M_nld) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have got no idea what idx_coef_nld
is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idx_coef_nld
is just a masking/indexing that are used in the pyMC3
model to broadcast the coef_mask_nld
, values_mask_nld
, coef_mask_gsf
and values_mask_gsf
.
median (bool, optional): If the resulting relative errors should be | ||
the mean or the median. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm falling out here: What do you mean by mean or median? This does not seem to be described in the article?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What to adopt as the value for the relative uncertainty. The mean (average) or the median (50% percentile). This can probably be removed. If the model works as expected these should coincide, I think. If anyone wants to dive deeper into the results they should probably look at the samples directly.
mid = np.median if median else np.mean | ||
nld_rel_err = Vector(E=E_nld, values=mid(trace['σ_ρ'], axis=0), | ||
std=np.std(trace['σ_ρ'], axis=0), units='MeV') | ||
gsf_rel_err = Vector(E=E_gsf, values=mid(trace['σ_f'], axis=0), | ||
std=np.std(trace['σ_f'], axis=0), units='MeV') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, not quite sure about this step, yet.
Co-authored-by: Fabio Zeiser <[email protected]>
Co-authored-by: Fabio Zeiser <[email protected]>
Co-authored-by: Fabio Zeiser <[email protected]>
The prior parameters for σ_ρ and σ_f (`lam` & `mu`) can now be modified by the user.
I've added an attribute to make it simple to suppress warnings
A slightly different expression is used as it gives slightly better numerical accuracy at the extremes (q=0 and q=1).
6651bd0
to
3a72c2f
Compare
I've removed the linear version for now. I don't think I want to spend time on making sure everything works correctly just yet. Might come back at a later time when I'm confident that everything else are as it should be.
I've refactored the `ErrorFinder` class significantly to make it much less complex. It should be easier to follow the code and the paper.
The normalizers will automatically remove any nan's in the input GSF or NLD. This will ensure that the normalizers actually gives useful results. It should be considered if the user should be informed if any nan's were removed. Maybe add best solved by adding an optional `logger` argument to the `cut_nan()` method that will be used to inform the user if there is nans in the vector?
This PR introduces a new method for estimating the relative errors of the individual points of the NLD and gSF. A more detailed description on how it works, etc. will be made available once ready for merge.
The PR is still work in progress