Oslo method errors #187

vetlewi · 2021-09-27T11:46:54Z

This PR introduces a new method for estimating the relative errors of the individual points of the NLD and gSF. A more detailed description on how it works, etc. will be made available once ready for merge.

The PR is still work in progress

review-notebook-app · 2021-09-27T11:46:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ompy/dist/fermi_dirac.py

ompy/dist/fermi_dirac_pymc4.py

fzeiser · 2021-09-29T13:25:03Z

setup.py

@@ -171,7 +171,8 @@ def write_version_py(filename='ompy/version_setup.py'):
 "uncertainties>=3.0.3",
 "tqdm",
 "pathos",
- "pybind11>=2.6.0"
+ "pybind11>=2.6.0",
+ "pymc3>=3.11.2,<4.0"


What do you think about making pymc3, pymulintest (...) optional later on? -> So one could have a leightweight version of ompy reading and plotting spectra.

I've thought about the same. I'm not sure exactly how this is best solved. Maybe that could be the goal for next major version of the OMpy project?

Made an issue. See #195.

release_note.md

fzeiser · 2021-09-29T13:28:23Z

ompy/extractor.py

+        error_estimator (ErrorFinder): The algorithm used to estimate the
+            relative errors of the extracted NLD and γSF.


Consistent naming: error_estimator? -> ErrorEstimator class...?

fzeiser · 2021-09-29T13:29:26Z

ompy/extractor.py

+        rel_err_missing (float): Relative error used for points that cannot be
+            estimated by error_estimator object.


when would that be the case?

Sometimes if the statistics are low some of the realizations may have different number of points in the extracted NLD and GSF (values that are not nan).

fzeiser · 2021-09-29T13:30:44Z

ompy/extractor.py

                     regenerate: Optional[bool] = None):
        """Decompose each first generation matrix in an Ensemble

        If `regenerate` is `True` it saves the extracted nld and gsf to file,
        or loads them if already generated. Exposes the vectors in the
        attributes self.nld and self.gsf.

+        If the error_estimator attribute or argument is set then relative
+        errors will be estimated using the provided algorithm. Points in the
+        nld and gsf that the algorithm are unable to estimate will be set to


Suggestion: /Relative error/ of points (...) will be set to self.rel_err_missing.

fzeiser · 2021-09-29T13:33:02Z

ompy/extractor.py

+                nld.save(self.path / f'nld_{i}.npy')  # Overwrite with errors!
+                gsf.save(self.path / f'gsf_{i}.npy')  # Overwrite with errors!


what is the meaning of the inline comment? is this supposed to happen here (but not yet...)?

Regardless of attribute error_estimator is set or not, the extracted NLD/GSF will be written to disk here: https://github.com/vetlewi/ompy/blob/1ab2618ec0c2b2e7b8c5935242183587e3451f00/ompy/extractor.py#L169-L170

If error_estimator is set then these two lines will overwrite the generated files on the disk. Could change the comment to Update files on disk with errors or something similar

ompy/error_finder.py

fzeiser · 2021-09-29T13:40:00Z

ompy/error_finder.py

+    This class uses pyMC3 to calculate the relative errors of the data points
+    in the extracted NLDs and γSFs. The class implements two different models,
+    one logarithmic and one linear. The logarithmic model will usually be more
+    stable and should be used if the linear doesn't converge.


Add something about the article (to be submitted)

fzeiser · 2021-09-29T13:46:05Z

ompy/error_finder.py

+            Tuple of nld energy points, gsf energy points and the observed
+            matrix for the NLD and γSF.


What is the observed matrix for the nld and gsf? Outdated? The return in definitively different now

fzeiser · 2021-09-29T13:46:25Z

ompy/error_finder.py

+        return vec_all_common
+
+    def condition_data(self, _nlds: List[Vector], _gsfs: List[Vector],
+                       ) -> Tuple[ndarray, ndarray, ndarray, ndarray]:


Return signature inconsistent (see below)?

ompy/error_finder.py

fzeiser · 2021-09-29T13:53:18Z

ompy/error_finder.py

+        else:
+            return nld_rel_err, gsf_rel_err
+
+    def keep_only(self, vecs: List[Vector]) -> List[Vector]:


Move to below condition_data?

ompy/error_finder.py

fzeiser · 2021-09-29T14:12:45Z

ompy/error_finder.py

+            # Make a copy of the values arrays
+            nld_array = [nld.values.copy() for nld in nlds]
+            gsf_array = [gsf.values.copy() for gsf in gsfs]
+
+            del nld_array[n]
+            del gsf_array[n]
+
+            nld_array = np.array(nld_array)
+            gsf_array = np.array(gsf_array)
+
+            # Set the observed data
+            nld_obs.append(nld.values[idx_nld]/nld_array)
+            gsf_obs.append(gsf.values[idx_gsf]/gsf_array)


This is somewhat funky. Why do you need to manually delete the array -- but after setting it by a copy?
I if I understand this correctly, it is because you want to get rid of the reference dataset. Maybe some comments would be appreciated here.

I'm very confused about this section:
# Set the observed data nld_obs.append(nld.values[idx_nld]/nld_array) gsf_obs.append(gsf.values[idx_gsf]/gsf_array)
Why/ what does it do? Why is it a devision (...)? Is it q in the article, and not the nld (gsf)?

I've refactored this entire part of the code. Should be much easier to follow now. Please see the format_data function (https://github.com/vetlewi/ompy/blob/621c95c4797bf11fefd6a95a032c9296fc0c7054/ompy/error_finder.py#L282-L322)

fzeiser · 2021-09-29T14:16:17Z

ompy/error_finder.py

+            nld_obs (ndarray): The observation tensor of the NLD
+            gsf_obs (ndarray): The observation tensor of the γSF


Where and everywhere else: Would it be better to say _ref and the reference tensor, instead of observed?

I've refactored. Should be more understandable.

fzeiser · 2021-09-29T14:20:29Z

ompy/error_finder.py

+        nld_obs = np.array(nld_obs)
+        gsf_obs = np.array(gsf_obs)
+
+        idx_coef_nld = np.repeat(np.arange(N-1), M_nld).reshape(N-1, M_nld)


I have got no idea what idx_coef_nld is

The idx_coef_nld is just a masking/indexing that are used in the pyMC3 model to broadcast the $D$, $F$ and $\alpha$ parameters to the correct shape. I've renamed these as coef_mask_nld, values_mask_nld, coef_mask_gsf and values_mask_gsf.

fzeiser · 2021-09-29T14:23:41Z

ompy/error_finder.py

+            median (bool, optional): If the resulting relative errors should be
+                the mean or the median.


I'm falling out here: What do you mean by mean or median? This does not seem to be described in the article?

What to adopt as the value for the relative uncertainty. The mean (average) or the median (50% percentile). This can probably be removed. If the model works as expected these should coincide, I think. If anyone wants to dive deeper into the results they should probably look at the samples directly.

ompy/error_finder.py

fzeiser · 2021-09-29T14:31:11Z

ompy/error_finder.py

+        mid = np.median if median else np.mean
+        nld_rel_err = Vector(E=E_nld, values=mid(trace['σ_ρ'], axis=0),
+                             std=np.std(trace['σ_ρ'], axis=0), units='MeV')
+        gsf_rel_err = Vector(E=E_gsf, values=mid(trace['σ_f'], axis=0),
+                             std=np.std(trace['σ_f'], axis=0), units='MeV')


As mentioned above, not quite sure about this step, yet.

ompy/error_finder.py

release_note.md

Co-authored-by: Fabio Zeiser <[email protected]>

The prior parameters for σ_ρ and σ_f (`lam` & `mu`) can now be modified by the user.

I've added an attribute to make it simple to suppress warnings

A slightly different expression is used as it gives slightly better numerical accuracy at the extremes (q=0 and q=1).

I've removed the linear version for now. I don't think I want to spend time on making sure everything works correctly just yet. Might come back at a later time when I'm confident that everything else are as it should be.

I've refactored the `ErrorFinder` class significantly to make it much less complex. It should be easier to follow the code and the paper.

The normalizers will automatically remove any nan's in the input GSF or NLD. This will ensure that the normalizers actually gives useful results. It should be considered if the user should be informed if any nan's were removed. Maybe add best solved by adding an optional `logger` argument to the `cut_nan()` method that will be used to inform the user if there is nans in the vector?

fzeiser added needs release note Suggestion Suggestion for new feature/changes labels Sep 29, 2021