Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #41 add error layer #45

Merged
merged 5 commits into from
Mar 4, 2024
Merged

issue #41 add error layer #45

merged 5 commits into from
Mar 4, 2024

Conversation

JuergenSchreiber
Copy link
Contributor

I added a function add_error_term to the Image class, treat 3d error arrays including header update

Copy link
Contributor

@semaphoreP semaphoreP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, Juergen! Because this is such a vital part of the code/backbone, I'm scrutinizing it a bit more than normal. I added a few comments to clean up the code for the future.

I also think we need to make a design choice with @maxwellmb and possibly others. Do we want to be able to handle both 2-D and 3-D error formats, or do we require all corgidrp data to have 3-D error formats. I'm in favor of the latter just so we simplify which use cases we need to handle.

corgidrp/data.py Outdated Show resolved Hide resolved
corgidrp/data.py Outdated Show resolved Hide resolved
corgidrp/data.py Show resolved Hide resolved
corgidrp/data.py Outdated

new_err = np.resize(err, tuple(sh))
#append new error as layer on 3D cube
new_err[sh[0] - 1,:, :] = error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think np.append(self.err, [error], axis=0) is shorter and makes the code more readable than the several lines above. Also please avoid naming things like err = self.err and having an input named error. It makes it difficult to track which variable belongs to which. I would suggest just to use self.err whenever possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use input_error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will look into this. My problem was that np.append only takes arrays with the same dimension, but I try again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the little snippet of code I provided above should work. The [error] should give the extra 3rd dimension that makes the shapes match.


#test copy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point, it would be useful to break up this giant test into several smaller unit tests: instead of one test_error_dq, we can have a test_err_dq_creation, test_copy, test_add_errors, test_dataset_err_dq, test_masked_arr. It would be cleaner, and also make the tests easier to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind also comment it up a bit so that its clear what the checks are all meant to be checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will update that.

corgidrp/data.py Outdated
#append new error as layer on 3D cube
new_err[sh[0] - 1,:, :] = error

comb_err = np.sqrt(new_err[0,:,:]**2 + error**2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this for a bit. There are cases where we're going to divide by something (e.g. a flat field, the K-gain, even the EMGain) with an error on them as well and so when combining things like that we won't be able to manipulate the errors directly like this. How do we want to handle that? Do we need separate functions? How do we track the errors in that case? Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose one way is to have a "Layer_N Type" (or something smaller) header keyword that clarifies it. and if we ever need to re-build the combined_error then we also read the type.

Copy link
Contributor Author

@JuergenSchreiber JuergenSchreiber Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very valuable comment. We left that out for the sake of simplicity. I have to think about it. To add a multiplicative error we would need not only the error of the data (e.g. flat) but also the data itself to calculate the error propagation, so we would need more optional parameters. The other way would be to do this directly in the e.g. flatfield step and use this function only for additive errors. We will also have the case where the signal/data is multiplied with a constant (e.g. GAIN, Flux calibration), also the errors should be multiplicated with that constant. What about to have functions for all these cases: add_additive_error, add_multiplicative_error, multiply_error..., just brainstorming. Error propagation is generally a complex thing (JWST pipeline hasn't solve it correctly up to now in my opinion)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like there are possibly a few different things going on:

  1. If we change the units of the data (e.g., by a scale factor), the errors should scale too. I think the function that does this scaling should obviously also change the units of the errors. If it is just a single scale factor, the error on the scale factor doesn't change the SNR of the data. It merely changes the "photometric calibration" of the units. I think we should keep the error map to track the SNR of the data. And we can use header keywords, or something else to track errors that need to be accounted for in the photometric calibration.
  2. For the flat field correction, you can imagine each pixel has a different error on the multiplicative factor. However, we can rewrite a multiplicative error as just an error to be added in quadrature (via standard error propagation equations) with no loss in information. Thus, I don't think this requires special handling.
  3. For something like gain, the error on the gain affects the photon noise calculation. I would say the error on the photon noise error is probably small enough for us to not keep track of, but it certainly could just be an extra slice in the error map.

Overall, it feels like to me we can handle it in our current framework. Additionally, I think these are probably small things that we shouldn't focus on now. I'd rather cross these bridges in the future, after we get the main features of the DRP complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a good table of error propagation for different functions (the covariances can be neglected) in:
https://en.wikipedia.org/wiki/Propagation_of_uncertainty
I think I just keep the additive function as is for now and we come to more complex things if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the conversation on this. I agree that we can make it work in the current framework. Users will have to make sure to treat everything correctly as we go forward within the individual functions. Including making sure to multiply/divide the error terms appropriately and add the error terms. With the current framework, we'd have to keep track of that in them primary header's history only, as the error headers aren't accessed via e.g. the update_after_processing_step function. I think that's probably ok though. @JuergenSchreiber, @JuergenSchreiber any last thoughts on that?

@JuergenSchreiber
Copy link
Contributor Author

Thanks for doing this, Juergen! Because this is such a vital part of the code/backbone, I'm scrutinizing it a bit more than normal. I added a few comments to clean up the code for the future.

I also think we need to make a design choice with @maxwellmb and possibly others. Do we want to be able to handle both 2-D and 3-D error formats, or do we require all corgidrp data to have 3-D error formats. I'm in favor of the latter just so we simplify which use cases we need to handle.

Thanks for reviewing, the question 2 or 3-D, I think now both is possible, but I find it a bit awkward as a users to have a 3D error array on a 2D signal array to initialize it.

@JuergenSchreiber
Copy link
Contributor Author

I pushed my corrections of add_error, so please review again.
I kept add_error_term() only for additive errors for now, I think we can taggle the other sorts of error propagation as needed.
A bit annoying is that I had to keep the possibility of a 3d err input due to the copying functionality. But I could change this and do image.err = new_err in the copy function. What do you think.

Copy link
Contributor

@semaphoreP semaphoreP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies.. I think I lead you in the wrong direction saying we should only accept a 2-D error array, because I forgot the copy command uses the same constructor. I think we should actually go in a different direction and make the init() function bigger and accept the err/dq arrays as well (so that the copy command is more natural).

corgidrp/data.py Outdated
@@ -296,6 +306,8 @@ def copy(self, copy_data=True):
# annoying, but we got to manually update some parameters. Need to keep track of which ones to update
new_img.filename = self.filename
new_img.filedir = self.filedir
new_img.errhd = self.errhd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok. I forgot about the fact copy() needs to access the __init__() function. Perhaps I suggested going in the wrong direction here.

First, I think we should copy() the err/dq headers as well for completeness.
Second, I think given we have to set the err/dq headers anyways, we might as well make the err/dq headers optional arguments into the Image.__init__() function.

Also, I'm wondering how the copy() code works when you pass in an 3-D error, but __init__ assumes it is 2-D.

tests/test_err_dq.py Show resolved Hide resolved
@JuergenSchreiber
Copy link
Contributor Author

Thanks for your comments. I tried to update accordingly, so please check again. err_hdr, dq_hdr is now an optional parameter.

Copy link
Contributor

@semaphoreP semaphoreP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making those changes! I think it looks good.

@semaphoreP semaphoreP merged commit d517d26 into main Mar 4, 2024
1 check passed
@semaphoreP semaphoreP deleted the add_error branch March 4, 2024 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants