-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .seq format for DE 16 and Celeritas Camera #11
base: main
Are you sure you want to change the base?
Conversation
…e and from the newer software packaged with the celeritas camera
…andling and xml reading
…reased test coverage slightly
…int statements and improved error message
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #11 +/- ##
==========================================
+ Coverage 84.95% 85.15% +0.20%
==========================================
Files 73 75 +2
Lines 8894 9250 +356
Branches 1955 2022 +67
==========================================
+ Hits 7556 7877 +321
- Misses 876 895 +19
- Partials 462 478 +16
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@sk1p I know we have talked about adding support for the DE Celeritas camera to liberTEM and hyperspy. If you have the chance can you look over this PR? The hardest thing is dealing with the Segment prebuffer for the celeritas camera. I wanted to add support for distributed scheduling using the scheme proposed by @uellue here but due to the nature of the prebuffer the data isn't evenly spaced in the binary file. This makes implementing this in a general way fairly difficult. |
I can have a look - I'd also like to try this with real data, did you manage to upload some to the drop link I gave you some time ago? In general, what is this project's stance on testing with real input data? It could be possible to publish a set of (small-ish) reference data sets on i.e. zenodo and download those in CI runs.
Yeah - in case of uneven spacing, it's probably required to do a sparse search pass over the data, for example by reading the image headers at N positions in the whole data set, and mapping out where it can be split - if I understood you correctly. Or is the coarse structure evenly spaced, i.e. it's possible to calculate offsets to images just from their index? Anyways, instead of just a straight |
Right now the data is all hosted in the tests/de_data/celeritas_data folder. There are smallish (1-20 mb) datasets collected using a couple of different camera modes. These are probably the best data used for testing.
We try to test with real input data as often as we can. That being said the data is included with the package and it might be better to host that somewhere else eventually. I was meaning to create an Issue regarding this.
So the data is structured like this:
Any examples of how you do this? Can you just create function that maps a frame to a offset in the data and then just apply it? |
The longterm idea is to host the files in the repo, but to exclude them from the installation, where they would just be downloaded on demand. I don't remember the name of the package that can do this @ericpre . But would be a good idea to create an issue to put it on the todo. |
rsciio/utils/tools.py
Outdated
def parse_xml(file): | ||
try: | ||
tree = ET.parse(file) | ||
xml_dict = {} | ||
for i in tree.iter(): | ||
xml_dict[i.tag] = i.attrib | ||
# clean_xml | ||
for k1 in xml_dict: | ||
for k2 in xml_dict[k1]: | ||
if k2 == "Value": | ||
try: | ||
xml_dict[k1] = float(xml_dict[k1][k2]) | ||
except ValueError: | ||
xml_dict[k1] = xml_dict[k1][k2] | ||
except FileNotFoundError: | ||
_logger.warning( | ||
msg="File " + file + " not found. Please" | ||
"move it to the same directory to read" | ||
" the metadata " | ||
) | ||
return None | ||
return xml_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have this as a generic utility function? The flattening, cleaning and conversion performed here seems to be specific to the DE metadata XML format. Any reason not to re-use convert_xml_to_dict
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that convert_xml_to_dict
doesn't cope well with human-readable XML, which can have both a .text
and child nodes, so that may need fixes before it can be used.
rsciio/de/api.py
Outdated
ImageBitDepth: int | ||
The bit depth of the image. This should be 16 in most cases | ||
TrueImageSize: int | ||
The size of each frame buffersin bytes. This includes the time stamp and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of each frame buffersin bytes. This includes the time stamp and | |
The size of each frame buffers in bytes. This includes the time stamp and |
rsciio/de/api.py
Outdated
top_mapped = np.memmap(top, offset=offset, dtype=dtypes, shape=total_buffer_frames) | ||
bottom_mapped = np.memmap( | ||
bottom, offset=offset, dtype=dtypes, shape=total_buffer_frames | ||
) | ||
|
||
if lazy: | ||
top_mapped = da.from_array(top_mapped) | ||
bottom_mapped = da.from_array(bottom_mapped) | ||
|
||
array = np.concatenate( | ||
[ | ||
np.flip( | ||
top_mapped["Array"].reshape(-1, *top_mapped["Array"].shape[2:]), axis=1 | ||
), | ||
bottom_mapped["Array"].reshape(-1, *bottom_mapped["Array"].shape[2:]), | ||
], | ||
1, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any examples of how you do this? Can you just create function that maps a frame to a offset in the data and then just apply it?
I think this was the main point you were asking about. It's more of a mapping from a chunk slice to an array. Each array chunk is created from both the top and bottom memory map, which are only created inside of the delayed function. To structure this according to the dask docs on memory mapping, it could look like this (sketch, untested):
def mmap_load_chunk(top, bottom, shape, dtype, offset, sl):
top_map = np.memmap(top, mode='r', shape=shape, dtype=dtype, offset=offset)["Array"]
top_flat = top_map.reshape(-1, *top_map.shape[2:])
top_sliced = top_flat[sl]
bottom_map = np.memmap(bottom, mode='r', shape=shape, dtype=dtype, offset=offset)["Array"]
bottom_flat = bottom_map.reshape(-1, *bottom_map.shape[2:])
bottom_sliced = bottom_flat[sl]
return np.concatenate([
np.flip(top_sliced, axis=1),
bottom_sliced,
], 1)
(with mmap_dask_array
adjusted accordingly)
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the np.concatenate
+np.flip
operation does touch a sizable chunk of data, it may be efficient to replace it with a numba
function that also inlines the application of dark_img
/gain_img
in addition to flipping/concatenation for cache efficiency.
…on-distributed implementations.
Formatting: Applied `black`
…code for easier readability.
…irectly point to files instead of using glob.
…it__` to align with hyperspy#62
# Conflicts: # docs/supported_formats/de.rst # rsciio/de/specifications.yaml # rsciio/tests/test_de.py # rsciio/utils/tools.py
Description of the change
This adds in support for reading the DE 16 and Celeritas cameras.
Some notes about the file format:
DE 16:
*Celeritas
Progress of the PR
- [ ] Add DE 16 support for saving (Potentially?)upcoming_changes
folder (seeupcoming_changes/README.rst
),readthedocs
doc build of this PR (link in github checks)Minimal example of the bug fix or the new feature