-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to return Reader for PixelData element #224
Comments
Interesting use case! I'm not sure if we could do this safely without leaving the DICOM file/input io.Reader open. E.g. you'd call parse, and under this option parse would finish, but one of the elemnts would have an io.Reader to read in the whole pixeldata? Maybe this could be done on demand (e.g. with a file offset stored in that element, so when you read from the reader, we reopen the file and start fetching the reader the PixelData bytes only--this gets complex when input isn't a file, but a generic io.Reader, which is what the current API is)? Or maybe with this option, the input io.Reader is open till the user exhausts the PixelData io.Reader? I think it could be workable, but a bit complex with some weird edge cases I'll need to think about. One far out there option could be (under this special hypothetical option) to allow the user to register a special processing function for PixelData (or maybe any tag). Maybe something like
Not sure if I love this though, and I don't think we could allow it for just any tag, as some elements depend on prior elements to parse correctly. Another option that would save you the CPU (but not the memory) is an option that just copies the PixelData bytes into memory without any processing at all, and you could grab it from that special tag during your workflow. Also, note that this library does support an element by element API with ParseNext() (and a streaming API for PixelData Frames), but still maintains a history of all elements read so far in memory (unfortunately, this is necessary in some cases to "lookback" at previous elements, namely for PixelData and I think some Sequence parsing). It's possible this could be revisited carefully in a "memory light" option / case, but I'll need to think more about what that would look like. Finally, and I'm sure you know this already, but if you're doing a true anonymization workflow, you probably would want to look at the images, because sometimes identifiable patient information can be in the images. |
Feel free to suggest an API if you had something specific in mind. Sounds like you were looking for something like I initially mentioned--some API where the returned element from Parser.Next() has an io.Reader to the PixelData you can use to dump the data to your output file. As mentioned, closing the input correctly in a system like this would be something to think about. In situations where the element-by-element Parser API is used directly, I think we could just leave that up to the user. There are some edge cases though, like what if some other thread advances the input reader the PixelData reader was supposed to read from, etc so I'm not sure how safe this is. Probably easiest thing to do for your case in short term is:
|
I saw there was a PR I think that may be related to this, and put some more thoughts there as well: #223 (review) |
We did introduce a SkipPixelData option, which skips the PixelData entirely (doesn't even load it into memory). However, for a use case like this where some tags are modified and others are to be written back identically, we can maybe introduce another option like dicom.NoParseValue(tag1, tag2, tag3) which will still read in the data into the Dataset, but will try to do as little parsing/processing of the value as possible to save CPU. In some cases it may be trickier to not do any parsing at all (e.g. in nested sequences etc), but at least for Native PixelData there would be a speed up. We can start by implementing support for PixelData first, and then go from there to possibly supporting other tags... WDYT? |
Hi there! I have a draft PR #256 that addresses this to some degree--it skips the processing of NativePixelData in favor of just blindly loading the data into memory (so that it can be roundtripped out later). Ultimately I think the right long term solution might be to instead have an option that indicates only certain elements should be processed, or a lazy load type of situation. However, for now, it seems like this is really most important for PixelData so I figure it is good to start there. |
…rving roundtrip read/write (#256) This addresses _some_ of the discussion in #224. The SkipProcessingPixelDataValue option causes the PixelData bytes to be read into the element, but not processed any further. This option provides an option to save the CPU cycles processing NativePixel data but still gives a roundtrip-able Dataset. If you want to save both CPU and Memory, you can instead use the SkipPixelData option, which doesn't load the PixelData bytes into memory at all. In the future, we should consider an option to lazy load/process all Element values if possible.
We currently are using DCMTK to
PatientBirthDate
andPatientName
with non-identifiable value.Current issue is 1+ Gb DICOMs which encode a large sequence of frames. It causes an unreasonable amount of CPU and RAM to read
PixelData
s from such DICOMs, simply to write them to an anonymised file without any processing.This library could be a great fit for anonymisation workflow if, for example, it would allow to It would be great if this library supported returning an
io.Reader
for the wholePixelData
element, so that non-image elements could be interpreted but image elements could be streamed to output file without change.The text was updated successfully, but these errors were encountered: