-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/tag info codegen #169
base: main
Are you sure you want to change the base?
Refactor/tag info codegen #169
Conversation
Thank you @bpeake-illuscio @peake100, apologies for this slipping through the cracks. I think much of this is on the right track. I'm going to take a closer look, and may fork your branch to add commits to make various changes and updates in the next couple of weeks (unless you'd like to make them yourself, but I figure you may be busy no pressure at all). Thank you again for all your contributions! |
Hey! Thanks so much for reaching out. I'm no longer at Illuscio, so this is my current user. I probably don't have the bandwidth for this right now, but I appreciate you reaching out and am happy to field questions about what I did if you have any! |
Sounds good, and totally understood! Will do :) Thanks again! |
This PR is resolving #147. It updates the generate_tag_definitions.py script to read the DICOM standard JSON dump from the Innolitics repo. They provide a few extra fields, so we introduce them in Tag definition, namely, Keyword and Retired. Keyword represents what Name used to represent, see comments on the field definitions for details. Innolitics also captures all possible VRs for tags that allow multiple VR values, so I updated the writer verification code to consider that. That ended up resolving #299 as well. This is a breaking change because some tag variables have been deleted or renamed. In particular, the command tags (group 0000) have been all deleted since Innolitics doesn't provide them. I thought that was okay because these are only related to DIMSE and should never be present in a DICOM file. I can put them back somehow if you think we should keep them, but maybe we could sort that out if we ever work on #181. But mostly, tags were added or unchanged. I ended up writing this from scratch and not branching from #169 because I thought that was a little bit too complicated for this. But we could reconsider that design if there are requests for supporting private data dictionaries. Misc: fixes some warnings (i.e. removed deprecated ioutil, etc.), added a helper Uint32 method.
Hello!
I have taken a crack here at #147. I'll break this summary into sections where I discuss the reasoning behind certain choices.
Overview of Changes
Innolitics Git Submodule
issue #147 mentions using the JSON dump found in the Innolitics Dicom Standard repo. Rather than downloading and copy/pasting the attributes.json, I decided to add the entire repo as a git submodule.
The submodule lives in /pkg/tag/dicom-standard, and here arre the tradeoffs of using this approach, IMO:
Pros
Cons
Overall I feel the pros are worth the cons.
Codegen migrated from Python to Go
I took the opportunity to move the codegen feature from Python to Go. This lowers the barrier to entry for any developers that wish to make PRs around the codgen feature, but may not know python. Go compiles quickly enough that I do not see this being an issue, especially with how few and far between regeneration of the code is going to be need.
The codgen tool now lives in /pkg/tag/codgen.
I'll go into more about the structure of this new codegen below.
Added Keyword-Indexed Map
Added tagDictByKeyword that indexes tag.Info objects by Info.Name, allowing us to do a map-lookup when calling tag.FindByName, rather than iterating over tagDict.
Added Codegen to Makefile
You can now type make codegen to run go generate. Since there were already make commands for tests I figured I would add one for this too.
Challenges
The main challenge was that the Innolitics JSON dump only contains tags for the DICOM file spec, and does not include tags for 0x0000 Group, which is used by the DIMSE / Dicom-Net spec.
Fo this PR, I have retained this group from the original CSV currently being used for code generation, and paired it down to just items in group 0x0000. However, this meant reading from multiple sources of truth, and the architecture of the code reflects that.
New Codgen Architecture.
Interfaces
The new codegen module relies on two interfaces to do the Heavy lifting:
TagReader
This interface reads tag information from a source file. A master implementation of TagReader is defined to pull from multiple TagReaders and return parsed tags until all child readers are exhausted. This means that if we add multiple sources, we can just define a new TagReader and register it.
CodeWriter
The CodeWriter interface is responsible for writing the codegen for a single go file. A Master implementation of this interface distributes each event to all child CodeWriters.
Basic Flow
The above interfaces allow the main flow to be simple:
This architecture allows us to add new generated files or new tag metadata sources by defining and registering new readers and writers.
Open Questions
ACR_NEMA_2C_CoefficientsSDDN
. These are not present in the Innolitics jump and I am afraid I am not quite sure where they come from. I can copy and paste them all from the old CSV into the current dimse.csv, but I would like to understand more about what those values are, why their keywords conform to a different convention, etc.Looking Forward
If you like this approach, I think I would like to start adding some more information into the tag.Info struct that is currently not being captured or fully parsed. Some possibilities include:
Retired bool
field to tag.Info, and including them.'US or SS'
. Right now I am pulling the second value, as that aligned with what the previous CSV did, but I think we should consider capturing both for completeness, especially if users want to use this lib to validate DICOM writes.Let me know what you think of this PR. If you like it, I think all of the above will be pretty low-hanging fruit and I am happy to knock them out in a series of PRs that build on this one.