-
Notifications
You must be signed in to change notification settings - Fork 1
Add log stream name mapping support #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, f144 log data used Kafka source names directly as stream names (e.g., 'BIFRO-DtCar:MC-RotZ-01:Mtr.RBV'). This change adds support for mapping these source names to friendlier internal names (e.g., 'detector_rotation'), consistent with how detectors and monitors are handled. Changes: - Replace `log_topics` parameter with `logs: StreamLUT` in StreamMapping - Add `_make_dev_logs` helper in _ess.py for dev mode - Update `make_dev_stream_mapping` to accept `log_names` parameter - Add Bifrost log mappings for detector_rotation and sample_rotation - Update dummy instrument to include log mappings for timeseries tests - Add tests for StreamMapping.log_topics property Original prompt: I am trying to update bifrost/specs.py to use the real values in f144_attribute. The real source_name we need is something found in the file (extracted info) - for example, BIFRO-DtCar:MC-RotZ-01:Mtr.RBV. Does this support also name mapping for NXlog (f144)? OR is the f144_attribute_registry setup insufficient? Or is it independent, applied after stream-mapping? Follow-up: I think we should take option B. Please implement this in a new branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add tooling to help generate f144_log_streams configuration from NeXus files:
- Add `units` field to StreamInfo dataclass, extracted from 'value' datasets
- Add `suggest_internal_name()` to derive internal names from group paths
- Add `filter_f144_streams()` to filter by writer module, topic, and patterns
- Add `generate_f144_log_streams_code()` to generate Python dict code
- Add CLI options: --f144, --topic, --exclude, --generate, --var-name
Refactor bifrost configuration to use combined f144_log_streams dict:
- Define `f144_log_streams` mapping internal name -> {source, units}
- Derive `f144_attribute_registry` from `f144_log_streams`
- Derive StreamLUT in `_make_bifrost_logs()` from `f144_log_streams`
This keeps source names, internal names, and units co-located in one place.
Example usage:
python -m ess.livedata.nexus_helpers file.hdf --generate --topic bifrost_motion
Original prompt: Can you extend the nexus_helpers.py script to also extract
and list units? Or, going beyond that, it would be good if we could turn this
into a better tool to generate code for attr registry and log_source_mapping.
Is having the two in two parts convenient?
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for mapping Kafka f144 log stream source names (e.g., EPICS PV names) to internal stream names in ESSlivedata, making the system consistent with how detectors and monitors are already handled.
Key changes:
- Modified
StreamMappingto accept alogs: StreamLUTparameter instead oflog_topics: set, withlog_topicsbecoming a computed property - Enhanced
nexus_helpers.pywith units extraction from NeXus files and CLI tools for generating f144 stream configuration code - Added comprehensive Bifrost f144 log stream mappings (24 streams total) for motion and sample environment data
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/ess/livedata/kafka/stream_mapping.py |
Replaced log_topics parameter with logs: StreamLUT, making log_topics a computed property consistent with other stream types |
src/ess/livedata/nexus_helpers.py |
Added units extraction to StreamInfo, new helper functions (suggest_internal_name, filter_f144_streams, generate_f144_log_streams_code), and enhanced CLI with argparse for code generation |
src/ess/livedata/config/instruments/bifrost/specs.py |
Added comprehensive f144_log_streams dictionary with 24 entries for motion and sample environment streams, deriving f144_attribute_registry from it |
src/ess/livedata/config/instruments/bifrost/streams.py |
Added _make_bifrost_logs() function to create StreamLUT from f144_log_streams, integrated logs into both DEV and PROD configurations |
src/ess/livedata/config/instruments/dummy/streams.py |
Updated DEV configuration to include log_names parameter using f144_attribute_registry keys |
src/ess/livedata/config/instruments/_ess.py |
Added _make_dev_logs() helper function and updated make_dev_stream_mapping() to accept log_names parameter |
tests/config/streams_test.py |
Added tests for StreamMapping.log_topics property behavior with and without logs |
tests/nexus_helpers_test.py |
Added comprehensive tests for new functions including suggest_internal_name, filter_f144_streams, and generate_f144_log_streams_code |
Comments suppressed due to low confidence (1)
src/ess/livedata/config/instruments/dummy/streams.py:44
- The dummy instrument PROD configuration should include a
logsparameter in theStreamMappingto be consistent with the DEV configuration and the pattern used in Bifrost. The dummy instrument hasf144_attribute_registrydefined inspecs.pywith 'motion1', and the DEV configuration useslog_names=list(instrument.f144_attribute_registry.keys()), but the PROD configuration (lines 40-44) doesn't include alogsparameter. Consider adding a_make_dummy_logs()function similar to_make_bifrost_logs()and including it in the PROD StreamMapping.
StreamingEnv.PROD: StreamMapping(
**make_common_stream_mapping_inputs(instrument='dummy'),
detectors=_make_dummy_detectors(),
area_detectors=_make_dummy_area_detectors(),
),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ess/livedata/nexus_helpers.py
Outdated
| if i > 0: | ||
| name = parts[i - 1] | ||
| # Remove common suffixes like '_r0', '_01' etc. | ||
| import re |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import re statement should be moved to the top of the file with other imports rather than being imported inside the function. This follows Python best practices and improves code readability. The same issue exists in filter_f144_streams (line 162).
src/ess/livedata/nexus_helpers.py
Outdated
| exclude_patterns: | ||
| List of regex patterns to exclude from results (matched against group_path). | ||
| """ | ||
| import re |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import re statement should be moved to the top of the file with other imports rather than being imported inside the function. This follows Python best practices and improves code readability.
src/ess/livedata/nexus_helpers.py
Outdated
| The generated dictionary maps internal names to source and units info, | ||
| which can be used to derive both f144_attribute_registry and StreamLUT. |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should mention that the generated dictionary includes 'topic' in addition to 'source' and 'units'. Currently it says "maps internal names to source and units info" but should say "maps internal names to source, units, and topic info" to accurately reflect the intended output structure.
src/ess/livedata/nexus_helpers.py
Outdated
| info = by_name[name] | ||
| units = info.units or 'dimensionless' | ||
| lines.append( | ||
| f" '{name}': {{'source': '{info.source}', 'units': '{units}'}}," |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generate_f144_log_streams_code function does not include the 'topic' key in the generated dictionary entries. However, the actual f144_log_streams in bifrost/specs.py includes 'topic', and _make_bifrost_logs() in bifrost/streams.py (line 72) expects it to be present. The generated code should include 'topic': '{topic}' in each dictionary entry to match the actual usage pattern.
| f" '{name}': {{'source': '{info.source}', 'units': '{units}'}}," | |
| f" '{name}': {{'source': '{info.source}', 'units': '{units}', 'topic': '{topic}'}}," |
tests/nexus_helpers_test.py
Outdated
| ] | ||
| code = generate_f144_log_streams_code(infos, topic='motion') | ||
| assert "f144_log_streams: dict[str, dict[str, str]] = {" in code | ||
| assert "'motor': {'source': 'MOTOR:PV:RBV', 'units': 'degrees'}" in code |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test expectation should be updated to include the 'topic' key in the generated code output. The actual f144_log_streams dictionary used in production (e.g., bifrost/specs.py) includes the 'topic' key, and the _make_bifrost_logs() function expects it. The assertion should check for something like "'motor': {'source': 'MOTOR:PV:RBV', 'units': 'degrees', 'topic': 'motion'}" instead.
| assert "'motor': {'source': 'MOTOR:PV:RBV', 'units': 'degrees'}" in code | |
| assert "'motor': {'source': 'MOTOR:PV:RBV', 'units': 'degrees', 'topic': 'motion'}" in code |
- Move `import re` to module-level imports instead of inside functions - Update docstring to mention that generated dict includes topic info - Include 'topic' key in generated f144_log_streams dictionary entries - Update test assertions to match new output format with topic Addresses review comments from PR #589. Original prompt: Please use a new worktree to address comments in a #589. Commit and push, then cleanup worktree. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
| }, | ||
| # Sample environment streams (topic: bifrost_sample_env) | ||
| 'heater_1': { | ||
| 'source': 'YMIR-SEE:SE-LS336-004:HTR1', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange to see Ymir stuff in here? (or maybe not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from the latest coda file.
| # Combined f144 log stream configuration. | ||
| # Maps internal name -> {source: Kafka source name, units: unit string, topic: topic} | ||
| # Generated using: python -m ess.livedata.nexus_helpers <file> --generate --topic <t> | ||
| f144_log_streams: dict[str, dict[str, str]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a mapping only applicable to Bifrost, or will other instruments follow later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, others will follow later, but every instrument will have its own unique mapping (or maybe with some overlap, for shared equipment).
src/ess/livedata/nexus_helpers.py
Outdated
| if i > 0: | ||
| name = parts[i - 1] | ||
| # Remove common suffixes like '_r0', '_01' etc. | ||
| name = re.sub(r'_[rt]\d+$', '', name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So above, in the f144_log_streams mapping, there are entries like temperature_1, temperature_2.
Can removing the suffixes lead to duplicate suggested names?
Or did I not follow the logic correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could, yes. But probably only if ECDC decided to store logs with identical names in different NeXus groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that's not impossible? Something like 'temperature' is quite a common name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think you are right, the main issue is actually transformation chains, which those suffixes represent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I forgot about transformation chains!
| "Usage: python -m ess.livedata.nexus_helpers <nexus_file.hdf>\n" | ||
| ) | ||
| sys.exit(1) | ||
| parser = argparse.ArgumentParser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Suffixes like '_r0', '_t0' etc. represent transformation chains in NeXus, not arbitrary naming conventions. These should be preserved in the suggested internal names. Changes: - Remove regex-based suffix stripping from suggest_internal_name() - Update docstring to remove misleading example about suffix removal - Rename tests test_removes_r0_suffix → test_preserves_r0_suffix - Rename tests test_removes_t0_suffix → test_preserves_t0_suffix - Update test expectations to preserve suffixes Note: The bifrost f144_log_streams configuration was generated with the old code that stripped suffixes. However, examining the current stream names (attenuator_1, temperature_0, etc.), none appear to match the pattern '_[rt]\d+$' that was being stripped, so no regeneration is needed at this time. If the original NeXus file contained streams with '_r0' or '_t0' suffixes, they would need to be regenerated to ensure accuracy. Original prompt: Please use a worktree and address the suffix-removal problem discussed in #589 (do not strip _r0 and the like, since it refers to "rotation 0", in fact do not strip any suffixes). Do we need to rerun the scripts that were used to generate the dict of names for bifrost?
|
@nvaytet I think I addressed the comments. We need this now for deployment so I will merge. |
Summary
logs: StreamLUTparameter toStreamMappingfor mapping Kafka f144 source names to internal names (consistent with detectors/monitors)nexus_helpers.pywith units extraction and code generation forf144_log_streamsStreamMapping changes
Previously, f144 log data used Kafka source names directly (e.g.,
BIFRO-DtCar:MC-RotZ-01:Mtr.RBV). Now supports mapping to friendlier internal names (e.g.,detector_rotation).nexus_helpers.py enhancements
New CLI tool for extracting f144 stream info from NeXus files:
Bifrost configuration
Combined
f144_log_streamsdict keeps source names, internal names, units, and topics co-located:Both
f144_attribute_registryand theStreamLUTare derived from this single source of truth.Test plan
StreamMapping.log_topicspropertysuggest_internal_name,filter_f144_streams,generate_f144_log_streams_code🤖 Generated with Claude Code