Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conversion factor to waveform columns #1422

Conversation

CodyCBakerPhD
Copy link
Collaborator

@CodyCBakerPhD CodyCBakerPhD commented Jan 10, 2022

Motivation

For large amounts of recording data, even adding waveform snippets to the columns of a Units table can be a significant task. One step towards reducing unnecessary data inflation is to add attributes for scaling factors similar to the behavior of other TimeSeries-like objects that allow the data to be stored in some minimal base type, while the conversion factor then scales it into the specified scientific units.

I've added these attributes and attempted to propagate them through the IO, mirroring the patterns established by the waveform_rate attribute.

Sister PR: NeurodataWithoutBorders/nwb-schema#491

How to test the behavior?

Behavior is showcased in the steps, identical to waveform_rate. What is currently untested in both cases is the default behavior of merely calling nwbfile.add_unit() the first time in a fresh NWBFile, which auto-generates a blank Units table. We should discuss how attributes of that table ought to be set in that situation (probably just be always being sure to define nwbfile.Units = Units(**my_attributes) before adding any actual units).

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?

@CodyCBakerPhD
Copy link
Collaborator Author

@rly This would be a bit more proper to utilize the MeasurementData extension to the VectorData being proposed at the higher hdmf level, and while it's clear how to specify that in the nwb-schema it's not as clear to me how to actually get the adjust the columns of the DynamicTable here in pynwb to use that.

Comment on lines +153 to +154
{'name': 'waveforms', 'description': waveforms_desc, 'index': 2,
'class': MeasurementData, 'unit': 'volts', 'conversion': 1., 'offset': 0.}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rly I had assumed this logic would propagate to https://github.com/catalystneuro/hdmf/blob/dev/src/hdmf/common/table.py#L473-L480 and allow the values to be specified at __init__ of the UnitsTable in this fashion, but it's not recognizing that this subclass of VectorData has any additional arguments - any ideas?

How would you recommend passing these values (conversion + offset and maybe units) both for the initiation of the UnitsTable (including cases where no units have waveforms added later), as well as manual user specification of these values (possibly during add_unit)?

Copy link
Contributor

@rly rly Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_extra=True needs to be added to the docval of DynamicTable.add_column, i.e.,

            {'name': 'col_cls', 'type': type, 'default': VectorData,
             'doc': ('class to use to represent the column data. If table=True, this field is ignored and a '
                     'DynamicTableRegion object is used. If enum=True, this field is ignored and a EnumData '
                     'object is used.')}, 
            allow_extra=True)
    def add_column(self, **kwargs):  # noqa: C901

The tests still fail with that because conversion=1. is being passed but a VectorData is expected based on the spec

@@ -173,13 +173,14 @@ def test_add_waveforms(self):
[1, 2, 3] # spike 4
]
]
ut.add_unit(waveforms=wf1)
ut.add_unit(waveforms=wf1, unit='volts', conversion=1., offset=0.)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rly I was wondering if it would be easiest to just have these specified either (a) the first time waveforms are passed in add_unit, (b) every time they are passed in add_unit, or (c) they should be defined on the first call of Units(...) with defaults set in case no waveforms are intended to be added to the table (which is also kind of what I was trying to do above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what use case we want to support -- a single conversion and offset for all units or a conversion and offset for each unit. In my experience, the latter case, where there is more than one set of conversion and offset for all the units is much less common, so I think we need not support it by default. In that case, we could do something like what has been implemented for waveform unit and rate, where it is set on the constructor

@@ -1,6 +1,6 @@
# minimum versions of package dependencies for installing PyNWB
h5py==2.10 # support for selection of datasets with list of indices added in 2.10
hdmf==3.1.1
hdmf @ git+https://github.com/catalystneuro/hdmf.git@add_measurement_vector_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this will not clone the repo with the submodule, so the CI breaks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it needs a --recurse-submodule option on it? I've never worked with this many nested git submodules before, lol

@CodyCBakerPhD CodyCBakerPhD self-assigned this Apr 20, 2022
@CodyCBakerPhD CodyCBakerPhD deleted the change_waveform_column_to_measurement branch September 6, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants