Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarity support with pipeline for data processing #124

Closed
wants to merge 19 commits into from

Conversation

mickp
Copy link
Collaborator

@mickp mickp commented Dec 5, 2019

Addresses issue #121. Needs testing with hardware.

@mickp mickp mentioned this pull request Dec 11, 2019
microscope/devices.py Outdated Show resolved Hide resolved
@@ -250,6 +261,7 @@ def initialize(self):
"""
_logger.info('Initializing.')
time.sleep(0.5)
self._initialized = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add this attribute during object construction? This would mean that one only neds to check for it's value and not have always also check if the attribute already exists. Also, should it not be set back to False after shutdown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not a good reason, but this would still work if a device fails to
set that member during init. In the TestCamera case, it's a simple
attribute. If we used a similar idea on real hardware, we might want some
other test (with library calls) to test that the devices is correctly
initialized, so _initialized could then be a property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is only being made to TestCamera so there is no what other device could do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because I've introduced it here for now, but we might want to move it over to the base Device class if it's useful.

(0, 1): numpy.flipud(numpy.rot90(data, rot)),
(1, 0): numpy.fliplr(numpy.rot90(data, rot)),
(1, 1): numpy.fliplr(numpy.flipud(numpy.rot90(data, rot)))
}[flips]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this change but isn't this indexing of the dict always perfoming all flips and rotations and then we are discarding them all except one? If so, shouldn't we ensure that we only do the transformation that is actually required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I assumed lazy evaluation, but maybe that's not the case. Perhaps lambdas would be more appropriate.

Copy link
Collaborator Author

@mickp mickp Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not evaluated lazily. I'll wrap it with lambda.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix included in this PR: cfff4a5.

Subclasses should call super()._process_data(data) after doing their
own processing."""
import functools
return functools.reduce(lambda x, f: f(x), self.pipeline, data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an attribute which is a list of processing steps, why should subclasses do their own processing first instead of just appending to self.pipeline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I think we should do one or the other. This requires some thought, though.

Originally _process_data was a method on DataDevices to implement the processing step, and I think that was meant to be pure virtual.

Now we have Devices that are not DataDevices that need to add processing steps, and I addressed that with the pipeline, but that needs super()._process_data() to ensure that the pipeline is executed after the DataDevice does anything it needs to do (e.g. running it through a LUT, type conversions, reshaping arrays).

The DataDevice could just add it's processing to the start of the pipeline, instead, and _process_data would be implemented on DataDevice and just process the pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in review immediately below this one.

@@ -409,6 +409,8 @@ def __init__(self, buffer_length=0, **kwargs):
self._acquiring = False
# A condition to signal arrival of a new data and unblock grab_next_data
self._new_data_condition = threading.Condition()
# A data processing pipeline: a list of f(data) -> data.
self.pipeline = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not make this a private attribute? Also, would be nice to also have type annotations for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be private, but then we'd need methods to allow other devices to add their processing steps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this at length.

  • Most (all?) classes derived from DataDevice should probably add to the pipeline rather than override _process_data.
  • Devices that override _process_data must call super()._process_data().
  • The docstring on DataDevice._process_data should indicate this clearly.
  • We could make pipeline private, but that would require methods to add/remove steps from it.

Copy link
Collaborator Author

@mickp mickp Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_process_data docstring made more explicit in 1c16359.
CameraDevice made to use pipeline instead of override in 0361a02.
_process_data is now only defined in one place, on the DataDevice class.

cam_kwargs = {}
for key in cam_kw_keys:
cam_kwargs[key.replace("camera.", "")] = kwargs[key]
del kwargs[key]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a camera_kwargs argument with the arguments to construct the camera make this cleaner?

def __init__(self, camera=None, camera_kwargs={}, **kwargs):
    super().__init__(**kwargs)
    if camera is not None:
        self._camera = camera(**camera_kwargs)
        ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make the code a bit cleaner, but then you'd have nested { {} } in the config, and maybe that's not cleaner / more error-prone.

@carandraug
Copy link
Collaborator

I have spoken with Mick earilier today in person about this PR and the
following is just a summary of that.

This Clarity mixing a ControllerDevice with a FilterWheel should
really be some sort of composite device. The problem it tries to
solve by being a controller is the same that we have on AO-tools and
with live SIM reconstruction (at least in the manner that Marcel
envisioned it).

Not mentioned anywhere yet on this PR is that the reasoning behind
this design is that we want to avoid passing data around between
processes for processing: camera gets image, sends it to the Clarity
for processing, Clarity then sends it back to the camera, which
finally sends it to the client. Using a controller makes that pretty
easy since everything ends up in the same process.

However, we are just delaying the problem and we will eventually need
to come up with some interface to composite devices and possibly how
to share data between processes (the multiprocessing module has
classes for that but there's a whole bunch of issues that we may have
to think about, specially since we mainly want to shared numpy arrays
and not python arrays).

For now, we need this Aurox interface and it does do what we need it
to do. It enables us to delay coming up with a general solution to
composite devices.

Leaving this unhandled could prevent other devices in the same process (e.g. on a controller) being shut down.
get_all_settings reports the values of problem settings as None.
Previously, we considered adding a new entry, __errors__, to the returned dict, but this is adding an item that is not a setting to a dict where everything else is a setting, and would require clients to handle this key as a special case.
Fixed ClarityProcessor import and conversion of UMat to ndarray.
Dict evaulation is not lazy: previously, this code evaluated all allowable transforms before picking the appropriate one. This is now fixed by breaking it down into the rotation transfrom, then passing the result to one function from a dict of flip transform functions.
@mickp
Copy link
Collaborator Author

mickp commented Dec 19, 2019

I've addressed some of the review items, and rebased this onto master.
I think the only outstanding issues are:

  • whether or not to make pipeline private
  • whether to use camera.some_parameter=x or camera_kwargs = {'some_parameter': x}

@carandraug
Copy link
Collaborator

whether to use camera.some_parameter=x or camera_kwargs = {'some_parameter': x}

It just occurred to me that we can't use a dot on parameter name, the following will fail because dot can't be used in a keyword:

Clarity(camera.some_parameter=x)

even though this works:

Clarity(**{'camera.some_parameter':x})

@mickp
Copy link
Collaborator Author

mickp commented Jan 6, 2020 via email

@carandraug
Copy link
Collaborator

So it currently only works because the DeviceServer uses the second
method to call the constructor?

Yes.

This point is just about the use of a period on an argument name, it's not about whether passing a dict of arguments to the camera vs passing multiple args marked for the camera.

@mickp
Copy link
Collaborator Author

mickp commented Jan 13, 2020

I think we need another approach for this.

Data processing can modify the shape of returned arrays, yet when we query this, the request goes to the underlying data source which returns the original shape. With the pipeline approach on one method, it's not easy to address this. A better approach might be to take the underlying camera class, dynamically create an augmented class that modifies both the data collection and data shape methods, and serve an instance of that augmented class.

@carandraug
Copy link
Collaborator

We have #133 which is a different implementation also from Mick, that addresses some of the issues from this solution. We are not merging this. Closing.

@carandraug carandraug closed this Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants