Large API changes to DeepLens in feedback branch

Updates

The last commit (on feedback) made large API changes to base DeepLens structures and will break most of the pre-existing code. This integrated the changes that were previously made on the sw-feedback branch and the feedback branch. It additionally modified the DataStream class significantly, adds an operator materialization, and adds a PipelineManager, and changes select functions in FullManager. Notable changes:

DataStreams (including VideoStreams) are required to define how they are materialized with static functions. This encapsulates and hides the internal implementation within a DataStream. For example, we don't have to know that we are using cv2 in CVVideoStream to write/read a new stream at all, or if we use JSON to serialize a list.
DataStreams return themselves when iterated. This means that we can call functions or reference parameters (eg. width of video) inherent to the stream at each iteration. Additionally, this allows us to prevent expensive materialization of data if it is not used during some iterations in the Pipeline (to be implemented). Finally, this allows us to do more rigorous type-checking on Pipelines.
There is a get() function in each DataStream that returns the materialized data at each iteration. Eg. ConstantStream returns a constant, and JSONListStream returns the item in the list at the appropriate index.
Pipeline() is a basic iterator over a dictionary of DataStreams. It should not be directly called, but rather, it should be manipulated with a PipelineManager. With PipelineManager, you can add one VideoStream, multiple DataStreams, and designate a list of operators, before calling build() to build a Pipeline. This allows for optimizations over the entire pipeline (like in FullOpt previously) before it is built, as well as provide stronger requirements for our version of a Pipeline (i.e. requiring a main VideoStream). Finally, it allows us to have general functions that act on a Pipeline - PipelineManager.run() iterates through the Pipeline, and optionally returns the results.
FullStorageManager - labels in our storage also functions as auxiliary DataStreams. Because of this, the table of labels was modified significantly to add type and value. (Further discussion about DataStreams not linked to VideoStreams can be made).
put_streams() in FullStorageManager directly puts a VideoStream and a set of auxiliary DataStreams into the manager (note that the VideoStream doesn't have to be materialized - i.e. we don't need to actually store the video data).
get() in FullStorageManager directly runs Sqlite queries on our database. This is in response to the fact that we had > 3 get functions that wrapped around the queries with small differences, and we will likely come up with more variations during testing. We can change this at a later date for production if needed or if we have a good idea of what queries we'll actually need after testing. (IMO, I think that it should stay like this we would be artificially restricting the power of our queries otherwise and SQL queries aren't too difficult).
Two new functions in FullStorageManager, create_vstream, create_dstream, directly creates VideoStreams (DataStreams) based on clip_id, video_name (and label).
New Materialize operator. Allows us to materialize intermediate (or end-result) streams in our Pipeline.(Note: Needs debugging once we actually fix all our previous code). This is an example of Operator and DataStream applications, and hopefully, looking at the code for this will help clarify previous points.
DataQueue class. Allows us to queue up small VideoStream and DataStreams to be aligned into the Pipeline. A part of one of the two ways that the Pipeline API will be used.

Example Use Case 1 of Pipeline(), PipelineManager() and DataQueue()

Over small VideoStream clips. Complicated case where we use a different platform to process one stream and use the results on the other streams:

clips -> list of VideoStreams of size batch_size

streams0 -> auxiliary DataStreams 1

streams1 -> auxiliary DataStreams 2

queue = DataQueue()
for clip in clips:
    queue.enqueue_videostream(clip)

for stream in streams0:
    queue.enqueue_datastream(stream, 'random_name0')

for stream in streams1:
    queue.enqueue_datastream(stream, 'random_name1')

pipeline = PipelineManager()
pipeline.add_operator(Materialize(name, args))

....

for vstream, streams in queue:
    if user_defined_function(streams['random_name0']):
        pipeline.update_videostream(vstream)
        pipeline.add_datastream(streams['random_name1'], 'random_name1')
        results = pipeline.run()
        pipeline.clear_streams()

Example Use Case 2 of Pipeline(), PipelineManager() and itertools

This is for the use case where we can't separate a video into batches (eg. implementation of a sliding window)

clips -> list of VideoStreams of size batch_size

streams0 -> auxiliary DataStreams 1

import itertools

clip = itertools.chain(clips)
stream0 = itertools.chain(streams0)

pipeline = PipelineManager()
pipeline.add_operator(Materialize(name, args))
pipeline.update_videostream(clip)
pipeline.add_datastream(streams0, 'random_name0')
pipeline = pipeline.build()
for frame in pipeline: # frame is the result
    # do something to the result of the pipeline per iteration

TODO:

@swjz Implement HwangVideoStream again (Sorry! Re-doing DataStreams will probably change this class a lot. Note: Does Hwang have an efficient writer? Now we can also use that!)
Integrate previous code and made design changes to this API as needed. This will include testing of the new code (because most of this is hard to test when the other components weren't working).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large API changes to DeepLens in feedback branch

Clone this wiki locally