Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow RDF #478

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open

workflow RDF #478

wants to merge 48 commits into from

Conversation

FynnBe
Copy link
Member

@FynnBe FynnBe commented Oct 26, 2022

implements spec part of #477
with a few assumptions on open questions...

@FynnBe
Copy link
Member Author

FynnBe commented Oct 26, 2022

A preview of the workflow RDF documentation can be found here:
https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages-workflow_rdf/workflow_spec_0_2.md

@FynnBe
Copy link
Member Author

FynnBe commented Oct 27, 2022

adapted after furrther discussion with @oeway and @constantinpape

@FynnBe FynnBe marked this pull request as ready for review October 27, 2022 13:58
@FynnBe
Copy link
Member Author

FynnBe commented Oct 28, 2022

hey @oeway !
I was just working on the workflow RDF and found some more aspects we didn't think about before and started discussing quite a bit with @constantinpape .
We really oppose inputs with defaults as it makes it very unclear when to use inputs and when kwargs.

We iterated over several solutions and arrived at this:

  • no kwargs field, only inputs
  • an input may have a default only if all following inputs have one too
  • outputs of one step are always positonal args to the next (like before)
  • outputs of one step may be named (like before)
  • inputs to a step may be given as a list (like before)
  • inputs to a step may be given as a dict (like we gave kwargs)
  • if an input dict is missing inputs, these are taken from the args (outputs of the previous step) if present
    • no interleaving: no input following an input given in an input dict can be taken from the present args

Essentially we don't try to come up with our own logic, but mirror how python callables behave.

@FynnBe
Copy link
Member Author

FynnBe commented Oct 28, 2022

  • inputs to a step may be given as a list (like before)
  • inputs to a step may be given as a dict (like we gave kwargs)
    ...

Essentially we don't try to come up with our own logic, but mirror how python callables behave.

been thinking about this part a bit more... bear with me, it's shorter than it looks!

We can also mix positional inputs and named inputs with list entries containing a dict with a single key (that is the name of an input):

steps:
- op: ...
  inputs: 
  - arg0
  - arg1
  - key0: value0
  - key1: value1

key word order is ignored:
equivalent to above:

steps:
- op: ...
  inputs: 
  - arg0
  - arg1
  - key1: value1
  - key0: value0   # key1 and key2 are given as key word arguments, their order is ignored 👍 

Assuming the previous step returns [arg0, arg1], this is equivalent to the above:

steps:
- op: ...
  inputs:   # positional inputs (arg0, arg1) are taken from previous step, optional inputs may be overwritten 👍 
    key0: value0
    key1: value1

if inputs is a list I would ignore positional outputs from the step above:

steps:
- op: ...
  inputs:  # arg0 is NOT filled in from previous step 
  - arg1  
  - key0: value0
  - key1: value1

correct syntax, but in this example it would
raise "missing arg1" (because arg1 would be given as arg0...)

This pattern implies that input values that are dicts with one key need to be given like this:

steps:
- op: ...
  inputs:  
    arg0: {key: value}

or

steps:
- op: ...
  inputs:  
  - arg0: {key: value}

because

steps:
- op: ...
  inputs:  
  - key: value

would be interpreted as the optional input 'key' with value 'value' and not as the first positional argument (here arg0) with value {key: value}.
Of course we could compare the expected arg name and the given key, but I would not rely on that and mandate to specify single key dicts only as a named input.

@FynnBe
Copy link
Member Author

FynnBe commented Oct 28, 2022

@oeway , one more thing @constantinpape and I discussed:
We should add axes to input if input type: tensor.

@oeway
Copy link
Contributor

oeway commented Oct 29, 2022

Hi, In the first glance, I like the idea of merging inputs and kwargs. However, when think it a bit further I find it actually making the usage much more complicated and confusing.

We really oppose inputs with defaults as it makes it very unclear when to use inputs and when kwargs.

I don't really get what this means, the rule is pretty clear to me. Same for defining a python function, if it's required, go for inputs, and if it's optionally with default, go for kwargs. For inputs, you don't need to name them, but kwargs you need to name them. inputs is a list while kwargs is a dictionary. You can of course merge the two by exploiting list of dict, or dict as you are proposing, but feels very hacky and needs a lot of additional rules when using it.

Essentially we don't try to come up with our own logic, but mirror how python callables behave.

But the actually python callables actually separates the args (a list) and kwargs (a dict). Using a list with 1-key dict or dict to encapsulate the two into one object make it feel hacky and hard to use.

In the following example:

  - op: zero_mean_unit_variance
  - op: model_inference
     kwargs:
         model_id: fearless-crab
         preprocessing: false # disable the preprocessing
         postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     kwargs:
        diameter: 2.3

If I understand correctly, to adapt what your proposed it will become:

  - id: step1
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: step2
     op: model_inference
     inputs:
         - step1.outputs.out1
         - model_id: fearless-crab
         - preprocessing: false # disable the preprocessing
         - postprocessing: false # disable the postprocessing
     outputs: [out2]
  - op: stardist_postprocessing
     inputs:
        - step2.outputs.out2
        - diameter: 2.3

Note that, because in step2 and step2 we need to set the kwargs, we are forced to name the inputs, and this propagate to all the other steps to force us set id for the first two steps and name the outputs. This introduces a lot of hassle for the naming.

We can potentially allowing ignoring the positional inputs when we only want to change the kwargs, however, this make the inputs fields looks so confusing, since we are not writing out all the inputs.

Overall, I felt the merging of inputs and kwargs will require more rules, the list of 1-key dict feels hacky, separating them actually make more sense to me.

We should add axes to input if input type: tensor.

Sure, maybe we should bring in more tensor spec (e.g. shape) too, however, for simplicity, maybe adding axes, shape, dtype etc. as an optional field would help.

@FynnBe
Copy link
Member Author

FynnBe commented Oct 30, 2022

Note that, because in step2 and step2 we need to set the kwargs, we are forced to name the inputs, and this propagate to all the other steps to force us set id for the first two steps and name the outputs. This introduces a lot of hassle for the naming.

no. I would not force naming/specifying outputs if inputs is given.
Also named inputs could be overwritten without touching the positional inputs by providing them as dict:

  - id: step0
     op: zero_mean_unit_variance
  - id: step2
     op: model_inference
     inputs:
         model_id: fearless-crab
         preprocessing: false # disable the preprocessing
         postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     inputs:
        diameter: 2.3

If positional inputs need to be specified this would become

  - id: step0
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: somehting-in-between  # added this step to the example, because otherwise out1 would not need naming
     op: ...
  - id: step2
     op: model_inference
     inputs:
         - step1.outputs.out1
         - model_id: fearless-crab
         - preprocessing: false # disable the preprocessing
         - postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     inputs:
        diameter: 2.3

to refine my idea of merging inputs and kwargs I wanted to propose a slight update:

  • if inputs are given as a list and the last entry is a dictionary, those are named inputs (kwargs).
    This avoids a list of len 1 kwargs which indeed feels very hacky.

The above example then becomes:

  - id: step0
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: somehting-in-between  # added this step to the example, because otherwise out1 would not need naming
     op: ...
  - id: step2
     op: model_inference
     inputs:
         - step1.outputs.out1
         - model_id: fearless-crab
           preprocessing: false # disable the preprocessing
           postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     inputs:
        diameter: 2.3

However, I have to agree with @oeway . this is still hacky...
the issue we tried to address, and the confusion we mentioned before was that inputs may have defaults. So another solution to a clearer specification would be mandatory inputs:

  • inputs have no defaults!
  • if inputs are not specified previous outputs are given as inputs
  • we rename kwargs (too python specific) to options (other suggestions welcome!), which always have a default value and a name
  • options are specified as a dict and always referred to by their name (key), because an arbitrary subset of them may be overwritten. This further distinguishes them from inputs and avoids confusing the two sets of parameters.
  • [I would like this one, but it is not a must] inputs may also be given as a dict via their names, but stay separate from the options.

Sure, maybe we should bring in more tensor spec (e.g. shape) too, however, for simplicity, maybe adding axes, shape, dtype etc. as an optional field would help.

I agree that shape and dtype may be added as optional fields, but I think we cannot handle tensors reasonably without knowing their axes. We don't want a bunch of transpose ops littered throughout the workflows which would be the consequence...
many of the ops (pre-/ and postprocessing) defined in spec and implemented in core have an axes kwarg. (mean per channel etc... )...

@FynnBe
Copy link
Member Author

FynnBe commented Oct 31, 2022

@oeway @constantinpape I know this is a lot to read through and think about, but it would be really convenient for me, if you could take a brief moment and just comment if we can proceed with the current state or need more discussion (if so that does not need to happen right away ;-) )

I tried to comment all discussed/proposed decisions in the following example:

inputs:
- name: ipt0
  type: tensor
  axes: bxy      # if type=='tensor' then 'axes' is mandatory

options:   
- name: param0  # although options are given as a dict we specify them as a list to match inputs/outputs pattern
  type: int
  default: null   # options always need a default value (may be null or of type 'type')

outputs: 
- name: ...
  type: tensor
  axes: bxy
- name: ...
  type: bool

steps:
  - id: step0
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: somehting-in-between  # added this step to the example, because otherwise out1 would not need naming
     op: ...
  - id: step2
     op: model_inference
     inputs: [step1.outputs.out1]
# inputs as list or alternatively by name as dict, but either way always all inputs need to be specified:
#     inputs: 
#       raw: step1.outputs.out1
     options:
         model_id: fearless-crab
         preprocessing: false # disable the preprocessing
         postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     options:
        diameter: 2.3

test_steps:  # analog to 'steps', but do not take any inputs/options (it may call the 'steps' multiple times with different fixed inputs/options though
  - ... 

@constantinpape
Copy link
Collaborator

@oeway:

I don't really get what this means, the rule is pretty clear to me. Same for defining a python function, if it's required, go for inputs, and if it's optionally with default, go for kwargs.

I agree with this. But when I discussed this with @FynnBe, he said that you wanted to allow default values for inputs. And this would not make sense to me. Judging from your comment here that might have been a misunderstanding.

  • inputs have no defaults!
  • if inputs are not specified previous outputs are given as inputs
  • we rename kwargs (too python specific) to options (other suggestions welcome!), which always have a default value and a name
  • options are specified as a dict and always referred to by their name (key), because an arbitrary subset of them may be > - overwritten. This further distinguishes them from inputs and avoids confusing the two sets of parameters.
  • [I would like this one, but it is not a must] inputs may also be given as a dict via their names, but stay separate from the options.

That sounds good to me. About the last point: I would rather not do it since I think we need to introduce more rules etc. and it makes reading everything more complex. But I don't have a strong opinion on this, so would be fine with this either way.

I also agree that we should have axes mandatory for tensors.

I know this is a lot to read through and think about, but it would be really convenient for me, if you could take a brief moment and just comment if we can proceed with the current state or need more discussion (if so that does not need to happen right away ;-) )

Looks good from my end. My time to look into this will be very limited in the next few weeks, so please don't wait on my feedback to go ahead here.

@oeway
Copy link
Contributor

oeway commented Oct 31, 2022

Hi @FynnBe I like this! No strong opinion about changing kwargs to options. If we do that, would it also make sense to change list/dict to array/object then? To align with the json types.

A minor suggestion, would it make sense to make name field under inputs optional? just to align with the ops spec, so we can use the default name self.inputs.0 (same as unnamed inputs in ops).

For axes, would it make sense to bring this new axes definition into the tensor spec?

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 24, 2022
@FynnBe
Copy link
Member Author

FynnBe commented Feb 9, 2023

this is ready for a pre-alpha release!
more detalis on the new bioimageio.workflows package can be found at https://github.com/bioimage-io/workflows-bioimage-io-python

@FynnBe FynnBe mentioned this pull request Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants