workflow RDF #478

FynnBe · 2022-10-26T22:26:55Z

implements spec part of #477
with a few assumptions on open questions...

FynnBe · 2022-10-26T22:34:18Z

A preview of the workflow RDF documentation can be found here:
https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages-workflow_rdf/workflow_spec_0_2.md

FynnBe · 2022-10-27T13:58:06Z

adapted after furrther discussion with @oeway and @constantinpape

example_specs/workflows/hpa/single_cell_classification.yaml

example_specs/workflows/stardist/stardist_example.yaml

FynnBe · 2022-10-28T15:32:29Z

hey @oeway !
I was just working on the workflow RDF and found some more aspects we didn't think about before and started discussing quite a bit with @constantinpape .
We really oppose inputs with defaults as it makes it very unclear when to use inputs and when kwargs.

We iterated over several solutions and arrived at this:

no kwargs field, only inputs
an input may have a default only if all following inputs have one too
outputs of one step are always positonal args to the next (like before)
outputs of one step may be named (like before)
inputs to a step may be given as a list (like before)
inputs to a step may be given as a dict (like we gave kwargs)
if an input dict is missing inputs, these are taken from the args (outputs of the previous step) if present
- no interleaving: no input following an input given in an input dict can be taken from the present args

Essentially we don't try to come up with our own logic, but mirror how python callables behave.

FynnBe · 2022-10-28T19:39:51Z

inputs to a step may be given as a list (like before)

inputs to a step may be given as a dict (like we gave kwargs)
...

Essentially we don't try to come up with our own logic, but mirror how python callables behave.

been thinking about this part a bit more... bear with me, it's shorter than it looks!

We can also mix positional inputs and named inputs with list entries containing a dict with a single key (that is the name of an input):

steps:
- op: ...
  inputs: 
  - arg0
  - arg1
  - key0: value0
  - key1: value1

key word order is ignored:
equivalent to above:

steps:
- op: ...
  inputs: 
  - arg0
  - arg1
  - key1: value1
  - key0: value0   # key1 and key2 are given as key word arguments, their order is ignored 👍

Assuming the previous step returns [arg0, arg1], this is equivalent to the above:

steps:
- op: ...
  inputs:   # positional inputs (arg0, arg1) are taken from previous step, optional inputs may be overwritten 👍 
    key0: value0
    key1: value1

if inputs is a list I would ignore positional outputs from the step above:

steps:
- op: ...
  inputs:  # arg0 is NOT filled in from previous step 
  - arg1  
  - key0: value0
  - key1: value1

correct syntax, but in this example it would
raise "missing arg1" (because arg1 would be given as arg0...)

This pattern implies that input values that are dicts with one key need to be given like this:

steps:
- op: ...
  inputs:  
    arg0: {key: value}

or

steps:
- op: ...
  inputs:  
  - arg0: {key: value}

because

steps:
- op: ...
  inputs:  
  - key: value

would be interpreted as the optional input 'key' with value 'value' and not as the first positional argument (here arg0) with value {key: value}.
Of course we could compare the expected arg name and the given key, but I would not rely on that and mandate to specify single key dicts only as a named input.

FynnBe · 2022-10-28T20:43:15Z

@oeway , one more thing @constantinpape and I discussed:
We should add axes to input if input type: tensor.

oeway · 2022-10-29T17:34:27Z

Hi, In the first glance, I like the idea of merging inputs and kwargs. However, when think it a bit further I find it actually making the usage much more complicated and confusing.

We really oppose inputs with defaults as it makes it very unclear when to use inputs and when kwargs.

I don't really get what this means, the rule is pretty clear to me. Same for defining a python function, if it's required, go for inputs, and if it's optionally with default, go for kwargs. For inputs, you don't need to name them, but kwargs you need to name them. inputs is a list while kwargs is a dictionary. You can of course merge the two by exploiting list of dict, or dict as you are proposing, but feels very hacky and needs a lot of additional rules when using it.

Essentially we don't try to come up with our own logic, but mirror how python callables behave.

But the actually python callables actually separates the args (a list) and kwargs (a dict). Using a list with 1-key dict or dict to encapsulate the two into one object make it feel hacky and hard to use.

In the following example:

  - op: zero_mean_unit_variance
  - op: model_inference
     kwargs:
         model_id: fearless-crab
         preprocessing: false # disable the preprocessing
         postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     kwargs:
        diameter: 2.3

If I understand correctly, to adapt what your proposed it will become:

  - id: step1
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: step2
     op: model_inference
     inputs:
         - step1.outputs.out1
         - model_id: fearless-crab
         - preprocessing: false # disable the preprocessing
         - postprocessing: false # disable the postprocessing
     outputs: [out2]
  - op: stardist_postprocessing
     inputs:
        - step2.outputs.out2
        - diameter: 2.3

Note that, because in step2 and step2 we need to set the kwargs, we are forced to name the inputs, and this propagate to all the other steps to force us set id for the first two steps and name the outputs. This introduces a lot of hassle for the naming.

We can potentially allowing ignoring the positional inputs when we only want to change the kwargs, however, this make the inputs fields looks so confusing, since we are not writing out all the inputs.

Overall, I felt the merging of inputs and kwargs will require more rules, the list of 1-key dict feels hacky, separating them actually make more sense to me.

We should add axes to input if input type: tensor.

Sure, maybe we should bring in more tensor spec (e.g. shape) too, however, for simplicity, maybe adding axes, shape, dtype etc. as an optional field would help.

FynnBe · 2022-10-30T19:03:06Z

Note that, because in step2 and step2 we need to set the kwargs, we are forced to name the inputs, and this propagate to all the other steps to force us set id for the first two steps and name the outputs. This introduces a lot of hassle for the naming.

no. I would not force naming/specifying outputs if inputs is given.
Also named inputs could be overwritten without touching the positional inputs by providing them as dict:

  - id: step0
     op: zero_mean_unit_variance
  - id: step2
     op: model_inference
     inputs:
         model_id: fearless-crab
         preprocessing: false # disable the preprocessing
         postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     inputs:
        diameter: 2.3

If positional inputs need to be specified this would become

  - id: step0
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: somehting-in-between  # added this step to the example, because otherwise out1 would not need naming
     op: ...
  - id: step2
     op: model_inference
     inputs:
         - step1.outputs.out1
         - model_id: fearless-crab
         - preprocessing: false # disable the preprocessing
         - postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     inputs:
        diameter: 2.3

to refine my idea of merging inputs and kwargs I wanted to propose a slight update:

if inputs are given as a list and the last entry is a dictionary, those are named inputs (kwargs).
This avoids a list of len 1 kwargs which indeed feels very hacky.

The above example then becomes:

  - id: step0
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: somehting-in-between  # added this step to the example, because otherwise out1 would not need naming
     op: ...
  - id: step2
     op: model_inference
     inputs:
         - step1.outputs.out1
         - model_id: fearless-crab
           preprocessing: false # disable the preprocessing
           postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     inputs:
        diameter: 2.3

However, I have to agree with @oeway . this is still hacky...
the issue we tried to address, and the confusion we mentioned before was that inputs may have defaults. So another solution to a clearer specification would be mandatory inputs:

inputs have no defaults!
if inputs are not specified previous outputs are given as inputs
we rename kwargs (too python specific) to options (other suggestions welcome!), which always have a default value and a name
options are specified as a dict and always referred to by their name (key), because an arbitrary subset of them may be overwritten. This further distinguishes them from inputs and avoids confusing the two sets of parameters.
[I would like this one, but it is not a must] inputs may also be given as a dict via their names, but stay separate from the options.

Sure, maybe we should bring in more tensor spec (e.g. shape) too, however, for simplicity, maybe adding axes, shape, dtype etc. as an optional field would help.

I agree that shape and dtype may be added as optional fields, but I think we cannot handle tensors reasonably without knowing their axes. We don't want a bunch of transpose ops littered throughout the workflows which would be the consequence...
many of the ops (pre-/ and postprocessing) defined in spec and implemented in core have an axes kwarg. (mean per channel etc... )...

FynnBe · 2022-10-31T19:33:56Z

@oeway @constantinpape I know this is a lot to read through and think about, but it would be really convenient for me, if you could take a brief moment and just comment if we can proceed with the current state or need more discussion (if so that does not need to happen right away ;-) )

I tried to comment all discussed/proposed decisions in the following example:

inputs:
- name: ipt0
  type: tensor
  axes: bxy      # if type=='tensor' then 'axes' is mandatory

options:   
- name: param0  # although options are given as a dict we specify them as a list to match inputs/outputs pattern
  type: int
  default: null   # options always need a default value (may be null or of type 'type')

outputs: 
- name: ...
  type: tensor
  axes: bxy
- name: ...
  type: bool

steps:
  - id: step0
     op: zero_mean_unit_variance
     outputs: [out1]
  - id: somehting-in-between  # added this step to the example, because otherwise out1 would not need naming
     op: ...
  - id: step2
     op: model_inference
     inputs: [step1.outputs.out1]
# inputs as list or alternatively by name as dict, but either way always all inputs need to be specified:
#     inputs: 
#       raw: step1.outputs.out1
     options:
         model_id: fearless-crab
         preprocessing: false # disable the preprocessing
         postprocessing: false # disable the postprocessing
  - op: stardist_postprocessing
     options:
        diameter: 2.3

test_steps:  # analog to 'steps', but do not take any inputs/options (it may call the 'steps' multiple times with different fixed inputs/options though
  - ...

constantinpape · 2022-10-31T20:02:54Z

@oeway:

I don't really get what this means, the rule is pretty clear to me. Same for defining a python function, if it's required, go for inputs, and if it's optionally with default, go for kwargs.

I agree with this. But when I discussed this with @FynnBe, he said that you wanted to allow default values for inputs. And this would not make sense to me. Judging from your comment here that might have been a misunderstanding.

inputs have no defaults!

if inputs are not specified previous outputs are given as inputs

we rename kwargs (too python specific) to options (other suggestions welcome!), which always have a default value and a name

options are specified as a dict and always referred to by their name (key), because an arbitrary subset of them may be > - overwritten. This further distinguishes them from inputs and avoids confusing the two sets of parameters.

[I would like this one, but it is not a must] inputs may also be given as a dict via their names, but stay separate from the options.

That sounds good to me. About the last point: I would rather not do it since I think we need to introduce more rules etc. and it makes reading everything more complex. But I don't have a strong opinion on this, so would be fine with this either way.

I also agree that we should have axes mandatory for tensors.

I know this is a lot to read through and think about, but it would be really convenient for me, if you could take a brief moment and just comment if we can proceed with the current state or need more discussion (if so that does not need to happen right away ;-) )

Looks good from my end. My time to look into this will be very limited in the next few weeks, so please don't wait on my feedback to go ahead here.

oeway · 2022-10-31T20:13:59Z

Hi @FynnBe I like this! No strong opinion about changing kwargs to options. If we do that, would it also make sense to change list/dict to array/object then? To align with the json types.

A minor suggestion, would it make sense to make name field under inputs optional? just to align with the ops spec, so we can use the default name self.inputs.0 (same as unnamed inputs in ops).

For axes, would it make sense to bring this new axes definition into the tensor spec?

# Conflicts: # tests/conftest.py

applies only for creating an RDF raw node in code

FynnBe · 2023-02-09T12:26:06Z

this is ready for a pre-alpha release!
more detalis on the new bioimageio.workflows package can be found at https://github.com/bioimage-io/workflows-bioimage-io-python

# Conflicts: # README.md

FynnBe added 6 commits October 26, 2022 21:59

draft workflow RDF

aa7cdc6

update passthrough module generation

7924b2f

Tensor -> Arg; ArgType

825cad3

test examples

891093f

add schema validation

8f06f9e

add test_workflow_rdf.py

6889bf8

FynnBe requested review from constantinpape and oeway October 26, 2022 22:27

FynnBe added 3 commits October 27, 2022 00:29

fix missing workflow import

9f0eed2

update generate_rdf_docs.py and generate_json_specs.py

8639d8d

fix typing import

1476670

test_steps and better workflow kwargs

fe9243e

FynnBe marked this pull request as ready for review October 27, 2022 13:58

Merge branch 'main' into workflow_rdf

7afc2c6

oeway reviewed Oct 27, 2022

View reviewed changes

example_specs/workflows/hpa/single_cell_classification.yaml Outdated Show resolved Hide resolved

example_specs/workflows/stardist/stardist_example.yaml Outdated Show resolved Hide resolved

Update example_specs/workflows/hpa/single_cell_classification.yaml

5645d4c

FynnBe added 2 commits October 28, 2022 23:48

wip discussion with constantin

218b67a

wip2

94d1292

Merge branch 'main' into workflow_rdf

5831b04

axes and options

428d605

FynnBe added 2 commits November 4, 2022 23:12

don't accept emtpy strings

7ace197

also log binarized

f5af22f

FynnBe mentioned this pull request Nov 4, 2022

Workflow RDF bioimage-io/core-bioimage-io-python#310

Open

FynnBe added 11 commits November 8, 2022 11:19

Merge branch 'main' into workflow_rdf

96376e5

# Conflicts: # tests/conftest.py

wip remove wf steps

9fe4ca2

rename importable sources

31ecba9

black

ea1b826

update changelog

283da9a

remove steps from workflow spec

73b31b9

split up CallableSource field

9c4a81d

set format_version as default

91e6783

applies only for creating an RDF raw node in code

prohibit serializing a list from a string

8bdb9c9

remove specialized axes classes

a89bf07

remove redundant brackets

5670cbc

github-actions bot added the documentation Improvements or additions to documentation label Nov 24, 2022

FynnBe added 10 commits November 25, 2022 10:16

update workflow tests

9b82e90

rename DEFAULT_TYPE_NAME_MAP

0243665

rename ArbitraryAxes to UnknownAxes

91d141f

make nested_errors optional

a3d97c8

assert for mypy

b7b51a9

some aliases for backward compatibility

eb4e3f8

add AXIS_LETTER_TO_NAME and AXIS_NAME_TO_LETTER

1102f6a

Merge branch 'main' into workflow_rdf

6780a70

Merge branch 'main' into workflow_rdf

924d667

update hello workflow example

b66798a

FynnBe mentioned this pull request Feb 14, 2023

Enhance axes in 0.5.0 #290

Closed

FynnBe added 3 commits March 3, 2023 11:33

Merge branch 'main' into workflow_rdf

c90cdd4

# Conflicts: # README.md

Merge branch 'main' into workflow_rdf

ff5cc6e

remove +\n from CLI help

052c553

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow RDF #478

workflow RDF #478

FynnBe commented Oct 26, 2022

FynnBe commented Oct 26, 2022

FynnBe commented Oct 27, 2022

FynnBe commented Oct 28, 2022

FynnBe commented Oct 28, 2022 •

edited

Loading

FynnBe commented Oct 28, 2022 •

edited

Loading

oeway commented Oct 29, 2022 •

edited

Loading

FynnBe commented Oct 30, 2022 •

edited

Loading

FynnBe commented Oct 31, 2022

constantinpape commented Oct 31, 2022

oeway commented Oct 31, 2022 •

edited

Loading

FynnBe commented Feb 9, 2023 •

edited

Loading

workflow RDF #478

Are you sure you want to change the base?

workflow RDF #478

Conversation

FynnBe commented Oct 26, 2022

FynnBe commented Oct 26, 2022

FynnBe commented Oct 27, 2022

FynnBe commented Oct 28, 2022

FynnBe commented Oct 28, 2022 • edited Loading

FynnBe commented Oct 28, 2022 • edited Loading

oeway commented Oct 29, 2022 • edited Loading

FynnBe commented Oct 30, 2022 • edited Loading

FynnBe commented Oct 31, 2022

constantinpape commented Oct 31, 2022

oeway commented Oct 31, 2022 • edited Loading

FynnBe commented Feb 9, 2023 • edited Loading

FynnBe commented Oct 28, 2022 •

edited

Loading

FynnBe commented Oct 28, 2022 •

edited

Loading

oeway commented Oct 29, 2022 •

edited

Loading

FynnBe commented Oct 30, 2022 •

edited

Loading

oeway commented Oct 31, 2022 •

edited

Loading

FynnBe commented Feb 9, 2023 •

edited

Loading