Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of implicit tensors #493

Closed
oeway opened this issue Feb 7, 2023 · 5 comments
Closed

Definition of implicit tensors #493

oeway opened this issue Feb 7, 2023 · 5 comments
Milestone

Comments

@oeway
Copy link
Contributor

oeway commented Feb 7, 2023

We are at the HT hackathon, and we had a discussion about the ImplicitOutputShape in the README:

In reference to the shape of an input tensor, the shape of the output tensor is shape = shape(input_tensor) * scale + 2 * offset. ImplicitOutputShape is a Dict with the following keys:...

@axtimwalde raised the following questions/suggestions about it:

  • the output shape should be shape(input_tensor) * scale - 2 * offset instead of +2 * offset
  • It would make sense to remove the factor of 2 if we call it offset, otherwise it's like a margin for both sides, offset typically means the offsets to the origin.
  • we should replace input_tensor to reference_tensor in shape = shape(input_tensor) * scale + 2 * offset
    cc @FynnBe @constantinpape @dasv74
@axtimwalde
Copy link

Small corrections to our earlier rant:

  • I think it's fine to keep calling it offset and leaving it as + offset, but the factor 2 in combination with 0.5 step size is a weird workaround to specify integers and hurts in typed languages.
  • the range of scale is underspecified, it is actually a ratio of how many input pixels turn into how many output pixels, both integers, so we suggest to replace this by a fraction of integers, one part in the input spec and one in the output spec, no opinions about the name, ratio, factor, or even scale work. The formula to calculate the output shape would be
output.shape[i] = reference.shape[i] * output.scale[i] / reference.scale[i] + output.offset[i] 

. In a scenario where e.g. one channel turns into 33 channels like in StarDist, these values would be output.scale[3] = 33, input.scale[0] = 1, output.offset[3] = 0 instead of the confusing output.offset[3] = 16.5`! Opinions?

@axtimwalde
Copy link

Also, many architectures are not translationally invariant and cannot be tiled correctly by tiling the full outputs of valid input shapes. output.halo does not cover this because is has slightly different semantics. I therefore suggest that output can optionally define the same min and step fields as input. Intelligent tiling pipelines can then make sure that they crop the hangover pixels of outputs to create seamless tilings.

@FynnBe
Copy link
Member

FynnBe commented Feb 7, 2023

we should replace input_tensor to reference_tensor

yes, agreed (we discussed this a little already in #234 (comment)).

I like the addition of scale or similar.
However the fraction of two integer is not sufficient to describe two different input/output shape ratios for e.g. two outputs referencing the same input. I think we can solve this by giving meaning to "the intermediate result".
This relates to our discussion about improving the axes description with step and unit resulting in #290 . There each length axis gets a step and a physical unit. These could be used similar to scale in @axtimwalde 's suggestion.

output.shape[i] = reference.shape[i] * reference.step[i] * reference.unit / output.step[i] / output.unit + output.offset[i]

This assumes that there is no scaling factor between the physical space of tensor (aka output) and reference (aka input). If we do want to support that---"a network generating a meter of output per nano meter of input"---we'd have to add an additional "warp factor" per output... (maybe only if that use case arises^^)

Also, many architectures are not translationally invariant and cannot be tiled correctly by tiling the full outputs of valid input shapes. output.halo does not cover this because is has slightly different semantics. I therefore suggest that output can optionally define the same min and step fields as input. Intelligent tiling pipelines can then make sure that they crop the hangover pixels of outputs to create seamless tilings.

not sure how to specify "how to tile", which is what missing translational invariance would incur, but giving output min and step leaves the relation of input size to output size underdefined.
If an input cannot be seamlessly tiled due to its size tiles have to overlap. Where to overlap is an arbitrary choice that we could maybe fix with the guideline "overlay last two tiles" if necessary. I took this approach when implementing the workflow inference_with_dask()
(edit: the meat is in get_chunk)

I am totally open to changing how the offset/margin is defined. We can flip the sign and/or leave out the factor 2... I won't object to any of it. I like flipping the sign (and keeping the factor 2), such that the output (tensor)'s origin is at offset wrt to the input (reference) origin. To avoid confusion (especially when flipping a sign from one spec version to another) I'd consider to change the field name as well, e.g. origin_at (and specify the offset with physical unit).

@FynnBe
Copy link
Member

FynnBe commented Jan 12, 2024

Current draft to implement the changes discussed here:

class SizeReference(Node):
"""A tensor axis size (extent in pixels/frames) defined in relation to a reference axis.
`axis.size = reference.size * reference.scale / axis.scale + offset`
note:
1. The axis and the referenced axis need to have the same unit (or no unit).
2. A channel axis may only reference another channel axis. Their scales are implicitly set to 1.
3. Batch axes may not be referenced.
4. Fractions are rounded down.
example:
An unisotropic input image of w*h=100*49 pixels depicts a phsical space of 200*196mm².
Let's assume that we want to express the image height h in relation to its width w
instead of only accepting input images of exactly 100*49 pixels
(for example to express a range of valid image shapes by parametrizing w, see `ParametrizedSize`).
>>> w = SpaceInputAxis(id=AxisId("w"), size=100, unit="millimeter", scale=2)
>>> h = SpaceInputAxis(
... id=AxisId("h"),
... size=SizeReference(tensor_id=TensorId("input"), axis_id=AxisId("w"), offset=-1),
... unit="millimeter",
... scale=4,
... )
>>> print(h.size.compute(h, w))
49
-> h = w * w.scale / h.scale + offset = 100 * 2mm / 4mm - 1 = 49
"""

@FynnBe
Copy link
Member

FynnBe commented Mar 18, 2024

released as part of model description 0.5

@FynnBe FynnBe closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants