Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add block #436

Open
JohnCoene opened this issue Oct 15, 2024 · 10 comments
Open

Add block #436

JohnCoene opened this issue Oct 15, 2024 · 10 comments
Labels

Comments

@JohnCoene
Copy link
Member

We have an issue in the add block, where it does not the right blocks or does not show any block available when it should.

See {blockr.composer} register: https://github.com/BristolMyersSquibb/blockr.composer/blob/master/R/register.R

I would expect to be able to add a new_freq_block after a new_create_table_block since their input and output match but when I have a new_create_table_block I'm unable to add any block after it.

@DivadNojnarg
Copy link
Collaborator

Thanks. I can reproduce it.

@DivadNojnarg
Copy link
Collaborator

Could you add the table_block class to new_create_table_block in the register function? https://github.com/BristolMyersSquibb/blockr.composer/blob/master/R/register.R#L9.

@JohnCoene
Copy link
Member Author

No, it does not use that class. the table_block is used for the stubs that come after it, namely to control the rendering of the table in the block which cannot be done in new_create_table_block.

See here, it does not use that class.

@DivadNojnarg
Copy link
Collaborator

The issue is at this line of get_compatible_blocks():

last_blk_output <- registry[grep(cls, registry$classes), "output"]

There isn't any match between cls (that are the classes of the block) and registry$classes that are the class written during the block registration. A short term workaround: you can add the transform_block class to new_create_table_block.

blockr::register_block(
  new_create_table_block,
  "Clinical table",
  "Create a clinical table",
  input = "parent_data",
  output = "clinical_table",
  package = "blockr.composer",
  classes = c("create_table_block", "transform_block")
)
Screenshot 2024-10-16 at 14 43 34

In the long run, the registry has to be slightly adjusted. For instance, I am not convinced that the registry classes are useful, given that this information can be extracted after constructing the block with the class attribute. This can even lead to more troubles: imagine a select block (select_block, transform_block) where you can pass whatever class in the registry (like data_block ... by mistake), that does not add any information and we can't rely on that.

@JohnCoene
Copy link
Member Author

Why are we comparing on classes? I thought this is why we defined input and output.

Can we not just remove this check on classes instead? I'm not sure why it's there.

Wasn't the logic:

  1. No block on stack: show blocks where input NA
  2. Show blocks that have input that matches last block in stack output

@nbenn
Copy link
Collaborator

nbenn commented Nov 4, 2024

@JohnCoene the problem here is the following I believe: In order to check whether a new block can be inserted in a (tail) position we have to check whether the input type of the new block matches the output type of the previous.

In order to do so, we have to be able to look up the registry entry of the previous block. And this is what I am attempting in #445, essentially by using a "registry ID" and saving that with the block.

One issue wich such an approach (cc @DivadNojnarg) is that we introduce a dependency (for a serialized block) on the runtime (registry) state from when/where the block was created.

This will cause issues if for example a block was added to the registry in one session, but is not available in the session where the block is being restored.

@nbenn
Copy link
Collaborator

nbenn commented Nov 5, 2024

Another issue: blocks that are not created in-app, but eg via set-up as

new_stack(
  new_dataset_block("iris")
)

are missing the required "registry link" as well. In light of these issues, it might make more sense to store the block attributes we require over the entire block-lifetime directly with the block and set this up during object construction.

We keep all registry-concerned metadata with the registry (such as name, description, category), but move input/output to the block definition. This will involve a breaking change to how blocks are defined. I see two ways of doing this:

  1. This can happen directly as part of the block ctor definition, by modifying calls to new_block(), something like

    new_some_block <- function() {
      ...
      new_block(
        ...,
        input = "data.frame",
        output = "data.frame"
      )
    }
  2. We can offer a utility, something like block_constructor() as

    new_some_block <- block_constructor(
      constructor = function() {
        ...
        new_block(
          ...
        )
      },
      input = "data.frame",
      output = "data.frame"
    )
  3. Another approach could be via a new set of generics ofc. We could have something like

    block_input.some_block <- function() "data.frame"
    block_output.some_block <- function() "data.frame" 

    This would have the advantage that we can leverage inheritance directly, e.g. have methods for transform_block and that way it would not have to be explicit for any of the existing blocks.

As for the input/output specification, we could improve this as well. We could do something like

  • input is a predicate function, e.g. function(x) inherits(x, "data.frame")
  • output is a ptype, something like data.frame()

That way the "backend" just has to output of the new block into input of the previous and that's all.

@DivadNojnarg any thoughts? I'm kind of leaning towards the generic approach as we're using this pattern in many places already and this seem to be the least "intrusive" change.

@DivadNojnarg
Copy link
Collaborator

@nbenn

  • From where would you call block_input.some_block in 3?
  • I don't see how 2 can be used. Seems like a major breaking change.

Overall, I'd prefer if we could use generics to have more customisation options. Could we have a mix of 1 and 3 like:

new_block <- function(fields, expr, name = rand_names(),
                      submit = FALSE, ...,
                      class = character()) {
  stopifnot(
    is.list(fields),
    all(lgl_ply(fields, is_field)),
    is.language(expr),
    is_string(name)
  )

  if (is.na(submit)) {
    submit <- 0
  } else {
    submit <- if (submit) 1 else -1
  }

  blk <- structure(fields,
    name = name, expr = expr, result = NULL, submit = submit, ...,
    class = c(class, "block")
  )
  attr(blk, "input") <- block_input(blk)
  attr(blk, "output") <- block_output(blk)
  blk
}

@nbenn
Copy link
Collaborator

nbenn commented Nov 5, 2024

From where would you call block_input.some_block in 3?

Anywhere we need to. For example in get_compatible_blocks(), we could iterate through the registry

for (x in get_registry()) {
  identical(block_input(x()), block_output(stack[[length(stack)]]))
}

to check which new blocks are compatible with the current last stack block.

I don't see how 2 can be used. Seems like a major breaking change.

Whenever a block constructor is defined. And yes that'd be a major breaking change. But the same holds true for option (1). That's why I am leaning towards (3).

@DivadNojnarg
Copy link
Collaborator

Ok, 3 seems fine to me, as we expose a better API around inputs/outputs. With such approach we also don't need all the work around block id, and we can probably drop the get_registry() function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants