Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dataclass in list default input error #5985

Open
2 tasks done
Future-Outlier opened this issue Nov 11, 2024 · 6 comments · May be fixed by flyteorg/flytekit#2931
Open
2 tasks done

[BUG] dataclass in list default input error #5985

Future-Outlier opened this issue Nov 11, 2024 · 6 comments · May be fixed by flyteorg/flytekit#2931
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@Future-Outlier
Copy link
Member

Future-Outlier commented Nov 11, 2024

Describe the bug

The following example should work, maybe there's something wrong with how we handle click types input.

Expected behavior

it should work.

Additional context to reproduce

env:
python 3.12, flytekit master branch
flyte backend: master branch

from textwrap import shorten

from flytekit import task, workflow
from dataclasses import dataclass
from typing import Any, Dict, Optional, List
from flytekit.core.type_engine import TypeEngine
from dataclasses_json import dataclass_json
from flyteidl.core.execution_pb2 import TaskExecution
from flytekit.core.context_manager import FlyteContextManager
from flytekit.configuration import SerializationSettings
from flytekit.core.base_task import PythonTask
from flytekit.core.interface import Interface
from flytekit.extend.backend.base_agent import (
    AgentRegistry,
    Resource,
    SyncAgentBase,
    SyncAgentExecutorMixin,
)
from flytekit.models.literals import LiteralMap
from flytekit.models.task import TaskTemplate


@dataclass_json
@dataclass
class Foo:
    val: str


class FooAgent(SyncAgentBase):
    def __init__(self) -> None:
        super().__init__(task_type_name="foo")

    def do(
        self,
        task_template: TaskTemplate,
        inputs: Optional[LiteralMap] = None,
        **kwargs: Any,
    ) -> Resource:
        return Resource(
            phase=TaskExecution.SUCCEEDED, outputs={"foos": [Foo(val="a"), Foo(val="b")], "has_foos": True}
        )


AgentRegistry.register(FooAgent())


class FooTask(SyncAgentExecutorMixin, PythonTask):  # type: ignore
    _TASK_TYPE = "foo"

    def __init__(self, name: str, **kwargs: Any) -> None:
        task_config: dict[str, Any] = {}

        outputs = {"has_foos": bool, "foos": Optional[List[Foo]]}

        super().__init__(
            task_type=self._TASK_TYPE,
            name=name,
            task_config=task_config,
            interface=Interface(outputs=outputs),
            **kwargs,
        )

    def get_custom(self, settings: SerializationSettings) -> Dict[str, Any]:
        return {}


foo_task = FooTask(name="foo_task")


@task
def foos_task(foos: list[Foo]) -> None:
    print(f"hi {foos}")


@workflow
def dc_wf(foos: list[Foo] = [Foo(val="a"), Foo(val="b")]) -> None:
    has_foos, foos = foo_task()
    foos_task(foos=foos)


if __name__ == "__main__":
    dc_wf([Foo(val="a"), Foo(val="b")])

Screenshots

image

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@Future-Outlier Future-Outlier added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Nov 11, 2024
@Future-Outlier Future-Outlier added good first issue Good for newcomers and removed untriaged This issues has not yet been looked at by the Maintainers labels Nov 11, 2024
@dalaoqi
Copy link

dalaoqi commented Nov 11, 2024

#take

@Future-Outlier
Copy link
Member Author

dataclass in dict default input doesn't work too.
Please do them together if possible. <3.

@dataclass
class DC:
    ff: FlyteFile = field(default_factory=lambda: FlyteFile(os.path.realpath(__file__)))
    sd: StructuredDataset = field(default_factory=lambda: StructuredDataset(
        uri="/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/df.parquet",
        file_format="parquet"
    ))
    fd: FlyteDirectory = field(default_factory=lambda: FlyteDirectory(
        "/Users/future-outlier/code/dev/flytekit/build/debugyt/user/FE/src/data/"
    ))


@task(container_image=image)
def t1(dc: DC = DC()) -> DC:
    with open(dc.ff, "r") as f:
        print("File Content: ", f.read())

    print("sd:", dc.sd.open(pd.DataFrame).all())

    df_path = os.path.join(dc.fd.path, "df.parquet")
    print("fd: ", os.path.isdir(df_path))

    return dc

@workflow
def wf(dc: DC = DC()):
    t1(dc=dc)

@task(container_image=image)
def list_t1(list_dc: list[DC] = [DC(), DC()]) -> list[DC]:
    for dc in list_dc:
        with open(dc.ff, "r") as f:
            print("File Content: ", f.read())

        print("sd:", dc.sd.open(pd.DataFrame).all())

        df_path = os.path.join(dc.fd.path, "df.parquet")
        print("fd: ", os.path.isdir(df_path))

    return list_dc

@workflow
def list_wf(list_dc: list[DC] = [DC(), DC()]):
    list_t1(list_dc=list_dc)

@task(container_image=image)
def dict_t1(dict_dc: dict[str, DC] = {"a": DC(), "b": DC()}) -> dict[str, DC]:
    for _, dc in dict:
        with open(dc.ff, "r") as f:
            print("File Content: ", f.read())

        print("sd:", dc.sd.open(pd.DataFrame).all())

        df_path = os.path.join(dc.fd.path, "df.parquet")
        print("fd: ", os.path.isdir(df_path))

    return dict_dc

@workflow
def dict_wf(dict_dc: dict[str, DC] = {"a": DC(), "b": DC()}):
    dict_t1(dict_dc=dict_dc)

@wild-endeavor
Copy link
Contributor

wait why are we touching the click types in the pr? I don't think click is being invoked. if you just run the first example in the description off of master flytekit, you will get the error Got multiple value....

This error should be fixed. It's coming from here: https://github.com/flyteorg/flytekit/blob/e19bbcc1041df1b4826a6d66cad215bf13cad4fa/flytekit/core/promise.py#L1413. This was added by flyteorg/flytekit#2522 when we wanted to start supporting positional args, not just kwargs. The "multiple" comes from the fact that the foos input to the dc_wf workflow has a default value ([Foo(val="a"), Foo(val="b")]). We should fix this, but if you remove the default, it should work.

But it doesn't work. Instead I get another error

  Error converting input 'foos' at position 0:
  The expected python type is 'typing.List[__main__.Foo]' but the received Flyte
literal value is not a collection (Flyte's representation of Python lists).

can we make this issue capture/address both?

@dalaoqi
Copy link

dalaoqi commented Nov 15, 2024

Got it, I also found that errors.
I’ll address both issues simultaneously.

@wild-endeavor
Copy link
Contributor

thank you.

@dalaoqi
Copy link

dalaoqi commented Nov 19, 2024

@Future-Outlier There seems to be a syntax error in the dict_t1 function. It would be better to use dict_dc.items() to iterate over a dictionary in this context, as it provides both keys and values. This might explain why the demo last week didn't execute successfully.

@wild-endeavor I noticed that, in addition to the Got multiple values error and the issue with type Flyte literal, the first example also throws a TypeError: the JSON object must be str, bytes, or bytearray, not Foo. Therefore, I believe we need to make some corrections in click_types.py as well.

Here are the errors I encountered while trying the cases mentioned above:

  • TypeError: the JSON object must be str, bytes or bytearray, not Foo
    image

  • AttributeError: 'NoneType' object has no attribute 'literals'
    image

  • Got multiple values
    image

Here are the results after making the revisions:

  • First case
    image
    image

  • Second case
    To reduce the length, I commented out part of print("File Content: ", f.read()); otherwise, it would print the entire code.

    • 2-1 wf
      image
    • 2-2 list_wf
      image
    • 2-3 dict_wf
      image

@davidmirror-ops davidmirror-ops moved this from Backlog to Assigned in Flyte Issues/PRs maintenance Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
Status: Assigned
3 participants