Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Optimize performance of pyfury #1993

Open
chaokunyang opened this issue Jan 3, 2025 · 2 comments
Open

[Python] Optimize performance of pyfury #1993

chaokunyang opened this issue Jan 3, 2025 · 2 comments

Comments

@chaokunyang
Copy link
Collaborator

chaokunyang commented Jan 3, 2025

Feature Request

pyfury is 3x faster than pickle serialization and 2x faster than pickle deserialization, here is the benchmark code:

@dataclass
class ComplexObject1:
    f1: Any = None
    f2: str = None
    f3: List[str] = None
    f4: Dict[pyfury.Int8Type, pyfury.Int32Type] = None
    f5: pyfury.Int8Type = None
    f6: pyfury.Int16Type = None
    f7: pyfury.Int32Type = None
    f8: pyfury.Int64Type = None
    f9: pyfury.Float32Type = None
    f10: pyfury.Float64Type = None
    f12: List[pyfury.Int16Type] = None


@dataclass
class ComplexObject2:
    f1: Any
    f2: Dict[pyfury.Int8Type, pyfury.Int32Type]


fury = pyfury.Fury(language=pyfury.Language.PYTHON)
fury.register_type(ComplexObject1)
fury.register_type(ComplexObject2)
o = COMPLEX_OBJECT
start = time.time()
binary = fury.serialize(o)
for i in range(50000000):
    # binary = fury.serialize(o)
    fury.deserialize(binary)
print(time.time() - start)
start = time.time()
binary = pickle.dumps(o)
for i in range(500000):
    # binary = pickle.dumps(o)
    pickle.loads(binary)
print(time.time() - start)

But the performance is not fast enough still, with the flame graph, we can see there are still performance improvement space:

out

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@pandalee99
Copy link
Contributor

It's more of a long-term need, and here I have some questions. 1. Whether it can improve efficiency by destroying readability (for example, reducing or reusing some variables); 2, whether the function implementation needs to use cpp as much as possible;

@chaokunyang
Copy link
Collaborator Author

Depending on the performance gains we can get, it's ok to compromise some code readability. And if cpp is faster, we should use it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants