You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello team, I'm new to PyCylon and have an issue related to distributed sorting.
It seems that an empty dataframe in one process triggers an Exception when I perform sort_values in distributed way.
As the context, I'm using pycylon version 0.6.0 and using MPI environment.
To reproduce the issue, I created the following snippet code.
Basically, it will initiate 2 rows of dataframe on each process. Then, it will perform the distributed groupby on column id. Only one id will be kept (id=0), this is done by filtering the resulting dataframe. This will make every process to have no rows, except for one process that holds the row of id=0. After that, I run another distributed groupby and success. Next, it performs the distributed sort_values and triggers an exception ValueError: Operation failed: : empty vector passed onto merge.
The following is the simple code from above scenario.
Based on above code and behavior, the distributed groupby operation can work when one process has row and others don't. However, it's not the same case with the distributed sort_values.
Thank you.
The text was updated successfully, but these errors were encountered:
Hello team, I'm new to PyCylon and have an issue related to distributed sorting.
It seems that an empty dataframe in one process triggers an Exception when I perform sort_values in distributed way.
As the context, I'm using pycylon version 0.6.0 and using MPI environment.
To reproduce the issue, I created the following snippet code.
Basically, it will initiate 2 rows of dataframe on each process. Then, it will perform the distributed groupby on column id. Only one id will be kept (id=0), this is done by filtering the resulting dataframe. This will make every process to have no rows, except for one process that holds the row of id=0. After that, I run another distributed groupby and success. Next, it performs the distributed sort_values and triggers an exception
ValueError: Operation failed: : empty vector passed onto merge
.The following is the simple code from above scenario.
Based on above code and behavior, the distributed groupby operation can work when one process has row and others don't. However, it's not the same case with the distributed sort_values.
Thank you.
The text was updated successfully, but these errors were encountered: