Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the user facing DataFrame and Series constructor by hide the query_compiler parameter #7366

Open
sfc-gh-yzou opened this issue Aug 8, 2024 · 3 comments
Assignees
Labels
Interfaces and abstractions Issues with Modin's QueryCompiler, Algebra, or BaseIO objects

Comments

@sfc-gh-yzou
Copy link

Today, the constructor for dataframe looks like the following

def __init__(
        self,
        data=None,
        index=None,
        columns=None,
        dtype=None,
        copy=None,
        query_compiler: BaseQueryCompiler = None,
    ) -> None:

which contains a parameter query_compiler, and occurs in the generated documentation. However, we don't really want user to use this parameter, it is just for our internal construction usage.

In pandas, they allow construction of dataframe/series directly from BlockManager class, and the BlockManager is a property on NDFrame class https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py#L257. Pandas provides an internal method _from_mgr https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py#L309 to allow construction directly from BlockManager. In the frontend, pandas actually handles when data is a BlockManager directly, but it is not documented in the API.

In modin, maybe we can do something similar, where we can push the query_compiler construction to the BasePandasDataset, and provide an internal constructer _from_query_compiler to allow direct creation of the class from query compiler.
In that way, we will also be consistent with the pandas constructor definition.

@sfc-gh-yzou
Copy link
Author

cc @devin-petersohn @noloerino Do you think we can potentially improve this user facing behavior?

@noloerino
Copy link
Collaborator

I would slightly prefer changing data to accept BaseQueryCompiler and performing an isinstance(data, BaseQueryCompiler) check, since we already do isinstance checks to see if the data is an array or other modin frontend object. I'd also be fine with adding a DataFrame/Series._from_query_compiler static method for internal use like you suggested.

@sfc-gh-yzou
Copy link
Author

@noloerino we can let data to accept the query compiler interface, but if we do that, let's make sure the QueryCompiler doesn't occur in any of the user documentation, so that we can keep that completely hidden from the user.

@noloerino noloerino added the Interfaces and abstractions Issues with Modin's QueryCompiler, Algebra, or BaseIO objects label Sep 5, 2024
@noloerino noloerino self-assigned this Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interfaces and abstractions Issues with Modin's QueryCompiler, Algebra, or BaseIO objects
Projects
None yet
Development

No branches or pull requests

2 participants