Why are pandas and numpy core dependencies for ibis? #9104
-
I've just started learning about ibis but am really excited about all the work happening here. Documentation is really top notch. High level question for the team/community -- I was a little surprised to see both pandas and numpy show up in my requirements file when I added ibis-framework to my project's dependencies. Shipping pandas with ibis does seem to transgress on the principle of portability a bit. So I was curious to know a little bit more about what role pandas and numpy are playing inside the library, outside the pandas backend-specific code. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @brendancooley, numpy is required because it is a required dependency of pandas (at least for now). pandas is a required dependency largely for historical reasons and we do intend to make it (and PyArrow) optional dependencies in the future, perhaps for 10.0. There is a draft PR here that can give more insight into the decoupling needed in the codebase before we can do that: #8804. Some more discussion is captured here: #8430.
Thanks! We are still improving here, any feedback on it (or in general) is more than welcome! I'm focusing on getting more how-to guides, improving the API docs, and more over the next couple months. |
Beta Was this translation helpful? Give feedback.
-
Hi @brendancooley 👋🏻 and welcome! I can add a bit more color to @lostmygithubaccount's response. Ibis was originally built for Impala, and during a time when there wasn't much in the way of in-memory containers for tabular data except pandas. Arrow was still just a twinkle in Wes's eye at the time (though it would show up very soon after Ibis started!). Pandas is one of the three output formats that we support:
For many users, NumPy is a required dependency because we access various numpy APIs when doing some output data type normalization in We're interested in making pandas optional (and perhaps even removing numpy as an explicit dependency if possible, though it will still show up as a transitive dependency), but doing this is not top priority at the moment. Regarding portability, the idea is that portability is at two levels:
|
Beta Was this translation helpful? Give feedback.
Hi @brendancooley 👋🏻 and welcome!
I can add a bit more color to @lostmygithubaccount's response.
Ibis was originally built for Impala, and during a time when there wasn't much in the way of in-memory containers for tabular data except pandas. Arrow was still just a twinkle in Wes's eye at the time (though it would show up very soon after Ibis started!).
Pandas is one of the three output formats that we support:
to_pandas()
: execute expression and return a pandasDataFrame
.to_pyarrow()
: execute expression and return a PyArrowTable
.to_polars()
(not yet supported on all backends IIRC): execute expression and return a Polars DataFrame.For many users,
to_pandas()
is still the go-to outpu…