Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Natively support creating a TorchArrow column from a numpy array #179

Open
scotts opened this issue Feb 4, 2022 · 1 comment
Open

Natively support creating a TorchArrow column from a numpy array #179

scotts opened this issue Feb 4, 2022 · 1 comment

Comments

@scotts
Copy link
Contributor

scotts commented Feb 4, 2022

If users create a column from a Python list, we actually dispatch that directly to C++. For example,

vals = [1, 2, 3, 4, 5]
col = ta.Column(vals, device="cpu")

We dispatch that directly to C++ through pybind11:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/lib.cpp#L135-L141
However, if a user creates a column from a numpy array, we currently have to handle that (slowly) in Python. For example,

vals = [1, 2, 3, 4, 5]
arr = numpy.array(vals)
col = ta.Colmun(arr, device="cpu")

That will be handled only on the Python side:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/torcharrow/scope.py#L226-L233
We should be able to handle numpy arrays natively in C++; pybind11 already exposes a numpy array type.

@wenleix
Copy link
Contributor

wenleix commented Feb 4, 2022

Here is the original from_numpy API prototype: https://github.com/facebookresearch/torcharrow/blob/95daa1fabd5a3098be112d487e085e13f5447786/torcharrow/_interop.py#L88-L100

But i don't think we have supported natively in CPU backend (only in the "demo" backend where data is stored as numpy array -- removed in #33)

Some API reference:

YLGH pushed a commit to YLGH/torcharrow that referenced this issue May 7, 2022
Summary:
Pull Request resolved: pytorch/torchrec#179

* add the `expand_into_jagged_permute` GPU kernel callsite for generating 1D sparse data permute

Reviewed By: youyou6093

Differential Revision: D34778094

fbshipit-source-id: d14174cea809f3e33b1d860d297c7d318a930e34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants