Support TorchArrow column with binary type (SQL VARBINARY, pyarrow.binary) #178

scotts · 2022-02-04T23:10:57Z

Summary: Pull Request resolved: pytorch/torchrec#178 Instead of naively showing input as just the pooling factor and output as just the embedding dimension this diff changes planner stats to use the actual size of input & output in terms of megabytes per iteration **input**: global_batch_size * pooling factor * sizeof(dtype of input) **output**: global_batch_size * (output size (1 in pooled)) * sizeof(dytpe of emb) * emb_dim This provides a sense of scale for data coming in and out, and additionally makes plans with multiple sharding types directly comparable. Also fixes a bug with TWCW, we incorrectly specified the ranks as entire world size when it should be limited to the local world of the host that the parameter is sharded on. Reviewed By: dstaay-fb Differential Revision: D35153224 fbshipit-source-id: c1e7d717ec0c1d074f7e059d843fba2d287eee56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support TorchArrow column with binary type (SQL VARBINARY, pyarrow.binary) #178

Support TorchArrow column with binary type (SQL VARBINARY, pyarrow.binary) #178

scotts commented Feb 4, 2022

Support TorchArrow column with binary type (SQL VARBINARY, pyarrow.binary) #178

Support TorchArrow column with binary type (SQL VARBINARY, pyarrow.binary) #178

Comments

scotts commented Feb 4, 2022