Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'bool' object has no attribute 'map' ​while using Predicate #789

Open
littlehomelessman opened this issue Jan 19, 2023 · 0 comments

Comments

@littlehomelessman
Copy link

littlehomelessman commented Jan 19, 2023

Hello team,

I'm trying to split training set and test set in a 80:20 ratio using predicate. And I got the following error:

/home/xzk/.local/lib/python3.7/site-packages/petastorm/hdfs/namenode.py:270: FutureWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
  return pyarrow.hdfs.connect(hostname, url.port or 8020, **kwargs)
Worker 3 terminated: unexpected exception:
Traceback (most recent call last):
  File "/home/xzk/.local/lib/python3.7/site-packages/petastorm/workers_pool/thread_pool.py", line 62, in run
    self._worker_impl.process(*args, **kargs)
  File "/home/xzk/.local/lib/python3.7/site-packages/petastorm/arrow_reader_worker.py", line 150, in process
    all_cols = self._load_rows_with_predicate(parquet_file, piece, worker_predicate, shuffle_row_drop_partition)
  File "/home/xzk/.local/lib/python3.7/site-packages/petastorm/arrow_reader_worker.py", line 258, in _load_rows_with_predicate
    erase_mask = match_predicate_mask.map(operator.not_)
AttributeError: 'bool' object has no attribute 'map'
Iteration on Petastorm DataLoader raise error: AttributeError("'bool' object has no attribute 'map'")

I notice that:

~/.local/lib/python3.7/site-packages/petastorm/arrow_reader_worker.py in _load_rows_with_predicate(self, pq_file, piece, worker_predicate, shuffle_row_drop_partition)
    256 
    257         match_predicate_mask = worker_predicate.do_include(predicates_data_frame)
--> 258         erase_mask = match_predicate_mask.map(operator.not_)

Where do_include(...) seems to return bool only.

Is this a bug? Or I'm using predicate in a wrong way? Please help, thank you!

My code:

def train_model(num_epochs=100, batch_size=1000):
    
    for epoch in range(num_epochs):

        with DataLoader(
            make_batch_reader(dataset_url, num_epochs=reader_epochs, schema_fields=None,
                              transform_spec=None, seed=1, shuffle_rows=False, shuffle_row_groups=False,
                             predicate=in_pseudorandom_split([0.8, 0.2], 0, "some_column_name")),
            batch_size=150) as dataloader:
            
            for raw in dataloader:
                print(raw)
                break
            
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant