Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does append() work on OSX? #48

Open
sykesdev opened this issue Oct 13, 2020 · 3 comments
Open

Does append() work on OSX? #48

sykesdev opened this issue Oct 13, 2020 · 3 comments

Comments

@sykesdev
Copy link

Has anyone been able to make append() work in a recent release?

Could anyone share with me a set of deps that allow collection.append() to work.

I've been through the various threads on this, but all seem to result in a silent failure.

I also get a lot of deprecation warnings of this sort...

Dask Name: read-parquet, 1 tasks
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1929: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:975: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
collection1.append('TEST', z1)
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1929: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1929: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:975: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1929: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:975: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
collection1.item('TEST').data
/Users/andrew/anaconda3/envs/feature_wrapper/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1929: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])
Dask DataFrame Structure:

I realise deprecation warnings aren't necessarily a problem, but perhaps they point to some underlying problem with deps.

I haven't included any particular config as I've been through many with varying degrees of error all centered around append() So I'm hoping someone has already trodden this same path more successfully than me :-)

@jen-co
Copy link

jen-co commented May 12, 2021

I am having the same issue - collection.append() is not appending any data.

these are my environment details:

MacOS Catalina 10.15.7
Using conda environment with python 3.6.10
pystore installed with pip 20.0.2
psytore version 0.1.22

When i first tried to run my app with pystore I got an error saying I need to install fsspec and to run the following command:
pip install 'fsspec>=0.3.3'. Once I did that I didnt get any runtime errors and everything appears to work except for this append issue.

@jen-co
Copy link

jen-co commented May 12, 2021

further update - I realised I hadn't set my dataframe timestamp column to a datetime type and hadnt either set it as the index. Once I had done this:

df['time-stamp'] = pd.to_datetime(df['time-stamp'])
df = df.set_index('time-stamp')

the append works fine. My hunch is that it was using a different column as index which happened to be the same value for each entry so was regarding the appended item as a duplicate.

@payasparab
Copy link

I had similar issues as jen-co and the advice helped! I converted the data types to ensure they match the existing pystore dtypes. I also changed the index to prevent duplicates (I am not using a specified index column) from being dropped and was able to get the append function to run, but had some serious issues with the result.

For other people having similar issues:

data_to_add = _some dataframe_
collection = pystore.store('_store_name_').collection('_collection_name_')
item_len = len(collection.item('_item_name_').data)
dtype_conv = collection.item('_item_name_').data.dtypes 
data_to_add = data_to_add.astype(dtype_conv)
data_to_add.index = range(item_len, item_len + len(data_to_add)) # just using a standard int index 
collection.append('_item_name_', data_to_add)

Couple questions for you @jen-co : What did you experience for the run time of the append on a large dataset? Also, what was the size/format of the parquet files that it updated?

For me, the runtime was very long compared to other pystore functionality. It also used way more RAM than I was expecting. Also, it squeezed all of the data into a single parquet file (which may contribute to slow runtime):

image

Any help is appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants