Allow simple text as input for upload_data_to_dataset #78

mhaas · 2020-09-18T13:59:16Z

Right now, we only allow binary, which requires additional work compared to just opening a file or passing text.

mhaas · 2020-09-22T13:30:16Z

This is actually not so easy to implement. The requests library strongly prefers that a binary stream (or data) is passed: https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads

The naive solution is to read the entire data into memory and just convert it there. This will however require a lot of memory for e.g. a 5 GiB file, so I would rather not do that.

If we allow file handles in text (non-binary) mode, we have to create a wrapper which will decode utf-8 characters to bytes while also handling multi-byte characters. This SO post provides some insight: https://stackoverflow.com/questions/55889474/convert-io-stringio-to-io-bytesio

We can implement this ourselves, but it will not be straightforward to get the entire size of the byte string without processing the entire string. This may even be OK, as it is linear effort. If we do not have the size of the stream, then requests will switch to Chunk-Encoded and I am not sure if the Data Attribute Recommendation service supports this.

Another solution is to use the codecs.iterdecode function. This returns an iterable, which will again cause requests to use the Chunk-Encoded mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow simple text as input for upload_data_to_dataset #78

Allow simple text as input for upload_data_to_dataset #78

mhaas commented Sep 18, 2020 •

edited

Loading

mhaas commented Sep 22, 2020

Allow simple text as input for upload_data_to_dataset #78

Allow simple text as input for upload_data_to_dataset #78

Comments

mhaas commented Sep 18, 2020 • edited Loading

mhaas commented Sep 22, 2020

mhaas commented Sep 18, 2020 •

edited

Loading