Download to CSV fails when values contain null characters #29508

Afsalmc · 2024-07-07T09:16:21Z

Bug description

If the Query result contains null characters Exporting to CSV fails. Getting error logs:
_csv.Error: need to escape, but no escapechar set

How to reproduce the bug

Go to SQL Lab
Execute query that can produce result like '\x00\x00\x00\x00v\x1e\x04\x0c'
Download to CSV

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

I have searched Superset docs and Slack and didn't find a solution to my problem.
I have searched the GitHub issue tracker and didn't find a similar bug report.
I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

dosubot · 2024-07-07T09:18:20Z

Hey @Afsalmc! 👋 I'm here to help you with any bugs, questions, or contributions you have while waiting for a human maintainer. Let's squash those bugs together!

To resolve the issue where downloading query results to CSV fails with the error _csv.Error: need to escape, but no escapechar set when the values contain null characters, you need to set the escapechar parameter in the CSVReader configuration.

Here is an example of how you can modify the CSVReader configuration to include the escapechar:

UploadCommand(
    upload_database.id,
    CSV_UPLOAD_TABLE,
    create_csv_file(CSV_FILE_WITH_NULLS),
    None,
    CSVReader({"null_values": ["N/A", "None"], "escapechar": "\\"}),
).run()

This sets the escape character to a backslash (\), which will handle the null characters properly during the CSV export.

Additionally, you can use the df_to_escaped_csv function from the superset/utils/csv.py module to ensure that special characters are properly escaped in your DataFrame before exporting it to CSV:

import pandas as pd
from superset.utils.csv import df_to_escaped_csv

# Assuming df is your DataFrame
csv_data = df_to_escaped_csv(df)
with open('output.csv', 'w') as file:
    file.write(csv_data)

This function will escape problematic characters and set the escape character to a backslash (\), ensuring that the CSV export handles null characters and other special cases correctly ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Afsalmc · 2024-07-07T09:21:02Z

Solution:

We can control what escape characters should be set while exporting pandas Dataframe through a superset config. By default we are not setting any escape character, only the encoding to use is set.
CSV_EXPORT = {"encoding": "utf-8"}

We can modify this config to specify an escape character,

CSV_EXPORT = {"encoding": "utf-8","escapechar": "\\"}

which will then passed as **kwargs to to_csv() in csv.py

def df_to_escaped_csv(df: pd.DataFrame, **kwargs: Any) -> Any:
   ...........................................................
   ...........................................................
    return df.to_csv(**kwargs)

dosubot bot added data:csv Related to import/export of CSVs sqllab Namespace | Anything related to the SQL Lab labels Jul 7, 2024

Afsalmc closed this as completed Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download to CSV fails when values contain null characters #29508

Download to CSV fails when values contain null characters #29508

Afsalmc commented Jul 7, 2024

dosubot bot commented Jul 7, 2024

Afsalmc commented Jul 7, 2024

Download to CSV fails when values contain null characters #29508

Download to CSV fails when values contain null characters #29508

Comments

Afsalmc commented Jul 7, 2024

Bug description

How to reproduce the bug

Screenshots/recordings

Superset version

Python version

Node version

Browser

Additional context

Checklist

dosubot bot commented Jul 7, 2024

Afsalmc commented Jul 7, 2024