Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download to CSV fails when values contain null characters #29508

Closed
3 tasks done
Afsalmc opened this issue Jul 7, 2024 · 2 comments
Closed
3 tasks done

Download to CSV fails when values contain null characters #29508

Afsalmc opened this issue Jul 7, 2024 · 2 comments
Labels
data:csv Related to import/export of CSVs sqllab Namespace | Anything related to the SQL Lab

Comments

@Afsalmc
Copy link

Afsalmc commented Jul 7, 2024

Bug description

If the Query result contains null characters Exporting to CSV fails. Getting error logs:
_csv.Error: need to escape, but no escapechar set

How to reproduce the bug

  1. Go to SQL Lab
  2. Execute query that can produce result like '\x00\x00\x00\x00v\x1e\x04\x0c'
  3. Download to CSV

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Copy link

dosubot bot commented Jul 7, 2024

Hey @Afsalmc! 👋 I'm here to help you with any bugs, questions, or contributions you have while waiting for a human maintainer. Let's squash those bugs together!

To resolve the issue where downloading query results to CSV fails with the error _csv.Error: need to escape, but no escapechar set when the values contain null characters, you need to set the escapechar parameter in the CSVReader configuration.

Here is an example of how you can modify the CSVReader configuration to include the escapechar:

UploadCommand(
    upload_database.id,
    CSV_UPLOAD_TABLE,
    create_csv_file(CSV_FILE_WITH_NULLS),
    None,
    CSVReader({"null_values": ["N/A", "None"], "escapechar": "\\"}),
).run()

This sets the escape character to a backslash (\), which will handle the null characters properly during the CSV export.

Additionally, you can use the df_to_escaped_csv function from the superset/utils/csv.py module to ensure that special characters are properly escaped in your DataFrame before exporting it to CSV:

import pandas as pd
from superset.utils.csv import df_to_escaped_csv

# Assuming df is your DataFrame
csv_data = df_to_escaped_csv(df)
with open('output.csv', 'w') as file:
    file.write(csv_data)

This function will escape problematic characters and set the escape character to a backslash (\), ensuring that the CSV export handles null characters and other special cases correctly [1][2].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added data:csv Related to import/export of CSVs sqllab Namespace | Anything related to the SQL Lab labels Jul 7, 2024
@Afsalmc
Copy link
Author

Afsalmc commented Jul 7, 2024

Solution:

We can control what escape characters should be set while exporting pandas Dataframe through a superset config. By default we are not setting any escape character, only the encoding to use is set.
CSV_EXPORT = {"encoding": "utf-8"}

We can modify this config to specify an escape character,

CSV_EXPORT = {"encoding": "utf-8","escapechar": "\\"}

which will then passed as **kwargs to to_csv() in csv.py

def df_to_escaped_csv(df: pd.DataFrame, **kwargs: Any) -> Any:
   ...........................................................
   ...........................................................
    return df.to_csv(**kwargs)

@Afsalmc Afsalmc closed this as completed Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:csv Related to import/export of CSVs sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

No branches or pull requests

1 participant