Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte #29457

Open
2 of 3 tasks
Habeeb556 opened this issue Jul 2, 2024 · 8 comments
Labels
#bug:regression Bugs that are identified as regessions sqllab Namespace | Anything related to the SQL Lab

Comments

@Habeeb556
Copy link

Bug description

I encountered the following error when querying select * from table in SQL Lab. This issue occurred after upgrading from Superset version 2.1.3 to version 4.0.1.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 6: invalid continuation byte

How to reproduce the bug

Any table column with a TIMESTAMP datatype generates this error.

Screenshots/recordings

image

Superset version

4.0.1

Python version

3.11

Node version

I don't know

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added #bug:regression Bugs that are identified as regessions sqllab Namespace | Anything related to the SQL Lab labels Jul 2, 2024
Copy link

dosubot bot commented Jul 2, 2024

To resolve the UnicodeDecodeError when querying a table with a TIMESTAMP datatype in SQL Lab after upgrading from Superset version 2.1.3 to version 4.0.1, ensure that the timestamp columns are properly normalized to a pandas-supported datetime format. This can be done using the utils.normalize_dttm_col function in the get_df method of the BaseViz class.

Here is the relevant code snippet from the get_df method:

if not df.empty:
    utils.normalize_dttm_col(
        df=df,
        dttm_cols=tuple(
            [
                DateColumn.get_legacy_time_column(
                    timestamp_format=timestamp_format,
                    offset=self.datasource.offset,
                    time_shift=self.form_data.get("time_shift"),
                )
            ]
        ),
    )

    if self.enforce_numerical_metrics:
        self.df_metrics_to_num(df)

    df.replace([np.inf, -np.inf], np.nan, inplace=True)

Ensure that this normalization step is correctly applied to handle the timestamp columns and avoid UnicodeDecodeError [1].

Additionally, the Pinot database connector is known to cause issues with TIMESTAMP columns in Superset. The Pinot driver infers TIMESTAMP columns as LONG, which can lead to problems such as UnicodeDecodeError. Here is a quick fix for the Pinot driver:

# Pinot driver infers TIMESTAMP column as LONG, so make the quick fix.
# When the Pinot driver fixes this bug, the current method could be removed.
if isinstance(sqla_column_type, types.TIMESTAMP):
    return sqla_column_type.compile().upper()

This fix ensures that TIMESTAMP columns are correctly interpreted and processed [2].

To continue talking to Dosu, mention @dosu.

@mistercrunch
Copy link
Member

#28266 was first cherry-picked in 4.0.2

Please try upgrading and confirm the issue is fixed

@Habeeb556
Copy link
Author

@mistercrunch unfortunately, this did not fix the issue. I upgraded to version 4.0.2 and encountered the same error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
ERROR:superset.views.base:'utf-8' codec can't decode byte 0xff in position 6: invalid start byte

Additionally, I noticed that this issue occurs only when selecting columns with the TIMESTAMP datatype. All other columns work fine. It worked correctly with version 2.1.3 when I switched back.

@mistercrunch
Copy link
Member

mistercrunch commented Jul 3, 2024

Full stracktrace please! Also curious which database engine/driver/version your are using.

@Habeeb556
Copy link
Author

Database engine: mssql+pyodbc
Version: 5.1.0

Stracktrace:

'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
2024-07-03 20:26:50,670:ERROR:superset.views.base:'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Triggering query_id: 41782
2024-07-03 20:26:50,944:INFO:superset.commands.sql_lab.execute:Triggering query_id: 41782
Query 41782: Running query on a Celery worker
2024-07-03 20:26:50,954:INFO:superset.sqllab.sql_json_executer:Query 41782: Running query on a Celery worker
'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
2024-07-03 20:26:59,507:ERROR:superset.views.base:'utf-8' codec can't decode byte 0xff in position 6: invalid start byte
Traceback (most recent call last):
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1484, in full_dispatch_request
   rv = self.dispatch_request()
        ^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/app.py", line 1469, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 127, in wraps
   raise ex
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/views/base_api.py", line 121, in wraps
   duration, response = time_function(f, self, *args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/core.py", line 1470, in time_function
   response = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask_appbuilder/api/__init__.py", line 183, in wraps
   return f(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/utils/log.py", line 255, in wrapper
   value = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/sqllab/api.py", line 346, in get_results
   payload = json.dumps(
             ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/__init__.py", line 395, in dumps
   **kw).encode(obj)
         ^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 298, in encode
   chunks = self.iterencode(o)
            ^^^^^^^^^^^^^^^^^^
 File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/simplejson/encoder.py", line 379, in iterencode
   return _iterencode(o, 0)
          ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 6: invalid start byte

@mistercrunch
Copy link
Member

Oh it appears 4.0.2 does not include the large json refactor that centralized all calls to superset/utils/json.py here -> #28702

This should make 4.1.x I believe, I don't recommend brining in this large refactor as a cherry as it'll merge-conflict heavily

@mistercrunch
Copy link
Member

mistercrunch commented Jul 3, 2024

@Habeeb556 if you have the ability to test against the master branch, you could confirm that it's working there. I'm tempted to close the issue, but will wait until you confirm the fix.

@Habeeb556
Copy link
Author

@mistercrunch, I have some good news and bad news. The good news is that I think I have successfully pushed to the master branch, and the query is running fine. However, the bad news is that the output is incorrectly formatted with Chinese characters.

image

I'm not sure if this is a bug or if my push was incorrect and missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug:regression Bugs that are identified as regessions sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

No branches or pull requests

2 participants