Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starrocks executing a specific query will cause the Query history page to report an error and not load the data #29991

Open
3 tasks done
kainchow opened this issue Aug 22, 2024 · 4 comments
Labels
data:connect:starrocks sqllab Namespace | Anything related to the SQL Lab validation:required A committer should validate the issue

Comments

@kainchow
Copy link
Contributor

Bug description

Starrocks executing a specific query will cause the Query history page to report an error and not load the data. Error msg: An error occurred while fetching Query historys: Fatal error
Snipaste_2024-08-22_15-14-36

superset_mysql_starrocks_query_history_error.mp4

How to reproduce the bug

  1. Create a data source with mysql, fill in the Starrocks cluster address and account secret.
  2. Goto Query history page(/sqllab/history/), at this point, you can see the query record normally.
  3. Goto Sql Lab, select the Starrocks data source you just created.
  4. Execute the following sql: select date_add(current_date, -1) as yst_date.
  5. Return to the Query history page, at this point the page reported an error, can not browse the query history.

Screenshots/recordings

superset_app container logs:
2024-08-22 06:53:04,605:ERROR:flask_appbuilder.api:list index out of range
Traceback (most recent call last):
File "/app/superset/sql_parse.py", line 297, in _extract_tables_from_sql
statements = parse(self.stripped(), dialect=self._dialect)
File "/usr/local/lib/python3.10/site-packages/sqlglot/init.py", line 87, in parse
return Dialect.get_or_raise(read or dialect).parse(sql, **opts)
File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/dialect.py", line 490, in parse
return self.parser(**opts).parse(self.tokenize(sql), sql)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1153, in parse
return self._parse(
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1219, in _parse
expressions.append(parse_method(self))
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1427, in _parse_statement
expression = self._parse_set_operations(expression) if expression else self._parse_select()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2486, in parse_select
from
= self._parse_from()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2693, in _parse_from
exp.From, comments=self._prev_comments, this=self._parse_table(joins=joins)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3067, in _parse_table
subquery = self._parse_select(table=True)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2501, in _parse_select
self._parse_table()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3067, in _parse_table
subquery = self._parse_select(table=True)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2491, in _parse_select
this = self._parse_query_modifiers(this)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2639, in _parse_query_modifiers
key, expression = parser(self)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 942, in
TokenType.WHERE: lambda self: ("where", self._parse_where()),
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3394, in _parse_where
exp.Where, comments=self._prev_comments, this=self._parse_conjunction()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3704, in _parse_conjunction
return self._parse_tokens(self._parse_equality, self.CONJUNCTION)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens
this = parse_method()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3707, in _parse_equality
return self._parse_tokens(self._parse_comparison, self.EQUALITY)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5541, in _parse_tokens
expression=parse_method(),
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3710, in _parse_comparison
return self._parse_tokens(self._parse_range, self.COMPARISON)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens
this = parse_method()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3713, in _parse_range
this = this or self._parse_bitwise()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3832, in _parse_bitwise
this = self._parse_term()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3864, in _parse_term
return self._parse_tokens(self._parse_factor, self.TERM)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens
this = parse_method()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3868, in _parse_factor
this = parse_method()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3889, in _parse_unary
return self._parse_at_time_zone(self._parse_type())
File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/mysql.py", line 602, in _parse_type
return super()._parse_type(parse_interval=parse_interval)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3910, in _parse_type
data_type = self._parse_types(check_func=True, allow_identifiers=False)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4005, in _parse_types
expressions = self._parse_csv(self._parse_type_size)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5520, in _parse_csv
parse_result = parse_method()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3927, in _parse_type_size
this = self._parse_type()
File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/mysql.py", line 602, in _parse_type
return super()._parse_type(parse_interval=parse_interval)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3911, in _parse_type
this = self._parse_column()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4113, in _parse_column
this = self._parse_column_reference()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4117, in _parse_column_reference
this = self._parse_field()
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4232, in _parse_field
or self._parse_function(anonymous=anonymous_func)
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4253, in _parse_function
func = self._parse_function_call(
File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4319, in _parse_function_call
func = function(args)
File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/dialect.py", line 707, in _builder
raise ParseError(f"INTERVAL expression expected but got '{interval}'")
sqlglot.errors.ParseError: INTERVAL expression expected but got '-1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 110, in wraps
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 182, in wraps
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 1711, in get_list
return self.get_list_headless(**kwargs)
File "/app/superset/queries/api.py", line 340, in get_list_headless
response[flask_appbuilder.const.API_RESULT_RES_KEY] = list_model_schema.dump(lst, many=True)
File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 557, in dump
result = self._serialize(processed_obj, many=many)
File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 519, in _serialize
return [
File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 520, in
self._serialize(d, many=False)
File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 525, in _serialize
value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
File "/usr/local/lib/python3.10/site-packages/marshmallow/fields.py", line 344, in serialize
return self._serialize(value, attr, obj, **kwargs)
File "/usr/local/lib/python3.10/site-packages/marshmallow/fields.py", line 1991, in _serialize
return self._serialize_method(obj)
File "/app/superset/queries/schemas.py", line 76, in get_sql_tables
return obj.sql_tables
File "/app/superset/models/sql_lab.py", line 75, in sql_tables
extract_tables_from_jinja_sql(
File "/app/superset/sql_parse.py", line 1126, in extract_tables_from_jinja_sql
).tables
File "/app/superset/sql_parse.py", line 287, in tables
self._tables = self._extract_tables_from_sql()
File "/app/superset/sql_parse.py", line 303, in _extract_tables_from_sql
**ex.errors[0]
IndexError: list index out of range
2024-08-22 06:53:04,613:INFO:werkzeug:192.168.10.1 - - [22/Aug/2024 06:53:04] "GET /api/v1/query/?q=(filters:!((col:database,opr:rel_o_m,value:2)),order_column:start_time,order_direction:desc,page:0,page_size:25) HTTP/1.1" 500 -

Superset version

4.0.2

Python version

3.10

Node version

16

Browser

Firefox

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added data:connect:starrocks sqllab Namespace | Anything related to the SQL Lab validation:required A committer should validate the issue labels Aug 22, 2024
Copy link

dosubot bot commented Aug 22, 2024

To resolve the "list index out of range" error in the Query history page when executing a specific query in Starrocks on Apache Superset, you should check the handling of the statements list in the execute_sql_statements function. Specifically, ensure that the list of statements is not empty before attempting to access its elements.

Here is a potential fix:

  1. Add a check to ensure that statements is not empty before proceeding with the execution loop.
def execute_sql_statements(
    # pylint: disable=too-many-arguments, too-many-locals, too-many-statements, too-many-branches
    query_id: int,
    rendered_query: str,
    return_results: bool,
    store_results: bool,
    start_time: Optional[float],
    expand_data: bool,
    log_params: Optional[dict[str, Any]],
) -> Optional[dict[str, Any]]:
    """Executes the sql query returns the results."""
    if store_results and start_time:
        # only asynchronous queries
        stats_logger.timing("sqllab.query.time_pending", now_as_float() - start_time)

    query = get_query(query_id)
    payload: dict[str, Any] = {"query_id": query_id}
    database = query.database
    db_engine_spec = database.db_engine_spec
    db_engine_spec.patch()

    if database.allow_run_async and not results_backend:
        raise SupersetErrorException(
            SupersetError(
                message=__("Results backend is not configured."),
                error_type=SupersetErrorType.RESULTS_BACKEND_NOT_CONFIGURED_ERROR,
                level=ErrorLevel.ERROR,
            )
        )

    # Breaking down into multiple statements
    parsed_query = ParsedQuery(
        rendered_query,
        engine=db_engine_spec.engine,
    )
    if not db_engine_spec.run_multiple_statements_as_one:
        statements = parsed_query.get_statements()
        logger.info(
            "Query %s: Executing %i statement(s)", str(query_id), len(statements)
        )
    else:
        statements = [rendered_query]
        logger.info("Query %s: Executing query as a single statement", str(query_id))

    # Check if statements list is empty
    if not statements:
        raise SupersetErrorException(
            SupersetError(
                message=__("No valid SQL statements found."),
                error_type=SupersetErrorType.INVALID_SQL_STATEMENT_ERROR,
                level=ErrorLevel.ERROR,
            )
        )

    logger.info("Query %s: Set query to 'running'", str(query_id))
    query.status = QueryStatus.RUNNING
    query.start_running_time = now_as_float()
    db.session.commit()

    # Rest of the function...

This check ensures that the statements list is not empty before attempting to execute any SQL statements, which should prevent the "list index out of range" error [1].

To continue talking to Dosu, mention @dosu.

@rusackas
Copy link
Member

Maybe @miomiocat has some insight here?

@kainchow
Copy link
Contributor Author

I made a temporary change to superset/sql_parse.py which avoids this issue, but it may cause other problems, so someone still needs to fix the bug!

def _extract_tables_from_sql(self) -> set[Table]:
    """
    Extract all table references in a query.

    Note: this uses sqlglot, since it's better at catching more edge cases.
    """
    try:
        statements = parse(self.stripped(), dialect=self._dialect)
    except ParseError as ex:
        statements = []
    except SqlglotError as ex:
        ...

@nvn01234
Copy link

nvn01234 commented Oct 15, 2024

Hi, does anyone fix it? I got same problem in 4.0.2 version, it happened in Saved Queries page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:starrocks sqllab Namespace | Anything related to the SQL Lab validation:required A committer should validate the issue
Projects
None yet
Development

No branches or pull requests

3 participants