Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's consider improvements to how internal errors are handled in the Pbench Server #3124

Open
portante opened this issue Dec 21, 2022 · 5 comments
Assignees
Milestone

Comments

@portante
Copy link
Member

portante commented Dec 21, 2022

Summary suggestions

TBD

Initial Discussion

This is a very long description due to the nature of the stack traces discussed here. The referenced code can be found in branch https://github.com/portante/pbench/tree/PBENCH-944, the basis for PR #3081.

There are two "Tracebacks" being discussed, 1 and 2, which are in separate sections below. Both were taking from log messages emitted by the Pbench Server code

For both stack traces the core problem was a database schema that did not match the code. An attempt to delete a Dataset object failed because the DELETING state was not in the stored database schema.

This issue is NOT about the particular base exception, but IS about the Pbench Server code and its error handling / logging code paths.

Both stack traces are logged via the log_exception() method of the Flask() "app" as called by the handle_exception() method of the Flask object when an unhandled exception occurs.

The comments that follow in this issue make reference to this description.

Some Details

  1. The Flask().wsgi_app() calls Flask().full_dispatch_request() at line 2525 of flask/app.py
  2. The Flask().full_dispatch_request() calls Flask().dispatch_request() at line 1820 of flask/app.py
    • Line 1820 is the outer-most line of the stack trace because the try/except block of Flask().wsgi_app() caught the exception, handled it by calling Flask().handle_exception(), which in turn logged it by calling Flask().log_exception() which finally logged the error at line 1741 (see log_exception 1741 in the first line of each log message preceding the tracebacks below).

Traceback 1:

2022-12-20T16:45:23.046216 ERROR 118 140434560441216 pbench.server.api app log_exception 1741 -- Exception on /api/v1/datasets/delete/22a4bc5748b920c6ce271eb68f08d91c [POST]
Traceback (most recent call last):
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InvalidTextRepresentation: invalid input value for enum states: "DELETING"
LINE 1: UPDATE datasets SET state='DELETING', transition='2022-12-20...
                                  ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 665, in update
    Database.db_session.commit()
  File "<string>", line 2, in commit
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 829, in commit
    self._prepare_impl()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
    self.session.flush()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3383, in flush
    self._flush(objects)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3523, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3483, in _flush
    flush_context.execute()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 237, in save_obj
    _emit_update_statements(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1001, in _emit_update_statements
    c = connection._execute_20(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1631, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input value for enum states: "DELETING"
LINE 1: UPDATE datasets SET state='DELETING', transition='2022-12-20...
                                  ^

[SQL: UPDATE datasets SET state=%(state)s, transition=%(transition)s WHERE datasets.id = %(datasets_id)s]
[parameters: {'state': 'DELETING', 'transition': datetime.datetime(2022, 12, 20, 16, 45, 23, 37775), 'datasets_id': 4}]
(Background on this error at: https://sqlalche.me/e/14/9h9h)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 756, in _post
    report = self._analyze_bulk(results)
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 670, in _analyze_bulk
    for ok, response in results:
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 319, in streaming_bulk
    for bulk_data, bulk_actions in _chunk_actions(
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
    for action, data in actions:
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/datasets_delete.py", line 71, in generate_actions
    dataset.advance(States.DELETING)
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 641, in advance
    self.update()
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 667, in update
    self.logger.error("Can't update {} in DB", str(self))
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 614, in __str__
    return f"({self.owner_id})|{self.name}"
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 481, in __get__
    return self.impl.get(state, dict_)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 941, in get
    value = self._fire_loader_callables(state, key, passive)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 972, in _fire_loader_callables
    return state._load_expired(state, passive)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/state.py", line 710, in _load_expired
    self.manager.expired_attribute_loader(self, toload, passive)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/loading.py", line 1451, in load_scalar_attributes
    result = load_on_ident(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/loading.py", line 407, in load_on_ident
    return load_on_pk_identity(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
    session.execute(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
    conn = self._connection_for_bind(bind)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
    return self._transaction._connection_for_bind(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
    self._assert_active()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.InvalidTextRepresentation) invalid input value for enum states: "DELETING"
LINE 1: UPDATE datasets SET state='DELETING', transition='2022-12-20...
                                  ^

[SQL: UPDATE datasets SET state=%(state)s, transition=%(transition)s WHERE datasets.id = %(datasets_id)s]
[parameters: {'state': 'DELETING', 'transition': datetime.datetime(2022, 12, 20, 16, 45, 23, 37775), 'datasets_id': 4}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1637, in _dispatch
    response = execute(params, request, {"auditing": auditing})
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 758, in _post
    raise APIInternalError("Unexpected backend error") from e
pbench.server.api.resources.APIInternalError: Internal Pbench Server Error: log reference d839a90f-5137-4cd4-bf23-7744f13f489b

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_httpauth.py", line 172, in decorated
    return self.ensure_sync(f)(*args, **kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1791, in post
    return self._dispatch(ApiMethod.POST, kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1648, in _dispatch
    abort(e.http_status, message=str(e))
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 31, in abort
    original_flask_abort(http_status_code)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/helpers.py", line 310, in abort
    current_app.aborter(code, *args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/werkzeug/exceptions.py", line 864, in __call__
    raise self.mapping[code](*args, **kwargs)
werkzeug.exceptions.InternalServerError: 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Traceback 2:

2022-12-20T17:54:09.977014 ERROR 1052 140645495241600 pbench.server.api app log_exception 1741 -- Exception on /api/v1/datasets/delete/22a4bc5748b920c6ce271eb68f08d91c [POST]
Traceback (most recent call last):
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InvalidTextRepresentation: invalid input value for enum states: "DELETING"
LINE 1: UPDATE datasets SET state='DELETING', transition='2022-12-20...
                                  ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 665, in update
    Database.db_session.commit()
  File "<string>", line 2, in commit
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 829, in commit
    self._prepare_impl()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
    self.session.flush()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3383, in flush
    self._flush(objects)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3523, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3483, in _flush
    flush_context.execute()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 237, in save_obj
    _emit_update_statements(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1001, in _emit_update_statements
    c = connection._execute_20(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1631, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input value for enum states: "DELETING"
LINE 1: UPDATE datasets SET state='DELETING', transition='2022-12-20...
                                  ^

[SQL: UPDATE datasets SET state=%(state)s, transition=%(transition)s WHERE datasets.id = %(datasets_id)s]
[parameters: {'state': 'DELETING', 'transition': datetime.datetime(2022, 12, 20, 17, 54, 9, 969114), 'datasets_id': 4}]
(Background on this error at: https://sqlalche.me/e/14/9h9h)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 756, in _post
    report = self._analyze_bulk(results)
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 670, in _analyze_bulk
    for ok, response in results:
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 319, in streaming_bulk
    for bulk_data, bulk_actions in _chunk_actions(
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
    for action, data in actions:
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/datasets_delete.py", line 71, in generate_actions
    dataset.advance(States.DELETING)
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 641, in advance
    self.update()
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 669, in update
    raise DatasetSqlError("updating", name=self.name)
pbench.server.database.models.datasets.DatasetSqlError: Error updating dataset name=fio_rw_2018.02.01T22.40.57

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1637, in _dispatch
    response = execute(params, request, {"auditing": auditing})
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 758, in _post
    raise APIInternalError("Unexpected backend error") from e
pbench.server.api.resources.APIInternalError: Internal Pbench Server Error: log reference 69ee6b38-cdbd-4dd4-b1c5-499f46aa95cc

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_httpauth.py", line 172, in decorated
    return self.ensure_sync(f)(*args, **kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1791, in post
    return self._dispatch(ApiMethod.POST, kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1648, in _dispatch
    abort(e.http_status, message=str(e))
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 31, in abort
    original_flask_abort(http_status_code)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/helpers.py", line 310, in abort
    current_app.aborter(code, *args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/werkzeug/exceptions.py", line 864, in __call__
    raise self.mapping[code](*args, **kwargs)
werkzeug.exceptions.InternalServerError: 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
@portante portante added this to the v0.72 milestone Dec 21, 2022
@portante portante self-assigned this Dec 21, 2022
@portante
Copy link
Member Author

portante commented Dec 21, 2022

Below is the reconstructed initial stack trace to the first exception raised from "Traceback 1" (the reconstruction of Traceback 2 is virtually identical). The process of reconstructing was performed by starting with the ending "Traceback" list, walking up the trace back until the first abort() call, then finding the reference to the same execution contexts in previous "Traceback" frames, repeating that process until the first exception is hit.

Actually, this re-construction is fuzzier than expected, but the _flush() method called at line 3383 from the flush() method in sqlalchemy/orm/session.py is where the direct line of descent ends. Between that call frame and call frame on the very bottom, one can see a number of context manager __exit__ frames and other raise exception calls which make no sense to me. It is probably best to ignore the SQLAlchemy magic for this discussion.

Traceback (most recent call last):
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_httpauth.py", line 172, in decorated
    return self.ensure_sync(f)(*args, **kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1791, in post
    return self._dispatch(ApiMethod.POST, kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1637, in _dispatch
    response = execute(params, request, {"auditing": auditing})
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 756, in _post
    report = self._analyze_bulk(results)
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 670, in _analyze_bulk
    for ok, response in results:
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 319, in streaming_bulk
    for bulk_data, bulk_actions in _chunk_actions(
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
    for action, data in actions:
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/datasets_delete.py", line 71, in generate_actions
    dataset.advance(States.DELETING)
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 641, in advance
    self.update()
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 665, in update
    Database.db_session.commit()
  File "<string>", line 2, in commit
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 829, in commit
    self._prepare_impl()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
    self.session.flush()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3383, in flush
    self._flush(objects)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3523, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/session.py", line 3483, in _flush
    flush_context.execute()
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 237, in save_obj
    _emit_update_statements(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1001, in _emit_update_statements
    c = connection._execute_20(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1631, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input value for enum states: "DELETING"
LINE 1: UPDATE datasets SET state='DELETING', transition='2022-12-20...
                                  ^

[SQL: UPDATE datasets SET state=%(state)s, transition=%(transition)s WHERE datasets.id = %(datasets_id)s]
[parameters: {'state': 'DELETING', 'transition': datetime.datetime(2022, 12, 20, 16, 45, 23, 37775), 'datasets_id': 4}]
(Background on this error at: https://sqlalche.me/e/14/9h9h)

@portante
Copy link
Member Author

For the rest of this issue, we are going to only discuss Traceback 2 because PR #3125 corrects the extra exception handling nesting seen there.

@portante
Copy link
Member Author

portante commented Dec 21, 2022

Here is the reconstructed traceback again, but with the last frame of the Pbench Server code expanded to the full method context.

The *A line is where the update() method calls out to the SQLAlchemy code and where the lowest level exception is initially raised. This method catches that exception, performs the rollback(), logs it (*B), and then raises DatasetSqlError (*C) starting a new exception context. This is what causes the tracaback to have a break with "During the handling of the above exception, another exception occurred:"

Changes to consider:

  1. Don't log an error (*B)
  2. Raise the DatasetSqlError (*C) from the original exception to add context instead of log statement
Traceback (most recent call last):
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_httpauth.py", line 172, in decorated
    return self.ensure_sync(f)(*args, **kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1791, in post
    return self._dispatch(ApiMethod.POST, kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1637, in _dispatch
    response = execute(params, request, {"auditing": auditing})
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 756, in _post
    report = self._analyze_bulk(results)
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 670, in _analyze_bulk
    for ok, response in results:
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 319, in streaming_bulk
    for bulk_data, bulk_actions in _chunk_actions(
  File "/home/pbench/.local/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
    for action, data in actions:
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/datasets_delete.py", line 71, in generate_actions
    dataset.advance(States.DELETING)
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 641, in advance
    self.update()
  File "/opt/pbench-server/lib/pbench/server/database/models/datasets.py", line 665, in update
    def update(self):
        """
        Update the database row with the modified version of the
        Dataset object.
        """
        try:
*A ->       Database.db_session.commit()
        except Exception:
            Database.db_session.rollback()
*B ->       self.logger.error("Can't update {} in DB", str(self))
*C ->       raise DatasetSqlError("updating", name=self.name)

@portante
Copy link
Member Author

portante commented Dec 21, 2022

Here again we reconstruct the traceback, but only to the next frame of the Pbench Server code where we handle exceptions.

The *D line is where the normal code flow proceeds. At line *E we raise an APIInternalError(), but using from, which is why the original traceback has The above exception was the direct cause of the following exception:.

Changes to consider:

  • Don't catch exceptions here
    • Creating the APIInternalError() here with "Unexpected backend error" does not add enough useful context
    • Allow the exception to propagate to its caller, _dispatch()
Traceback (most recent call last):
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_httpauth.py", line 172, in decorated
    return self.ensure_sync(f)(*args, **kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1791, in post
    return self._dispatch(ApiMethod.POST, kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1637, in _dispatch
    response = execute(params, request, {"auditing": auditing})
  File "/opt/pbench-server/lib/pbench/server/api/resources/query_apis/__init__.py", line 756, in _post
    def _post(
        self, params: ApiParams, request: Request, context: ApiContext
    ) -> Response:
    ...
            # NOTE: because both generate_actions and streaming_bulk return
            # generators, the entire sequence is inside a single try block.
            try:
                results = helpers.streaming_bulk(
                    elastic,
                    self.generate_actions(params, dataset, context, map),
                    raise_on_exception=False,
                    raise_on_error=False,
                )
*D ->           report = self._analyze_bulk(results)
            except Exception as e:
*E ->           raise APIInternalError("Unexpected backend error") from e

@portante
Copy link
Member Author

Finally, we show the last exception handler in this call stack.

The *F line is where the normal code flow proceeds. At line *G we log an exception, and then proceed to abort() immediately after logging.

Changes to consider:

  1. As above, create the APIInternalError here, instead of in _post() by catching all exceptions here
    • ... And use raise / from
    • ... And be sure to add the api_name to the raised exception
    • This will allow the Flask().log_exception() method to log the details we want
  2. Don't log an exception here, let Flask().log_exception() do that
  3. Don't call abort() in exception context, let the exception propagate
  4. Write/provide our own "unhandled" exception handler which will format responses as needed
Traceback (most recent call last):
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/pbench/.local/lib/python3.9/site-packages/flask_httpauth.py", line 172, in decorated
    return self.ensure_sync(f)(*args, **kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1791, in post
    return self._dispatch(ApiMethod.POST, kwargs)
  File "/opt/pbench-server/lib/pbench/server/api/resources/__init__.py", line 1637, in _dispatch
    def _dispatch(
        self,
        method: ApiMethod,
        uri_params: Optional[JSONOBJECT] = None,
    ) -> Response:
        try:
*F ->       response = execute(params, request, {"auditing": auditing})
            if auditing["finalize"]:
                Audit.create(
                    root=auditing["audit"],
                    status=auditing["status"],
                    reason=auditing["reason"],
                    attributes=auditing["attributes"],
                )
            return response
        except APIInternalError as e:
*G ->       self.logger.exception("{} {}", api_name, e.details)
            abort(e.http_status, message=str(e))

@portante portante changed the title Let's consider improvements to internal errors are handled in the Pbench Server Let's consider improvements to how internal errors are handled in the Pbench Server Dec 21, 2022
@portante portante modified the milestones: v0.72, v0.73 Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To Do
Development

No branches or pull requests

1 participant