-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1011 Disable full stack trace when using spark connect #1012
1011 Disable full stack trace when using spark connect #1012
Conversation
src/sql/run/sparkdataframe.py
Outdated
raise exceptions.MissingPackageError("pysark not installed") | ||
|
||
return SparkResultProxy(dataframe, dataframe.columns, should_cache) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please integrate this with short_errors:
Line 175 in 0433444
short_errors = Bool( |
by default, it should raise the exception, if short_errors is True, then just print it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this was marked as resolved but I don't see any references to the short_errors
option in your PR, am I missing anything?
src/sql/run/sparkdataframe.py
Outdated
return SparkResultProxy(dataframe, dataframe.columns, should_cache) | ||
try: | ||
return SparkResultProxy(dataframe, dataframe.columns, should_cache) | ||
except AnalysisException as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this except is redundant, the except Exception as e
can catch all exceptions
src/sql/util.py
Outdated
@@ -559,6 +559,7 @@ def is_non_sqlalchemy_error(error): | |||
# Pyspark | |||
"UNRESOLVED_ROUTINE", | |||
"PARSE_SYNTAX_ERROR", | |||
"AnalysisException", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking through the code I think adding AnalysisException here will solve the issue since PARSE_SYNTAX_ERROR
works as expected.
AnalysisException covers all these error conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just need to test it somehow. Will try to package the jupysql and install it in a spark environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can install like this:
pip install git+https://github.com/b1ackout/jupysql@running-sql-using-sparkconnect-should-not-print-full-stack-trace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it but no luck, the error message doesn't contain AnalysisException
, only the sql error conditions listed here which are included in AnalysisException. So instead of included all these in the list (which could also be updated regularly) I think checking if the error is of instance of AnalysisException would be a lot cleaner.
@@ -9,9 +9,9 @@ | |||
|
|||
|
|||
def handle_spark_dataframe(dataframe, should_cache=False): | |||
"""Execute a ResultSet sqlaproxy using pysark module.""" | |||
"""Execute a ResultSet sqlaproxy using pyspark module.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo
@@ -556,11 +562,14 @@ def is_non_sqlalchemy_error(error): | |||
"pyodbc.ProgrammingError", | |||
# Clickhouse errors | |||
"DB::Exception:", | |||
# Pyspark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these as they are included in AnalysisException
seems like you're still working on this PR, please review our contribution guidelines: https://ploomber-contributing.readthedocs.io/en/latest/contributing/responding-pr-review.html specifically the point about addressing comments. please post links to the code changes so we can review them one by want. I also saw you marked some discussions as resolved, please unmark them so I can review them |
closing due to inactivity |
if not DataFrame and not CDataFrame: | ||
raise exceptions.MissingPackageError("pysark not installed") | ||
raise exceptions.MissingPackageError("pyspark not installed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo
try: | ||
from pyspark.sql.utils import AnalysisException | ||
except ModuleNotFoundError: | ||
AnalysisException = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to handle the case where pyspark module is not installed
is_pyspark_analysis_exception = ( | ||
isinstance(error, AnalysisException) if AnalysisException else False | ||
) | ||
return ( | ||
any(msg in str(error) for msg in specific_db_errors) | ||
or is_pyspark_analysis_exception | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If AnalysisException is imported then checks if the error is of instance of pyspark's Analysis Exception and handles it accordingly
edublancas sorry, I was away on paternity, I opened a new one: #1024 |
Describe your changes
short_errors
is enabled to show only the spark sql error and not the full stack traceIssue number
Closes #1011
Checklist before requesting a review
pkgmt format
📚 Documentation preview 📚: https://jupysql--1012.org.readthedocs.build/en/1012/