[SPARK-52104][CONNECT][SCALA] Validate column name eagerly in Spark Connect Scala Client #50873
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently, calling DataFrame.col(colName) with a non-existent column in the Scala Spark Connect client does not raise an error. In contrast, both PySpark (in either Spark Connect or Spark Classic) and Scala in Spark Classic do raise an exception in such cases. This leads to inconsistent behavior between Spark Connect and Spark Classic in Scala.
PySpark on Spark Classic:
PySpark on Spark Connect:
Scala on Spark Classic:
Scala on Spark Connect:
it doesn't throw any exceptions.
In this PR, eager validation of column names has been implemented in the DataFrame.col(colName) method of the Scala client to ensure consistent behavior with both Spark Classic and PySpark. The implementation here is based on the __getitem__ and verify_col_name methods in PySpark.
Now, it will throw an error in Scala client on Spark Connect:
Why are the changes needed?
This PR ensures consistent behavior between Spark Connect and Spark Classic in the scenario described above.
Does this PR introduce any user-facing change?
Yes, referencing non-existent column in Scala client will now throw an error.
How was this patch tested?
New test case.
Was this patch authored or co-authored using generative AI tooling?
No.