-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spline Agent in AWS Glue 4.0 #822
Comments
Your config and networking seems to be perfectly fine. There might be several reasons why it fails and most (if not all) of them are likely related to the Glue environment. Many people reported different issues related to specific proprietary Spark environments like Databricks or AWS. Unfortunately, because those platforms are closed source it's sometime extremely difficult to debug them, or to find a qualified tech support that is well aware of the system internals. So, all I'm trying to do right now is just to find a fancy way to say - I don't know what breaks things for you. One thing that I would suspect is bytecode incompatibility (check that Scala and Spark versions for which the agent bundle is compiled match the ones used in your Glue). So, all in all, keep on looking. |
Thanks for the response @wajda, any chance you can point me towards programatic init of the agent in pyspark? The Readme shows an example in Scala/Java I think. |
It's very similar. Here's the example - https://github.com/AbsaOSS/spline-spark-agent/blob/develop/examples/src/main/python/python_example.py |
Thanks @wajda, I've tried that and unfortunately yields the same error. I have just tried using a much older spline agent version (0.7.13) which appears to work. Possibly I will try to find the highest version agent that works to try to determine what patch/update is breaking for me |
That would be a great help. Thank you! |
It seems like version 0.7.13 is the highest version that works in this situation. Using the next version up (1.0.0) causes the same error. Issue #602 references getting a different issue in Glue 4.0 with version 1.0.0, that was then fixed in 1.1.0. However 1.1.0 also seems to yield the same result for me. I also don't see anything wildly different about their setup vs my own |
I tested the Spline agent version 2.1.0 with AWS Glue 4.0 and everything worked as expected.
|
Just spotted a minor mistake in your config:
Since the agent version 2.x (or even earlier, I don't remember exactly) the |
I have a glue job in AWS Glue that I'm trying to connect to my Spline server on EC2 using the spline agent jar, using the HTTP lineage dispatcher.
The glue logs show the correct producer url, and that the Spline is required and successfully initialises.
The Glue job at the moment just does a simple read operation (using
spark.sql
, from a data catalog in Glue) and writes to an s3 location (usingdf.write.parquet
). The Glue job fails immediately after registering the Spline agent:The shutdown then continues with the following logs, including an uncaught error in some Java bits:
I have set the Job parameters as follows:
And I have the following set in Dependant JARs path:
s3://redacted-s3-bucket-name/spline/spark-3.3-spline-agent-bundle_2.12-2.0.0.jar,s3://redacted-s3-bucket-name/spline/snakeyaml-2.2.jar
The Glue job is also on a VPC Connection on the same subnet as the Ec2. To ensure that the Glue job "sees" the EC2 server, I attempted to ping the spline producer API with a simple post request, and received the following response:
I assume this error stems from a badly made post body, but figured this is proof enough that basic networking is not the issue.
Any help would be appreciated, even just help in finding more useful log files than the ones that come in default for Glue 4.0
The text was updated successfully, but these errors were encountered: