Validate SQL downloads syntax before running the query? #393

marcos-lg · 2025-02-26T09:45:04Z

Some SQL downloads fail because the syntax of the query is incorrect:

0000379-250225085111116: https://airflow.gbif.org/log?execution_date=2025-02-25T12%3A07%3A41.322074%2B00%3A00&task_id=download_query_monitor&dag_id=gbif_occurrence_download_dag&map_index=-1

Exception in thread "main" org.apache.spark.sql.AnalysisException: [WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `gbif_temporalUncertainty` requires 2 parameters but the actual number is 1. Please, refer to 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix.; line 3 pos 321

0000874-250225214225278: https://airflow.gbif.org/log?execution_date=2025-02-26T09%3A18%3A14.172797%2B00%3A00&task_id=download_query_monitor&dag_id=gbif_occurrence_download_dag&map_index=-1

2025-02-26T09:19:37,159 WARN [task-result-getter-3] org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0) (192.168.145.72 executor 5): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to hdfs://gbif-hdfs/uat2/0000874_250225214225278/.hive-staging_hive_2025-02-26_09-19-26_860_4969043567238980144-1/-ext-10000.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:774)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:420)
	at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (`gbif_within (UDFRegistration$$Lambda$827/0x0000000840820040)`: (string, double, double) => string).
	at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:198)
	at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.Cast_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.And_2$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
	at org.apache.spark.sql.execution.FilterEvaluatorFactory$FilterPartitionEvaluator.$anonfun$eval$1(FilterEvaluatorFactory.scala:42)
	at org.apache.spark.sql.execution.FilterEvaluatorFactory$FilterPartitionEvaluator.$anonfun$eval$1$adapted(FilterEvaluatorFactory.scala:41)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:515)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:403)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:410)
	... 17 more
Caused by: java.lang.IllegalArgumentException: The value (false) of the type (java.lang.Boolean) cannot be converted to the string type
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:296)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:288)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:477)
	... 28 more

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate SQL downloads syntax before running the query? #393

Validate SQL downloads syntax before running the query? #393

marcos-lg commented Feb 26, 2025

Validate SQL downloads syntax before running the query? #393

Validate SQL downloads syntax before running the query? #393

Comments

marcos-lg commented Feb 26, 2025