Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point not found when injectRules used #237

Open
bdgeise opened this issue Mar 4, 2019 · 4 comments
Open

Point not found when injectRules used #237

bdgeise opened this issue Mar 4, 2019 · 4 comments

Comments

@bdgeise
Copy link

bdgeise commented Mar 4, 2019

val spark = SparkSession.builder
    .appName("Testing Spark DSL")
    .master("local[1]") //build a local cluster
    .getOrCreate()

//  injectRules(spark)

  import spark.implicits._

  val data = Array(("US", "TX", "2018-12-08 00:00:00", 12.0123, "ios", 2, 32.813548, -96.835159),
    ("US", "PA", "2018-12-08 00:00:00", 12.0123, "ios", 183,32.813548, -96.835159),
    ("CA", null, "2018-12-08 00:00:00", 12.0123, "android", 183,32.813548, -96.835159),
    ("GB", null, "2018-12-08 00:00:00", 12.0123, "ios", 2,32.813548, -96.835159),
    ("US", "NC", "2018-12-08 00:00:00", 12.0123, "android", 35,32.813548, -96.835159),
    ("US", "CA", "2018-12-08 00:00:00", 12.0123, null, 2,32.813548, -96.835159),
    ("A", null, "2018-12-08 00:00:00", 12.0123, "android", 183,32.813548, -96.835159),
    ("US", "NY", "2018-12-08 00:00:00", 12.0123, "ios", 2, 32.813548, -96.835159))

  val df1 = spark.sparkContext.parallelize(data).toDF("country", "state", "location_at",
    "horizontal_accuracy", "platform", "app_id", "latitude", "longitude")
    .withColumn("location_at", col("location_at").cast(TimestampType))
  df1.show()
  println(df1.printSchema)

  val filterFilePath = path_to_geojson

  val filteringDS = spark.sqlContext.read.format("magellan")
    .option("magellan.index", "true")
    .option("magellan.index.precision", "15")
    .option("type", "geojson").load(filterFilePath)
    .cache()

  filteringDS.count()
  filteringDS.show(false)

  val filtered = df1
    .withColumn("locationPoint", point(col("longitude"), col("latitude")))
    .join(filteringDS)
    .where(col("locationPoint") within col("polygon"))

  filtered.show()

Using the example above, if I just injectRules I get 0 results. But if I don't use injectRules I get the proper results.

Also, to note, I've tried different levels of precision in the index but the same issue persisted when injecting the rules.

Geojson file used for testing attached.
TX.geojson.txt

@bdgeise
Copy link
Author

bdgeise commented Mar 18, 2019

@harsha2010 - Any luck looking at this one?

@bdgeise
Copy link
Author

bdgeise commented Mar 19, 2019

I was able to do some more testing/debugging today. If I do a true cross join, and test for point within polygon using a withColumn, it returns true. However when I do it in the where I still get an empty dataframe in return while using inject rules. @harsha2010

@bdgeise
Copy link
Author

bdgeise commented Mar 19, 2019

Another update here...Seems to work ok with the master branch and Spark 2.3.2. Are you aware of any changes since the 1.05 release that I might be able to look at and test against? @harsha2010

@harsha2010
Copy link
Owner

@bdgeise there is this bug I noticed and fixed a while back.. aa9021e

not sure if that is related. let me try this on 1.0.5 branch and check today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants