Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with pred_within() - results outside polygon in the download #704

Closed
jfberner opened this issue Feb 14, 2024 · 6 comments
Closed
Milestone

Comments

@jfberner
Copy link

Hi there, I'm downloading a dataset for a small region in Portugal, and the resulting download is 29 million records strong. Although that'd be great for what I need, once the data was loaded into R the very first record fell within countryCode "US", which caught my attention.

It appears the call is ignoring the pred_within argument, while only complying with the other preds. Is this possible? See from the API call below that the within pred is being sent, but it is still giving me over 29M records all over the world.

Here's my call:

occ_download(type = 'and',
  pred_within(region_poly_str),
  pred('hasCoordinate', T),
  pred('hasGeospatialIssue', F),
  pred("occurrenceStatus","PRESENT"),              # only presence data
  pred_gte("year", 1993),                          # records as of 1993
  pred_or(pred("taxonKey",1),                      # animals or
          pred("taxonKey",6)),                     # plants
  format = "SIMPLE_CSV",
  user = 'myuser',
  pwd = pwd,
  email = [email protected]')

where region_poly_str is equal to "POLYGON ((-8.249386 38.163071, -7.70912 38.163071, -7.70912 37.666805, -8.249386 37.666805, -8.249386 38.163071))", which upon examination in a WKT viewer is just fine - valid, closed polygon and all. I've used an object to store the WKT just yesterday and passed it into the pred_within() before and it worked, so I figured that's not the issue. Rerunning the same call with the pasted text does not change the result.

and this is the resulting API call I got from the download page in the website:

{
  "type": "and",
  "predicates": [
    {
      "type": "within",
      "geometry": "POLYGON ((-8.249386 38.163071, -7.70912 38.163071, -7.70912 37.666805, -8.249386 37.666805, -8.249386 38.163071))"
    },
    {
      "type": "equals",
      "key": "HAS_COORDINATE",
      "value": "true",
      "matchCase": false
    },
    {
      "type": "equals",
      "key": "HAS_GEOSPATIAL_ISSUE",
      "value": "false",
      "matchCase": false
    },
    {
      "type": "equals",
      "key": "OCCURRENCE_STATUS",
      "value": "PRESENT",
      "matchCase": false
    },
    {
      "type": "greaterThanOrEquals",
      "key": "YEAR",
      "value": "1993",
      "matchCase": false
    },
    {
      "type": "or",
      "predicates": [
        {
          "type": "equals",
          "key": "TAXON_KEY",
          "value": "1",
          "matchCase": false
        },
        {
          "type": "equals",
          "key": "TAXON_KEY",
          "value": "6",
          "matchCase": false
        }
      ]
    }
  ]
}

Am I doing something wrong? The same piece of code worked fine just yesterday, and it was with a super complex MULTIPOLYGON.

Output of devtools::session_info() :

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os Pop!_OS 22.04 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Sao_Paulo
date 2024-02-14
rstudio 2023.12.1+402 Ocean Storm (desktop)
pandoc 2.9.2.1 @ /usr/bin/pandoc

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
arrow 14.0.0.2 2023-12-02 [1] CRAN (R 4.3.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.3.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
class 7.3-22 2023-05-03 [1] CRAN (R 4.3.0)
classInt 0.4-10 2023-09-05 [1] CRAN (R 4.3.1)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
codetools 0.2-19 2023-02-01 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
countrycode 1.5.0 2023-05-30 [1] CRAN (R 4.3.0)
crul 1.4.0 2023-05-17 [1] CRAN (R 4.3.0)
curl 5.2.0 2023-12-08 [1] CRAN (R 4.3.2)
data.table 1.14.10 2023-12-08 [1] CRAN (R 4.3.2)
DBI 1.2.1 2024-01-12 [1] CRAN (R 4.3.2)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0)
digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
e1071 1.7-14 2023-12-06 [1] CRAN (R 4.3.2)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)
hoardr 0.5.4 2024-01-23 [1] CRAN (R 4.3.2)
htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2)
httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.3.0)
httpuv 1.6.13 2023-12-06 [1] CRAN (R 4.3.2)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.1)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2)
KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.2)
later 1.3.2 2023-12-06 [1] CRAN (R 4.3.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
oai 0.4.0 2022-11-10 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgbuild 1.4.3 2023-12-10 [1] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.3.2)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.2)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0)
promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1)
proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.0)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2)
remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.2)
rgbif * 3.7.9 2024-01-11 [1] CRAN (R 4.3.2)
rgeoboundaries * 1.2.9 2023-12-08 [1] CRAN (R 4.3.2)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
sf 1.0-15 2023-12-18 [1] CRAN (R 4.3.2)
shiny 1.8.0 2023-11-17 [1] CRAN (R 4.3.2)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
terra * 1.7-71 2024-01-31 [1] CRAN (R 4.3.2)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
triebeard 0.4.1 2023-03-04 [1] CRAN (R 4.3.0)
units 0.8-5 2023-11-28 [1] CRAN (R 4.3.2)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
urltools 1.7.3 2019-04-14 [1] CRAN (R 4.3.0)
usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
whisker 0.4.1 2022-12-05 [1] CRAN (R 4.3.0)
xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)

[1] /home/jfb/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

Output of sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3; LAPACK version 3.10.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Sao_Paulo
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rgeoboundaries_1.2.9 dplyr_1.1.4 terra_1.7-71 rgbif_3.7.9

loaded via a namespace (and not attached):
[1] gtable_0.3.4 ggplot2_3.4.4 remotes_2.4.2.1 htmlwidgets_1.6.4 devtools_2.4.5 vctrs_0.6.5
[7] tools_4.3.2 generics_0.1.3 curl_5.2.0 tibble_3.2.1 proxy_0.4-27 fansi_1.0.6
[13] pkgconfig_2.0.3 KernSmooth_2.23-22 data.table_1.14.10 assertthat_0.2.1 lifecycle_1.0.4 compiler_4.3.2
[19] stringr_1.5.1 munsell_0.5.0 codetools_0.2-19 httpuv_1.6.13 usethis_2.2.2 htmltools_0.5.7
[25] class_7.3-22 lazyeval_0.2.2 urlchecker_1.0.1 later_1.3.2 pillar_1.9.0 whisker_0.4.1
[31] ellipsis_0.3.2 classInt_0.4-10 cachem_1.0.8 sessioninfo_1.2.2 mime_0.12 countrycode_1.5.0
[37] tidyselect_1.2.0 digest_0.6.34 stringi_1.8.3 sf_1.0-15 purrr_1.0.2 arrow_14.0.0.2
[43] fastmap_1.1.1 grid_4.3.2 colorspace_2.1-0 cli_3.6.2 magrittr_2.0.3 pkgbuild_1.4.3
[49] triebeard_0.4.1 crul_1.4.0 utf8_1.2.4 e1071_1.7-14 promises_1.2.1 scales_1.3.0
[55] rappdirs_0.3.3 bit64_4.0.5 oai_0.4.0 httr_1.4.7 bit_4.0.5 shiny_1.8.0
[61] memoise_2.0.1 miniUI_0.1.1.1 hoardr_0.5.4 profvis_0.3.8 rlang_1.1.3 urltools_1.7.3
[67] Rcpp_1.0.12 xtable_1.8-4 glue_1.7.0 DBI_1.2.1 httpcode_0.3.0 xml2_1.3.6
[73] pkgload_1.3.4 rstudioapi_0.15.0 jsonlite_1.8.8 R6_2.5.1 plyr_1.8.9 fs_1.6.3
[79] units_0.8-5

Thanks a bunch for developing and mantaining this, you're all heroes :)

@jfberner jfberner changed the title problem with pred_within() - results outside polygon in the resulting download problem with pred_within() - results outside polygon in the download Feb 14, 2024
@jfberner
Copy link
Author

Retried manually doing the same thing from the website, with the same results. Issue might be with the database or the API, is there another place for me to bring this issue? I haven't found anywhere else.

@MattBlissett
Copy link
Collaborator

MattBlissett commented Feb 15, 2024

Hi,

The polygon is clockwise, which our download system interprets as a hole, i.e. the whole world except that bit of Portugal.

It should work in R or through the website if you reverse the direction of the polygon, maybe with st_as_text(st_reverse(u)) if you're using that library

Query: https://www.gbif.org/occurrence/map?has_coordinate=true&has_geospatial_issue=false&taxon_key=1&taxon_key=6&year=1993,*&geometry=POLYGON((-8.24939%2038.16307,-8.24939%2037.6668,-7.70912%2037.6668,-7.70912%2038.16307,-8.24939%2038.16307))&occurrence_status=PRESENT

I think it's a bug that the search APIs (used by the website) show data within the rectangle for this query.

We have seen this problem several times, especially with R users (e.g. gbif/portal-feedback#2222). Although a clockwise 'hole' polygon is technically valid, we should probably reject them to avoid the confusion. If a user actually wants this, they could provide 4 rectangles, or the world minus the hole, or a suitable lat-long greaterthan/lessthan expression.

https://github.com/gbif/portal-feedback/issues is the best place for general issues with the GBIF website or API, or the "Feedback" link from the top right of the website.

@jhnwllr
Copy link
Collaborator

jhnwllr commented Feb 16, 2024

#672
Yes I will prioritize checking the polygons in the next version.

@jhnwllr jhnwllr added this to the 3.8.0 milestone Feb 16, 2024
@jfberner
Copy link
Author

Thank you both for looking into this.

Honest mistake, certainly others have run into it. The behavior does make sense, it just never crossed my mind that this might be the problem.

I was using terra::geom(x, wkt = T) to get the string, as the original object is a SpatVector.

For future reference, I fixed it with:
x %>% as('Spatial') %>% st_as_sfc() %>% st_reverse() %>% st_as_text()
as you suggested, and it worked perfectly.

The fix requires loading raster and sf, as a SpatialPolyetc is a needed step between SpatVector and sfc.

Thank you for sharing the issue page for the portal :) will use that in the future if needed.

@jhnwllr
Copy link
Collaborator

jhnwllr commented Feb 21, 2024

#704

@jhnwllr
Copy link
Collaborator

jhnwllr commented Mar 5, 2024

This is now fixed server side.
gbif/occurrence#340

@jhnwllr jhnwllr closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants