Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Some checklists in EBD are missing from sampling event data. #46

Open
sofbol94 opened this issue Sep 23, 2020 · 10 comments
Open

Comments

@sofbol94
Copy link

sofbol94 commented Sep 23, 2020

Hello,

I'm new to auk, and working with data for Great Green Macaws to estimate presence/absence in different seasons. I've filtered my ebd and sampling event data to Costa Rica and then attempted to zero fill these. Since i'm working with a sensitive species i'm using a customized EBD. However, I am getting an error that there are some checklists in the EBD that are missing in the sampling data. i tryied to filter for the last edited date to exclude checklist that were added after Mar2020. Here is my code:

library(auk)
library(tidyverse)

f_ebd <- "~/data/ebd_GGM.txt"
f_sed <- "~/data/sed_GGM.txt"
ebd_2020_GGMA <- auk_ebd("ebd_sensitive_relMar-2020.txt", 
                      file_sampling = "ebd_sampling_relAug-2020.txt") %>% 
  auk_species("Great Green Macaw") %>% 
  auk_country("Costa Rica") %>%
  auk_date(c("2019-01-01", "2019-03-31")) %>% 
  auk_last_edited(date = c("2019-01-01", "2020-02-29")) %>%
  auk_complete() %>%
  auk_filter(f_ebd, file_sampling = f_sed, overwrite=TRUE)

ebd_only <- read_ebd(f_ebd)
sed_only <- read_sampling(f_sed)

nrow(ebd_only)
#[1] 525
nrow(sed_only)
#[1] 18423

ebd_zf <- auk_zerofill(f_ebd, sampling_events = f_sed)
ebd_zf

Error in auk_zerofill.data.frame(x = ebd, sampling_events = sed, species = species,  : 
  Some checklists in EBD are missing from sampling event data.

Wondering if anyone has any insight into why this may be the case, and how I could solve this considering that i can't download a more recent custumized EBD file.

thanks,
Sofia

@mstrimas
Copy link
Contributor

You can't use different versions of the EBD and sampling event data. You have a Mar-2020 EBD and an Aug-2020 sampling event data. I understand that since you have sensitive data you probably can't get an Aug-2020 version. It is possible to combine these, but you'll have to do it manually. I'd start by using auk to subset the sampling event data:

library(auk)
library(tidyverse)

f_sed <- "~/data/sed_GGM.txt"
sed_filter <- auk_sampling("ebd_sampling_relAug-2020.txt") %>% 
  auk_country("Costa Rica") %>%
  auk_date(c("2019-01-01", "2019-03-31")) %>% 
  auk_complete() %>%
  auk_filter(f_sed, overwrite=TRUE)

Then read in the EBD directly, no need to subset it first since it's a small file, and subset both the EBD and SED to have the same set of checklists.

sed <- read_sampling(f_sed, unique = FALSE)
ebd <- read_ebd("ebd_sensitive_relMar-2020.txt", unique = FALSE)
ids <- intersect(sed$checklist_id, ebd$checklist_id)
sed <- filter(sed, checklist_id %in% ids)
ebd <- filter(ebd, checklist_id %in% ids)
zf <- auk_zerofill(ebd, sed, collapse = TRUE)

I don't have time to actually test any of this, so you may need to try it out and adjust the code, but this should get you started.

@sofbol94
Copy link
Author

sofbol94 commented Sep 23, 2020

Thanks,
that was helpful, i'm having some issues though with the second part.

sed <- read_sampling(f_sed)
ebd <- read_ebd("ebd_sensitive_relMar-2020.txt")
ids <- intersect(sed$checklist_id, ebd$checklist_id)
sed <- filter(sed, checklist_id %in% ids)
ebd <- filter(ebd, checklist_id %in% ids)
zf <- auk_zerofill(ebd, sed, collapse = TRUE)

i took away unique=FALSE otherwise i had no column called checklist_id but when i write the command to intersect the file i have no absence and the zf has only checklist were the species was recorded.
any suggestion?

thanks again,
sofia

@mstrimas
Copy link
Contributor

Hmmm, as I think about this more, I don't think you can correctly zero fill the data without the matching sampling event data. I think you'll need to request the most recent version of the Great Green Macaw data so it will match the sampling event data.

@gking-aug
Copy link

I wanted to follow up on this issue as I'm having a similar problem with auk_zerofill giving the error: "Some checklists in EBD are missing from sampling event data."

In my case I have ensured that the versions of the EBD and sampling event data match (both are Jan-2021). However, I am using a custom downloaded EBD dataset (all observations in Canada) and the full sampling event data. Based on a previous issue (now closed -- see here) I'm wondering if a mismatch between a custom dataset is the underlying issue? Unfortunately it seems the only way to check this would be to download the complete EBD and at 90GB I'll admit to be being a bit reticent.

I read in both of the successfully filtered EBD and sampling event files (via read_ebd and ebd_sampling, respectively) and they definitely reveal a different number of records (2864 vs. 2052 for my particular filters -- a bounding box in Alberta). So that is probably the issue. But when I try out the suggestion from @mstrimas to manually subset I end up with 454 common checklist_id observations.

This is my first project looking at the eBird data, so maybe I'm missing something here, but it seems there is something strange and maybe zero-filled data REQUIRES the full datasets?

@BrittanyHBrown
Copy link

Hi @gking-aug just wondering if you ever found a solution for your problem?

I am having an almost identical issue to you, and am having troubleshooting the issue myself.

Dd you end up needing to download the full EBD dataset? Or did you find a way to match up the custom download ebd & sampling event files for zerofilling?

Thanks!

@gking-aug
Copy link

Hi @BrittanyHBrown. This is a really good question -- the project was a directed reading and I haven't touched it in a while. Let me quickly investigate what I ended up doing and I will follow-up and post here.

@nikkiregimbal
Copy link

nikkiregimbal commented Jul 10, 2024

Building off the initial question in this thread, I am also new to auk and getting the same error. In my case, I am trying to use auk_zerofill for multiple datasets independently. My code is working for all except one dataset, even though from what I can tell it's exactly the same. I have ensured that all the months that the data covers is consistent and that all species are reported. Here is my code:

`#My code works for 2019 (in addition to 5 other years of data)
US2019sed <- "Acadian Flycatcher/US_2019/ebd_US_acafly_201905_201908_smp_relMay-2024_sampling.txt"
US2019check <- read_sampling(US2019sed)
US2019ebd <- "Acadian Flycatcher/US_2019/ebd_US_acafly_201905_201908_smp_relMay-2024.txt"
US2019obs <- read_sampling(US2019ebd)

US2019checksub <- subset(US2019check, all_species_reported == TRUE)
US2019obssub <- subset(US2019obs, all_species_reported == TRUE)

zfUS19 <- auk_zerofill(US2019obssub, US2019checksub, collapse = TRUE)

#When I replicate this for 2020 data, I get the error that some checklists from the EBD are missing sampling event data
US2020sed <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024_sampling.txt"
US2020check <- read_sampling(US2020sed)
US2020ebd <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024.txt"
US2020obs <- read_sampling(US2020ebd)

US2020checksub <- subset(US2020check, all_species_reported == TRUE)
US2020obssub <- subset(US2020obs, all_species_reported == TRUE)

zfUS20 <- auk_zerofill(US2020obssub, US2020checksub, collapse = TRUE)`

If anyone has any ideas of what might be going on, I'd really appreciate some feedback! I tried re-downloading the 2020 dataset a couple times now in case there was something wrong with the download, but get the same error.

@mstrimas
Copy link
Contributor

First, you should be using read_ebd() to read in the observation data, so these lines:

US2019obs <- read_sampling(US2019ebd)
US2020obs <- read_sampling(US2020ebd)

Should be changed to

US2019obs <- read_ebd(US2019ebd)
US2020obs <- read_ebd(US2020ebd)

If you're still having problems after making that change, please post the error and we can try to troubleshoot it. Thanks!

@nikkiregimbal
Copy link

Thanks for the catch on the read_ebd @mstrimas. I updated that portion of my code and am still getting the same error.

US2020sed <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024_sampling.txt"
US2020check <- read_sampling(US2020sed)

US2020ebd <- "Acadian Flycatcher/US_2020/ebd_US_acafly_202005_202008_smp_relMay-2024.txt"
US2020obs <- read_ebd(US2020ebd)

US2020checksub <- subset(US2020check, all_species_reported == TRUE)
US2020obssub <- subset(US2020obs, all_species_reported == TRUE)

zfUS20 <- auk_zerofill(US2020obssub, US2020checksub, collapse = TRUE)

Error in auk_zerofill.data.frame(US2020obssub, US2020checksub, collapse = TRUE) : 
  Some checklists in EBD are missing from sampling event data.

I am stumped because the same code is working on other datasets. Thanks!

@mstrimas
Copy link
Contributor

This is a rare bug that I've describe here #79 (comment)

In your case, right before you call auk_zerofill(), add something like the following:

US2020obssub <- US2020obssub[US2020obssub$checklist_id %in% US2020checksub%checklist_id, ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants