You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having a problem with a duplication error while trying to produce zero-filled data using the complete eBird dataset to create a smaller presence-absence dataset. While following both tutorials from "Best Practices for Using eBird Data" (https://ebird.github.io/ebird-best-practices/) and "Introduction to auk" (https://cornelllabofornithology.github.io/auk/articles/auk.html#quick-start), the step to collapse the zero-filled data results in each entry being duplicated, and most seem to be 322 duplicates.
For instance, here is the code I used while following the "Introduction to Auk" tutorial:
After collapse_zerofill, each entry duplicates around 322 times. Using the other tutorial from "Best Practices for Using eBird Data" works the same way, in which the entries duplicate after the code:
you seem to be using the same file for both the observation dataset and the checklists dataset. I also don't understand why you have this second round of filtering. The idea is to filter the observations and checklists at the same time, i.e.
I'm having a problem with a duplication error while trying to produce zero-filled data using the complete eBird dataset to create a smaller presence-absence dataset. While following both tutorials from "Best Practices for Using eBird Data" (https://ebird.github.io/ebird-best-practices/) and "Introduction to auk" (https://cornelllabofornithology.github.io/auk/articles/auk.html#quick-start), the step to collapse the zero-filled data results in each entry being duplicated, and most seem to be 322 duplicates.
For instance, here is the code I used while following the "Introduction to Auk" tutorial:
library(auk)
library(dplyr)
library(ggplot2)
library(gridExtra)
library(lubridate)
library(readr)
library(sf)
states <- c("US-GA", "US-IL", "US-CO", "US-IN", "US-WI", "US-FL", "US-AZ", "US-NY", "US-MO", "US-WA", "US-DE")
input_file <- "/Volumes/UES_LAB/UWIN_acad_perf_analysis/ebird/ebd_US_relJun-2023.txt/ebd_US_relJun-2023.txt"
output_file <- "ebd-filtered-states.txt"
ebird_data <- input_file %>%
auk_ebd() %>%
auk_date(date = c("2011-01-01", "2012-12-31")) %>%
auk_country(country = "United States") %>%
auk_state(states) %>%
auk_complete() %>%
auk_filter(file = output_file) %>%
read_ebd()
ebird_data %>%
glimpse()
f_ebd <- output_file
f_smp <- output_file
filters <- auk_ebd(f_ebd, file_sampling = f_smp) %>%
auk_state(states) %>%
auk_complete()
filters
ebd_sed_filtered <- auk_filter(filters, file = "ebd_filteredPA.txt", file_sampling = "sampling_filteredPA.txt")
ebd_sed_filtered
read_ebd(ebd_sed_filtered)
A tibble: 1,070 × 48
read_ebd(f_ebd)
A tibble: 1,070 × 48
read_ebd(f_smp)
A tibble: 1,070 × 48
here the data shows 1,070 entries and everything had worked thus far
ebd_zf <- auk_zerofill(ebd_sed_filtered)
ebd_zf
Zero-filled EBD: 1,096 unique checklists, for 322 species.
ebd_zf_df <- collapse_zerofill(ebd_zf)
class(ebd_zf_df)
ebd_zf_df
A tibble: 352,912 × 57
After collapse_zerofill, each entry duplicates around 322 times. Using the other tutorial from "Best Practices for Using eBird Data" works the same way, in which the entries duplicate after the code:
zerofill <- auk_zerofill(observations, checklists, collapse = TRUE)
It also results in the same total number of entries: 352,912. Using code to remove duplicates is unsuccessful, such as:
unique.data.frame(zerofill)
unique.array(zerofill)
unique.matrix(zerofill)
("zerofill" is the name of the zero-filled dataset, these result in no change)
Has anyone run into this issue or knows a possible solution? Thanks!
The text was updated successfully, but these errors were encountered: