Files in this repository relate to scraping the data on the Ocean City Real-time Whale Buoy (RTWB) website.
This repository uses a GitHub Action to scrape the number of tracks per day on the “Automated detection data” table. The table is scraped and results posted to a Google Sheet on the private TailWinds Google Drive every day at 09:00 UTC.
Refer to the raw walkthrough document, or download the compiled version and open with your web browser.
This process uses the code in
scrape_rtwb.R
and a secret access token. The process to set the access token is
outlined at
google_and_github.html
.
Should the Google Sheet get deleted, run
scrape_rtwb_to_current.R
and change the second date in line
27
with yesterday’s date to bring everything up to date.
Another GitHub Action is used to pull in the table under “Data analyst
review” on the RTWB
website. The routine
sources
daily_occurrence_scraper.R
and runs immediately after that outlined above at 9:00AM UTC.
The general idea is:
- The two sheets named “Scraper - XXXX” are DELETED;
- Two NEW sheets are made, including their time stamps in UTC;
- “Scraper - Full table xxxx” is the table noted above on the main site, with color-coded detection/possible detection/no detection information converted to text;
- “Scraper - Summary xxxx” is the sum of detections and possible
detections per species, per month;
- The last row of “Scraper - Summary” are the respective column sums.