Scrape data from the Ocean City buoy site using GitHub Actions

Files in this repository relate to scraping the data on the Ocean City Real-time Whale Buoy (RTWB) website.

Summarize number of analyzed periods

This repository uses a GitHub Action to scrape the number of tracks per day on the “Automated detection data” table. The table is scraped and results posted to a Google Sheet on the private TailWinds Google Drive every day at 09:00 UTC.

Refer to the raw walkthrough document, or download the compiled version and open with your web browser.

This process uses the code in scrape_rtwb.R and a secret access token. The process to set the access token is outlined at google_and_github.html.

Should the Google Sheet get deleted, run scrape_rtwb_to_current.R and change the second date in line 27 with yesterday’s date to bring everything up to date.

Daily occurrence table

Another GitHub Action is used to pull in the table under “Data analyst review” on the RTWB website. The routine sources daily_occurrence_scraper.R and runs immediately after that outlined above at 9:00AM UTC.

The general idea is:

The two sheets named “Scraper - XXXX” are DELETED;
Two NEW sheets are made, including their time stamps in UTC;
“Scraper - Full table xxxx” is the table noted above on the main site, with color-coded detection/possible detection/no detection information converted to text;
“Scraper - Summary xxxx” is the sum of detections and possible detections per species, per month;
- The last row of “Scraper - Summary” are the respective column sums.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scrape data from the Ocean City buoy site using GitHub Actions

Summarize number of analyzed periods

Daily occurrence table

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scrape data from the Ocean City buoy site using GitHub Actions

Summarize number of analyzed periods

Daily occurrence table