Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryo 157 - add python and R notebooks for accessing NOAA data #58

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions notebooks/NOAA_Access/Python_download_NOAA_NSIDC_data.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
"<img src='./img/nsidc_logo.png'/>\n",
"\n",
"# How to download NOAA@NSIDC data using python\n",
"\n",
"</center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Tutorial Overview \n",
"This notebook demonstrates how to download NOAA@NSIDC data using python. It includes examples for downloading a single file and all the files in a directory.\n",
"\n",
"### Credits \n",
"This notebook was developed by Jennifer Roebuck of NSIDC.\n",
"\n",
"For questions regarding the notebook or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues)\n",
"\n",
"### Learning Objectives\n",
"\n",
"By the end of this demonstration you will be able to:\n",
"\n",
"1. Download a single file from a NOAA@NSIDC data set\n",
"2. Download all the files in a directory on the NOAA@NSIDC HTTPS server \n",
"\n",
"### Prerequisites \n",
"\n",
"1. The `requests` and `bs4` libraries are already installed. \n",
"\n",
"### Time requirement \n",
"\n",
"Allow approximately 5 to 10 minutes to complete this tutorial."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Tutorial Steps \n",
"\n",
"### Import necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#import the requests library \n",
"import requests\n",
"from bs4 import BeautifulSoup #TBD Describe what htis library does. Do we need to add it to our support set??"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Downloading a single file\n",
"This demonstrates how to download a single file.\n",
"\n",
"First we need to set the URL of the file we wish to download. The URL will follow the format of: `https://noaadata.apps.nsidc.org/NOAA/<path to data set and file>`\n",
"\n",
"where \\<path to data set and file\\> is specific to the data set and can be determined by exploring https://noaadata.apps.nsidc.org in a web browser. \n",
"\n",
"We will use the [Sea Ice Index (G02135)](https://nsidc.org/data/G02135) data set as an example, and download the text file containing daily sea ice extent values for the Arctic (N_seaice_extent_daily_v3.0.csv)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#URL of the file \n",
"file_url = \"https://noaadata.apps.nsidc.org/NOAA/G02135/north/daily/data/N_seaice_extent_daily_v3.0.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we need to create a HTTPS response object for that URL using the `get` method from the `requests` library. We will raise an exception if the response returns an error."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#Create a HTTPS response object\n",
"r = requests.get(file_url)\n",
" \n",
"try:\n",
" r = requests.get(file_url)\n",
" r.raise_for_status()\n",
"except requests.exceptions.RequestException as err:\n",
" raise SystemExit(err)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we need to set the filename that we want to save the downloaded file as, and download the file. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#Download and save the file\n",
"with open(\"N_seaice_extent_daily_v3.0.csv\", \"wb\") as f:\n",
" f.write(r.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Downloading all the files in a directory \n",
"This demonstrates downloading all of the files in a single directory.\n",
"\n",
"First we need to set the URL path of the directory we wish to download. It follows a similar format to the one described above for downloading a single file.\n",
"\n",
"Again we will use the [Sea Ice Index (G02135)](https://nsidc.org/data/G02135) data set as an example and download all the daily GeoTIFFs for October 1978. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#Set the URL of the directory we wish to download all the files from\n",
"archive_url = \"https://noaadata.apps.nsidc.org/NOAA/G02135/north/daily/geotiff/1978/10_Oct/\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we need to create an HTTPS response object for the URL, again using the `get` method from the `requests` library. \n",
"\n",
"Then we will use `BeautifulSoup` to parse all the filenames that are in the directory. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#Create a HTTPS response object\n",
"r = requests.get(archive_url)\n",
"\n",
"#Use BeautifulSoup to get a list of the files in the directory\n",
"data = BeautifulSoup(r.text, \"html.parser\")\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will create a URL for each of the files, set filenames for each of our downloaded files, and download the files. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#Loop through the list of the html links (excluding the first one which is just a link to the previous directory)\n",
"for l in data.find_all(\"a\")[1:]:\n",
" #generate URL to download each of the files \n",
" r = requests.get(archive_url + l[\"href\"])\n",
" print(r.status_code) #print status code\n",
" print(l[\"href\"]) #prints name of file\n",
" #Download and save files \n",
" with open(l[\"href\"], \"wb\") as f:\n",
" f.write(r.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Learning outcomes recap\n",
"\n",
"We have learned how to:\n",
"1. Download a single file from a NOAA@NSIDC data set\n",
"2. Download all the files in a directory related to a NOAA@NSIDC data set. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
7 changes: 7 additions & 0 deletions notebooks/NOAA_Access/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Download single or multiple NOAA@NSIDC data files

### Summary
In this tutorial we demonstrate how to download NOAA@NSIDC data, whether it be one file or all the files in a directory. The tutorial is provided in two different languages: Python and R.

We use one NOAA@NSIDC data set as an example:
* [Sea Ice Index (G02135)](https://nsidc.org/data/G02135)
88 changes: 88 additions & 0 deletions notebooks/NOAA_Access/R_download_NOAA_NSIDC_data.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@


---
title: "How to download NOAA@NSIDC data using R"
output: html_notebook
---

## 1. Tutorial Overview
This notebook demonstrates how to download NOAA@NSIDC data using R, it includes examples for downloading a single file and all the files in a directory.

### Credits
This notebook was developed by Jennifer Roebuck of NSIDC.

For questions regarding the notebook or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues)

### Learning Objectives

By the end of this demonstration you will be able to:

1. Download a single file from a NOAA@NSIDC data set
2. Download all the files in a directory on the NOAA@NSIDC HTTPS server

### Prerequisites

1. The library `rvest` installed

### Time requirement

Allow approximately 5 to 10 minutes to complete this tutorial.

## 2. Tutorial steps

### Import necessary libraries
We need to import the following libraries.
```{r}
library(base)
library(rvest)
```

### Set working directory
We need to set the directory we wish to download our files to. Edit the variable below to include the path to the directory where you wish to store the downloaded data.
```{r}
setwd("/Users/jero7025/Documents/NOAA/HTTPS_switch")
```


### Downloading a single file
This demonstrates how to download a single file, we will use the Sea Ice Index (G02135) data set as an example.
```{r}
# set the url for the file you want to download
url <- "https://noaadata.apps.nsidc.org/NOAA/G02135/south/daily/geotiff/2023/05_May/S_20230501_concentration_v3.0.tif"

# set the name of the file you are downloading
destination <- "S_20230501_concentration_v3.0.tif"

# download the file
download.file(url, destination, mode = "wb")
```

### Downloading all the files in a directory
This demonstrates downloading all of the files in a single directory, again we will use the Sea Ice Index (G02135) data set as an example.
```{r}
# set the URL for the directory you want to download files from
url <- "https://noaadata.apps.nsidc.org/NOAA/G02135/south/daily/geotiff/2023/05_May"

# read html content from url
page <- read_html(url)

# Get a list of the files listed in the directory at this url
files <- page %>% html_nodes("a") %>% html_attr("href")

for(i in 2:length(files)){
# generate the url for each of the files
u <- paste(url,files[i], sep="/")
# download each of the files
download.file(u,files[i], mode = "wb")
}


```

## 3. Learning outcomes recap

We have learned how to:
1. Download a single file from a NOAA@NSIDC data set
2. Download all the files in a directory related to a NOAA@NSIDC data set.


Loading
Loading