Skip to content

Latest commit

 

History

History
185 lines (152 loc) · 4.51 KB

File metadata and controls

185 lines (152 loc) · 4.51 KB

Here's a flowchart to explain what is happening in your web scraping project for your README.md file on GitHub. I'll also provide an example of how to structure the README.md.

Flowchart:

  1. Initialize the R Markdown file

    • Load required libraries.
  2. Scraping Jackson University Data

    • MBA Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Marine Science Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Health Informatics Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Fine Arts Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
  3. Scraping Florida Atlantic University Data

    • Read HTML content.
    • Extract relevant nodes.
    • Combine text into a single string.
  4. Scraping University of North Florida Data

    • MBA Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Health Informatics Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • CS/DS Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Fine Arts Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
  5. Scraping Saint Leo University Data

    • Arts Psychology Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • CS Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Health Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • MBA Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
  6. Scraping University of Tampa Data

    • Finance Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • IT Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • MBA Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
    • Nursing Program
      • Read HTML content.
      • Extract relevant nodes.
      • Combine text into a single string.
  7. Combine Data

    • Combine all scraped data into a tibble.
    • Save the combined data as a CSV file.

Example README.md:

# Web Scraping University Programs

![Web Scraping](https://media.giphy.com/media/3oEjI6SIIHBdRxXI40/giphy.gif)

## Project Overview

This project is a web scraping script written in R, designed to extract and compile academic program information from multiple university websites. The data includes information from various departments such as business, marine science, health informatics, fine arts, and more.

### Libraries Used

- `rvest`
- `tidyverse`
- `data.table`
- `textdata`
- `tidytext`
- `tm`
- `SnowballC`
- `wordcloud`
- `RColorBrewer`
- `leaflet`
- `sf`
- and many more.

### Data Sources

- Jackson University
  - MBA Program
  - Marine Science Program
  - Health Informatics Program
  - Fine Arts Program
- Florida Atlantic University
- University of North Florida
  - MBA Program
  - Health Informatics Program
  - CS/DS Program
  - Fine Arts Program
- Saint Leo University
  - Arts Psychology Program
  - CS Program
  - Health Program
  - MBA Program
- University of Tampa
  - Finance Program
  - IT Program
  - MBA Program
  - Nursing Program

### Script Workflow

1. **Initialize**
   - Load necessary libraries.

2. **Scrape Data**
   - Extract relevant program information from multiple university websites.

3. **Combine Data**
   - Aggregate the scraped data into a single tibble.

4. **Save Data**
   - Write the combined data to a CSV file.

### File Structure

├── web_scraping.Rmd ├── uni_data.txt ├── README.md └── images └── web_scraping.gif


### How to Run

1. Install the required R packages.
2. Clone the repository.
3. Run the `web_scraping.Rmd` file in RStudio or any R Markdown-compatible editor.
4. The output data will be saved as `uni_data.txt`.

### Example Output

```r
# Sample of combined data
print(head(uni))

Author

Dhruv Jain


Feel free to reach out for any queries or contributions!