Skip to content

Began as duplicate of IRE17 Python 2 repo, potentially for use as template.

Notifications You must be signed in to change notification settings

meli-lewis/web-scraping-for-journos

Repository files navigation

CAR18 Chicago

Introduction to Web Scraping

adapted from Alex Richards' (@alexrichards) excellent IRE17 class.

He'll also be teaching a repeat web scraping session Sunday!

This session will cover:

  • How web scraping will make your life easier
  • How to do so responsibly
  • Using third-party Python packages
  • Fetching web pages with Python
  • Navigating the HTML in those pages to get data
  • Structuring scraped data and writing it to a CSV
  • And a couple of tips on shortcuts with HTML tables!

Software requirements:

You should have Python on your machine. Type the following in Bash (on Mac OS, you can access it with an Application called Terminal) to check that you have the correct version for the class:

which python3

which should return something like

/Library/Frameworks/Python.framework/Versions/3.5/bin/python3

If not, and you're in the CAR18 class, you should flag down the instructor or a TA. If you're not in the class, download Python3.

If you already have Python 3, you should be able to run the command pip install -r requirements.txt after downloading this repository to get the packages listed below:

Have questions?

You can always:

Struggling with installation? Try this updated guide for Windows and OS X.

Resources:

Python

  • PyCAR for in-depth Python learning
  • CodeAcademy for Python syntax
  • Think Python, a popular introductory book whose digital edition is available free online

Scraping

The Internet

About

Began as duplicate of IRE17 Python 2 repo, potentially for use as template.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published