Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 1.14 KB

File metadata and controls

19 lines (11 loc) · 1.14 KB

Scrape urls from multiple websites 2.0

This repository will help you to extract all the urls reported on a website and save it in an excel file. This code works on multiple weblinks provided in a csv file and thus saves you a lot of manual work. If you are working on scouting multiple websites for identifying press releases, presentations, annual reports etc., this code will come in handy and save alot of man-hours.

Instructions

  • pip install -r requirements
  • Run url_extract_2.0.py

Reference

I devised the solution from the following pages of the documentation:

  • [Urllib] package that collects several modules for working with URLs
  • [beautyfulsoup4] to scrape information from web pages
  • [feedparser] to parse RSS feeds in Python
  • [pandas] for data structuring