Skip to content

reeeeemo/newscrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

newscrape

Python-based news scraper that collects news from Canadian news outlets via user-inputted keywords to filter article titles and body text

How to Use?

When you run the executable, you should see a GUI pop up with some info text.

What are Keywords and Entrywords? Keywords are words that the scraper will use to filter article titles. Entrywords are words that the scraper will use to filter the text inside of the article.

To add/remove a Keyword/Entryword:

  • Type in the entry box your keyword (caps insensitive)
  • Click the "Keyword" or "Entryword" radio button
  • Click "Add"
  • If removing, select the item in the list and click "Remove"

Once given all keywords and entrywords, click "Submit" and it will start the scraping process.

The output CSV will be in output/news_articles.csv

Current list of news sites parsed

More will be added if there is demand

About

Web Scraper for news related to Canada via keywords inside of articles and their titles.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages