Skip to content

This Python-based project is a Hacker News scraper that fetches and customizes data from the Hacker News website. It reads a list of URLs from a text file, scrapes the specified pages for news articles, and processes the data to filter and sort the articles based on the number of upvotes.

Notifications You must be signed in to change notification settings

Cherukuri-Thanu/Web-Scraping-Hacker-News-Website

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Web Scraping - Python

Description

This project is a custom scraper for the Hacker News website. It is designed to extract news articles from multiple pages of Hacker News, filtering and sorting them based on the number of upvotes. The final output includes articles that have garnered more than 99 upvotes, providing a curated list of popular and relevant news items.

Features

  • Scrapes multiple Hacker News pages.
  • Filters articles with more than 99 upvotes.
  • Sort articles based on upvote count.
  • Utilizes BeautifulSoup for efficient HTML parsing.

How to Use

  1. Clone this repository.
  2. Install the required dependencies: requests and beautifulsoup4.
  3. Add URLs of the Hacker News pages you want to scrape in URLs_list.txt.
  4. Run the script: python main.py.

Requirements

  • Python 3.x
  • requests
  • beautifulsoup4

Contact

Thanuja Cherukuri - [[email protected]]

About

This Python-based project is a Hacker News scraper that fetches and customizes data from the Hacker News website. It reads a list of URLs from a text file, scrapes the specified pages for news articles, and processes the data to filter and sort the articles based on the number of upvotes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages