Skip to content

Latest commit

 

History

History
25 lines (19 loc) · 757 Bytes

File metadata and controls

25 lines (19 loc) · 757 Bytes

Web Scraping & Natural Language Processing of web articles

  • Extracted textual data from articles using given URLs and saved in text files.

  • Analysed text using sentiment analysis and readability analysis.

  • Computed variables such as polarity score, subjectivity score and fog index for each of the articles.

Contents:

  1. Data Extraction

  2. Sentimental Analysis

    • Cleaning using Stop Words
    • Creating dictionary of Positive and Negative words
    • Extracting Derived variables
      • POSITIVE SCORE
      • NEGATIVE SCORE
      • POLARITY SCORE
      • SUBJECTIVITY SCORE
  3. Analysis of Readability

    • Extracting Derived variables
      • SYLLABLE PER WORD
      • PERCENTAGE OF COMPLEX WORDS
      • FOG INDEX