A program to process WARC files and extract information from the HTML data within. The program analyses text to produce insights related to public sentiment, in several countries, towards their government. The user inputs 3 arguments: a WARC (web archive) file to be processed, a text file containing words defined as 'positive', and a text file containing words defined as 'negative'. The output produced is 4 lists: the first 3 being floats of statistics related to the counts of positive and negative words, and the last list being a list of the top-5 most occurring domain names in the file, along with their count.
-
Notifications
You must be signed in to change notification settings - Fork 0
A program to process WARC files and extract information from the HTML data within. The program analyses text to produce insights related to public sentiment, in several countries, towards their government.
davidika/Sentiment-analysis-of-web-pages
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A program to process WARC files and extract information from the HTML data within. The program analyses text to produce insights related to public sentiment, in several countries, towards their government.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published