Skip to content

dyonende/SDG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Mining for Sustainability: Detecting Corporate Greenwashing with The Sustainable Development Goals

This repository is part of the Master's thesis Text Mining for Sustainability: Detecting Corporate Greenwashing with The Sustainable Development Goals and hosts all the tools that were used in the reseach.

The following scripts were used for my thesis. There were run in order as listed below as they require the output of the previous step. The PDFs are not stored on this repository but can be found on the websites of the companies.

  1. pdf_extractor.py Extract paragraphs of text from PDF files

  2. gn_links.py Collect all article urls from a Google News page.

  3. article_scraper.py Scrape paragraphs of online news articles from a list of links

  4. filter_data.py Only keep texts that are at least 20 tokens

  5. aurora.py Implementation of The Aurora Universities Network SDG classifier. Requires queries.py to work. This classifier drops the windowing constrains from the original classifier.

  6. osdg.py Label text with OSDG classifier. Requires that the OSDG docker container is running.

  7. combine_columns.py Combines the output of aurora and OSDG from two columns into an extra column.

  8. sentiment.py Add a column with a sentiment score from VADER.

This folder contains files with all the links to news articles that were used for the research.

This folder contains all the data that was used for the research

This folder contains the saved HTML search results from Google News that were used for the research

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published