Text Mining for Sustainability: Detecting Corporate Greenwashing with The Sustainable Development Goals

This repository is part of the Master's thesis Text Mining for Sustainability: Detecting Corporate Greenwashing with The Sustainable Development Goals and hosts all the tools that were used in the reseach.

scripts

The following scripts were used for my thesis. There were run in order as listed below as they require the output of the previous step. The PDFs are not stored on this repository but can be found on the websites of the companies.

pdf_extractor.py Extract paragraphs of text from PDF files
gn_links.py Collect all article urls from a Google News page.
article_scraper.py Scrape paragraphs of online news articles from a list of links
filter_data.py Only keep texts that are at least 20 tokens
aurora.py Implementation of The Aurora Universities Network SDG classifier. Requires queries.py to work. This classifier drops the windowing constrains from the original classifier.
osdg.py Label text with OSDG classifier. Requires that the OSDG docker container is running.
combine_columns.py Combines the output of aurora and OSDG from two columns into an extra column.
sentiment.py Add a column with a sentiment score from VADER.

links

This folder contains files with all the links to news articles that were used for the research.

classifier evaluation

Annotation_Guidelines.pdf The annotation guidelines that were used for the evaluation task.
corpus.csv The questions from the evaluation with gold labels.

csv

This folder contains all the data that was used for the research

html

This folder contains the saved HTML search results from Google News that were used for the research

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
classifier evaluation		classifier evaluation
csv		csv
html		html
links		links
scripts		scripts
.gitignore		.gitignore
README.md		README.md
SDG_Thesis.pdf		SDG_Thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Mining for Sustainability: Detecting Corporate Greenwashing with The Sustainable Development Goals

scripts

links

classifier evaluation

csv

html

About

Releases

Packages

Languages

dyonende/SDG

Folders and files

Latest commit

History

Repository files navigation

Text Mining for Sustainability: Detecting Corporate Greenwashing with The Sustainable Development Goals

scripts

links

classifier evaluation

csv

html

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages