This repository contains code to reproduce the results in the paper ``Delayed takedown of illegal content on social media makes moderation ineffective''.
The model is an extension of SimSoM: A Simulator of Social Media
data
: contains raw & derived datasetsexample
: contains a minimal example to start using the SimSoM modelexperiments
: experiment configurations, results, supplementary data and .ipynb noteboooks to produce figures reported in the paperlibs
: contains the extended SimSoM model package that can be imported into scriptsworkflow
: scripts to run experiments
We include two ways to set up the environment and install the model
Run make
from the project directory (SimSoM
)
We use conda
, a package manager to manage the development environment. Please make sure you have conda or mamba installed on your machine
1.2.1. Create the environment with required packages: run conda env create -n simsom -f environment.yml
1.2.2. Install the SimSoM
module:
- activate virtualenv:
conda activate simsom
- run
pip install -e ./libs/
Run the notebooks in experiments/figures
to visualize the experiment results in the paper
The steps to reproduce the results from scratch, rather than using the provided results in experiments/results
, are outlined below.
Warning: following these steps will overwrite the content of experiments/results
.
All scripts are run from the project root directory, simsom_removal
3.1.1. Unzip the data file: unzip data/data.zip -d .
3.1.2. Automatically run all experiments to reproduce the results in the paper by running 2 commands:
- make file executable:
chmod +x workflow/rules/run_experiment.sh
- run shell script:
workflow/rules/run_experiments.sh
This script does 2 things
- Create configuration folders for all experiments (see
experiments/config
for the results of this step) - Run the
run_exps.py
script with an argument to specify the experiment to run:- vary_tau: main results
- vary_group_size: robustness check for varying group sizes
- vary_illegal_probability: robustness check for varying illegal probabilities
- vary_network_type: robustness check for varying network structures
We are interested in the prevalence of illegal content and engagement metrics such as reach and impressions. To aggregate these metrics, we need to parse the experiment verbose tracking files. To parse these files, run:
- For reach and impressions:
python workflow/scripts/read_data_engagement.py --result_path experiments/<experiment_name> --out_path experiments/results/<experiment_name>
- For prevalence of illegal content:
python read_data_illegal_count.py --result_path experiments/<experiment_name> --out_path experiments/results/<experiment_name>
See point 2 above to visualize the newly created results.
The empirical network is created from the Replication Data for: Right and left, partisanship predicts vulnerability to misinformation, where:
measures.tab
contains user information, i.e., one's partisanship and misinformation score.anonymized-friends.json
is the adjacency list.
We reconstruct the empirical network from the above 2 files, resulting in data/follower_network.gml
. The steps are specified in the script to create empirical network
Check out example
to get started.
- Example of the simulation and results:
example/run_simulation.ipynb
- SimSoM was written and tested with Python>=3.6
- The results in the paper are based on averages across multiple simulation runs. To reproduce those results, we suggest running the simulations in parallel, for example on a cluster, since they will need a lot of memory and CPU time.