Introduction

This repository contains the project developed for my Bachelor's thesis in Economics and Finance at the University of Bologna, where I aimed to detect the emergence of collusion in Prisoner Dilemma games played by two reinforcement learning algorithms using a Random Forest Classifier. If you find this topic particularly interesting, you can read the entire thesis here.

Project overview

The project is based on research from this paper which states that algorithmic pricing softwares powered by Reinforcement Learning can autonomously learn to collude. The main question addressed in this project is:

When and how is it possible to detect the beginning of tacit collusive behaviour between reinforcement algorithmic powered agents?

Problem definition

In the Prisoner's Dilemma game, players make one move at a time, and with each move, a time series of scores is created, indicating the optimal next move. These scores vary around 0, and depending on whether they are positive or negative, the RL algorithm will undertake cooperation or defection. At some point, the scores take values that remain strictly positive or negative, indicating the beginning of collusion

To visualize this, an example is provided below:

Every line refers to a current state

red (CC) both cooperate -> collusive state
black (DD) both defect
blue (DC) one defect and the other cooperates
green (CD) opposite of the blue line And the score (dQ) indicates what the next action will be:
positve -> next action defect
negative -> next action cooperate

The pattern shown in this image is the same in all experiments, where the red line departs from zero sooner than all the others, then after some thousand iterations, the other 3 lines take strictly positive or negative values.

Collusion is defined as the moment when both agents cooperate and will continue to cooperate for the next moves. This moment is visually obvious, as the red line becomes negative forever. To give a "date" to this event, the exact iteration can be identified by zooming in on the image, as shown below:

For this experiment is iteration 159290

Featues engineering

After identifying the critical dates, when the previous iteration is non-negative and the following is negative, the switch date (the date on which the red CC line becomes strictly negative) is labeled for all training and testng data

Features are created by calculating moving averages on different periods for every critical date, for states CC, DD, and DC:

100 interations before and after ml100, mu100
500 interations before and after ml500, mu500
ml1000, mu1000
ml3000, mu3000
ml5000, mu5000
ml8000, mu8000
ml30000, mu30000
ml50000, mu50000

Random forest classifier

A Random Forest Classifier (RFC) was trained and tested on the data generated by the Prisoner's Dilemma game. The classifier was trained on 2000 sessions (1000 per agent) and was tested on the same number of unknown observations. The most relevant features for the classifier were found to be the moving averages of the CC red line, as well as the longest averages of the DC blue line. The variable importance plot of this classifier is shown below:

The results of testing the RFC on the unseen data showed a high level of accuracy, with an Area Under the Curve (AUC) of 0.9985019 on the Receiver Operating Characteristic (ROC) curve, as shown below:

Conclusions

This project demonstrates that it is possible to detect the emergence of collusive behavior between Reinforcement Learning powered agents playing the repeated game of Prisoner's Dilemma with a high level of accuracy. This was achieved using a Machine Learning algorithm, a Random Forest Classifier. As pointed out by Calvano et al. “Today, the prevalent approach to tacit collusion is relatively lenient, in part because tacit collusion among human decision-makers is difficult to detect." This project provides a method for detecting collusion in the context of algorithmic agents and has the potential to inform regulatory approaches to collusion in digital markets.

Future work

This project is a proof-of-concept and future research could expand on this work by testing the classifier on different types of games or in more realistic settings. Additionally, further research could explore different machine learning algorithms or feature engineering techniques to improve the detection of collusion.

Note

The code and data used in this project can be found in the src and data folders of this repository. If you find this topic particularly interesting, you can read all my thesis here.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
images		images
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Project overview

Problem definition

Featues engineering

Random forest classifier

Conclusions

Future work

Note

About

Releases

Packages

Languages

manuelrech/Detecting-Collusive-Behavior-in-RL-Powered-Agents-Playing-the-Prisoner-s-Dilemma

Folders and files

Latest commit

History

Repository files navigation

Introduction

Project overview

Problem definition

Featues engineering

Random forest classifier

Conclusions

Future work

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages