Skip to content

Detecting using Random Forest the emergence of collusion in Prisoner Dilemma games played by two reinforcement learning algorithms

Notifications You must be signed in to change notification settings

manuelrech/Detecting-Collusive-Behavior-in-RL-Powered-Agents-Playing-the-Prisoner-s-Dilemma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository contains the project developed for my Bachelor's thesis in Economics and Finance at the University of Bologna, where I aimed to detect the emergence of collusion in Prisoner Dilemma games played by two reinforcement learning algorithms using a Random Forest Classifier. If you find this topic particularly interesting, you can read the entire thesis here.

Project overview

The project is based on research from this paper which states that algorithmic pricing softwares powered by Reinforcement Learning can autonomously learn to collude. The main question addressed in this project is:

When and how is it possible to detect the beginning of tacit collusive behaviour between reinforcement algorithmic powered agents?

Problem definition

In the Prisoner's Dilemma game, players make one move at a time, and with each move, a time series of scores is created, indicating the optimal next move. These scores vary around 0, and depending on whether they are positive or negative, the RL algorithm will undertake cooperation or defection. At some point, the scores take values that remain strictly positive or negative, indicating the beginning of collusion

To visualize this, an example is provided below:

alt text

Every line refers to a current state

  • red (CC) both cooperate -> collusive state
  • black (DD) both defect
  • blue (DC) one defect and the other cooperates
  • green (CD) opposite of the blue line And the score (dQ) indicates what the next action will be:
  • positve -> next action defect
  • negative -> next action cooperate

The pattern shown in this image is the same in all experiments, where the red line departs from zero sooner than all the others, then after some thousand iterations, the other 3 lines take strictly positive or negative values.

Collusion is defined as the moment when both agents cooperate and will continue to cooperate for the next moves. This moment is visually obvious, as the red line becomes negative forever. To give a "date" to this event, the exact iteration can be identified by zooming in on the image, as shown below:

alt text

For this experiment is iteration 159290

Featues engineering

After identifying the critical dates, when the previous iteration is non-negative and the following is negative, the switch date (the date on which the red CC line becomes strictly negative) is labeled for all training and testng data

Features are created by calculating moving averages on different periods for every critical date, for states CC, DD, and DC:

  • 100 interations before and after ml100, mu100
  • 500 interations before and after ml500, mu500
  • ml1000, mu1000
  • ml3000, mu3000
  • ml5000, mu5000
  • ml8000, mu8000
  • ml30000, mu30000
  • ml50000, mu50000

Random forest classifier

A Random Forest Classifier (RFC) was trained and tested on the data generated by the Prisoner's Dilemma game. The classifier was trained on 2000 sessions (1000 per agent) and was tested on the same number of unknown observations. The most relevant features for the classifier were found to be the moving averages of the CC red line, as well as the longest averages of the DC blue line. The variable importance plot of this classifier is shown below:

alt text

The results of testing the RFC on the unseen data showed a high level of accuracy, with an Area Under the Curve (AUC) of 0.9985019 on the Receiver Operating Characteristic (ROC) curve, as shown below:

alt text

Conclusions

This project demonstrates that it is possible to detect the emergence of collusive behavior between Reinforcement Learning powered agents playing the repeated game of Prisoner's Dilemma with a high level of accuracy. This was achieved using a Machine Learning algorithm, a Random Forest Classifier. As pointed out by Calvano et al. “Today, the prevalent approach to tacit collusion is relatively lenient, in part because tacit collusion among human decision-makers is difficult to detect." This project provides a method for detecting collusion in the context of algorithmic agents and has the potential to inform regulatory approaches to collusion in digital markets.

Future work

This project is a proof-of-concept and future research could expand on this work by testing the classifier on different types of games or in more realistic settings. Additionally, further research could explore different machine learning algorithms or feature engineering techniques to improve the detection of collusion.

Note

The code and data used in this project can be found in the src and data folders of this repository. If you find this topic particularly interesting, you can read all my thesis here.

About

Detecting using Random Forest the emergence of collusion in Prisoner Dilemma games played by two reinforcement learning algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages