Play the prisoner's dilemma game
I wrote this in my first year as a simple exercise in agent-based modelling, and also to help me to understand the special features of iterative, or iterated prisoner's dilemmas. In a 'one-shot' prisoner's dilemma game, the dominant strategy is always to defect, or confess. This, however, changes if you're playing several times against the same opponent. This gives them the opportunity to punish you if you cheat, so reducing or eliminating the incentive to defect. It's only as you near the end of a the game that it becomes more worthwhile to cheat, as they have less opportunity to then punish you. I'd been reading The Evolution of Cooperation by Robert Axelrod, in which he used a competiton to design a computer program to play in an iterative prisoner's dilemma tournament to demonstrate that when games are repeated, the best strategy is to co-operate, but to punish your opponent if they cheat. He goes on to show how this can explain why co-operation evolved rather than everyone acting selfishly.
I wanted to see what would happen if the 'agents' from Axelrod's competition played against humans. The format of the game is quite simple. There are a number of 'players', each with a different strategy. For example, one is the same as the 'tit-for-tat' agent which won Axelrod's competition. There are a number of others based on examples in the book, plus one that I designed myself (titfortatendgame) which tries to improve on tit-for-tat's performance. The tournament is in a round-robin format, where each player plays each other player once. Each game comprises approximately 12 rounds. The exact number may be slightly more or less. This random element is to eliminate end-game effects: where a player waits until the last round before defecting. The 'titfortatendgame' player tries to get around this by employing a similar random element to decide when to start defecting. Because the number of rounds played will vary by each player, the scores for each player are based on the average score for each round.
Defecting can be considered equivalent to confessing in the classic prisoner's dilemma scenario, or reneging in a cartel.
The scores are allocated as follows:
- If both players defect, each recieves the Punishment for mutual defection of 1 point.
- If both players co-operate they recieve the Reward for mutual co-operation of 3 points.
- If one player defects and the other co-operates, one recieves the Temptation to defect of 5 points whereas the other recieves nothing: the Sucker's payoff.
The average score for the game is shown next to each player's history of moves. The moves are colour-coded according to whether the player defected or co-operated.
The agents in the game have access to exactly the same information as the human playing. This means they only know that the game will be approximately 12 rounds, and they cannot see the other player's move before they make their own.
After each game played by the human, you will see who the opponent was, and they will then play against all of the other players. Of course, being computer programs, they can do this virtually instantly, so all of the games will be displayed at once. The league table of scores will then be updated and displayed.
Tips: Your aim is not to beat the player, but to manage an average score higher than the other players. Often this will mean having an equal score to your opponent. The human has a big advantage over the agents, as they never vary their strategies. You can start to recognise who you are playing against and adjust accordingly. For example, if you're playing against random there is no point in trying a rational strategy like tit for tat. The best things is to just defect every time. You won't get punished for it. But if you're playing tit-for-tat, defecting early will lead you to be punished. You may end up scoring more than that player, but other players will score higher by cooperating.