2.18. Lecture 17: Evolution of cooperation¶

Before this class you should:

Read Think Complexity, Chapter 12

Before next class you should:

Make sure you have a copy of The Alignment Problem

Note taker: Alec McBurney

Today’s class was an introduction to the evolution of cooperation. These notes will also discuss the problem of altruism.

2.18.1. The Evolution of Trust Simulation¶

We started the class with another simulation by Nicky Case. This simulation modeled a game.

Rules:

Two players.
There is a trading coin machine between both players.
Players cannot communicate.
Player options: cooperate or cheat.
- Cooperate: Insert a coin (-1 coin).
- Cheat: Don’t insert a coin (-0 coins).
If a player cooperates then the other player gets 3 coins (+3 coins).
If a player cheats then the other player receives no coins (+0 coins).

Goal:

Have more coins than the other player by the end.

Best moves for 1 turn:

Other Players’ Move	Your Best Move	Other Player	You
Cheat(-0 coins)	Cheat (-0 coins)	+0 coins	+0 coins
Cooperate(-1 coin)	Cheat (-0 coins)	-1 coin	+3 coins

Cheating is the best choice in both scenarios because it leads to either you and your opponent having the same amount of coins or you having 4 more coins than them by the end of the round.

Funnily enough this leads to “the problem of nice” as discussed in Think Complexity chapter 12.2. If the game had only one round, and every player understood the reasoning behind it, they would all choose to cheat. But during lecture, that’s not what happened. Most of us chose to cooperate instead. This seems to come down to just plain altruism where people are willing to deal with the negative outcome towards themselves in exchange for a better outcome for the other person.

2.18.1.1. Play Styles¶

There are several other play styles that could be used in this game once you begin to play for longer rounds:

Copycat: In this simulation, the copycat begins with cooperation and then just copies whatever the last move was. This is also called “tit-for-tat”.
Always Cheat: This strategy is to only ever cheat. This reflects a natural selection mindset of every person for themselves and no cooperation.
Always Cooperate: This strategy involves only cooperating, focusing on mutual benefit and altruism.
Grudger: The grudger trusts the other player and cooperates but, once the other player cheats, the grudger will also cheat for the rest of the game.
Detective: The detective tries to guess the opponent’s player type. The move order is: cooperation, cheat, cooperation, cheat. If the opponent cheats back, the detective plays like a Copycat; if not, they play as an Always Cheat.

Given a tournament between these 5 play styles with 10 rounds per match, it turns out that the Copycat is the best out of them in this scenario.

The same outcome happens with a new tournament style. This time multiple tournaments will happen until only one play style remains. At the end of each tournament, the bottom 5 players will be removed and replaced by the top 5. The always cheat comes close to winning in the 2nd type of tournament but the copycats being able to cooperate off of each other leads to much larger scores by the end. This can be swapped, leading to the Always Cheat winning by using a low enough number of rounds per tournament.

2.18.1.2. Mistakes¶

An issue with the Copycat play style is that if there are two Copycats and one player accidentally cheats, then there is an infinite back and fourth of cheating and cooperating.

Some other play styles were found that were capable of dealing with this problem:

Copykitten: The Copykitten is similar to copycat but only cheats back after getting cheated twice in a row.
Simpleton: This play style involves the Simpleton repeating their last move if the other player cooperates or doing the opposite if the other player cheats.
Random: Exactly like it sounds. Zero thought all 50/50 chance.

Given a similar tournament like before with mostly Always Cooperates, Copycats, Copykittens, Simpletons, and Randoms, the Simpleton actually comes out on top since it takes advantage of the Always Cooperates. If the Simpleton makes a mistake and cheats then it will never swap back to cooperating because the Always Cooperate always cooperates.

If you were to swap out the Always Cooperates with Always Cheats however, the Copykitten comes out on top even though it never fully wipes out Copycat.

2.18.1.3. Conclusions¶

With the knowledge of future interactions, trust can be gained which will alow a relationship to flourish.
The game requires a non-zero-sum outcome for trust to develop, since trust depends on both players having something at stake and believing the other will reliably choose to cooperate.
Miscommunication can very easily break trust but being forgiving can make up for a smaller amount of miscommunication

2.18.2. Robert Axelrod and the Prisoner’s Dilemma¶

Robert Axelrod was a political scientist at the University of Michigan. In the late 1970s he set up a tournament for participants to submit computer programs representing strategies for playing iterated Prisoner’s Dilemma.

The Prisoner’s Dilemma is very similar to one round of The Evolution of Trust simulation that was previously discussed. The game goes like this:

Two prisoners are placed in solitary confinement with no means of communicating with each other. The prosecutors cannot fully convict either prisoner but are capable of convicting both on a lesser charge with the evidence they have. The prosecutors give the prisoners options:

If both prisoners betray (defect) each other then both will serve 2 years.
If one prisoner remains silent (cooperates) then two things will happen:
- that prisoner will be set free.
- the other prisoner will serve 3 years.
If both prisoners remain silent then both will only serve 1 year.

The scenario may be different but the consequences per choice are very similar with the best option being always to betray the other as that is the only way to be set free.

Axelrod noticed that a strategy that did surprisingly well was one called “tit-for-tat”, or Copycat as seen earlier. In both strategies, the two players basically copy the other’s previous move to give what you get. Some may consider this to be the most ‘fair’ way to play. During his analysis he also came up with several characteristics that the best strategies tended to share:

Nice: Cooperating on the first round and generally cooperating as often as they defect in subsequent rounds.
Retaliating: Retaliating if the opponent defects did much better than cooperating all the time.
Forgiving: Overly vindictive strategies ended up punishing themselves as well as their opponents.
Non-Envious: Rarely outscoring their opponents and just having a good enough score against a wide variety of opponents.

2.18.2.1. The Problem of Altruism¶

The problem of altruism refers to how, even though it is against their better interests, people tend to choose the option that benefits someone else even if it leads to a less favourable outcome for themselves. One reason why people may pick this option is because it feels good to help someone and it makes one feel better about the world. Altruism seems to be innate from normal brain development which leads to it being passed down through genetic information. If animals are all in constant competition with each other to survive and reproduce then you’d think that altruism would have been left behind through evolution. In a society of both selfish and altruistic people you’d think that the altruistic ones would suffer. This contradiction is the problem of altruism, why haven’t those genes died out yet?

Axelrod offers a possible answer through his tournament to how altruism hasn’t evolved out of the human population yet. It’s possible that the genes for altruism are adaptive.

2.18.2.2. Conclusions¶

The simulations from this chapter lead to the following conclusions:

When starting a scenario with 1 always cooperate, 1 always defect, and 1 “tit-for-tat” the always defect may quickly dominate, but this also means that the always defects won’t have as many others to take advantage of. This leads to the the defectors being vulnerable to invasion of nicer strategies. The more nicer strategies there are, however, the more dominant the defectors are. This leads to an oscillation back and fourth between the nicer and more selfish strategies.

Summary:

Large populations of one strategy are easily invaded by the other.
The majority strategy oscillates between good and selfish with a more nice average.
TFT was strong in Axelrod’s tournament but is less so in evolving populations.
Only a small amount of retaliation is needed. Not full TFT.
There is no one stable optimal strategy for an evolving population.

As simple as the agents from the program are, they still help understand human nature a little better. It’s very possible that the reason behind the inclination for ooperation, retaliation, and forgiveness come from the fact that those brains were more likely to propagate whereas less altruistic brains were less likely to propagate.