2.19. Lecture 17: Evolution of cooperation

Before this class you should:

  • Read Think Complexity, Chapter 12

Before next class you should:

  • Make sure you have a copy of The Alignment Problem

2.19.1. Note Taker: Mouner Dabjan

2.19.2. Remarks before the lecture started:

1. This will be the final installment of this type of lecture. Starting Thursday, there will be an alternative mode. Guest lecture from ChainML on Thursday by Ethan Jackson. Professor will attend it as well!

2. Next week Tuesday: initial first round of debates, professor sent out an email to all participants and expecting a reply to it.

  1. Next week Thursday: project pitch sessions, professor also sent an email and expecting a respone soon.

2.19.3. Overview:

Building on from last week’s material, we begin by asking a question of what is the prisoner’s dilemma?

To answer this question, we have to look at the biological and philosophical sides:

  1. Biological: Prisoner’s dilemma studies the conflict between animal’s behaviours of natural selection and their tendency to help each other.

  2. Philosophical: The prisoner’s dilemma highlights the nuanced moral philosophies of humans, suggesting that what we perceive as ‘good’ or ‘evil’ behaviors are not absolute but potentially influenced by a complex interplay of individual choices and environmental factors.

2.19.4. Game of Trust:

Introducing the Game of Trust. The class went over the first level of the game where professor explained the scenarios and Students chose the best course of action to take. The game is between two players. They can either choose to cooperate or cheat. If the two players cooperate, will both be rewarded with 2 coins each. If one player cheats and the other cooperates, whoever cheats will get 3 coins while the cooperating player will lose 1 coin. If both players cheat, they will both get 0 coins. The professor presented two scenarios in the first level. The first one states that if the opposing player is cheating, would we cheat or cooperate. The second one is if the opposing player is cooperating would we cheat or cooperate. The class choose to cheat in both scenarios as it will maximize our profits in coins.

Breakout rooms were created and Students were placed randomly in them to play the second level repeated game. This game is similar to the first level. However, the Students don’t know what the opposing player is going to choose.The Students played against 5 different opponents anywhere between 3 to 7 rounds. Students had to keep their final scores to compare after the session concluded. At the end of the level, we got introduced to the different characters in the game: Copycat, All Cooperate All Cheat, Grudger, and Detective. Each of these characters possess a unique strategy in the game. Copycat starts with cooperate and aftweards copies whatever the player did in the last round. Always Cheats always cheats. Always Cooperate always cooperates. Grudger starts with cooperate and keeps cooperating until the player cheats. Once they detect cheating they will always cheat. Detective starts with cooperate,cheat, cooperate, cooperate. If the player cheats they will act like Copycat. If the player never cheats back, they will act like Always Cheat.

The professor moved onto the next level, one tournament, where characters played each other to determine which strategy is ideal for the game. The tournament takes the 5 characters explained and puts them in a single tournament. This is 10 paired matches with 10 rounds per match with the winner being the character with the highest number of coins. The class voted for Copycat to be the winner as they believed they posses the best staretgy in this game. At the end of the tournament, the class was right and Copycat was the winner.

Moving onto the next level, repeated tournament, where each round the 5 worst players are eliminated and the best 5 players are cloned. To start off, there will be 15 Always Cooperates, 5 Always Cheats, and 5 Copycats. This a reminder of last week’s lecture. This level is associated with the lecture title as it focuses on how various tournament scenarios contribute to the evolution of play strategies to identify the most effective approach. The class voted for Copycat to win it. As the tournaments are being played, we noticed that in rounds 1 to 3 All Cheats are acuumlating the most coins and All Cooperates are the worst. By round 3, All Cooperates are wiped out and Copycats are the most dominant. Continuing with the rounds, Copycats are the winners. The reason why Always Cheats were winning rounds 1 to 3 is because they were exploiting the Always Cooperates and accumlating their coins from them. However, by round 3 there were no Always Cooperates and this is where the Copycats started exploiting the Always Cheats.

At this point, we are assuming that the world is full of trust which is far from realtiy. We moved onto the next level, The Evolution of Distrust, where we were able to modify the number of rounds played in the tournament and noticed that the outcome is tied to the number of rounds played. The higher number of rounds played: the winner is Copycat. To be specific, if the number of rounds played is greater than 5, Copycat wins otherwise, Always Cheats wins. This is tied to the idea that the number of rounds refers to the number of interactions the characters are having with each other. The more interactions there are, the more trust there is and vice versa. Next we adjusted the payoff outcome of the tournament. We noticed that Copycat wins in normal payout however, Always Cheats wins when there is a decrease change in the cooperate reward while Always Cooperates wins when there is an increase in the cooperate reward. This makes us conclude that, when the stakes for cooperating are higher, Always Cooperate can exploit the Copycats. Whereas, when the cooperating reward is low, Always Cheats will exploit both Copycats and Always Cooperates. This makes us conclude that the more imbalance there is, the more distrustfulness.

Finally, we explored the possibility of making mistakes in the final level, Making Misteaks, where it explains that some miscommunication could occur. To deal with those mistakes, three new characters are introducted: Copykitten, Simpleton, and Random. Each with different strategy to the game. Copykitten is similar to Copycat. However, it only cheats back after two consecutive cheats by the other player. It believes that the first one could be a mistake. Simpleton starts with cooperating, if the player cooperates back they will do the same thing as their last move. If the player cheats back, they will do the opposite thing as last move. Random plays cheat or cooperate randomly with a 50/50 chance. A new tournament was launched with Copykitten, Simpleton, Random, Copycat and All Cooperate to see which strategy is the best with 5% chance of mistakes happening. The class chose Copykitten to win however the winner was Simpleton. Another tournament was played, but swapping All Cooperate with All Cheat. The class chose Copycat to win however, Copykitten won. At the end, we examined the effect of miscommunication and the effect it has on the best strategy. We determined that 1% to 9% miscommunication results in Copykitten as winner, 10% to 49% communication results in Always Cheat winning and anything greater than 50% communication results in no winners. The reason behind this conclusion is that a little miscommunication can lead to forgiveness and hence the reason why Copykitten is winning it. Whereas when the miscommunication becomes too much, it turns into distrust and this is where Always Cheats will always win.

2.19.5. Relationships:

This demo is based on the book by Robert Axelrod: Evolution of Cooperation. He was organising tournaments to test the prisoner dilemma. The participants played multiple tournaments which helped Robert examine their behaviour based on the their strategy in those tournaments.

He identified the characteristics of the winning strategies:

  1. Nice: Cooperate during the first round and generally cooperate as often as they defect in subsequenct rounds.

  2. Retaliating: Strategies that cooperate all the time did not do as well as strategeis that retaliate if the opponent defects.

  3. Forgiving: Strategies that were too vindictive tended to punish themselves as well as their opponents.

  4. Non- envious: They seldom outscore their opponents; they are successful because they do well enough against a variety of opponents.

2.19.6. Jupyter Notebook:

Evolution of Cooperation: Week 13 chapter:

  1. Reusing the Simulation and Instrument classes from Chapter 11 notebook. - This setup allows for the simulation of various cooperation strategies within a defined environment.

  2. choose_dead and choose_replacements methods are from Thursday’s class. - These methods determine which agents die and how replacements are chosen based on survival probabilities and reproduction weights.

Implementation of choose_dead and choose_replacements methods
def choose_dead(self, ps):
    """Choose which agents die in the next timestep.

    ps: probability of survival for each agent

    returns: indices of the chosen ones
    """
    n = len(self.agents)
    is_dead = np.random.random(n) > 0.9
    index_dead = np.nonzero(is_dead)[0]
    return index_dead

def choose_replacements(self, n, weights):
    """Choose which agents reproduce in the next timestep.

    n: number of choices
    weights: array of weights

    returns: sequence of Agent objects
    """
    agents = np.random.choice(self.agents, size=n, replace=True)
    replacements = [agent.copy() for agent in agents]
    return replacements
  1. The Agent class incorporates strategy by responding to actions based on historical interactions. - Defines agent behaviors including initialization, response strategy, score updating, and mutation.

Definition of the Agent class
class Agent:

    keys = [(None, None),
            (None, 'C'),
            (None, 'D'),
            ('C', 'C'),
            ('C', 'D'),
            ('D', 'C'),
            ('D', 'D')]

    def __init__(self, values, fitness=np.nan):
        """Initialize the agent.

        values: sequence of 'C' and 'D'
        """
        self.values = values
        self.responses = dict(zip(self.keys, values))
        self.fitness = fitness

    def reset(self):
        """Reset variables before a sequence of games.
        """
        self.hist = [None, None]
        self.score = 0

    def past_responses(self, num=2):
        """Select the given number of most recent responses.

        num: integer number of responses

        returns: sequence of 'C' and 'D'
        """
        return tuple(self.hist[-num:])

    def respond(self, other):
        """Choose a response based on the opponent's recent responses.

        other: Agent

        returns: 'C' or 'D'
        """
        key = other.past_responses()
        resp = self.responses[key]
        return resp

    def append(self, resp, pay):
        """Update based on the last response and payoff.

        resp: 'C' or 'D'
        pay: number
        """
        self.hist.append(resp)
        self.score += pay

    def copy(self, prob_mutate=0.05):
        """Make a copy of this agent.
        """
        if np.random.random() > prob_mutate:
            values = self.values
        else:
            values = self.mutate()
        return Agent(values, self.fitness)

    def mutate(self):
        """Makes a copy of this agent's values, with one mutation.

        returns: sequence of 'C' and 'D'
        """
        values = list(self.values)
        index = np.random.choice(len(values))
        values[index] = 'C' if values[index] == 'D' else 'D'
        return values
  1. The Tournament class simulates competitions between agents to assess fitness based on the Prisoner’s Dilemma. - Defines the competition framework, including scoring and agent matchups.

Definition of the Tournament class
class Tournament:

payoffs = {('C', 'C'): (3, 3),
           ('C', 'D'): (0, 5),
           ('D', 'C'): (5, 0),
           ('D', 'D'): (1, 1)}

num_rounds = 6

def play(self, agent1, agent2):
    """Play a sequence of iterated PD rounds.

    agent1: Agent
    agent2: Agent

    returns: tuple of agent1's score, agent2's score
    """
    agent1.reset()
    agent2.reset()

    for i in range(self.num_rounds):
        resp1 = agent1.respond(agent2)
        resp2 = agent2.respond(agent1)

        pay1, pay2 = self.payoffs[resp1, resp2]

        agent1.append(resp1, pay1)
        agent2.append(resp2, pay2)

    return agent1.score, agent2.score

def melee(self, agents, randomize=True):
    """Play each agent against two others.

    Assigns the average score from the two games to agent.fitness

    agents: sequence of Agents
    randomize: boolean, whether to shuffle the agents
    """
    if randomize:
        agents = np.random.permutation(agents)

    n = len(agents)
    i_row = np.arange(n)
    j_row = (i_row + 1) % n

    totals = np.zeros(n)

    for i, j in zip(i_row, j_row):
        agent1, agent2 = agents[i], agents[j]
        score1, score2 = self.play(agent1, agent2)
        totals[i] += score1
        totals[j] += score2

    for i in i_row:
        agents[i].fitness = totals[i] / self.num_rounds / 2

2.19.7. Conclusion:

The lecture on “Evolution of Cooperation” explores the intricate mechanisms underlying cooperation and competition, not only within human societies but also in biological systems. This concept expands on the fundamental idea of the prisoner’s dilemma to examine the intricacies of moral philosophy and biological imperatives. The interactive “Game of Trust” exposes students to a range of strategies, including cooperation and deceit, and demonstrates the long-term consequences of these strategies in repeated interactions. The simulation demonstrates the importance of characteristics such as retaliation, forgiveness, and the absence of envy in promoting cooperative behaviour over a period of time. Robert Axelrod’s research on the evolution of cooperation through repeated tournaments offers a tangible illustration of these principles in practice, emphasizing the significance of kindness, retaliation, forgiveness, and lack of envy in effective strategies. The provided Jupyter Notebook provides a concrete illustration of these principles using code, allowing students to actively participate in the dynamics of cooperation and competition. This lecture elucidates the circumstances in which cooperation can develop and persist, while also stimulating critical analysis of the equilibrium between personal interests and the overall welfare of a group in different situations.