2.8. Lecture 7: Graphs

Before this class you should:

  • Read Think Complexity, Chapter 2

  • Read the Wikipedia page about graphs at https://en.wikipedia.org/wiki/Graph_(discrete_mathematics) and answer the following questions:

    1. What is a simple graph? For our discussion today, we will assume that all graphs are simple graphs. This is a common assumption for many graph algorithms – so common it is often unstated.

    2. What is a regular graph? What is a complete graph? Prove that a complete graph is regular.

    3. What is a path? What is a cycle?

    4. What is a forest? What is a tree? Note: a graph is connected if there is a path from every node to every other node.

Before next class you should:

Note taker: Adedoyin Adelabu

2.8.1. Introduction

The type of graphs that are being discussed in this course are a representation of a system that has discrete interconnected elements. These elements are represented by nodes or vertices (interchangeable), and connections between them are called edges. The examples shown in the lecture pdf can be accessed through an interactive site called https://github.com/graphistry/pygraphistry.

2.8.2. Where you can see graph structures

Contact tracing during covid: A practical usage of graph structures is in contact tracing. People in this scenario would represent nodes and interactions would represent edges. Graph algorithms can be help in evaluating the effectiveness of containment measures for scenarios like a pandemic, through the study of graph traversals.

Social media sites: A friendship with someone on Facebook would be a bidirectional edge because you must add someone back to be their friend. Whereas you can follow someone on X without them ever following you back.

Detective boards: In our course we will be discussing homogeneous graphs which only represent one entity. The other type of graph is a heterogeneous graph, they can represent many entities. For example, a detective board can be full of pictures, fingerprints and notes, this would be considered heterogeneous graph of the evidence relevant to the case.

2.8.3. Definitions

Graph types & properties:

At the most basic level graphs are either directed or undirected Directed graphs are graphs where the edges clearly point toward which node it is going to and where it originates from, e.g A->B. Undirected graph’s edges do not show direction between nodes, e.g A-B.

Simple graphs are undirected graphs without two properties: multiple edges or loops. An example of multiple edges would be A-B, B-A, here A and B connect twice. An example of a loop would be a node that connects to itself. e.g A-A

Regular graphs are graphs where each node has the same number of neighbours, for example “A<->B<->C<->A”, here A, B and C all have two neighbours.

A complete graph is a graph where each pair of nodes is joined by an edge. The graph in the regular graph section can be considered a complete graph. Also a complete graph must be regular, as each node is connected, and every node must have n-1 neighbors (n is number of nodes).

Another type of graph is a connected graph, which is a graph with a path from every node to every other nodes. Paths will be explained in the next section.

Traversing graphs

There are multiple ways to describe going from node to node in a graph, and the simplest description is a path. A path is a sequence of nodes with an edge between each consecutive pair. Typically, you can’t repeat nodes in a regular path. If the starting and ending node is repeated that is considered a closed path.

Other ways of going from node to node includes:
  • Walk:
    • Another traversal between nodes but allows for repetition between nodes.

  • Cycle:
    • A path where each node is visited exactly once besides the start and ending.

  • Closed walk:
    • Think of a cycle, but a node in between the start and end is visited again.

Graph structures

More than one graph can be depicted in an image. When multiple connected graphs with no cycles are depicted they are called trees When two or more trees are depicted it is considered a forest. Take nature for example; Trees in a forest have leaves that their branches are connected directly to, but these trees themselves are not directly connected to each other.

2.8.4. Python Library for Graphs

We will be using a python library called NetworkX.

The following are some features of this library:

  • Built in data structures for graphs like directed and undirected ones.

  • Standard graph algorithms.

  • Generators for graph models like Erdos-Renyi

  • Support for heterogeneous graphs with nodes that can be any type.

  • Edges can hold any type of data like distances between series or time series.

NetworkX has a newer version, version 3.2, but we are using NetworkX version 2 until everything is fully tested locally.

2.8.5. Chapter 2 notebook:

This section is about what we learned through the Jupyter notebook. This notebook builds up towards the Erdos-Renyi graph model. Topics include: directed, undirected and complete graphs; connectivity and the probability of connectivity are all areas that will be touched upon in this notebook.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import seaborn as sns

from utils import decorate, savefig

# I set the random seed so the notebook
# produces the same results every time.
np.random.seed(17)
(Allen Downey 2016)
Explaining the code:
  • Matplotlib inline is a special directory to create the plots inline embedded into your code editor instead of a new window

  • Svg is an image format used for better rendering of images.

  • Random seed is for random number generation.

  • The number in np.random.seed(x) ensures that if you rerun something you will get the same random output. It is best for reproducibility.

Creating Graphs using NetworkX
  • You can initialize directed graphs with x.Digraph()

  • Digraph on its own gives blank directed graph.

  • Nodes can be added by using x.add_node. (it can by any type).

  • Add edges by use add_edge and putting your items in between separated by commas. e.g: G.add_edge('Alice', 'Bob')

  • Nodes can also be added from dictionaries or lists by using the .add_edges_from() function

  • Use .edges() to see your edges. it will be displayed as a tuple.

It’s important the order you place your names in add_edge. The first name is the director, and the second name is the receiver. The director is the node from where the edge starts, and the receiver is the node where the edge ends. for example in G.add_edge(1,3) One points to three (A->B).

Random Graphs

To simulate randomness, a helper function that controls the rate at which heads or tails appears in a coin flip is used. In the random pair function, for each edge a coin is flipped, if it’s true the edge is added if it’s false it is not. Running this gives a random graph.

def random_pairs(nodes, p):
for edge in all_pairs(nodes):
    if flip(p):
        yield edge

If p is high, a lot of random edges are added and if p is low, less edges are added. For an Erdos-Renyi graph two things are needed: the number of nodes and the probability. The number of edges depends on this probability; the closer p is to 1 the more edges are created.

Connectivity

To check whether a graph is connected, you must start by finding all nodes that can be reached from a given node. This can be done by using a stack to check each node that has not been seen and then add its neighbours. A stack is a data structure that places elements in a list with a first in last out basis.

def reachable_nodes(G, start):
seen = set()
stack = [start]
while stack:
    node = stack.pop()
    if node not in seen:
        seen.add(node)
        stack.extend(G.neighbors(node))
return seen

Stack explanation In the code above the first element is taken out with the .pop() function operation and set to our current node. If the node was not “seen”, the node is added to the “seen” set while the neighbors are added to the top of the stack. then those neighbors will be checked to be in seen and the process repeats until stack is no longer full.

Probability of connectivity

If the (P)robability is high a graph will be likely connected and if P is low, it will be likely not connected. According to theory, 0.23 is a special quantity that is close to \(\frac{\log n}{n}\) where n is the number of nodes. At this number there is a transition point between very little connectivity and a lot of connectivity. Critical points where behaviour changes will be seen a lot in this course.