2.10. Lecture 9: Scale-free networks

Before this class you should:

  • Read Think Complexity, Chapter 4, and answer the following questions:

    1. The probability mass function (PMF) plotted in Figure 4.1 is a normalized version of what other kind of plot?

    2. What is the continuous analogue of the PMF?

Before next class you should:

  • Read Think Complexity, Chapter 5

Note Taker: Philip Liu

2.10.1. Topics Covered in this Lecture:

The lecture covered many topics:

  • First recapped the previous lecture

  • Learned about specified Ego networks

  • Began an example using SNAP Facebook Circles Dataset (Implemented and tested through juptyer notebooks)

  • Looked at Heavy-tailed Distributions

  • Learned about the Barabàsi-Albert (BA) Generative Model

The other portion of the lecture was looking at the notebook and the code contained within (Chapter 4 Notebook - 5th Lab).

2.10.1.1. Learning to Discover Social Circles in Ego Networks:

Ego Networks are focused on a specific person and looking at the links that connect to that person. Ego networks are networks that center around the ‘ego’, creating a chain structure where all networks revolve around the main ‘ego’.

../_images/EgoNetwork.png

This image visualizes an example of an Ego Network where the ego node (you) is made up of multiple different nodes (the types of people you’ve met) and how all nodes are interconnected.

  • Focus on the community surrounding the ‘ego’

  • 25% of the egos are Nesting Structure (circles within circles)

  • 50% of the egos overlap (blue overlaps red)

  • 25% of the egos are not related at all

2.10.1.2. SNAP Facebook Circles Dataset:

This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. Facebook data was collected from survey participants using a Facebook app. The dataset includes node features (profiles), circles, and ego networks.

  • Search the dataset to look for triplets (collection of 3 nodes) and see if the nodes are connected with either 2 or 3 links

  • If the dataset consists of 3 links then the nodes result in a closed triplet

  • If the dataset consists of 2 links then the nodes result in a open triplet

  • Each set of triplets, whether open or closed, are classified as global clustering coefficients and measures the clustering that has occured in the whole network

The dataset is an example of a group of statistics that make up a large network. The purpose being that during this example, the use of different models that were learned during this lecture along with pervious lectures would show a visual representation of the difference between models.

2.10.1.3. Heavy-tailed Distributions:

Heavy-tailed Distribution are a type of probability distribution that deals with tails that are not exponentially bounded. What this means is that the data set that is used to form the distribution will have the tail-ends be larger.

../_images/HeavyTailDist.png

The image visualizes a Heavy-tailed Distribution that is based on the SNAP Facebook Dataset plotted on a log-log axis.

When plotting the function generated on a log-log axis, it results in a log function. Typically, this function manifests as a straight line, indicating a power law relationship between the largest values of the distribution. Decreasing the degree, symbolized by ‘k’ and representing the total number of edges connected by the nodes, leads to an increase in the probability mass function (PMF) value. PMF values are instrumental in computing the mean and standard deviation of the data. The slope of the line is determined by the alpha coefficient in the equation PMF(k) ~ k^-alpha, implying that alpha increases with the degree (k). Furthermore, for large degree values, a linear relationship can be established after taking the log on both sides of the equation above.

2.10.1.4. Barabàsi-Albert (BA) Generative Model:

Barabàsi and Albert proposed a generative model that generates “scale-free” graphs. It differs from the Watts and Strogratz model in two key ways: To comprehend the two key ways, it’s essential to grasp the concept of a scale-free graph, which adheres to a power law. Firstly, in growth, the BA model doesn’t initiate with a set number of vertices but rather begins with a small graph, incrementally adding vertices. Secondly, preferential attachment dictates that when a new edge forms, it’s more inclined to connect with a vertex already possessing a substantial number of edges, reflecting a “rich get richer” phenomenon. This effect mirrors the growth patterns observed in various real-world networks.

The BA model initiates by establishing a small graph and incrementally adding vertices, one at a time. Unlike random connections, vertices in this model adhere to preferential attachment, where new links are more likely to connect to nodes already possessing a high number of connections. This tendency results in the attachment of new edges to already well-connected nodes, illustrating a characteristic of the BA model. To understand the code structure of the BA model, one can navigate to Jupyter Notebooks. Within the code snippet provided, the iteration loop ensures that nodes are selected only once, even if they offer multiple options within the array. This process guarantees the selection of unique elements until the desired number of nodes, ‘k’, is completed, discarding repeated nodes along the way.

Example of the BA model: If you have 4 nodes, set the number of nodes, n = 7, while the number of edges, k = 4. The 4 nodes are now targets for the next node to come in. Add a new source node (k), the 5th node is going to wire up to each of the 4 initial nodes. Now the 4 nodes will enter the repeated_nodes array once and then the source will be then added k (in this case 4) times making it the most popular node. Now randomly select the targets which will be connected to the new node. Each node with a new edge will be added to the array as well as the new source node. Repeat the steps until all nodes are added to the model.

After learning about BA models, a graph can be formed using the example code below:

import random


def barabasi_albert_graph(n, k, seed=None):
    """Constructs a BA graph.

    n: number of nodes
    k: number of edges for each new node
    seed: random seen
    """
    if seed is not None:
        random.seed(seed)

    G = nx.empty_graph(k)
    targets = set(range(k))
    repeated_nodes = []

    for source in range(k, n):  # starts from k since first k are already in G

        # create k edges from new node source to existing targets
        G.add_edges_from(zip([source]k, targets))

        repeated_nodes.extend(targets)  # add destinations for pref attachment
        repeated_nodes.extend([source] k)  # add source k times

        targets = _random_subset(repeated_nodes, k)

    return G

This code block will produce graphs that are will show the power relation in comparison to the WS-Graphs:

../_images/BAModel.png

2.10.1.5. Explanatory Models

../_images/ExplanatoryModels.png

The image visualizes the Explantory Model for universal complex systems and shows how each attribute is linked.

In a system, many questions are observable and thus an abstraction between a system and a model is needed. This abstraction means the elements between the model and the system are based on each other. The model exhibits a behaviour which is analogous to the observations. All in all the system exhibts observables since the system is similar to the model, the model then exhibits behaviour and the behaviour is ultimately similar to the observable.

2.10.1.6. Too Many Explanations?

  • Small world phenomenon is the principle that we are all linked by short chains

  • WS and BA models both exhibit elements of small-world behaviour

  • The WS model suggests that social networks are “small” because they include both strongly-connected clusters and “weak ties” that connect clusters

  • The BA model suggests that social networks are small because they include nodes with high degree that act as hubs, and that hubs grow, over time, due to preferential attachment

2.10.1.7. Summary:

  • The lectures first topic was comparing the model from the W-S and how it compares against the facebook data

  • This lead to the class learning that the WS model was efficient until the standard deviation degree

  • Introduced the BA Model in order to compare with the WS model since the BA model allowed for a better standard deviation degree

  • The mean clustering coefficient, which measures the degree to which node cluster, of the BA Model graph was not nearly as close to the Facebook or WS models

  • None of the 3 models used in the example were perfect as each model had outliers within the results as seen in the table below:

../_images/SummaryTable.png