Merge pull request #1 from networkx/main

Fetching upsteam changes to notebook-development branch
networkx · Oct 18, 2021 · 610e159 · 610e159
2 parents f14137e + 49cbbfa
commit 610e159
Show file tree

Hide file tree

Showing 23 changed files with 2,606 additions and 80 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -22,7 +22,7 @@ jobs:
           command: |
             python3 -m venv venv
             source venv/bin/activate
-            pip install --upgrade pip wheel setuptools
+            pip install --upgrade wheel setuptools pip
             pip install -r requirements.txt
 
       - run:

diff --git a/.github/workflows/notebooks.yml b/.github/workflows/notebooks.yml
@@ -37,7 +37,7 @@ jobs:
   
     - name: Install dependencies
       run: |
-        pip install --upgrade pip
+        pip install pip==21.1.1
         pip install -r requirements.txt
     - name: Test with nbval
       run: |

diff --git a/content/algorithms/assortativity/correlation.md b/content/algorithms/assortativity/correlation.md
@@ -0,0 +1,265 @@
+---
+jupytext:
+  notebook_metadata_filter: all
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.2
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+language_info:
+  codemirror_mode:
+    name: ipython
+    version: 3
+  file_extension: .py
+  mimetype: text/x-python
+  name: python
+  nbconvert_exporter: python
+  pygments_lexer: ipython3
+  version: 3.8.5
+---
+
+# Node assortativity coefficients and correlation measures
+
+In this tutorial, we will go through the theory of [assortativity](https://en.wikipedia.org/wiki/Assortativity) and its measures.
+
+Specifically, we'll focus on assortativity measures available in NetworkX at [algorithms/assortativity/correlation.py](https://github.com/networkx/networkx/blob/main/networkx/algorithms/assortativity/correlation.py):
+* Attribute assortativity
+* Numeric assortativity
+* Degree assortativity
+
+as well as mixing matrices, which are closely releated to assortativity measures.
+
+## Assortativity
+
+Assortativity in a network refers to the tendency of nodes to connect with
+other 'similar' nodes over 'dissimilar' nodes.
+
+Here we say that two nodes are 'similar' with respect to a property if they have the same value of that property. Properties can be any structural properties like the degree of a node to other properties like weight, or capacity.
+
+Based on these properties we can have a different measure of assortativity for the network.
+On the other hand, we can also have disassortativity, in which case nodes tend
+to connect to dissimilar nodes over similar nodes.
+
+### Assortativity coefficients
+
+Let's say we have a network $N$, $N = (V, E)$  where $V$ is the set of nodes in the network and $E$ is the set of edges/directed edges in the network.
+In addition, $P(v)$ represents a property for each node $v$.
+
+#### Mixing matrix
+
+Let the property $P(v)$ take $P[0],P[1],...P[k-1]$ distinct values on the network,
+then the **mixing matrix** is matrix $M$ such that $M[i][j]$ represents the number of edges from
+nodes with property $P[i]$ to $P[j]$.
+We can normalize mixing matrix by diving by total number of ordered edges i.e.
+$ e = \frac{M}{|E|}$.
+
+Now define,
+
+$a[i]=$ proportion of edges $(u,v)$ such that $P(u)=P[i]$
+
+$$ a[i] = \sum\limits_{j}e[i][j] $$
+
+$b[i]=$ proportion of edges $(u,v)$ such that $P(v)=P[i]$
+
+$$ b[i] = \sum\limits_{j}e[j][i]$$
+
+in Python code it would look something like `a = e.sum(axis=0)` and `b = e.sum(axis=1)`
+
+Finally, let $\sigma_a$ and $\sigma_b$ represent the standard deviation of
+$\{\ P[i]\cdot a[i]\ |\ i \in 0...k-1\}$ and $\{ P[i]\cdot b[i]\ |\ i \in 0...k-1\}$
+respectively.
+
+Then we can define the assortativity coefficient for this property based on the
+Pearson correlation coefficient.
+
+#### Attribute Assortativity Coefficient
+
+Here the property $P(v)$ is a nominal property assigned to each node.
+As defined above we calculate the normalized mixing matrix $e$ and from that we
+define the attribute assortativity coefficient [^1] as below.
+
+From here onwards we will use subscript notation to denote indexing, for eg. $P_i = P[i]$ and $e_{ij} = e[i][j]$
+
+$$ r = \frac{\sum\limits_{i}e_{ii} - \sum\limits_{i}a_{i}b_{i}}{1-\sum\limits_{i}a_{i}b_{i}} = \frac{Trace(e) - ||e^2||}{1-||e^2||}$$
+
+It is implemented as `attribute_assortativity_coefficient`.
+
+#### Numeric Assortativity Coefficient
+
+Here the property $P(v)$ is a numerical property assigned to each
+node and the definition of the normalized mixing
+matrix $e$, $\sigma_a$, and $\sigma_b$ are same as above.
+From these we define numeric assortativity coefficient [^1] as below.
+
+$$ r = \frac{\sum\limits_{i,j}P_i P_j(e_{ij} -a_i b_j)}{\sigma_a\sigma_b} $$
+
+It is implemented as `numeric_assortativity_coefficient`.
+
+#### Degree Assortativity Coefficient
+
+When it comes to measuring degree assortativity for directed networks we have
+more options compared to assortativity w.r.t a property because we have 2 types
+of degrees, namely in-degree and out-degree.
+Based on the 2 types of degrees we can measure $2 \times 2 =4$ different types
+of degree assortativity [^2]:
+
+1. r(in,in) : Measures tendency of having a directed edge (u,v) such that, in-degree(u) = in-degree(v).
+2. r(in,out) : Measures tendency of having a directed edge (u,v) such that, in-degree(u) = out-degree(v).
+3. r(out,in) : Measures tendency of having a directed edge (u,v) such that, out-degree(u) = in-degree(v).
+4. r(out,out) : Measures tendency of having a directed edge (u,v) such that, out-degree(u) = out-degree(v).
+
+Note: If the network is undirected all the 4 types of degree assortativity are the same.
+
+To define the degree assortativity coefficient for all 4 types we need slight
+modification in the definition of $P[i]$ and $e$, and the definations of
+$\sigma_a$ and $\sigma_b$ remain the same.
+
+Let $x,y \in \{in,out\}$. The property $P(\cdot)$ takes distinct values from
+the union of the values taken by $x$-degree$(\cdot)$ and $y$-degree$(\cdot)$,
+and $e_{i,j}$ is the proportion of directed edges $(u,v)$ with $x$-degree$(u) = P_i$
+and $y$-degree$(v) = P_j$.
+
+$$ r(x,y) = \frac{\sum\limits_{i,j}P_i P_j(e_{ij} -a_i b_j)}{\sigma_a\sigma_b} $$
+
+It is implemented as `degree_assortativity_coefficient` and
+`degree_pearson_correlation_coefficient`. The latter function uses
+`scipy.stats.pearsonr` to calculate the assortativity coefficient which makes
+it potentally faster.
+
+## Example
+
+```{code-cell} ipython3
+%matplotlib inline
+import networkx as nx
+import matplotlib.pyplot as plt
+import pickle
+import copy
+import random
+import warnings
+warnings.filterwarnings("ignore")
+```
+
+Illustrating how value of assortativity changes
+
+```{code-cell} ipython3
+gname = "g2"
+# loading the graph
+G = nx.read_graphml(f"data/{gname}.graphml")
+with open(f"data/pos_{gname}", "rb") as fp:
+    pos = pickle.load(fp)
+```
+
+```{code-cell} ipython3
+fig, axes = plt.subplots(4, 2, figsize=(20, 20))
+
+# assign colors and labels to nodes based on their 'cluster' and 'num_prop' property
+node_colors = ["orange" if G.nodes[u]["cluster"] == "K5" else "cyan" for u in G.nodes]
+node_labels = {u: G.nodes[u]["num_prop"] for u in G.nodes}
+
+for i in range(8):
+    g = nx.read_graphml(f"data/{gname}_{i}.graphml")
+
+    # calculating the assortativity coefficients wrt different proeprties
+    cr = nx.attribute_assortativity_coefficient(g, "cluster")
+    r_in_out = nx.degree_assortativity_coefficient(g, x="in", y="out")
+    nr = nx.numeric_assortativity_coefficient(g, "num_prop")
+
+    # drawing the network
+    nx.draw_networkx_nodes(
+        g, pos=pos, node_size=300, ax=axes[i // 2][i % 2], node_color=node_colors
+    )
+    nx.draw_networkx_labels(g, pos=pos, labels=node_labels, ax=axes[i // 2][i % 2])
+    nx.draw_networkx_edges(g, pos=pos, ax=axes[i // 2][i % 2], edge_color="0.7")
+    axes[i // 2][i % 2].set_title(
+        f"Attribute assortativity coefficient = {cr:.3}\nNumeric assortativity coefficient = {nr:.3}\nr(in,out) = {r_in_out:.3}",
+        size=15,
+    )
+
+fig.tight_layout()
+```
+
+Nodes are colored by the `cluster` property and labeled by `num_prop` property.
+We can observe that the initial network on the left side is completely assortative
+and its complement on right side is completely disassortative.
+As we add edges between nodes of different (similar) attributes in the assortative
+(disassortative) network, the network tends to a non-assortative network and
+value of both the assortativity coefficients tends to $0$.
+
++++
+
+The parameter `nodes` in `attribute_assortativity_coefficient` and
+`numeric_assortativity_coefficient` specifies the nodes whose edges are to be
+considered in the mixing matrix calculation.
+That is to say, if $(u,v)$ is a directed edge then the edge $(u,v)$ will be
+used in mixing matrix calculation if $u$ is in `nodes`.
+For the undirected case, it's considered if atleast one of the $u,v$ in in `nodes`.
+
+The `nodes` parameter is interpreted differently in `degree_assortativity_coefficient` and
+`degree_pearson_correlation_coefficient`, where it specifies the nodes forming a subgraph
+whose edges are considered in the mixing matrix calculation.
+
+```{code-cell} ipython3
+# list of nodes to consider for the i'th network in the example
+# Note: passing 'None' means to consider all the nodes
+nodes_list = [
+    None,
+    [str(i) for i in range(3)],
+    [str(i) for i in range(4)],
+    [str(i) for i in range(5)],
+    [str(i) for i in range(4, 8)],
+    [str(i) for i in range(5, 10)],
+]
+fig, axes = plt.subplots(3, 2, figsize=(20, 16))
+
+
+def color_node(u, nodes):
+    """Utility function to give the color of a node based on its attribute"""
+    if u not in nodes:
+        return "0.85"
+    if G.nodes[u]["cluster"] == "K5":
+        return "orange"
+    else:
+        return "cyan"
+
+
+# adding a edge to show edge cases
+G.add_edge("4", "5")
+
+for nodes, ax in zip(nodes_list, axes.ravel()):
+    # calculating the value of assortativity
+    cr = nx.attribute_assortativity_coefficient(G, "cluster", nodes=nodes)
+    nr = nx.numeric_assortativity_coefficient(G, "num_prop", nodes=nodes)
+
+    # drawing network
+    ax.set_title(
+        f"Attribute assortativity coefficient: {cr:.3}\nNumeric assortativity coefficient: {nr:.3}\nNodes = {nodes}",
+        size=15,
+    )
+    if nodes is None:
+        nodes = [u for u in G.nodes()]
+    node_colors = [color_node(u, nodes) for u in G.nodes]
+    nx.draw_networkx_nodes(G, pos=pos, node_size=450, ax=ax, node_color=node_colors)
+    nx.draw_networkx_labels(G, pos, labels={u: u for u in G.nodes}, font_size=15, ax=ax)
+    nx.draw_networkx_edges(
+        G,
+        pos=pos,
+        edgelist=[(u, v) for u, v in G.edges if u in nodes],
+        ax=ax,
+        edge_color="0.3",
+    )
+fig.tight_layout()
+```
+
+In the above plots only the nodes which are considered are colored and rest are
+grayed out and only the edges which are considerd in the assortaivty calculation
+are drawn.
+
++++
+
+[^1]: M. E. J. Newman, Mixing patterns in networks <https://doi.org/10.1103/PhysRevE.67.026126>
+
+[^2]: Foster, J.G., Foster, D.V., Grassberger, P. & Paczuski, M. Edge direction and the structure of networks <https://doi.org/10.1073/pnas.0912671107>