Skip to content

Commit

Permalink
Merge pull request #1 from networkx/main
Browse files Browse the repository at this point in the history
Fetching upsteam changes to notebook-development branch
  • Loading branch information
stef4k authored Oct 18, 2021
2 parents f14137e + 49cbbfa commit 610e159
Show file tree
Hide file tree
Showing 23 changed files with 2,606 additions and 80 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
command: |
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel setuptools
pip install --upgrade wheel setuptools pip
pip install -r requirements.txt
- run:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- name: Install dependencies
run: |
pip install --upgrade pip
pip install pip==21.1.1
pip install -r requirements.txt
- name: Test with nbval
run: |
Expand Down
265 changes: 265 additions & 0 deletions content/algorithms/assortativity/correlation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
---
jupytext:
notebook_metadata_filter: all
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.2
kernelspec:
display_name: Python 3
language: python
name: python3
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.8.5
---

# Node assortativity coefficients and correlation measures

In this tutorial, we will go through the theory of [assortativity](https://en.wikipedia.org/wiki/Assortativity) and its measures.

Specifically, we'll focus on assortativity measures available in NetworkX at [algorithms/assortativity/correlation.py](https://github.com/networkx/networkx/blob/main/networkx/algorithms/assortativity/correlation.py):
* Attribute assortativity
* Numeric assortativity
* Degree assortativity

as well as mixing matrices, which are closely releated to assortativity measures.

## Assortativity

Assortativity in a network refers to the tendency of nodes to connect with
other 'similar' nodes over 'dissimilar' nodes.

Here we say that two nodes are 'similar' with respect to a property if they have the same value of that property. Properties can be any structural properties like the degree of a node to other properties like weight, or capacity.

Based on these properties we can have a different measure of assortativity for the network.
On the other hand, we can also have disassortativity, in which case nodes tend
to connect to dissimilar nodes over similar nodes.

### Assortativity coefficients

Let's say we have a network $N$, $N = (V, E)$ where $V$ is the set of nodes in the network and $E$ is the set of edges/directed edges in the network.
In addition, $P(v)$ represents a property for each node $v$.

#### Mixing matrix

Let the property $P(v)$ take $P[0],P[1],...P[k-1]$ distinct values on the network,
then the **mixing matrix** is matrix $M$ such that $M[i][j]$ represents the number of edges from
nodes with property $P[i]$ to $P[j]$.
We can normalize mixing matrix by diving by total number of ordered edges i.e.
$ e = \frac{M}{|E|}$.

Now define,

$a[i]=$ proportion of edges $(u,v)$ such that $P(u)=P[i]$

$$ a[i] = \sum\limits_{j}e[i][j] $$

$b[i]=$ proportion of edges $(u,v)$ such that $P(v)=P[i]$

$$ b[i] = \sum\limits_{j}e[j][i]$$

in Python code it would look something like `a = e.sum(axis=0)` and `b = e.sum(axis=1)`

Finally, let $\sigma_a$ and $\sigma_b$ represent the standard deviation of
$\{\ P[i]\cdot a[i]\ |\ i \in 0...k-1\}$ and $\{ P[i]\cdot b[i]\ |\ i \in 0...k-1\}$
respectively.

Then we can define the assortativity coefficient for this property based on the
Pearson correlation coefficient.

#### Attribute Assortativity Coefficient

Here the property $P(v)$ is a nominal property assigned to each node.
As defined above we calculate the normalized mixing matrix $e$ and from that we
define the attribute assortativity coefficient [^1] as below.

From here onwards we will use subscript notation to denote indexing, for eg. $P_i = P[i]$ and $e_{ij} = e[i][j]$

$$ r = \frac{\sum\limits_{i}e_{ii} - \sum\limits_{i}a_{i}b_{i}}{1-\sum\limits_{i}a_{i}b_{i}} = \frac{Trace(e) - ||e^2||}{1-||e^2||}$$

It is implemented as `attribute_assortativity_coefficient`.

#### Numeric Assortativity Coefficient

Here the property $P(v)$ is a numerical property assigned to each
node and the definition of the normalized mixing
matrix $e$, $\sigma_a$, and $\sigma_b$ are same as above.
From these we define numeric assortativity coefficient [^1] as below.

$$ r = \frac{\sum\limits_{i,j}P_i P_j(e_{ij} -a_i b_j)}{\sigma_a\sigma_b} $$

It is implemented as `numeric_assortativity_coefficient`.

#### Degree Assortativity Coefficient

When it comes to measuring degree assortativity for directed networks we have
more options compared to assortativity w.r.t a property because we have 2 types
of degrees, namely in-degree and out-degree.
Based on the 2 types of degrees we can measure $2 \times 2 =4$ different types
of degree assortativity [^2]:

1. r(in,in) : Measures tendency of having a directed edge (u,v) such that, in-degree(u) = in-degree(v).
2. r(in,out) : Measures tendency of having a directed edge (u,v) such that, in-degree(u) = out-degree(v).
3. r(out,in) : Measures tendency of having a directed edge (u,v) such that, out-degree(u) = in-degree(v).
4. r(out,out) : Measures tendency of having a directed edge (u,v) such that, out-degree(u) = out-degree(v).

Note: If the network is undirected all the 4 types of degree assortativity are the same.

To define the degree assortativity coefficient for all 4 types we need slight
modification in the definition of $P[i]$ and $e$, and the definations of
$\sigma_a$ and $\sigma_b$ remain the same.

Let $x,y \in \{in,out\}$. The property $P(\cdot)$ takes distinct values from
the union of the values taken by $x$-degree$(\cdot)$ and $y$-degree$(\cdot)$,
and $e_{i,j}$ is the proportion of directed edges $(u,v)$ with $x$-degree$(u) = P_i$
and $y$-degree$(v) = P_j$.

$$ r(x,y) = \frac{\sum\limits_{i,j}P_i P_j(e_{ij} -a_i b_j)}{\sigma_a\sigma_b} $$

It is implemented as `degree_assortativity_coefficient` and
`degree_pearson_correlation_coefficient`. The latter function uses
`scipy.stats.pearsonr` to calculate the assortativity coefficient which makes
it potentally faster.

## Example

```{code-cell} ipython3
%matplotlib inline
import networkx as nx
import matplotlib.pyplot as plt
import pickle
import copy
import random
import warnings
warnings.filterwarnings("ignore")
```

Illustrating how value of assortativity changes

```{code-cell} ipython3
gname = "g2"
# loading the graph
G = nx.read_graphml(f"data/{gname}.graphml")
with open(f"data/pos_{gname}", "rb") as fp:
pos = pickle.load(fp)
```

```{code-cell} ipython3
fig, axes = plt.subplots(4, 2, figsize=(20, 20))
# assign colors and labels to nodes based on their 'cluster' and 'num_prop' property
node_colors = ["orange" if G.nodes[u]["cluster"] == "K5" else "cyan" for u in G.nodes]
node_labels = {u: G.nodes[u]["num_prop"] for u in G.nodes}
for i in range(8):
g = nx.read_graphml(f"data/{gname}_{i}.graphml")
# calculating the assortativity coefficients wrt different proeprties
cr = nx.attribute_assortativity_coefficient(g, "cluster")
r_in_out = nx.degree_assortativity_coefficient(g, x="in", y="out")
nr = nx.numeric_assortativity_coefficient(g, "num_prop")
# drawing the network
nx.draw_networkx_nodes(
g, pos=pos, node_size=300, ax=axes[i // 2][i % 2], node_color=node_colors
)
nx.draw_networkx_labels(g, pos=pos, labels=node_labels, ax=axes[i // 2][i % 2])
nx.draw_networkx_edges(g, pos=pos, ax=axes[i // 2][i % 2], edge_color="0.7")
axes[i // 2][i % 2].set_title(
f"Attribute assortativity coefficient = {cr:.3}\nNumeric assortativity coefficient = {nr:.3}\nr(in,out) = {r_in_out:.3}",
size=15,
)
fig.tight_layout()
```

Nodes are colored by the `cluster` property and labeled by `num_prop` property.
We can observe that the initial network on the left side is completely assortative
and its complement on right side is completely disassortative.
As we add edges between nodes of different (similar) attributes in the assortative
(disassortative) network, the network tends to a non-assortative network and
value of both the assortativity coefficients tends to $0$.

+++

The parameter `nodes` in `attribute_assortativity_coefficient` and
`numeric_assortativity_coefficient` specifies the nodes whose edges are to be
considered in the mixing matrix calculation.
That is to say, if $(u,v)$ is a directed edge then the edge $(u,v)$ will be
used in mixing matrix calculation if $u$ is in `nodes`.
For the undirected case, it's considered if atleast one of the $u,v$ in in `nodes`.

The `nodes` parameter is interpreted differently in `degree_assortativity_coefficient` and
`degree_pearson_correlation_coefficient`, where it specifies the nodes forming a subgraph
whose edges are considered in the mixing matrix calculation.

```{code-cell} ipython3
# list of nodes to consider for the i'th network in the example
# Note: passing 'None' means to consider all the nodes
nodes_list = [
None,
[str(i) for i in range(3)],
[str(i) for i in range(4)],
[str(i) for i in range(5)],
[str(i) for i in range(4, 8)],
[str(i) for i in range(5, 10)],
]
fig, axes = plt.subplots(3, 2, figsize=(20, 16))
def color_node(u, nodes):
"""Utility function to give the color of a node based on its attribute"""
if u not in nodes:
return "0.85"
if G.nodes[u]["cluster"] == "K5":
return "orange"
else:
return "cyan"
# adding a edge to show edge cases
G.add_edge("4", "5")
for nodes, ax in zip(nodes_list, axes.ravel()):
# calculating the value of assortativity
cr = nx.attribute_assortativity_coefficient(G, "cluster", nodes=nodes)
nr = nx.numeric_assortativity_coefficient(G, "num_prop", nodes=nodes)
# drawing network
ax.set_title(
f"Attribute assortativity coefficient: {cr:.3}\nNumeric assortativity coefficient: {nr:.3}\nNodes = {nodes}",
size=15,
)
if nodes is None:
nodes = [u for u in G.nodes()]
node_colors = [color_node(u, nodes) for u in G.nodes]
nx.draw_networkx_nodes(G, pos=pos, node_size=450, ax=ax, node_color=node_colors)
nx.draw_networkx_labels(G, pos, labels={u: u for u in G.nodes}, font_size=15, ax=ax)
nx.draw_networkx_edges(
G,
pos=pos,
edgelist=[(u, v) for u, v in G.edges if u in nodes],
ax=ax,
edge_color="0.3",
)
fig.tight_layout()
```

In the above plots only the nodes which are considered are colored and rest are
grayed out and only the edges which are considerd in the assortaivty calculation
are drawn.

+++

[^1]: M. E. J. Newman, Mixing patterns in networks <https://doi.org/10.1103/PhysRevE.67.026126>

[^2]: Foster, J.G., Foster, D.V., Grassberger, P. & Paczuski, M. Edge direction and the structure of networks <https://doi.org/10.1073/pnas.0912671107>
Loading

0 comments on commit 610e159

Please sign in to comment.