Supplementary data for the paper On the Structural Properties of Social Networks and their Measurement-calibrated Synthetic Counterparts - M. Nagy, R. Molontay (2019)
@inproceedings{nagy2019structural,
title={On the Structural Properties of Social Networks and their Measurement-calibrated Synthetic Counterparts},
author={Nagy, Marcell and Molontay, Roland},
booktitle={2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)},
pages={584--588},
year={2019},
organization={IEEE}
}
The graphs are collected from the following sources:
- Network Repository,
- The Colorado Index of Complex Networks (ICON),
- KONECT - The Koblenz Network Collection
Domain | Description | Range of network size | Number of networks |
---|---|---|---|
Friendship | Online social networks of friendships (mostly Facebook) |
324-23,613 (median: 4,065) |
58 |
Communication | Retweet, email and reply networks |
96-33,696 (avg: 4,687 |
43 |
Collaboration | Co-authorship and collaboration networks (mostly scientific) |
86-21,363 (median: 553) |
19 |
The data folder contains the spreadsheets which contain the calculated metrics of the 120 real networks.
The calculated metrics are the following:
- Assortativity (the Pearson correlation coefficient of degree between pairs of linked nodes)
The assortativity coefficient is given bywhere the term
is the mass function of the distribution of the remaining degrees (degree of the nodes minus one) and j and k indicate the remaining degrees. Furthermore,
refers to the mass function of the joint probability distribution of the remaining degrees of the two vertices. Finally,
denotes the variance of the remaining degree distribution with mass function
i.e.
- Average clustering coefficient (the average local clustering coefficient. The local clustering coefficient of a node quantifies how close its neighbours are to being a clique)
The local clustering coefficient of vertex v is the fraction of pairs of neighbors of v that are connected over all pairs of neighbors of v. Formally:where
is the neighbourhood of the node v i.e. the vertices adjacent to v.
The average (local) clustering coefficient of a G graph is defined as:
where n is the size of the graph.
- Average degree (the mean of the degrees)
- Average path length (average number of steps along the shortest paths for all possible pairs of nodes)
A path is a sequence of edges which connect a sequence of vertices. The distancebetween the vertices u and v is the length (number of edges) of the shortest path connecting them. The
average path length of a graph G of size n is defined as:
- Density (measures how dense the graph is)
Graph density D is the ratio of the number of edges of the graph divided by the number of edges of a complete graph with the same number of vertices, i.e: - Global clustering coefficient (the number of closed triplets over the total number of triplets (both open and closed)),
- Four interval degree probabilities introduced in this paper (quantifies the degree distribtuion of the graph)
- Largest eigenvector centrality (eigenvector centrality is a measure of the influence of a node in a network),
- Maximum degree (the maximum of the degrees)
For a graph, the vector of eigenvector centralitiessatisfies the eigenvector equation
, where
is the largest eigenvalue of the graph's adjacency matrix
. In other words, for a connected undirected graph, the vector of eigenvector centralities is given by the (suitably normalized) eigenvector of corresponding to the largest eigenvalue of the adjacency matrix.
- Maximum vertex betweenness centrality (betweenness centrality is a measure of centrality in a graph based on shortest paths)
The betweenness centrality of a node v is given by the expression:where
is the total number of shortest paths from node s to node t and
is the number of those paths that pass through v.
- Maximum edge betweenness centrality (betweenness centrality is a measure of centrality in a graph based on shortest paths)
The edge betweenness centrality of an edge is the number of shortest paths between pairs of vertices that run along it. In other words this is analogous to the previously defined, but here we consider an edge instead of a node.
- Number of edges
- Number of nodes (size of the network)
- Pseudo diameter (diameter is the greatest distance between any pair of vertices. Pseudo diameter is an approximation of the diameter)
The correlation heatmaps of the metrics together with some analysis can be found here.
A detailed description of the dataset and the metrics can be found in Data-driven Analysis of Complex Networks and their Model-generated Counterparts