Decentralised NAT Punch-Through Simulation #372

emmacasolin · 2022-04-13T06:36:48Z

Description

Decentralised NAT requires all nodes to be part of a connected network, such that each node can always be reached by following connections across the network without the necessity of centralised seed nodes. We want to achieve this while minimising the number of active connections that need to be maintained across the network. We can prototype this through visualisations and simulations, such as ngraph (for creating graph structures) and d3 (for visualisation).

Issues Fixed

Relates to Decentralised NAT Signalling #365

Tasks

1. Simulate and visualise the node graph using ngraph/d3, allowing for rapid prototyping
2. Determine the most efficient arrangement of connections to create a fully connected graph (preferably supported with logical proof)

emmacasolin · 2022-04-13T06:41:54Z

The script in test-graph.ts can be used to create a graph of Nodes. It uses a modified version of the script here #326 (comment) to create a set number of node ids, and these are then put into an ngraph graph. You can set the number of "closest nodes" each node should connect to as well.

The script will output the graph to JSON, however, to visualise it better you can use https://observablehq.com/@d3/force-directed-graph#ForceGraph - it will display the JSON file as a graph.

CMCDragonkai · 2022-04-13T08:44:48Z

I can corroborate that top-K always results in clustering. Proven with:

import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';

type NodeGraph = Array<[number, NodeId, number, bigint]>;

// 1 byte node ids
function generateNodeIds(amount: number) {
  if (amount < 0 || amount > 256) { throw new RangeError() };
  const nodeIds: Array<NodeId> = Array.from(
    { length: amount },
    (_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
  );
  return nodeIds;
}

function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
  // index, node ID, bucket index, distance
  const results: Array<[
    number, NodeId, number, bigint
  ]> = [];
  for (let i = 0; i < nodeIds.length; i++) {
    if (nodeId.equals(nodeIds[i])) {
      continue;
    }
    let bucketIndex;
    let distance;
    bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
    distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
    results.push(
      [
        i,
        nodeIds[i],
        bucketIndex,
        distance
      ]
    );
  }
  return results;
}

function closestNodes(nodeGraph: NodeGraph, limit: number): NodeGraph {
  const resultsSorted = [...nodeGraph].sort(([, , , distance1], [, , , distance2]) => {
    if (distance1 < distance2) return -1;
    if (distance1 > distance2) return 1;
    return 0;
  });
  const closestK = resultsSorted.slice(0, limit);
  return closestK;
}


async function main () {
  const visitedNodes = new Set<number>();
  const pendingNodes: Array<[number, NodeId]> = [];
  const nodeIds = generateNodeIds(256);

  const K = 128;

  const nodeGraph1 = calculateNodeGraph(nodeIds, nodeIds[77]);
  const closestK1 = closestNodes(nodeGraph1, K);

  for (const [index,nodeId] of closestK1) {
    pendingNodes.push([index, nodeId]);
  }

  while (pendingNodes.length > 0) {
    const [index, nodeId] = pendingNodes.shift() as [number, NodeId];
    visitedNodes.add(index);
    const nodeGraph = calculateNodeGraph(nodeIds, nodeId);
    const closestK = closestNodes(nodeGraph, K);
    for (const [index, nodeId] of closestK) {
      if (!visitedNodes.has(index)) pendingNodes.push([index, nodeId]);
    }
  }

  console.log(visitedNodes);
  console.log(visitedNodes.size);

}

main();

Adjust the K above to see how many connected nodes there are the very end.

Assuming 1 byte NodeIds which means 8 bits and 256 possible NodeIds.

At K = 128, you get 1 cluster of size 256. That means full connectivity.

At K < 128 you will get more than 1 cluster. The cluster sizes become 128, 64, 32, 16, 8, 4, 2... etc.

Top-K	Cluster Size
256	256
...	256
129	256
128	256
127	128
...	128
64	128
63	64
...	64
32	64
31	32
...	32
16	32
15	16
...	16
8	16
7	8
...	8
4	8
3	4
2	4
1	2

Basically this means top-K strategy can only ensure full connectivity if you connect at least half of all possible node IDs.

CMCDragonkai · 2022-04-13T09:12:41Z

Trying out "bottom-K" strategy.

import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';

type NodeGraph = Array<[number, NodeId, number, bigint]>;

// 1 byte node ids
function generateNodeIds(amount: number) {
  if (amount < 0 || amount > 256) { throw new RangeError() };
  const nodeIds: Array<NodeId> = Array.from(
    { length: amount },
    (_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
  );
  return nodeIds;
}

function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
  // index, node ID, bucket index, distance
  const results: Array<[
    number, NodeId, number, bigint
  ]> = [];
  for (let i = 0; i < nodeIds.length; i++) {
    if (nodeId.equals(nodeIds[i])) {
      continue;
    }
    let bucketIndex;
    let distance;
    bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
    distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
    results.push(
      [
        i,
        nodeIds[i],
        bucketIndex,
        distance
      ]
    );
  }
  return results;
}

function farthestNodes(nodeGraph: NodeGraph, limit: number): NodeGraph {
  const resultsSorted = [...nodeGraph].sort(([, , , distance1], [, , , distance2]) => {
    if (distance1 < distance2) return 1;
    if (distance1 > distance2) return -1;
    return 0;
  });
  const closestK = resultsSorted.slice(0, limit);
  return closestK;
}

async function main () {
  const visitedNodes = new Set<number>();
  const pendingNodes = new Set<number>();
  const nodeIds = generateNodeIds(256);

  const K = 65;

  const nodeGraph1 = calculateNodeGraph(nodeIds, nodeIds[77]);
  const closestK1 = farthestNodes(nodeGraph1, K);

  for (const [index,nodeId] of closestK1) {
    pendingNodes.add(index);
  }

  while (pendingNodes.size > 0) {
    const [index] = pendingNodes;
    pendingNodes.delete(index);

    visitedNodes.add(index);
    const nodeGraph = calculateNodeGraph(nodeIds, nodeIds[index]);
    const closestK = farthestNodes(nodeGraph, K);
    for (const [index, nodeId] of closestK) {
      if (!visitedNodes.has(index)) pendingNodes.add(index);
    }
  }

  console.log(visitedNodes);
  console.log(visitedNodes.size);

}

main();

Here you only need bottom-K of 65 to get full connectivity of 256. However bottom-K isn't very aligned with our kademlia system.

CMCDragonkai · 2022-04-13T09:15:12Z

Next thing to try out @emmacasolin would be a mix of top K and bottom K.

@tegefaulkes also suggested random K, as in just choose a random selection of node IDs.

Furthermore this is all in the ideal case where every node has the complete node graph and all node IDs are utilised.

In production, nodes do not have the complete node graph, and not all node IDs are utilised, in fact node IDs are "used" at random. So we can add these constraints on top after figuring out what maintains connectivity in the ideal situation.

CMCDragonkai · 2022-04-13T09:27:32Z

Trying out the random-K strategy seems to work REALLY NICELY!

import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';

type NodeGraph = Array<[number, NodeId, number, bigint]>;

// 1 byte node ids
function generateNodeIds(amount: number) {
  if (amount < 0 || amount > 256) { throw new RangeError() };
  const nodeIds: Array<NodeId> = Array.from(
    { length: amount },
    (_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
  );
  return nodeIds;
}

function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
  // index, node ID, bucket index, distance
  const results: Array<[
    number, NodeId, number, bigint
  ]> = [];
  for (let i = 0; i < nodeIds.length; i++) {
    if (nodeId.equals(nodeIds[i])) {
      continue;
    }
    let bucketIndex;
    let distance;
    bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
    distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
    results.push(
      [
        i,
        nodeIds[i],
        bucketIndex,
        distance
      ]
    );
  }
  return results;
}

function randomNodes(nodeGraph: NodeGraph, limit: number): NodeGraph {
  const results: NodeGraph = [];
  const usedJs = new Set<number>();
  for (let i = 0; i < limit; i++) {
    let j;
    while (true) {
      j = Math.floor(Math.random() * nodeGraph.length);
      if (!usedJs.has(j)) break;
    }
    usedJs.add(j);
    results.push(nodeGraph[j]);
  }
  return results;
}

async function main () {
  const visitedNodes = new Set<number>();
  const pendingNodes = new Set<number>();
  const nodeIds = generateNodeIds(256);

  const K = 6;

  const nodeGraph1 = calculateNodeGraph(nodeIds, nodeIds[77]);
  const closestK1 = randomNodes(nodeGraph1, K);

  for (const [index,nodeId] of closestK1) {
    pendingNodes.add(index);
  }

  while (pendingNodes.size > 0) {
    const [index] = pendingNodes;
    pendingNodes.delete(index);

    visitedNodes.add(index);
    const nodeGraph = calculateNodeGraph(nodeIds, nodeIds[index]);
    const closestK = randomNodes(nodeGraph, K);
    for (const [index, nodeId] of closestK) {
      if (!visitedNodes.has(index)) pendingNodes.add(index);
    }
  }

  console.log(visitedNodes);
  console.log(visitedNodes.size);

}

main();

Even with just 6 random connections, we often get 256 full connectivity.

This must be a statistical question. If every single person knew 6 random people in society, what is the probability that everybody knows everybody transitively? Someone has probably worked out a formula for this.

CMCDragonkai · 2022-04-13T09:55:21Z

Increasing the number of possible node IDs requires the random K number to be larger to ensure reduce the probability of clustering. With 2 byte node IDs, we now have 65536 possible node IDs. Here I find that top K of 10 is not enough to ensure full connectivity, but top K of 20 is quite enough.

import type { NodeId } from './src/nodes/types';
import { IdInternal } from '@matrixai/id';
import * as utils from './src/utils';
import * as nodesUtils from './src/nodes/utils';

type NodeGraph = Array<[number, NodeId, number, bigint]>;

// 2 byte node ids
function generateNodeIds(amount: number) {
  if (amount < 0 || amount > 65536) { throw new RangeError() };
  const nodeIds: Array<NodeId> = Array.from(
    { length: amount },
    (_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 2))
  );
  return nodeIds;
}

function calculateNodeGraph(nodeIds: Array<NodeId>, nodeId: NodeId): NodeGraph {
  // index, node ID, bucket index, distance
  const results: Array<[
    number, NodeId, number, bigint
  ]> = [];
  for (let i = 0; i < nodeIds.length; i++) {
    if (nodeId.equals(nodeIds[i])) {
      continue;
    }
    let bucketIndex;
    let distance;
    bucketIndex = nodesUtils.bucketIndex(nodeId, nodeIds[i]);
    distance = nodesUtils.nodeDistance(nodeId, nodeIds[i]);
    results.push(
      [
        i,
        nodeIds[i],
        bucketIndex,
        distance
      ]
    );
  }
  return results;
}

function randomNodes(nodeIds: Array<NodeId>, limit: number, ownNodeId: NodeId): Array<[number, NodeId]> {
  const results: Array<[number, NodeId]> = [];
  const usedJs = new Set<number>();
  for (let i = 0; i < limit; i++) {
    let j;
    while (true) {
      j = Math.floor(Math.random() * nodeIds.length);
      if (nodeIds[j].equals(ownNodeId)) continue;
      if (!usedJs.has(j)) break;
    }
    usedJs.add(j);
    results.push([j, nodeIds[j]]);
  }
  return results;
}

async function main () {

  const visitedNodes = new Set<number>();
  const pendingNodes = new Set<number>();
  const nodeIds = generateNodeIds(65536);

  const K = 14;

  const randomK1 = randomNodes(nodeIds, K, nodeIds[77]);

  for (const [index] of randomK1) {
    pendingNodes.add(index);
  }

  while (pendingNodes.size > 0) {
    const [index] = pendingNodes;
    pendingNodes.delete(index);
    visitedNodes.add(index);
    const randomK = randomNodes(nodeIds, K, nodeIds[index]);
    for (const [index] of randomK) {
      if (!visitedNodes.has(index)) pendingNodes.add(index);
    }
  }

  console.log(visitedNodes.size);

}

main();

With 32 byte or 256 bit node IDs, this becomes even more significant. At this point simulation won't help.

We will need to work analytically the probability relationship.

Some resources:

emmacasolin · 2022-04-14T05:59:57Z

Moving forward with the "Random K" approach, the first step is to run some simulations to determine how many connections we need per node for different densities of nodes. We know that in a real deployment situation the chance of every node id being in use at one time is practically 0, so we need a solution that works for low densities of nodes but that can also be scaled as the Polykey network grows with more users.

Simulations

For all of these simulations, the node ids are set to 1 byte (i.e. there are 256 possible node ids). Each simulation was run 5 times and the results below are averages from these. The number of nodes are the row headings and the number of connections each node attempts to make are the column headings. The data is the average number of disconnected nodes from the main cluster. Note it's the number of attempted connections, since a node may try to connect to a node id that has not been assigned to a node, in which case that connection won't be made. The average number of successful connections per node for each simulation is included in the full data at the bottom.

WIP of data so far

	1	2	3	4	5	6	7	8	9	10
100		42	10	5	2	0	-	-	-	-

Full data from simulations:

For each simulation, I've calculated the average number of (outgoing) connections each node holds as well as the rate of connectedness among the nodes (number of nodes - disconnected nodes / number of nodes).

100 Nodes, 2 Conns

Run No.	Total conns	Num disconnected nodes
1	75	36
2	68	46
3	87	39
4	79	36
5	72	54

Average conns per node = 0.8; Average connectedness = 0.58

100 Nodes, 3 Conns

Run No.	Total conns	Num disconnected nodes
1	115	9
2	121	17
3	134	8
4	143	3
5	115	12

Average conns per node = 1.3; Average connectedness = 0.9

100 Nodes, 4 Conns

Run No.	Total conns	Num disconnected nodes
1	167	2
2	132	5
3	150	7
4	159	7
5	148	3

Average conns per node = 1.5; Average connectedness = 0.95

100 Nodes, 5 Conns

Run No.	Total conns	Num disconnected nodes
1	185	2
2	207	0
3	190	2
4	184	2
5	181	3

Average conns per node = 1.9; Average connectedness = 0.98

100 Nodes, 6 Conns

Run No.	Total conns	Num disconnected nodes
1	241	0
2	247	0
3	249	0
4	231	0
5	233	1

Average conns per node = 2.4; Average connectedness = 1 (0.998)

CMCDragonkai · 2023-11-02T22:54:52Z

Part of the reason random nodes work so well is that majority of nodes that would exist in a complete NG would be in the farthest bucket. 50% of the node ID space would exist in the farthest bucket.

So most of the time you're getting the farthest bucket connections. Actually random-K means likely 50% of that random-K will come from the farthest bucket, in this case bucket 7.

See these are a few time's I rolled the random K:

[
  [ 65, IdInternal(1) [Uint8Array] [ 65 ], 3, 12n ],
  [ 76, IdInternal(1) [Uint8Array] [ 76 ], 0, 1n ],
  [ 172, IdInternal(1) [Uint8Array] [ 172 ], 7, 225n ],
  [ 113, IdInternal(1) [Uint8Array] [ 113 ], 5, 60n ],
  [ 24, IdInternal(1) [Uint8Array] [ 24 ], 6, 85n ],
  [ 164, IdInternal(1) [Uint8Array] [ 164 ], 7, 233n ]
]

[
  [ 131, IdInternal(1) [Uint8Array] [ 131 ], 7, 206n ],
  [ 146, IdInternal(1) [Uint8Array] [ 146 ], 7, 223n ],
  [ 174, IdInternal(1) [Uint8Array] [ 174 ], 7, 227n ],
  [ 191, IdInternal(1) [Uint8Array] [ 191 ], 7, 242n ],
  [ 23, IdInternal(1) [Uint8Array] [ 23 ], 6, 90n ],
  [ 1, IdInternal(1) [Uint8Array] [ 1 ], 6, 76n ]
]

Half of all nodes in such a NG would be located in bucket index 7 at a high distance.

However for most nodes when asking for the 20 closest nodes to fill up their NG at the beginning would mostly fill up at the beginning nodes that are closest to them and make connections to them.

Therefore selecting randomly here is not truely representative. It might be of the seed nodes which get connections from all possible nodes first, but nodes by themselves aren't filling up their node graph in a uniform way.

So for random-K to work, would we argue that all nodes shouldn't be necessarily asking for closest nodes, but also random nodes to fill up and by asking for random nodes, we would necessarily end up getting farther nodes too?

CMCDragonkai · 2023-11-07T02:39:18Z

Closing this as superseded by #618. Remaining code and comments here can still be used for the research in #365.

Visualising node connections

3696419

emmacasolin self-assigned this Apr 13, 2022

Setup for testing random connection configurations

0bf3ba0

CMCDragonkai mentioned this pull request Nov 1, 2023

Decentralised NAT Signalling #365

Closed

CMCDragonkai mentioned this pull request Nov 7, 2023

Decentralized Signalling #618

Merged

22 tasks

CMCDragonkai closed this Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decentralised NAT Punch-Through Simulation #372

Decentralised NAT Punch-Through Simulation #372

emmacasolin commented Apr 13, 2022

emmacasolin commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022 •

edited

Loading

CMCDragonkai commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022

emmacasolin commented Apr 14, 2022

100 Nodes, 2 Conns

100 Nodes, 3 Conns

100 Nodes, 4 Conns

100 Nodes, 5 Conns

100 Nodes, 6 Conns

CMCDragonkai commented Nov 2, 2023 •

edited

Loading

CMCDragonkai commented Nov 7, 2023

Decentralised NAT Punch-Through Simulation #372

Decentralised NAT Punch-Through Simulation #372

Conversation

emmacasolin commented Apr 13, 2022

Description

Issues Fixed

Tasks

emmacasolin commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022 • edited Loading

CMCDragonkai commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022

CMCDragonkai commented Apr 13, 2022

emmacasolin commented Apr 14, 2022

Simulations

100 Nodes, 2 Conns

100 Nodes, 3 Conns

100 Nodes, 4 Conns

100 Nodes, 5 Conns

100 Nodes, 6 Conns

CMCDragonkai commented Nov 2, 2023 • edited Loading

CMCDragonkai commented Nov 7, 2023

CMCDragonkai commented Apr 13, 2022 •

edited

Loading

CMCDragonkai commented Nov 2, 2023 •

edited

Loading