- Concepts
- ⚠ Security Warning
- An overview of the repository tests
- Tutorial: Get started developing
- How-to
This is a repository for testing out the behavior of applications operating against a mongodb replica-set backend experiencing replication lag. It was created to rigorously replicate the problems experienced in a back-end at a previous company, and proved a solution in the form of the automatically-created causally consistent sessions described below.
Replication lag is typically a transient phenomenon caused by instabilities in network architecture. In order to stably reproduce the phenomenon, this repository features a docker container stack of mongodb servers configured in a replica-set with a specifiable baseline network latency overhead between the primary and one of the secondaries.
This repository asseses a client's behavior in the presence of replication lag by running db-tests, which are unit tests which maintain a connection against a db and which execute queries against it. The db state is managed between tests to keep it clean.
The unit of code fundamentally susceptible to replication lag is the write-sleep-read (WSR). This consists of a write to a document, on the primary; a sleep stage, in which something else happens; and then a read from the same document, from a secondary. A WSR is causal if the read reflects the write. If replication lag is great enough and the proper mongodb safeguards are not in place, then a WSR can read from a secondary before the effects of the write have been replicated, and the WSR is non-causal.
The laggy replica-set stack contains a service container with the added capability NET_ADMIN
, which is necessary for simulating latency in the stack using linux traffic control. Enabling this capability entails certain risks.
- Establish that WSRs with little sleep can be non-causal in the presence of replication lag.
- Establish that WSRs with sleep can be causal, even in the presence of replication lag.
- (TODO) Establish that WSRs with little sleep but causally consistent sessions are causal in the presence of replication lag.
- Establish that we can use
AsyncLocalStorage
to store causally consistent sessions in the async context.
Take a look at the docker-compose.yaml file declaring the laggy replica-set.
Looking through that file you'll note that some of the services run custom images prefixed with xdsgs/repl-lag/
. These need to be built and present on the host machine before the stack can be deployed. Two images need to be built:
-
xdsgs/repl-lag/proxy
. This image describes a proxy service with built-in traffic control. To build this image run:$ <repo>/dev build/image/proxy
-
xdsgs/repl-lag/replset-init
. This image is used to initialize the replica-set once the mongo servers have spun up. To build this image run:$ <repo>/dev build/image/replset-init
Spinning-up a laggy replica-set is as simple as running:
$ LATENCY_MS=<latency-ms> <repo>/dev start/stack/db
The larger the latency the longer it will take the replica-set to initialize. It should be noted that certain large latencies (on the scale of ~2000ms) can totally ruin the process of initialization. Thus, unfortunately, testable replication lag is capped at around 2s. An improved version of this repository would initialize the replica-set before adding network latency, but this is eaasier said than done.
With an upper bound of 5 minutes, the replica-set should be running (relatively) healthy. To verify this, one can run <repo>/dev status
, and may expect an output like:
$ ./dev status
{ "name": "xdsgs.repl-lag.db.proxy", "created": "About a minute ago", "state": "running", "status": "Up About a minute (healthy)" }
{ "name": "xdsgs.repl-lag.db.primary", "created": "About a minute ago", "state": "running", "status": "Up About a minute (healthy)" }
{ "name": "xdsgs.repl-lag.db.secondary-1", "created": "About a minute ago", "state": "running", "status": "Up About a minute (healthy)" }
{ "name": "xdsgs.repl-lag.db.secondary-0", "created": "About a minute ago", "state": "running", "status": "Up About a minute (healthy)" }
As a secondary means of verification, the terminal running the stack should be outputting logs, and will have (early in the initialization process) output a log line like:
xdsgs.repl-lag.db.replset-init | Replica set initialization complete
xdsgs.repl-lag.db.replset-init exited with code 0
The recommended way to develop is to use the vscode development container provided with this repository. It comes with several vscode extensions pre-configured in the container and attaches to the laggy replica-set proxy's network.
It is feasible to develop on the host machine, however ports on some of the services in the laggy replica-set stack would need to be published to the host.
In the repository root, run:
$ npm install
The testing runtime that the client runs will need to be able to connect to the laggy replica-set proxy in order to run (the replica-set members are all listed). This connection info should be accessible by the runtime in the environment variable MONGO_URI
, which takes the form of a mongo connection string.
Because the vscode development container is on the same docker bridge network as the laggy replica-set proxy, the docker DNS allows us to access it by its container name. Therefore, in that environment, the connection variable should be set as:
$ export MONGO_URI=mongodb://xdsgs.repl-lag.db.proxy:27017/?replicaSet=rs0
Once the client has been prepared, the the tests can be run against the laggy replica-set with the command:
$ <repo>/dev start/db-tests
This will locate all the files in the src
file tree with the suffix dbtest.ts
and run them with ts-jest
.
Tests can alternatively be run with a test-runner stack. This stack relies on the presence of a test-runner docker image, so the first time you run this and every time the tests change you need to rebuild this stack with:
$ <repo>/dev build/image/test-runner
Running the test-runner stack can be accomplished with the command:
$ <repo>/dev start/stack/test-runner
All the images can be built all at once with the command:
$ <repo>/dev build/stack/test
The db stack and the test-runner stack do not need to be managed independently of each other. Assuming that all the constituent images are built, a full stack can be spun-up with the command:
$ LATENCY_MS=<latency-ms> <repo>/dev start/stack/test
The dev
tool in the repository root is a convenience script for executing other development scripts in the <repo>/.devtools
folder. The command:sub-command relationships for the dev
tools directly mirrors the file structure in the .devtools
folder, so to execute a script <repo>/.devtools/my/script.sh
one can run <repo>/dev my/script
. It's a mild convenience.
There is also a help
command, which takes as an argument the path to any other command in the .devtools
folder (eliding the .sh
suffix). So to obtain help with <repo>/.devtools/my/script.sh
one would execute <repo>/dev help my/script
.