Skip to content

Clustered Alarm System Example

Ryan Slominski edited this page May 19, 2022 · 26 revisions

Extend JAWS with EPICS alarms in a cluster of 12 containers.

Before beginning make sure to navigate to the cluster example directory:

cd examples/cluster

Finally, check connect status

Docker Compose

Note: This example uses a single node with docker-compose. For true fault-tolerance and scalability you'll need to deploy the containers across multiple nodes using a container orchestration tool such as Docker Swarm or Kubernetes.

  1. Launch Zookeeper "ensemble" (3):
docker-compose -f zookeeper.yml up

Wait for them to come up!

  1. Launch Kafka nodes (3):
docker-compose -f kafka.yml up

Wait for them to come up!

  1. Launch alarm/support nodes:
docker-compose -f alarm.yml up

Wait for them to come up!

  1. Launch connect nodes (3):
docker-compose -f connect.yml up

Wait for them to come up!

Docker Swarm

Note: You can easily create a single node swarm with:

docker swarm init

Note: The same compose files (v3.2+) used above are used by the Docker Engine in Swarm mode. However, the ability to scale on demand requires more work to setup as you would need to run each piece as a single scalable service. For example, instead of having three separate fixed Kafka services defined, you would need to define a single dynamic Kafka service that could be scaled similar to what is described here.

  1. Launch Zookeeper "ensemble" (3):
docker stack deploy -c zookeeper.yml alarms

Wait for them to come up!

  1. Launch Kafka nodes (3):
docker stack deploy -c kafka.yml alarms

Wait for them to come up!

  1. Launch alarm/support nodes:
docker stack deploy -c alarm.yml alarms

Wait for them to come up!

  1. Launch connect nodes (3):
docker stack deploy -c connect.yml alarms

Wait for them to come up!

Check connect status:

Note: If you used swarm then the container names are conveniently scrambled and obfuscated. You'll have to look them up with:

docker container ls

Further, the docker service ls command shows different names, which aren't container names. Despite docker-compose exec docs saying it works with service names, it really only works with container names.

Show Status

docker exec -it connect-1 bash
/scripts/show-status.sh

Test Fail-over

docker stop connect-3

The show-status.sh script should show that task assigned to connect-3 (via IP address) is now in state UNASSIGNED. After the delay specified by scheduled.rebalance.max.delay.ms has elapsed (default 5 minutes) the task should be re-assigned and in state RUNNING, though on a different connect server.

If connect-3 happened to be the connector leader, it will have been moved to a new connect server as well (the leader runs the Connector, which has a ChannelManager that monitors the command channel for changes).