Skip to content
This repository was archived by the owner on Oct 9, 2019. It is now read-only.
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
FROM neo4j:3.4.6-enterprise
FROM neo4j:3.5.4-enterprise

RUN apk update && apk add --no-cache --quiet \
e2fsprogs \
curl \
RUN apk add --no-cache \
e2fsprogs \
curl \
zip \
unzip \
python py-pip && \
Expand All @@ -11,17 +11,22 @@ RUN apk update && apk add --no-cache --quiet \

# Install plugins
RUN mkdir -p /var/lib/neo4j/plugins
RUN curl -L -s https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.4.0.1/apoc-3.4.0.1-all.jar > /var/lib/neo4j/plugins/apoc-3.4.0.1-all.jar
RUN curl -L -s http://central.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar > /var/lib/neo4j/plugins/mysql-connector-java-6.0.6.jar

COPY docker-entrypoint.sh /docker-entrypoint.sh
ENV NEO4J_APOC_VERSION=3.5.0.2

ADD https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/$NEO4J_APOC_VERSION/apoc-$NEO4J_APOC_VERSION-all.jar /var/lib/neo4j/plugins/apoc-$NEO4J_APOC_VERSION-all.jar
ADD http://central.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar /var/lib/neo4j/plugins/mysql-connector-java-6.0.6.jar


ENV EXTENSION_SCRIPT=/ecs-extension.sh

COPY ecs-extension.sh ${EXTENSION_SCRIPT}
COPY init_db.sh /init_db.sh

# These were created earlier by image, but we dont need them since
# entrypoint will configure Neo to use them if they exist.
RUN rm -rf /var/lib/neo4j/data /var/lib/neo4j/logs
RUN rm -rf ${NEO4J_HOME}/data/ ${NEO4J_HOME}/logs/ ${NEO4J_HOME}/metrics/

EXPOSE 5000 5001 6000 6001 7000

ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["neo4j"]
CMD ["start"]
19 changes: 15 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,16 +1,27 @@
COMMIT=$(shell git rev-parse HEAD)
COMMIT=taras-$(shell git rev-parse HEAD)
DATE=$(shell date +%Y-%m-%d-%H-%M)

.PHONY:
build:
@ echo "Building image..."
@ docker build -t neo .

# Use param REPO, e.g. REPO=xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/neo to specify ECR repository
# Use param REGION, e.g. REGION=us-east-1
# Use param NEO_ECR_REPO, e.g. NEO_ECR_REPO=xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/neo to specify ECR repository
# Use param NEO_AWS_REGION, e.g. NEO_AWS_REGION=us-east-1
.PHONY:
push_image: build
@ echo "Pushing image based on last commit $(COMMIT)"
@ $(shell aws ecr get-login --region $(NEO_AWS_REGION))
@ $(shell aws ecr get-login --region $(NEO_AWS_REGION) --no-include-email)
@ docker tag neo:latest $(NEO_ECR_REPO):$(COMMIT)
@ docker push $(NEO_ECR_REPO):$(COMMIT)
@ echo "Pushed image $(NEO_ECR_REPO):$(COMMIT)"


.PHONY: create_stack
create_stack:
awless --no-sync --color always \
create stack \
name=taras-neo-test1-$(DATE) \
capabilities=CAPABILITY_IAM \
template-file=./cloudformation.yml \
stack-file=./config.yml
99 changes: 69 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,24 @@ A setup for HA (High-Availability) deployment of a [Neo4j Enterprise](https://ne

You can obtain Neo4j from the [official website](https://neo4j.com/). Please contact [email protected] for Enterprise licensing.

### Includes:
## Includes

- Customizable CloudFormation template.
- Custom docker image on top [official Neo4j image](https://hub.docker.com/_/neo4j/). Current version - *Neo4j 3.4.6*
- Custom docker image on top [official Neo4j image](https://hub.docker.com/_/neo4j/). Current version - *Neo4j 3.5.4*

## Features

* Automatic daily backups to S3 using a slave-only instance.
* Bootstrap a cluster from a backup snapshot.
* Autoscaling (based on Memory Utilization).
* CloudWatch alerts setup.
* Bootstrap a node with an existing data volume for quick startup.
* Automatically create users+credentials for read-only and read/write access.
- Automatic daily backups to S3 using a slave-only instance.
- Bootstrap a cluster from a backup snapshot.
- Autoscaling (based on Memory Utilization).
- CloudWatch alerts setup.
- Bootstrap a node with an existing data volume for quick startup.
- Automatically create users+credentials for read-only and read/write access.

## Prerequisites:
## Prerequisites

* Install [Docker](https://docs.docker.com/engine/installation/) to build the image.
* [AWS CLI](https://aws.amazon.com/cli) for uploading images to ECR.
- Install [Docker](https://docs.docker.com/engine/installation/) to build the image.
- [AWS CLI](https://aws.amazon.com/cli) for the ECR Auth.

## How does it work?

Expand All @@ -32,12 +33,16 @@ It uses [Bolt](https://boltprotocol.org/) – a highly efficient, lightweight

Essentially it's a Neo4j cluster with a minimum of 2 nodes (use at least 3 for HA), which is split logically into 2 ECS clusters (yet still it's 1 Neo4j cluster):

### A Read-Write cluster with one master node and multiple slaves:
### A Read-Write cluster with one master node and multiple slaves

- Fast synchronisation between and master and nodes.
- Load Balancer keeps only a current master node in service. Hence slaves act like hot-standby in case of a failover.
- All nodes are eligible for becoming a master. Reelection will be quickly spotted by ELB.

### A Read-only cluster with one slave node:
### A Read-only cluster with one slave node [optionally]

_This will generate additional costs, since separate ELB is created for this node_

- Slower synchronisation.
- Can not become master.
- Can not accept write queries.
Expand All @@ -50,7 +55,7 @@ Essentially it's a Neo4j cluster with a minimum of 2 nodes (use at least 3 for H

Ports open:

```
```yaml
- HTTP(s): 7473, 7474
- Bolt: 7687
```
Expand All @@ -62,38 +67,72 @@ Ports open:

2. Save environment variable for use in makefile (customize them first)

$ export NEO_ECR_REPO=<paste here ARN of your ECR repo>
$ export NEO_AWS_REGION=<your AWS region>
```sh
export NEO_ECR_REPO=<paste here ARN of your ECR repo>
export NEO_AWS_REGION=<your AWS region>
```

3. Build Docker image and push it to your ECR:

$ make build
$ make push_image
``` sh
make push_image
```

4. Feel free to modify `cloudformation.yml` in any way you like before spinning up infrastructure, however most of the things are customizable via parameters.

5. [Create a Cloud Formation stack](https://console.aws.amazon.com/cloudformation/home#/stacks/new) using `cloudformation.yml`.
5. [Create a Cloud Formation stack](https://console.aws.amazon.com/cloudformation/home#/stacks/new) using `cloudformation.yml` with your parameters.

_If you want to setup simpler (and cheaper) environment, without the Slave-Only node (and all realted resources),
you can set `SlaveMode=ABSENT` and ignore the rest of `Slave` related parameters (except `SlaveSubnetID` you need to choose any subnet there,
it will be ingnored as long as `SlaveMode=ABSENT`)_

**Parameters guide**

Parameter | Description
----------|----------
AcceptLicense | Must be set to `true` in order to use Neo4J
AdminUser | Default Admin user should be `neo4j` and can't be changed
ClusterInstanceType | EC2 instance type
DesiredCapacity | Number of desired Neo4J nodes (excluding the SlaveOnly one)
DockerECRARN | ARN of your Private ECR repo
DockerImage | URL of your customly build Neo4J Image
Domain | The domain for the your Neo4J cluster endpoint (http://<domain>:7474)
DomainHostedZone | Route53 Domain Hosted zone to register your DNS record
EBSSize | Size of EBS volume for Neo4J data in GBs
EBSType | Type of EBS volume
GuestPassword | Password for the Neo4J Read Only user
GuestUser | Name for the Neo4j Read Only user
KeyName | SSH key to use for EC2 instances access
MaxSize | Max number of instances in the cluster
SubnetID | List of Subnets to place Neo4J Cluster nodes. *Supported only one instance per Subnet*
Mode | [Neo4J DB Mode](https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.mode)
NodeSecurityGroups | List of additional SG to apply on your EC2 instances
SlaveMode | [Neo4J DB Mode](https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.mode) for the SlaveOnly instance, with the addional one `ABSENT` that can be used to create Neo4J cluster without the SlaveOnly instance
SlaveOnlyDomain | The domain for the your Neo4J SlaveOnly endpoint (http://<domain>:7474)
SlaveOnlyInstanceType | EC2 instance type for the slave only mode
SlaveSubnetID | SubnetID for the Slave only instance. *Should be different one from the main cluster subnets, but should be able to access other instances*. Even if `SlaveMode` set to `ABSENT` some value must be set here (it will be ignored in this case)
VpcId | AWS VPC ID to place your cluster in
SNSTopicArn | SNS Topic ARN to send Alerts to. If none specified, new one will be created
SnapshotPath | Path to the DB snapshot on the S3, to restore data from on start (_<bucket_name>/hourly/neo4j-backup-<timestamp>.zip_)

During this step you will define all the resources you need and configure Docker image with Neo4j for ECS.
Please make sure to set 2 tags for your stack (on "Options" page):
Please consider tagging your stack (on "Options" page):

Name: <how you name your stack>
Environment: <your env name, e.g production>
```yaml
Name: <how you name your stack>
Environment: <your env name, e.g production>
```

## Upgrade version

Please see [detailed instructions](./UPGRADE_README.md) to upgrade using this CF template.


## Known Problems

* You can't restore server from a backup without a downtime. See [further instructions](https://neo4j.com/docs/operations-manual/current/backup/restore-backup/#backup-restore-ha-cluster).
* Autoscaling is hardcoded via RAM utilization (>70%). Feel free to modify for your own needs.
* Sometimes, rolling updates, that require nodes reboot, render them stuck for some time before rejoining cluster. Probaly a slower rolling update can help so that at each moment at least one node is already registered in main ELB as master.
- You can't restore server from a backup without a downtime. See [further instructions](https://neo4j.com/docs/operations-manual/current/backup/restore-backup/#backup-restore-ha-cluster).
- Autoscaling is hardcoded via RAM utilization (>70%). Feel free to modify for your own needs.
- Sometimes, rolling updates, that require nodes reboot, render them stuck for some time before rejoining cluster. Probaly a slower rolling update can help so that at each moment at least one node is already registered in main ELB as master.

## TODO

```
* Parametrize autoscaling.
* Allow disabling slave-only more for simplest 1-node setups.
```
- Parametrize autoscaling.
36 changes: 12 additions & 24 deletions UPGRADE_README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,39 @@
## Upgrade guide
# Upgrade guide

Information below is up-to-date with Neo4j 3.4.6.

### Patch version upgrades
## Patch version upgrades

If you upgrade between patch versions, you might use
[rolling upgrade](https://neo4j.com/docs/operations-manual/current/upgrade/causal-cluster/#cc-upgrade-rolling)
just by updating each instance separately. However it is possible *only when a store format upgrade is not needed (see release notes for particular change)*.

One step: build a new version of docker image (based on newest official Neo4j Docker image), and use that image in CloudFormation (see step #1 in Generic Upgrades section below).


### Generic upgrades
## Generic upgrades

(Based on [official Neo4j guide](https://neo4j.com/docs/operations-manual/current/upgrade/))

Moves between minor/major versions do not allow zero-downtime (as of the 3.4.6) database upgrade.

You want to make use of CloudFormation (CF) parameters to tweak upgrade steps, as follows below.

### Preconditions
## Preconditions

- Time of day with lowest graph load
- Client-side retry system or logging of all queries, in order to not lose write queries.
- Build neo4j docker image with new version and have it in ECS.
- AWS console open.


### Migration
## Migration

1. Update CF stack with parameters:

DockerImage = <docker image address name:tag> 

This will roll updates to cluster. If during this master fails over to another node, client might spot couple seconds window.



2. Update CF stack with parameters:
1. Update CF stack with parameters:

Mode = SINGLE
SlaveMode = SINGLE
Expand All @@ -53,48 +49,40 @@ You want to make use of CloudFormation (CF) parameters to tweak upgrade steps, a

Verify it's working properly. Good idea is to roll client tests since this is a final upgraded DB.


3. Manual migrations (if needed).
1. Manual migrations (if needed).

If upgrade process involves index recreation or other migration to data, that needs to be done manually, it's the right time to do it.


4. Update CF stack with parameters:
1. Update CF stack with parameters:

Mode = HA (but keep SlaveMode = SINGLE)
AllowUpgrade = False

Verify your single master is fine as HA node. Again downtime since master reboots.


5. Change "Name" tags for all slave Neo4j data volumes.
1. Change "Name" tags for all slave Neo4j data volumes.

For example "neo4j-production-data" → "neo4j-production-data-old". We don't need data volumes with old DB on slaves, with changed name nodes won't use old volumes after reboot.


6. (Optional) Create slave copies of master's migrated volume.
1. (Optional) Create slave copies of master's migrated volume.

If DB is big, this step is useful. Without it slave nodes will start on fresh DB and will need to catch up with master in online mode.

So, you create a snapshot from a migrate Neo4j data volume and create volumes with needed tags in all regions. (Environment tag, Name tag).

Still during boot of node, Neo4j might refuse to use your volume as too old, Neo will simply throw out the data and catch up with master online 



7. Update CF stack with parameters: (N = 2 for example)
1. Update CF stack with parameters: (N = 2 for example)

MaxSize = N
DesiredCapacity = N

Will upscale cluster and create new slaves, that will catch up.


8. Update CF stack with parameters:
1. Update CF stack with parameters:

SlaveMode = HA

Will reboot read-only slave as HA member, hooking up to new volume.


Verify both main cluster and slave-only node are working properly, roll tests. Upgrade complete :)
Loading