Skip to content

Latest commit

 

History

History
128 lines (80 loc) · 6.95 KB

infrastructure.md

File metadata and controls

128 lines (80 loc) · 6.95 KB

Infrastructure

Domains

nextstrain.org and dev.nextstrain.org are hosted on Heroku.

data.nextstrain.org is an AWS CloudFronted S3 bucket, nextstrain-data.

staging.nextstrain.org is an AWS CloudFronted S3 bucket, nextstrain-staging.

login.nextstrain.org is used by our AWS Cognito user pool.

Heroku

The production Heroku app is nextstrain-server, which is part of a Heroku app pipeline of the same name. Deploys of master happen automatically after Travis CI tests are successful.

Environment variables

  • SESSION_SECRET must be set to a long, securely generated string. It protects the session data stored in browser cookies. Changing this will invalidate all existing sessions and forcibly logout people.

  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are tied to the nextstrain.org AWS IAM user. These credentials allow the backend web server limited access to private S3 buckets.

  • REDIS_URL is provided by the Heroku Redis add-on. It should not be modified directly. Our authentication handlers rewrite it at server start to use a secure TLS connection.

  • FETCH_CACHE is not currently used, but can be set to change the location of the on-disk cache used by (some) server fetch()-es. The default location is /tmp/fetch-cache.

  • PAPERTRAIL_API_TOKEN is used for logging through papertrail.

Redis add-on

The Heroku Redis add-on is attached to our nextstrain-server and nextstrain-dev apps. Redis is used to persistently store login sessions after authentication via AWS Cognito. A persistent data store is important for preserving sessions across deploys and regular dyno restarts.

The maintenance window is set to Friday at 22:00 UTC to Saturday at 02:00 UTC. This tries to optimize for being outside/on the fringes of business hours in relevant places around the world while being in US/Pacific business hours so the Seattle team can respond to any issues arising.

If our Redis instance reaches its maximum memory limit, existing keys will be evicted using the volatile-ttl policy to make space for new keys. This should preserve the most active logged in sessions and avoid throwing errors if we hit the limit. If we regularly start hitting the memory limit, we should bump up to the next add-on plan, but I don't expect this to happen anytime soon with current usage.

Logs

Server logs are available via the papertrail web app (requires heroku login). The dev server does not have papertrail enabled, but logs may be viewed using the heroku CLI via heroku logs --app=nextstrain-dev --tail.

Development server

A testing app, nextstrain-dev, is also used, available at dev.nextstrain.org. Deploys to it are manual, via the dashboard or git pushes to the Heroku remote, e.g. git push -f heroku-dev <branch>:master, where the heroku-dev remote is https://git.heroku.com/nextstrain-dev.git. Note that the dev server runs in production mode (NODE_ENV=production), and also uses the nextstrain.org AWS IAM user.

Review apps

We use Heroku Review Apps to create ephemeral apps for PRs to the GitHub repo. These are automatically created for PRs submitted by Nextstrain team members. To recreate an inactivated app, or create one for a PR from a fork, you can use the heroku dashboard. (Make sure to review code for security purposes before creating such an app.)

It is not currently possible to login/logout of these apps due to our AWS Cognito setup; thus private datasets cannot be accessed.

Rolling back deployments

Normal heroku deployments, which require TravisCI to pass and are subsequently built on Heroku, can take upwards of 10 minutes. Heroku allows us to immediately return to a previous version using heroku rollback --app=nextstrain-server vX, where X is the version number (available via the heroku dashboard).

AWS

All resources are in the us-east-1 region. If you don't see them in the AWS Console, make sure to check the region you're looking at.

S3 buckets

nextstrain-data

Public. CloudFronted. Contains JSONs for our core builds, as well as the nextstrain.yml conda environment definition. Fetches by the server happen over unauthenticated HTTP.

nextstrain-staging

Public. CloudFronted. Contains JSONs for staging copies of our core builds. Fetches by the server happen over unauthenticated HTTP.

nextstrain-inrb

Private. Access controlled by IAM groups/policies. Fetches by the server happen via the S3 HTTP API using signed URLs.

EC2 instances

rethink.nextstrain.org hosts the lab's fauna instance, used to maintain data for the core builds.

Ephemeral instances are automatically managed by AWS Batch for nextstrain build --aws-batch jobs.

Cognito

A user pool called nextstrain.org provides authentication for Nextstrain logins. Cognito is integrated with the nextstrain.org server using the OAuth2 support from PassportJS in our authn/index.js file.

We currently don't use Cognito's identity pools. It may be beneficial to use one in the future so we can get temporary AWS credentials specific to each Nextstrain user with the appropriate authorizations baked in (instead of using a server-wide set of credentials).

DNS

Nameservers for the nextstrain.org zone are hosted by DNSimple.

GitHub

nextstrain/nextstrain.org is the GitHub repo for the Nextstrain website.

Core and staging narratives are sourced from the nextstrain/narratives repo (the master and staging branches, respectively).

Travis CI

CI is run via TravisCI using our .travis.yml. All commits to the master branch on GitHub, or an open PR, will trigger a CI build.