title | description |
---|---|
Deployments |
How systems are deployed at Artsy |
- Safe: Only successfully-tested code should be deployed. Deploying should be graceful. I.e., shouldn't impact availability of the service.
- Repeatable: The result of a deploy should be consistent no matter the state of the developer's local environment, the particulars of a code change, the state of data, volume of traffic, etc.
- Visible: It should be transparent what deploy is happening when (and by whom...). An audit trail and instrumentation can be helpful in diagnosing production issues and interpreting monitoring results.
- Simple: Other things being equal, we prefer simple deployment operations whose impact can be easily understood by developers who may have to consider how to undo a change or its impact on other systems.
- Independent: Releases should be chunked and sequenced such that any one system can be updated independently of others.
- Continuous: Releases should not require manual involvement by engineers. Release increments should be small and tests, builds, migrations, and other safeguards should be sufficient to ensure most releases can be unattended.
- Reversible*: We should be able to roll back a deploy while keeping the state of the system stable and consistent. (This may also include associated changes such as to dependencies or configuration.)
- Consistent*: Ideally the deployment operation is the same whether a change includes just code, updated configuration, data migrations, or dependencies.
- Fast*: Other things being equal, we prefer deploys to be fast so we can deploy regularly and focus on other things.
*Not all of these goals are routinely achieved.
- Pull requests undergo the full test suite and any other linting or coverage steps imposed by the project.
- Once merged to the
main
branch and successfully tested/built, changes are automatically pushed to the container registry and deployed to staging by CI steps. - Any necessary pre-deploy steps (such as migrations) or post-deploy steps occur automatically during the deploy.
- A
staging
branch is updated based on each deploy to the staging environment. - Opening, reviewing, and merging a pull request from the
staging
to therelease
branch triggers a promotion from the staging to production environment (i.e., a release). - Whenever possible, production releases should proceed automatically, typically after having been open for 24 hours.
- For visibility, announce any manual production releases in #dev.
We use the hokusai tool to drive this process via
CircleCI. The
artsy/artsy-hokusai-templates project provides solid starting
points for .circleci/config.yml
and other configuration files. The
artsy/hokusai orb packages up the common steps for
convenient reuse. See
Convection's .circleci/config.yml for a
complete example.
We use horizon to visualize the status of our deployments and automatically open and [usually] merge production release PRs. See its instructions for defining a new project and configuring deployment steps, including Slack channels to notify prior to releases.
If a project should not be released for any reason (such as needing QA or known issues in staging),
create a deploy block to record the reason and timing.
The artsy/release orb defines a block
step
that will respect any unresolved deploy blocks and cause release builds to short-circuit.
If a project strays from these common practices, it's especially important to document the correct process in its README.
- See CircleCI's docs for general set-up steps.
- Artsy's project templates depend on write access to the repo (in order to push branches from CI steps). To create
a read+write Github key for CircleCI (rather than the default read-only key):
- Generate a key with a helpful label:
ssh-keygen -t rsa -b 4096 -m PEM -C "github_rw_key_for_circle"
(provide a blank passphrase). - Log into Github as the
artsyit
user and, in the project's settings, go to Deploy keys > Add deploy key. Give the key a descriptive name (like the label above) and paste in the contents of the public (.pub
) key file. - Check the Allow write access box and click the Add key button to save the new key.
- In the CircleCI project settings, go to Additional SSH Keys > Add SSH Key.
- Enter
github.com
for Hostname and the contents of the private key file for Private Key, then click Add SSH Key to save. - If necessary, add the ssh key fingerprint to your
.circleci/config.yml
following the instructions here. - The default read-only key under Checkout SSH Keys is still required for Circle to checkout the repo, so don't delete.
- Generate a key with a helpful label:
- In the CircleCI project settings, navigate to the advanced tab. Generally speaking you should:
- Enable the option to Build forked pull requests.
- Disable the option to Pass secrets to builds from forked pull requests.
We prefer to use our CI pipeline to "roll forward" even for reverts or timely fixes.
When it's critical to avoid the delay of a full CI run or staging deploy, you can hot-fix/roll-back using Hokusai, as follows:
-
List the project's production ECR images.
$ hokusai registry images --filter-tags production Image Pushed At | Image Tags ---------------------------------------------------------- 2022-01-27 12:15:21-05:00 | production, staging--2022-01-27--19-05-54, 9718ddb9334c3e9b2a0a0ffa5d744e1ca91d5cb3, production--2022-01-27--19-43-45 2022-01-26 06:16:16-05:00 | staging--2022-01-26--11-55-47, production--2022-01-26--14-17-47, 84fd6dcd9b115482e2b1d2981c31f4c8bc97a015 2022-01-25 11:02:42-05:00 | fcf109fa7db52c538755a4eac1b103ecf83dddce, staging--2022-01-25--16-57-31, production--2022-01-25--17-54-15 ... 81 more images available
The one being used in Production right now is the one that has the canonical
production
tag. In the example output above, it's the first one in the list. Presumably, this is the image that is problematic. The other images that haveproduction-<timestamp>
tags (but not the canonicalproduction
tag) were used in Production in the past but are no longer. Presumably one of them is the known last good working image. -
Identify the tag of the known last good working image. This is the one you are to revert to.
-
Deploy the known last good working image.
hokusai production deploy <tag>
That updates Kubernetes
Deployment
resources only. If other resources (e.g.Ingress
) should be reverted as well, addupdate-config
flag:hokusai production deploy <tag> --update-config
Please note that hokusai production deploy...
may not be a comprehensive rollback. Depending on your situation, additional steps (e.g. assets compilation, updating configuration) may be required to achieve a complete fix/roll-back.