Skip to content

2021 10 23 (Saturday) Deployment

Mike Marcotte edited this page Oct 28, 2021 · 6 revisions

General Notes

This deployment mainly consists of the latest batch of work from Flexion. See the stories below.

Additionally, it commits the change to add an additional replica shard to our Elasticsearch cluster for each index. This will improve performance and resiliency.

We are performing this update after hours, expecting it to conclude between 1am and 2am as we observe low level of activity at this time. We will notify any Court Staff logged in to save their work and log out as the deployment completes.

Bugfixes

Feature Stories

Observations

While deploying in Court environments, we observed that the wait until reindexing was complete script was getting confused by the additional cluster. it appears that the stats API counts the total number of documents multiplied by the number of shards. By adding a replica, we increased that amount by 50%. So, we created a bug to track this, and a fix to use the count API instead.

Timeline

  • 22:14 - Created the Pull Request
  • 22:15 - Run script to setup boolean values in prod deploy table
$ ./scripts/update-deploy-string-to-boolean.sh prod
  • 22:17 - Ensure ES and DynamoDB tables are ready for a Migration
  • 22:20 - Ran Docker to ECR script
$ ./docker-to-ecr.sh latest
  • 22:21 - Tests pass
  • 22:22 - Merged the PR CircleCI Build
  • 22:35 - Tests pass; deploy step starts
  • 22:40 - Observed deploy table looks correct, and migrate flag is true, source table: beta, destination table: alpha.
  • 23:00 - Deploy step completes
  • 23:00 - Migration starts. 🤞
  • 00:06 - Migration completes successfully
  • 02:58 - Reindexing appears to be complete based off of the earlier observations:
## prod Index Summary
┌─────────┬───────────────────────┬────────────┬───────────┬─────────┐
│ (index) │       indexName       │ countAlpha │ countBeta │  diff   │
├─────────┼───────────────────────┼────────────┼───────────┼─────────┤
│    0    │     'efcms-case'      │  3013935   │  2009290  │ 1004645 │
│    1    │ 'efcms-case-deadline' │   27384    │   18266   │  9118   │
│    2    │ 'efcms-docket-entry'  │  27667143  │ 18444764  │ 9222379 │
│    3    │    'efcms-message'    │   592368   │  394912   │ 197456  │
│    4    │     'efcms-user'      │   481410   │  320940   │ 160470  │
│    5    │   'efcms-work-item'   │  1587057   │  1058038  │ 529019  │
└─────────┴───────────────────────┴────────────┴───────────┴─────────┘

With the updated script:

┌─────────┬───────────────────────┬────────────┬───────────┬──────┐
│ (index) │       indexName       │ countAlpha │ countBeta │ diff │
├─────────┼───────────────────────┼────────────┼───────────┼──────┤
│    0    │     'efcms-case'      │  1004645   │  1004645  │  0   │
│    1    │ 'efcms-case-deadline' │    9128    │   9133    │  5   │
│    2    │ 'efcms-docket-entry'  │  9222381   │  9222382  │  1   │
│    3    │    'efcms-message'    │   197456   │  197456   │  0   │
│    4    │     'efcms-user'      │   160470   │  160470   │  0   │
│    5    │   'efcms-work-item'   │   529019   │  529019   │  0   │
└─────────┴───────────────────────┴────────────┴───────────┴──────┘
  • 03:02 - Manually continuing the deployment
  • 03:03 - Running script to figure out what the missing docket entry is:
$ node shared/admin-tools/elasticsearch/determine-difference-es-index.js prod beta efcms-docket-entry
  • 03:08 - Smoketests pass! Observed that USTC_ADMIN_USER is disabled.
  • 03:13 - Switch colors...
  • 03:16 - Disabled blue api custom domains east & west

Things are looking good. Investigating the docket entry and case deadlines that are missing from the destination cluster. 🤔

Conclusion

I’m having a hard time figuring out which document is missing because my query to calculate the delta keeps timing out due to the fact that the docket entry index is so huge.

$ node shared/admin-tools/elasticsearch/determine-difference-es-index.js prod beta efcms-docket-entry
efcms-search-prod-alpha
events.js:292
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:209:20)
Emitted 'error' event on ClientRequest instance at:
    at Socket.socketErrorListener (_http_client.js:469:9)
    at Socket.emit (events.js:315:20)
    at Socket.EventEmitter.emit (domain.js:467:12)
    at emitErrorNT (internal/streams/destroy.js:106:8)
    at emitErrorCloseNT (internal/streams/destroy.js:74:3)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: -54,
  code: 'ECONNRESET',
  syscall: 'read'
}

However, for the case deadline records, it’s another example of https://github.com/flexion/ef-cms/issues/9009. The records don’t exist in DynamoDB (either source or destination). At some point in time, these records should have been removed from the source Cluster. Somehow they continue to linger. It must be something intermittently failing deleting these records (and perhaps indexing?) from the cluster. The fix put forth for 9009 so far was a significant refactor that deprecated efcms-user-case index and stopped indexing unwanted records into the efcms-user index. It appears the underlying problem, where some requests are failing to be deleted, still persists.

Clone this wiki locally