2021 10 23 (Saturday) Deployment

General Notes

This deployment mainly consists of the latest batch of work from Flexion. See the stories below.

Additionally, it commits the change to add an additional replica shard to our Elasticsearch cluster for each index. This will improve performance and resiliency.

We are performing this update after hours, expecting it to conclude between 1am and 2am as we observe low level of activity at this time. We will notify any Court Staff logged in to save their work and log out as the deployment completes.

Bugfixes

Feature Stories

Observations

While deploying in Court environments, we observed that the wait until reindexing was complete script was getting confused by the additional cluster. it appears that the stats API counts the total number of documents multiplied by the number of shards. By adding a replica, we increased that amount by 50%. So, we created a bug to track this, and a fix to use the count API instead.

Timeline

22:14 - Created the Pull Request
22:15 - Run script to setup boolean values in prod deploy table

$ ./scripts/update-deploy-string-to-boolean.sh prod

22:17 - Ensure ES and DynamoDB tables are ready for a Migration
22:20 - Ran Docker to ECR script

$ ./docker-to-ecr.sh latest

22:21 - Tests pass
22:22 - Merged the PR CircleCI Build
22:35 - Tests pass; deploy step starts
22:40 - Observed deploy table looks correct, and migrate flag is true, source table: beta, destination table: alpha.
23:00 - Deploy step completes
23:00 - Migration starts. 🤞
00:06 - Migration completes successfully
02:58 - Reindexing appears to be complete based off of the earlier observations:

## prod Index Summary
┌─────────┬───────────────────────┬────────────┬───────────┬─────────┐
│ (index) │       indexName       │ countAlpha │ countBeta │  diff   │
├─────────┼───────────────────────┼────────────┼───────────┼─────────┤
│    0    │     'efcms-case'      │  3013935   │  2009290  │ 1004645 │
│    1    │ 'efcms-case-deadline' │   27384    │   18266   │  9118   │
│    2    │ 'efcms-docket-entry'  │  27667143  │ 18444764  │ 9222379 │
│    3    │    'efcms-message'    │   592368   │  394912   │ 197456  │
│    4    │     'efcms-user'      │   481410   │  320940   │ 160470  │
│    5    │   'efcms-work-item'   │  1587057   │  1058038  │ 529019  │
└─────────┴───────────────────────┴────────────┴───────────┴─────────┘

With the updated script:

┌─────────┬───────────────────────┬────────────┬───────────┬──────┐
│ (index) │       indexName       │ countAlpha │ countBeta │ diff │
├─────────┼───────────────────────┼────────────┼───────────┼──────┤
│    0    │     'efcms-case'      │  1004645   │  1004645  │  0   │
│    1    │ 'efcms-case-deadline' │    9128    │   9133    │  5   │
│    2    │ 'efcms-docket-entry'  │  9222381   │  9222382  │  1   │
│    3    │    'efcms-message'    │   197456   │  197456   │  0   │
│    4    │     'efcms-user'      │   160470   │  160470   │  0   │
│    5    │   'efcms-work-item'   │   529019   │  529019   │  0   │
└─────────┴───────────────────────┴────────────┴───────────┴──────┘

03:02 - Manually continuing the deployment
03:03 - Running script to figure out what the missing docket entry is:

$ node shared/admin-tools/elasticsearch/determine-difference-es-index.js prod beta efcms-docket-entry

03:08 - Smoketests pass! Observed that USTC_ADMIN_USER is disabled.
03:13 - Switch colors...
03:16 - Disabled blue api custom domains east & west

Things are looking good. Investigating the docket entry and case deadlines that are missing from the destination cluster. 🤔

Conclusion

I’m having a hard time figuring out which document is missing because my query to calculate the delta keeps timing out due to the fact that the docket entry index is so huge.

$ node shared/admin-tools/elasticsearch/determine-difference-es-index.js prod beta efcms-docket-entry
efcms-search-prod-alpha
events.js:292
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:209:20)
Emitted 'error' event on ClientRequest instance at:
    at Socket.socketErrorListener (_http_client.js:469:9)
    at Socket.emit (events.js:315:20)
    at Socket.EventEmitter.emit (domain.js:467:12)
    at emitErrorNT (internal/streams/destroy.js:106:8)
    at emitErrorCloseNT (internal/streams/destroy.js:74:3)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: -54,
  code: 'ECONNRESET',
  syscall: 'read'
}

However, for the case deadline records, it’s another example of https://github.com/flexion/ef-cms/issues/9009. The records don’t exist in DynamoDB (either source or destination). At some point in time, these records should have been removed from the source Cluster. Somehow they continue to linger. It must be something intermittently failing deleting these records (and perhaps indexing?) from the cluster. The fix put forth for 9009 so far was a significant refactor that deprecated efcms-user-case index and stopped indexing unwanted records into the efcms-user index. It appears the underlying problem, where some requests are failing to be deleted, still persists.

AWS environment configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly