Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resurrect snapshot container #403

Merged
merged 2 commits into from
Feb 15, 2024
Merged

Conversation

LesnyRumcajs
Copy link
Member

Summary of changes
Changes introduced in this pull request:

  • resurrected the with few changes:
    • the base snapshot service image doesn't have the aws-cli dependency anymore - it is replaced by the usage of https://gallery.ecr.aws/aws-cli/aws-cli (used the ECR flavour to bypass Dockerhub download limits)
    • no copies are made, the volume is mounted directly to the aws-cli container which uses it to upload the snapshot.

The peak disk usage will be 2x snapshot size for a brief moment (the period of snapshot export command), but apart from that, e.g., during the upload, it should be 1x snapshot size.

Reference issue to close (if applicable)

Closes

Other information and links

@LesnyRumcajs LesnyRumcajs requested a review from a team as a code owner February 9, 2024 12:01
@LesnyRumcajs LesnyRumcajs requested review from hanabi1224 and elmattic and removed request for a team February 9, 2024 12:01
Copy link

github-actions bot commented Feb 9, 2024

Forest: Snapshot Service Infrastructure Plan: success

Show Plan
data.external.sources_tar: Reading...
data.local_file.init: Reading...
data.local_file.init: Read complete after 0s [id=dc94e3ebf33fe45cb5e8946405ca9f6d06f4f21a]
data.digitalocean_ssh_keys.keys: Reading...
data.digitalocean_project.forest_project: Reading...
data.external.sources_tar: Read complete after 0s [id=-]
data.local_file.sources: Reading...
module.monitoring[0].newrelic_alert_policy.alert: Refreshing state... [id=1300187]
module.monitoring[0].newrelic_notification_channel.slack-channel[0]: Refreshing state... [id=4c3875ea-f2da-4bee-b8e3-14d2816cac32]
data.local_file.sources: Read complete after 0s [id=bb5129f3e24d23f393420ccd3ea738485eda581b]
data.digitalocean_ssh_keys.keys: Read complete after 1s [id=ssh_keys/8656000852680004208]
digitalocean_droplet.forest: Refreshing state... [id=399718313]
module.monitoring[0].newrelic_workflow.alerting-workflow-slack[0]: Refreshing state... [id=d7b883e9-c7eb-4a95-8308-d95874fa5dbe]
module.monitoring[0].newrelic_nrql_alert_condition.disk_space: Refreshing state... [id=1300187:5697050]
data.digitalocean_project.forest_project: Read complete after 6s [id=da5e6601-7fd9-4d02-951e-390f7feb3411]
digitalocean_project_resources.connect_forest_project: Refreshing state... [id=da5e6601-7fd9-4d02-951e-390f7feb3411]
digitalocean_firewall.forest-firewall: Refreshing state... [id=7de97678-c086-4a0a-a9bf-7a7155afc60f]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # digitalocean_droplet.forest must be replaced
-/+ resource "digitalocean_droplet" "forest" {
      ~ created_at           = "2024-02-06T11:59:12Z" -> (known after apply)
      ~ disk                 = 200 -> (known after apply)
      ~ id                   = "399718313" -> (known after apply)
      ~ ipv4_address         = "164.92.197.157" -> (known after apply)
      ~ ipv4_address_private = "10.135.0.5" -> (known after apply)
      + ipv6_address         = (known after apply)
      ~ locked               = false -> (known after apply)
      ~ memory               = 16384 -> (known after apply)
        name                 = "prod-forest-snapshot"
      ~ price_hourly         = 0.125 -> (known after apply)
      ~ price_monthly        = 84 -> (known after apply)
      ~ private_networking   = true -> (known after apply)
      ~ status               = "active" -> (known after apply)
        tags                 = [
            "iac",
            "prod",
        ]
      ~ urn                  = "do:droplet:399718313" -> (known after apply)
      ~ user_data            = (sensitive value) # forces replacement
      ~ vcpus                = 4 -> (known after apply)
      ~ volume_ids           = [] -> (known after apply)
      ~ vpc_uuid             = "46a525ac-fd37-47ea-bb10-95c1db0055f7" -> (known after apply)
        # (9 unchanged attributes hidden)
    }

  # digitalocean_firewall.forest-firewall will be updated in-place
  ~ resource "digitalocean_firewall" "forest-firewall" {
      ~ droplet_ids     = [
          - 399718313,
        ] -> (known after apply)
        id              = "7de97678-c086-4a0a-a9bf-7a7155afc60f"
        name            = "prod-forest-snapshot"
        tags            = []
        # (3 unchanged attributes hidden)

        # (6 unchanged blocks hidden)
    }

  # digitalocean_project_resources.connect_forest_project will be updated in-place
  ~ resource "digitalocean_project_resources" "connect_forest_project" {
        id        = "da5e6601-7fd9-4d02-951e-390f7feb3411"
      ~ resources = [
          - "do:droplet:399718313",
        ] -> (known after apply)
        # (1 unchanged attribute hidden)
    }

Plan: 1 to add, 2 to change, 1 to destroy.

Changes to Outputs:
  ~ ip = [
      - "164.92.197.157",
      + (known after apply),
    ]

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: /home/runner/work/forest-iac/forest-iac/tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "/home/runner/work/forest-iac/forest-iac/tfplan"

@LesnyRumcajs LesnyRumcajs requested review from a team and ruseinov and removed request for hanabi1224 and a team February 13, 2024 10:13
@ruseinov
Copy link

I'll be honest, the setup for snapshot service is looking pretty confusing, a mix of ruby, bash and bash embedded into ruby. Perhaps we should improve upon this code? My eyes bleed reading this, it's not comprehensive at all. And we are talking about a really small amount of code.

If we want to keep using Ruby/Python or whatever else scripting language here - I think we should reconsider how bash is wrapped, and come up with somewhat clearer abstractions, e.g. an Object that has a bunch of methods that then do some bash stuff under the hood. It would be much nicer than interpolating a bunch of commands into a single string.

This stuff is looking pretty unreadable:

add_timestamps_cmd = <<~CMD
  awk '{
  if ($0 !~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{6}Z/)
    print strftime("[%Y-%m-%d %H:%M:%S]"), $0;
  else
    print $0;
  fflush();
  }'
CMD

I appreciate the awk knowledge, but I'd prefer to have a readable alternative. If it's not easier and more comprehensive to do it in Ruby - at least some comments to avoid having to decipher this would be much appreciated.

All of that said, if it's too much effort to invest - it would at least be nice to have this code properly documented.

@LesnyRumcajs
Copy link
Member Author

I'll be honest, the setup for snapshot service is looking pretty confusing, a mix of ruby, bash and bash embedded into ruby. Perhaps we should improve upon this code? My eyes bleed reading this, it's not comprehensive at all. And we are talking about a really small amount of code.

If we want to keep using Ruby/Python or whatever else scripting language here - I think we should reconsider how bash is wrapped, and come up with somewhat clearer abstractions, e.g. an Object that has a bunch of methods that then do some bash stuff under the hood. It would be much nicer than interpolating a bunch of commands into a single string.

This stuff is looking pretty unreadable:

add_timestamps_cmd = <<~CMD
  awk '{
  if ($0 !~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{6}Z/)
    print strftime("[%Y-%m-%d %H:%M:%S]"), $0;
  else
    print $0;
  fflush();
  }'
CMD

I appreciate the awk knowledge, but I'd prefer to have a readable alternative. If it's not easier and more comprehensive to do it in Ruby - at least some comments to avoid having to decipher this would be much appreciated.

All of that said, if it's too much effort to invest - it would at least be nice to have this code properly documented.

@ruseinov The code to review is here 2f263d0. You don't have to review the awk monstrosity at all.

The overall design of the service could definitely be improved, but it's not the point of this PR. The point is to improve upon its usage, e.g., via Terraform.

Copy link

@ruseinov ruseinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's add an issue to revisit the bash-awk-ruby madness and make something readable of it.

Copy link
Contributor

@samuelarogbonlo samuelarogbonlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LesnyRumcajs LesnyRumcajs merged commit c0c6669 into main Feb 15, 2024
10 checks passed
@LesnyRumcajs LesnyRumcajs deleted the resurrect-snapshot-container branch February 15, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants