Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crates-io): replicate entire bucket in fallback #468

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MarcoIeni
Copy link
Member

@MarcoIeni MarcoIeni commented Aug 1, 2024

The crates-io bucket contain many directories:
Screenshot 2024-08-01 at 17 36 30

However the fallback contains just crates:
Screenshot 2024-08-01 at 17 36 10

I think we should replicate the entire content of the bucket, so I'm removing the filter 👍
What do you think? 👀

@MarcoIeni
Copy link
Member Author

MarcoIeni commented Aug 2, 2024

Staging plan:

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_s3_bucket.static will be updated in-place
  ~ resource "aws_s3_bucket" "static" {
        id                          = "staging-crates-io"
        tags                        = {}
        # (11 unchanged attributes hidden)

      - lifecycle_rule {
          - abort_incomplete_multipart_upload_days = 1 -> null
          - enabled                                = true -> null
          - id                                     = "purge-old-rss-feeds" -> null
          - prefix                                 = "rss/" -> null
          - tags                                   = {} -> null

          - noncurrent_version_expiration {
              - days = 1 -> null
            }
        }

        # (7 unchanged blocks hidden)
    }

  # aws_s3_bucket_replication_configuration.static will be updated in-place
  ~ resource "aws_s3_bucket_replication_configuration" "static" {
        id     = "staging-crates-io"
        # (2 unchanged attributes hidden)

      ~ rule {
            id       = "crates"
            # (2 unchanged attributes hidden)

          - filter {
              - prefix = "crates/" -> null
            }

            # (2 unchanged blocks hidden)
        }
    }

  # fastly_service_compute.static will be updated in-place
  ~ resource "fastly_service_compute" "static" {
      ~ active_version = 94 -> (known after apply)
      ~ cloned_version = 94 -> (known after apply)
        id             = "liljrvY3Xt0CzNk0mpuLa7"
        name           = "static.staging.crates.io"
        # (4 unchanged attributes hidden)

      ~ package {
          ~ source_code_hash = "5cdd587ee607364c9517440d0c4f3bb1e2295e93dfacabb04fd2933c594bf09e0cc0c1dea81fcc0a99b2aea55f7f123856e767a9b9db91f3565fbb4d58a00479" -> "b2b550736a33231f268f0263853111e8de057026a3a193459d8202e0c4d7cd4044440c5da4e233906fb54bcae2615d65930f8f720642c69eb3978c6d14fcf52a"
            # (1 unchanged attribute hidden)
        }

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 3 to change, 0 to destroy.
╷
│ Warning: Argument is deprecated
│
│   with aws_s3_bucket.index,
│   on s3-index.tf line 1, in resource "aws_s3_bucket" "index":
│    1: resource "aws_s3_bucket" "index" {
│
│ Use the aws_s3_bucket_versioning resource instead
│
│ (and 9 more similar warnings elsewhere)
╵

@MarcoIeni
Copy link
Member Author

MarcoIeni commented Aug 2, 2024

Production plan (before changing deployed_ref, so you won't see changes from this PR here):

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # fastly_service_compute.static will be updated in-place
  ~ resource "fastly_service_compute" "static" {
      ~ active_version = 38 -> (known after apply)
      ~ cloned_version = 38 -> (known after apply)
        id             = "gEfRWQihVaQqh6vsPlY0H1"
        name           = "static.crates.io"
        # (4 unchanged attributes hidden)

      ~ package {
          ~ source_code_hash = "095092655df6bfa42f810a2c2a29e49792c8e2056311997033d8a459b6c1b53611d52536259f3dd661cef7084ea9f8b0d391a9a02f2338c1ff421443664d21b4" -> "a8881886c8461a95ef94ea2e91023dc96dcea1bebec0cdd0785c2fa4e85a32d8f41f82f9f0b96d17a17698423d9256f99803d6b2ae8019a39fc2971aa97eb437"
            # (1 unchanged attribute hidden)
        }

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
╷
│ Warning: Argument is deprecated
│
│   with aws_s3_bucket.index,
│   on s3-index.tf line 1, in resource "aws_s3_bucket" "index":
│    1: resource "aws_s3_bucket" "index" {
│
│ Use the aws_s3_bucket_versioning resource instead
│
│ (and 9 more similar warnings elsewhere)
╵

@MarcoIeni MarcoIeni marked this pull request as ready for review August 2, 2024 08:54
@jdno
Copy link
Member

jdno commented Aug 5, 2024

The unstable hash for the Fastly function is something I'm chasing and fixing every few months. 😞 Could just be that a new Rust version changed the layout of the WASM file slightly. 🤷‍♂️ Would be great to eventually make sure the hash is stable again...

@jdno
Copy link
Member

jdno commented Aug 5, 2024

Regarding the filter, I'm not sure what the historic context is. @pietroalbini or @Mark-Simulacrum might know.

@MarcoIeni
Copy link
Member Author

I applied the other changes. The plan looks much better now:

Terraform will perform the following actions:

  # aws_s3_bucket_replication_configuration.static will be updated in-place
  ~ resource "aws_s3_bucket_replication_configuration" "static" {
        id     = "staging-crates-io"
        # (2 unchanged attributes hidden)

      ~ rule {
            id       = "crates"
            # (2 unchanged attributes hidden)

          - filter {
              - prefix = "crates/" -> null
            }

            # (2 unchanged blocks hidden)
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

If I get the approval from Pietro or Mark, I'll apply and merge this 👍

@Mark-Simulacrum
Copy link
Member

Regarding the filter, I'm not sure what the historic context is. @pietroalbini or @Mark-Simulacrum might know.

I don't have particular context. However, I suspect that the crates.io frontend/site uses the other resources and can only use the primary bucket to do this today, and/or we deemed replicating the rest of the less critical files (which should be ~reproducible from the .crate files, I'd expect) needlessly expensive to replicate and store multiple times.

But @rust-lang/crates-io or Pietro likely have more current context on usage. I doubt that extra replication is that much of a problem.

@Turbo87
Copy link
Member

Turbo87 commented Aug 18, 2024

we deemed replicating the rest of the less critical files (which should be ~reproducible from the .crate files, I'd expect) needlessly expensive to replicate

yeah, the readmes can be extracted and rerendered from the crate files and the RSS files can be recreated from the database. the database dump obviously can also be recreated from the database.

the archive folder is fairly new and contains data that we explicitly removed from the database. replicating that probably makes sense.

replicating the crates folder is obviously the most important part :D

@MarcoIeni
Copy link
Member Author

the archive folder is fairly new and contains data that we explicitly removed from the database. replicating that probably makes sense.

Yeah, we should have replicated the archive folder, but we forgot. That's why I think replicating everything is the safer option here 👍

@pietroalbini
Copy link
Member

The reasoning why only the crates/ directory was replicated was to avoid replicating files that can be automatically re-created. TBH the storage costs are going to be negligible, so I'm not opposed to replicating everything.

@MarcoIeni MarcoIeni force-pushed the feat-crates-io-replicate-entire-bucket-in-fallback branch from 9d5a0f0 to 76551f5 Compare August 21, 2024 12:26
@MarcoIeni
Copy link
Member Author

MarcoIeni commented Aug 21, 2024

I got this error in the staging environment when running apply 😵


aws_s3_bucket_replication_configuration.static: Modifying... [id=staging-crates-io]
╷
│ Error: updating S3 replication configuration for bucket (staging-crates-io): InvalidRequest: DeleteMarkerReplication cannot be used for this version of Cross Region Replication configuration schema. Please refer to S3 Developer Guide for more information.
│ 	status code: 400, request id: 9NQ7D1PSAP9T14FZ, host id: Ek0sEgeO/MFJuiDX7OqDL60sFKDyjyuhLY0UOP9BowwU54IIibEidv2jyqYjspGLuHVguxHBLJ8=
│
│   with aws_s3_bucket_replication_configuration.static,
│   on s3-static.tf line 53, in resource "aws_s3_bucket_replication_configuration" "static":
│   53: resource "aws_s3_bucket_replication_configuration" "static" {
│

So maybe there's an issue with delete_marker_replication

I'll put this on hold until we update the terraform aws provider

@MarcoIeni
Copy link
Member Author

I retried now that terraform providers are at the latest version and I got the same error 😵
So this needs more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants