Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if quota has been exceeded or object has too many versions #4487

Open
igorestevanjasinski opened this issue Dec 22, 2024 · 8 comments
Open

Comments

@igorestevanjasinski
Copy link

igorestevanjasinski commented Dec 22, 2024

Hello Everyone,
I'm using ECS DELL as my S3, and after sometime I start to see this particupar message for all my operatorations
level=error ts=2024-12-22T04:41:59.49184793Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=e39622c4-3777-4c76-b659-178a5d728286 attempts=58 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/e39622c4-3777-4c76-b659-178a5d728286/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:41:59.491907604Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=e39622c4-3777-4c76-b659-178a5d728286 backoff=2m0s level=info ts=2024-12-22T04:41:59.491946293Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=0014813c-fe0e-46bb-b62b-2d0fa1e44a6e level=error ts=2024-12-22T04:41:59.574035199Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=65a133a1-c6f7-4096-8d9f-2c18f13cd4e2 attempts=417 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/65a133a1-c6f7-4096-8d9f-2c18f13cd4e2/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:41:59.574077979Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=65a133a1-c6f7-4096-8d9f-2c18f13cd4e2 backoff=2m0s level=info ts=2024-12-22T04:41:59.574109427Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=e01c91eb-aac9-4da8-884c-eb8c117d609a level=error ts=2024-12-22T04:41:59.680115917Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=45cad84c-3efa-4696-99af-160ea0b44510 attempts=219 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/45cad84c-3efa-4696-99af-160ea0b44510/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:41:59.680169029Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=45cad84c-3efa-4696-99af-160ea0b44510 backoff=2m0s level=info ts=2024-12-22T04:41:59.68021364Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=2df5edb5-ed63-4a6a-bb19-9c879f9b71e3 level=error ts=2024-12-22T04:42:00.488218529Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=ab69dea3-49c3-4d40-aff7-721fe737be02 attempts=437 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/ab69dea3-49c3-4d40-aff7-721fe737be02/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:42:00.488280096Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=ab69dea3-49c3-4d40-aff7-721fe737be02 backoff=2m0s level=info ts=2024-12-22T04:42:00.488323419Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=273b2452-3b61-4c24-bb29-efe574912469 level=error ts=2024-12-22T04:42:01.077140381Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=2df5edb5-ed63-4a6a-bb19-9c879f9b71e3 attempts=652 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/2df5edb5-ed63-4a6a-bb19-9c879f9b71e3/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:42:01.07718453Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=2df5edb5-ed63-4a6a-bb19-9c879f9b71e3 backoff=2m0s level=info ts=2024-12-22T04:42:01.077232484Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=bdf3becb-b9bd-46f5-8aba-e5d43632c1dd level=error ts=2024-12-22T04:42:01.078250022Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=e01c91eb-aac9-4da8-884c-eb8c117d609a attempts=339 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/e01c91eb-aac9-4da8-884c-eb8c117d609a/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:42:01.078277919Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=e01c91eb-aac9-4da8-884c-eb8c117d609a backoff=2m0s level=info ts=2024-12-22T04:42:01.078297462Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=09c3480b-4e98-43e7-a6e3-d4bb54ba2bd7 level=error ts=2024-12-22T04:42:01.177086297Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=0014813c-fe0e-46bb-b62b-2d0fa1e44a6e attempts=209 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/0014813c-fe0e-46bb-b62b-2d0fa1e44a6e/data.parquet: Check if quota has been exceeded or object has too many versions" level=info ts=2024-12-22T04:42:01.177146201Z caller=flush.go:391 org_id=single-tenant msg="retrying op in flushQueue" op=1 block=0014813c-fe0e-46bb-b62b-2d0fa1e44a6e backoff=2m0s level=info ts=2024-12-22T04:42:01.177181411Z caller=flush.go:312 msg="flushing block" userid=single-tenant block=9d9751fc-ec23-44ba-af18-f8eb1b0ed141

Is worth mentioning that the bucket isn't full or the quota has been exceeded, and after this error start to show, I'm not able to filter trace anymore, and receive this message for all my search.
failed to get trace with id: ff2c36c118818af306668fabc20c2dff Status: 500 Internal Server Error Body: error finding trace by id, blockID: 67b69e1d-17df-4b96-b7fe-7f8776751f14: error retrieving bloom bloom-0 (single-tenant, 67b69e1d-17df-4b96-b7fe-7f8776751f14): does not exist

with different blockId and bloom each time
Logs from S3

169.254.2.9 2024-12-26T17:59:59,719 [qtp1048437093-1577673-ac150cf7:190ae2c6dee:df61f:4af3-s3-100.126.33.186]  INFO  V4Signer.java (line 118) credential: a-user-plataforma-monitoramento-grafana-tempo-prod-3/20241226/us-east-1/s3/aws4_request, amz_expires: null, amz_signed_headers: content-type;host;x-amz-content-sha256;x-amz-date, amz_signature: 11821dc0263e24f53ebfc033d891418b363e19603d6579518c1668aad7223213, payloadHash: UNSIGNED-PAYLOAD, amz_date: 20241226T175959Z
169.254.2.9 2024-12-26T17:59:59,722 [qtp1048437093-1577673-ac150cf7:190ae2c6dee:df61f:4af3-s3-100.126.33.186] ERROR  ObjectControllerExceptionHelper.java (line 441) Method initiateMultiPartUpload failed due to exception
com.emc.storageos.data.object.exception.ObjectControllerException: directory server 172.21.12.182 returns error ERROR_OPERATION_NOT_ALLOWED, bucket a-ns1-plataforma-monitoramento-prod.plataforma-monitoramento-grafana-tempo-prod-3, requestId ac150cf7:190ae2c6dee:df61f:4af4
169.254.2.9 2024-12-26T17:59:59,723 [qtp1048437093-1577673-ac150cf7:190ae2c6dee:df61f:4af3-s3-100.126.33.186] ERROR  S3Exception.java (line 1733) got object access exception. RequestId ac150cf7:190ae2c6dee:df61f:4af3
com.emc.storageos.objcontrol.object.exception.ObjectAccessException: ERROR_METHOD_NOT_ALLOWED
Caused by: com.emc.storageos.data.object.exception.ObjectControllerException: directory server 172.21.12.182 returns error ERROR_OPERATION_NOT_ALLOWED, bucket a-ns1-plataforma-monitoramento-prod.plataforma-monitoramento-grafana-tempo-prod-3, requestId ac150cf7:190ae2c6dee:df61f:4af4
169.254.2.9 2024-12-26 17:59:59,723 ac150cf7:190ae2c6dee:df61f:4af3 172.21.12.247:9021 172.21.12.39:53460 a-user-plataforma-monitoramento-grafana-tempo-prod-3 MinIO%20(linux;%20amd64)%20minio-go/v7.0.70 POST a-ns1-plataforma-monitoramento-prod plataforma-monitoramento-grafana-tempo-prod-3 single-tenant%2F1d8ed62f-6bf9-48df-9ff7-60c1f7884101%2Fdata.parquet uploads= HTTP/1.1 403 4 - - 4 - - - 100.126.33.186 'authorization: AWS4-HMAC-SHA256 Credential=a-user-plataforma-monitoramento-grafana-tempo-prod-3/20241226/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=11821dc0263e24f53ebfc033d891418b363e19603d6579518c1668aad7223213' 'x-amz-content-sha256: UNSIGNED-PAYLOAD' 'content-length: 0' 'x-amz-date: 20241226T175959Z' 'x-forwarded-proto: https' 'host: s3data.sicredi.net' 'content-type: application/octet-stream' 'connection: close' 'x-forwarded-for: 100.126.33.186' 'user-agent: MinIO (linux; amd64) minio-go/v7.0.70'

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (SHA or version)
  2. Perform Operations (Read/Write/Others)

Expected behavior
Able to search and write traces to S3

Environment:

  • Infrastructure: kubernentes
  • Deployment tool: helm

Additional Context

`backend: 127.0.0.1:3100

tempo.yaml:

cache:
caches:

  • memcached:
    consistent_hash: true
    host: 'grafana-tempo-memcached'
    service: memcached-client
    timeout: 500ms
    roles:
    • parquet-footer
    • bloom
    • frontend-search
      compactor:
      compaction:
      block_retention: 168h
      compacted_block_retention: 1h
      compaction_cycle: 30s
      compaction_window: 1h
      max_block_bytes: 107374182400
      max_compaction_objects: 6000000
      max_time_per_tenant: 5m
      retention_concurrency: 10
      v2_in_buffer_bytes: 5242880
      v2_out_buffer_bytes: 20971520
      v2_prefetch_traces_count: 1000
      ring:
      kvstore:
      store: memberlist
      distributor:
      receivers:
      jaeger:
      protocols:
      grpc:
      endpoint: 0.0.0.0:14250
      otlp:
      protocols:
      grpc:
      endpoint: 0.0.0.0:4317
      http:
      endpoint: 0.0.0.0:4318
      ring:
      kvstore:
      store: memberlist
      ingester:
      complete_block_timeout: 10m
      flush_check_period: 5s
      lifecycler:
      ring:
      kvstore:
      store: memberlist
      replication_factor: 3
      tokens_file_path: /var/tempo/tokens.json
      max_block_duration: 5m
      trace_idle_period: 5s
      memberlist:
      abort_if_cluster_join_fails: false
      bind_addr: []
      bind_port: 7946
      cluster_label: 'grafana-tempo.tempo'
      gossip_interval: 1s
      gossip_nodes: 2
      gossip_to_dead_nodes_time: 30s
      join_members:
  • dns+grafana-tempo-gossip-ring:7946
    leave_timeout: 5s
    left_ingesters_timeout: 5m
    max_join_backoff: 1m
    max_join_retries: 10
    min_join_backoff: 1s
    node_name: ""
    packet_dial_timeout: 5s
    packet_write_timeout: 5s
    pull_push_interval: 30s
    randomize_node_name: true
    rejoin_interval: 0s
    retransmit_factor: 2
    stream_timeout: 10s
    metrics_generator:
    metrics_ingestion_time_range_slack: 30s
    processor:
    service_graphs:
    dimensions: []
    histogram_buckets:
    • 0.1
    • 0.2
      max_items: 10000
      wait: 10s
      workers: 10
      span_metrics:
      dimensions: []
      histogram_buckets:
    • 0.002
    • 0.004
      registry:
      collection_interval: 15s
      external_labels: {}
      stale_duration: 15m
      ring:
      kvstore:
      store: memberlist
      storage:
      path: /var/tempo/wal
      remote_write:
    • url: ##############
      remote_write_add_org_id_header: false
      remote_write_flush_deadline: 1m
      traces_storage:
      path: /var/tempo/traces
      multitenancy_enabled: false
      overrides:
      max_bytes_per_trace: 50000000
      max_traces_per_user: 30000000
      metrics_generator_processors:
  • service-graphs
  • span-metrics
    per_tenant_override_config: /runtime-config/overrides.yaml
    querier:
    frontend_worker:
    frontend_address: grafana-tempo-query-frontend-discovery:9095
    max_concurrent_queries: 20
    search:
    external_backend: null
    external_endpoints: []
    external_hedge_requests_at: 8s
    external_hedge_requests_up_to: 2
    prefer_self: 10
    query_timeout: 30s
    trace_by_id:
    query_timeout: 10s
    query_frontend:
    max_outstanding_per_tenant: 2000
    max_retries: 2
    metrics:
    max_duration: 3h
    search:
    concurrent_jobs: 1000
    target_bytes_per_job: 104857600
    trace_by_id:
    query_shards: 100
    server:
    grpc_server_max_recv_msg_size: 100000000
    grpc_server_max_send_msg_size: 100000000
    http_listen_port: 3100
    http_server_read_timeout: 30s
    http_server_write_timeout: 30s
    log_format: logfmt
    log_level: info
    storage:
    trace:
    backend: s3
    block:
    parquet_dedicated_columns:
    • name: db.statement
      scope: span
      type: string
    • name: jdbc.query[0]]
      scope: span
      type: string
    • name: peer.service
      scope: span
      type: string
    • name: http.pth
      scope: span
      type: string
    • name: spring.kafka.listener.id
      scope: span
      type: string
    • name: net.host.ip
      scope: span
      type: string
    • name: host.name
      scope: resource
      type: string
    • name: process.command_line
      scope: resource
      type: string
      version: vParquet4
      blocklist_poll: 5m
      local:
      path: /var/tempo/traces
      pool:
      max_workers: 400
      queue_depth: 20000
      s3:
      access_key: ##############
      bucket: ##############
      endpoint: ##############
      secret_key: ##############
      tls_insecure_skip_verify: true
      wal:
      path: /var/tempo/wal
      usage_report:
      reporting_enabled: true

BinaryData

Events:
`

@edgarkz
Copy link
Contributor

edgarkz commented Dec 25, 2024

Hi,
You should address this to your s3 provider and review your usage\settings
it looks like you hit some preconfigured limits

@igorestevanjasinski
Copy link
Author

Hi, You should address this to your s3 provider and review your usage\settings it looks like you hit some preconfigured limits

@edgarkz I did that and there are no settings regarding the usage or quota. It is also worth mentioning that this problem start when we update from 2.4 to 2.6 and change to parquet4

@joe-elliott
Copy link
Member

error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/45cad84c-3efa-4696-99af-160ea0b44510/data.parquet: Check if quota has been exceeded or object has too many versions

do you have versioning enabled for the data.parquet files? perhaps turn that off?

what quota has been exceeded?

@igorestevanjasinski
Copy link
Author

error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/45cad84c-3efa-4696-99af-160ea0b44510/data.parquet: Check if quota has been exceeded or object has too many versions

do you have versioning enabled for the data.parquet files? perhaps turn that off?

what quota has been exceeded?

@joe-elliott In our s3 we don't have any versioning configured or the quota has been exceeded.
there is any configuration regarding data.parquet files versioning at the Tempo that I can modify to avoid this error?

@joe-elliott
Copy link
Member

I'm unaware of anything occurring between 2.4 and 2.6 that would cause this issue. You need to work with your ECS team to better understand what this error means.

@joe-elliott In our s3 we don't have any versioning configured or the quota has been exceeded.

If you have no versioning and no quota then this is error is wrong and the problem is with ECS.

@igorestevanjasinski
Copy link
Author

Idk if may help but we saw these logs at the S3:

169.254.2.9 2024-12-26T17:59:59,719 [qtp1048437093-1577673-ac150cf7:190ae2c6dee:df61f:4af3-s3-100.126.33.186]  INFO  V4Signer.java (line 118) credential: a-user-plataforma-monitoramento-grafana-tempo-prod-3/20241226/us-east-1/s3/aws4_request, amz_expires: null, amz_signed_headers: content-type;host;x-amz-content-sha256;x-amz-date, amz_signature: 11821dc0263e24f53ebfc033d891418b363e19603d6579518c1668aad7223213, payloadHash: UNSIGNED-PAYLOAD, amz_date: 20241226T175959Z
169.254.2.9 2024-12-26T17:59:59,722 [qtp1048437093-1577673-ac150cf7:190ae2c6dee:df61f:4af3-s3-100.126.33.186] ERROR  ObjectControllerExceptionHelper.java (line 441) Method initiateMultiPartUpload failed due to exception
com.emc.storageos.data.object.exception.ObjectControllerException: directory server 172.21.12.182 returns error ERROR_OPERATION_NOT_ALLOWED, bucket a-ns1-plataforma-monitoramento-prod.plataforma-monitoramento-grafana-tempo-prod-3, requestId ac150cf7:190ae2c6dee:df61f:4af4
169.254.2.9 2024-12-26T17:59:59,723 [qtp1048437093-1577673-ac150cf7:190ae2c6dee:df61f:4af3-s3-100.126.33.186] ERROR  S3Exception.java (line 1733) got object access exception. RequestId ac150cf7:190ae2c6dee:df61f:4af3
com.emc.storageos.objcontrol.object.exception.ObjectAccessException: ERROR_METHOD_NOT_ALLOWED
Caused by: com.emc.storageos.data.object.exception.ObjectControllerException: directory server 172.21.12.182 returns error ERROR_OPERATION_NOT_ALLOWED, bucket a-ns1-plataforma-monitoramento-prod.plataforma-monitoramento-grafana-tempo-prod-3, requestId ac150cf7:190ae2c6dee:df61f:4af4
169.254.2.9 2024-12-26 17:59:59,723 ac150cf7:190ae2c6dee:df61f:4af3 172.21.12.247:9021 172.21.12.39:53460 a-user-plataforma-monitoramento-grafana-tempo-prod-3 MinIO%20(linux;%20amd64)%20minio-go/v7.0.70 POST a-ns1-plataforma-monitoramento-prod plataforma-monitoramento-grafana-tempo-prod-3 single-tenant%2F1d8ed62f-6bf9-48df-9ff7-60c1f7884101%2Fdata.parquet uploads= HTTP/1.1 403 4 - - 4 - - - 100.126.33.186 'authorization: AWS4-HMAC-SHA256 Credential=a-user-plataforma-monitoramento-grafana-tempo-prod-3/20241226/us-east-1/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=11821dc0263e24f53ebfc033d891418b363e19603d6579518c1668aad7223213' 'x-amz-content-sha256: UNSIGNED-PAYLOAD' 'content-length: 0' 'x-amz-date: 20241226T175959Z' 'x-forwarded-proto: https' 'host: s3data.sicredi.net' 'content-type: application/octet-stream' 'connection: close' 'x-forwarded-for: 100.126.33.186' 'user-agent: MinIO (linux; amd64) minio-go/v7.0.70'

There is any configuration that I can change in Tempo to avoid :
Method initiateMultiPartUpload failed due to exception com.emc.storageos.data.object.exception.ObjectControllerException: directory server 172.21.12.182 returns error ERROR_OPERATION_NOT_ALLOWED, bucket a-ns1-plataforma-monitoramento-prod.plataforma-monitoramento-grafana-tempo-prod-3, requestId ac150cf7:190ae2c6dee:df61f:4af4

@joe-elliott
Copy link
Member

It seems like you have a permissions issue to work out.

@igorestevanjasinski
Copy link
Author

It seems like you have a permissions issue to work out.

maybe, but is a strange behavior because. if we delete all the files from s3 and start empty or even in a new bucket, it will normally work for a couple of days, and after that, we will start to see the message
evel=error ts=2024-12-22T04:42:01.177086297Z caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=0014813c-fe0e-46bb-b62b-2d0fa1e44a6e attempts=209 err="error copying block from local to remote backend: error writing object to s3 backend, object single-tenant/0014813c-fe0e-46bb-b62b-2d0fa1e44a6e/data.parquet: Check if quota has been exceeded or object has too many versions"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants