Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery pricing unit for onDemandCost is tebibyte instead of terabyte #1

Open
dleser opened this issue Jan 11, 2023 · 3 comments
Open

Comments

@dleser
Copy link

dleser commented Jan 11, 2023

Many queries used in this repo are calculating the on-demand costs and billed bytes/MBs/GBs like in this example:

       SUM(COALESCE(totalBilledBytes, 0)) AS totalBilledBytes,
        ROUND(SUM(COALESCE(totalBilledBytes,
              0)) / 1000000, 2) AS totalMegabytesBilled,
        ROUND(SUM(COALESCE(totalBilledBytes,
              0)) / 1000000000, 2) AS totalGigabytesBilled,
        ROUND(SUM(COALESCE(totalBilledBytes,
              0)) / 1000000000000, 2) AS totalTerabytesBilled,
        ROUND(SAFE_DIVIDE(SUM(COALESCE(totalBilledBytes, 0)),
          1000000000000) * 5, 2) AS onDemandCost,

But the pricing unit used by Google to calculate the costs is tebibyte , not terabyte

  • Kibibyte (KiB) 1024¹ = 1,024
  • Mebibyte (MiB) 1024² = 1,048,576
  • Gibibyte (GiB) 1024³ = 1,073,741,824
  • Tebibyte (TiB) 1024⁴ = 1,099,511,627,776

I have used this query with the exported billing records table to check this:

select
  service.description as service_description, 
  sku.description as sku_description,
  usage.unit as usage_unit,
  usage.pricing_unit as usage_pricing_unit,
  count(*) as count,
  sum(cost) as cost_eur,
  sum(cost / currency_conversion_rate) as cost_usd,
  sum(usage.amount) as usage_amount,
  sum(usage.amount_in_pricing_units) as amount_in_pricing_units,
  sum(cost / currency_conversion_rate) / sum(usage.amount_in_pricing_units) as cost_per_pricing_unit_usd,
  sum(usage.amount) / sum(usage.amount_in_pricing_units) as amount_to_pricing_unit_conversion_rate,
  pow(1024, 4)  as tebibyte_in_bytes
from `<MY_BILLING_EXPORT_TABLE`
where 1=1
  and _partitiontime >= timestamp_sub(current_timestamp, interval 7 day)
  and service.description = "BigQuery"
group by service.description, sku.description, usage_unit, usage_pricing_unit
order by service.description, sku.description 

BTW: Awesome that you have collected these SQL statements here, this will really help me! :-)

@ericel79
Copy link

ericel79 commented Jan 20, 2023

Awesome collection indeed!

Also adding to @dleser I think there is one 0 too few in the cost calc of top_billed_queries_deduplicated making the cost 10 times higher than in reality?

@dleser
Copy link
Author

dleser commented Jan 20, 2023

@ericel79 , you are right.

ROUND(SAFE_DIVIDE(total_bytes_billed,
            102400000000) * 5, 2) AS cost,

(see https://github.com/doitintl/bigquery-optimization-queries/blob/main/information_schema/top_billed_queries_deduplicated.sql#L26 )

should be:

ROUND(total_bytes_billed / POW(1024, 4) * 5, 2) AS cost,

BTW: Please remind that the cost factor 5 USD/TB used here is only valid for on-demand pricing in some, but not all regions. E.g.:

  • europe-west3: 6.5 USD/TB
  • europe-west1: 6 USD/TB
  • but multi-region Europe (eu): 5 USD/TB

@ericel79
Copy link

ericel79 commented Jan 31, 2023

And.. whats your oppinion on filtering on AND NOT cache_hit as well if you want to calculate cost? - because hitting the cache doesn't cost anything. @dleser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants