To explore this example with some demo data, head here.
Calculating the year-on-year change of a quantity typically requires a few steps:
- Aggregate the quantity by year (and by any other fields)
- Add a column for last years quantity (this will be null for the first year in the dataset)
- Calculate the change from last year to this year
A generic query template may then look like
select
*,
-- Calculate percentage change
(<year_total> - last_year_total) / last_year_total * 100 as perc_diff
from (
select
*,
-- Use a window function to get last year's total
lag(<year_total>) over (partition by <fields> order by <year_column> asc) last_year_total
from <table>
order by <fields>, <year_column> desc
)
where
table
- your source data tablefields
- one or more columns to split the year-on-year differences byyear_column
- the column containing the year of the aggregationyear_total
- the column containing the aggregated data
Using total Spotify streams as an example data source, let's identify:
table
- this is calledraw
fields
- this is just the single columnartist
year_column
- this is calledyear
year_total
- this issum_streams
Then the query looks like:
-- raw
select
date_trunc(day, year) year,
sum(streams) sum_streams,
artist
from spotify.spotify_daily_tracks
group by 1,3
year | sum_streams | artist |
---|---|---|
2020-01-01 | 115372043 | CJ |
2021-01-01 | 179284925 | CJ |
... | ... | ... |
-- with_perc_diff
select
*,
(sum_streams - last_year_total) / last_year_total * 100 as perc_diff
from (
select
*,
lag(sum_streams) over (partition by artist order by year asc) last_year_total
from raw
order by artist, year desc
)
year | sum_streams | artist | last_year_total | perc_diff |
---|---|---|---|---|
2020-01-01 | 115372043 | CJ | NULL | NULL |
2021-01-01 | 179284925 | CJ | 115372043 | 55.397200515899684 |
... | ... | ... | ... | .... |