Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
ClimaAnalysis.jl Release Notes
===============================
v0.5.14
-------

## Split by seasons across time
It may be the case that you want to split a `OutputVar` by season, while keeping each year
separate. This is different from `split_by_season`, which ignores that seasons can come from
different years. This can be done by using `split_by_season_across_time`. For example, if a
`OutputVar` contains times corresponding to 2010-01-01, 2010-03-01, 2010-06-01, 2010-09-01,
and 2010-12-01, then the result of `split_by_season_across_time` is five `OutputVar`s, each
corresponding to a distinct date. Even though 2010-01-01 and 2010-12-01 are in the same
season, there are two `OutputVar`s, because the dates do not belong in the same season and
year.

v0.5.13
-------

Expand Down
1 change: 1 addition & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Var.integrate_lonlat
Var.integrate_lat
Var.integrate_lon
Var.split_by_season(var::OutputVar)
Var.split_by_season_across_time(var::OutputVar)
Var.bias
Var.global_bias
Var.squared_error
Expand Down
88 changes: 64 additions & 24 deletions docs/src/var.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,35 +197,75 @@ the seasons are March to May, June to August, September to November, and Decembe
February. The order of the vector is MAM, JJA, SON, and DJF. If there are no dates found for
a season, then the `OutputVar` for that season will be an empty `OutputVar`.

```@julia split_by_season
julia> attribs = Dict("start_date" => "2024-1-1");

julia> time = [0.0, 5_184_000.0, 13_132_800.0]; # correspond to dates 2024-1-1, 2024-3-1, 2024-6-1

julia> dims = OrderedDict(["time" => time]);

julia> dim_attribs = OrderedDict(["time" => Dict("units" => "s")]); # unit is second

julia> data = [1.0, 2.0, 3.0];
```@setup split_by_season
import ClimaAnalysis
import OrderedCollections: OrderedDict

attribs = Dict("start_date" => "2024-1-1");
time = [0.0, 5_184_000.0, 13_132_800.0]; # correspond to dates 2024-1-1, 2024-3-1, 2024-6-1
dims = OrderedDict(["time" => time]);
dim_attribs = OrderedDict(["time" => Dict("units" => "s")]); # unit is second
data = [1.0, 2.0, 3.0];
var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data);
```

julia> var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data);
```@repl split_by_season
var.attributes
ClimaAnalysis.times(var) # correspond to dates 2024-1-1, 2024-3-1, 2024-6-1
var.data
MAM, JJA, SON, DJF = ClimaAnalysis.split_by_season(var);
ClimaAnalysis.isempty(SON) # empty OutputVar because no dates between September to November
[MAM.dims["time"], JJA.dims["time"], DJF.dims["time"]]
[MAM.data, JJA.data, DJF.data]
```

julia> MAM, JJA, SON, DJF = ClimaAnalysis.split_by_season(var);
### Split by season and year

julia> ClimaAnalysis.isempty(SON) # empty OutputVar because no dates between September to November
true
It may be the case that you want to split a `OutputVar` by season, while keeping each year
separate. This is different from `split_by_season`, which ignores that seasons can come from
different years. This can be done by using `split_by_season_across_time`. For example, if a
`OutputVar` contains times corresponding to 2010-01-01, 2010-03-01, 2010-06-01, 2010-09-01,
and 2010-12-01, then the result of `split_by_season_across_time` is five `OutputVar`s, each
corresponding to a distinct date. Even though 2010-01-01 and 2010-12-01 are in the same
season, there are two `OutputVar`s, because the dates do not belong in the same season and
year.

julia> [MAM.dims["time"], JJA.dims["time"], DJF.dims["time"]]
3-element Vector{Vector{Float64}}:
[5.184e6]
[1.31328e7]
[0.0]
```@setup split_by_season_across_time
import ClimaAnalysis
import OrderedCollections: OrderedDict

lon = collect(range(-179.5, 179.5, 36))
lat = collect(range(-89.5, 89.5, 18))
time = [0.0]
push!(time, 5_097_600.0) # correspond to 2024-3-1
push!(time, 13_046_400.0) # correspond to 2024-6-1
push!(time, 20_995_200.0) # correspond to 2024-9-1
push!(time, 28_857_600.0) # correspond to 2024-12-1

data = reshape(
1.0:1.0:(length(lat) * length(time) * length(lon)),
(length(lat), length(time), length(lon)),
)
dims = OrderedDict(["lat" => lat, "time" => time, "lon" => lon])
attribs = Dict("long_name" => "hi", "start_date" => "2010-1-1")
dim_attribs = OrderedDict([
"lat" => Dict("units" => "deg"),
"time" => Dict("units" => "s"),
"lon" => Dict("units" => "deg"),
])
var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data)
```

julia> [MAM.data, JJA.data, DJF.data]
3-element Vector{Vector{Float64}}:
[2.0]
[3.0]
[1.0]
```@repl split_by_season_across_time
var.attributes["start_date"]
ClimaAnalysis.times(var) # dates from the first of January, March, June, August, and December
split_var = ClimaAnalysis.split_by_season_across_time(var);
length(split_var) # months span over 5 seasons
ClimaAnalysis.times(split_var[1]) # correspond to 1/1 (middle of DJF)
ClimaAnalysis.times(split_var[2]) # correspond to 3/1 (start of MAM)
ClimaAnalysis.times(split_var[3]) # correspond to 6/1 (start of JJA)
ClimaAnalysis.times(split_var[4]) # correspond to 9/1 (start of SON)
ClimaAnalysis.times(split_var[5]) # correspond to 12/1 (start of DJF)
```

## Bias and squared error
Expand Down
91 changes: 91 additions & 0 deletions src/Utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export match_nc_filename,
squeeze, nearest_index, kwargs, seconds_to_prettystr, warp_string

import Dates
import OrderedCollections: OrderedDict

"""
match_nc_filename(filename::String)
Expand Down Expand Up @@ -304,6 +305,96 @@ function split_by_season(dates::AbstractArray{<:Dates.DateTime})
return (MAM, JJA, SON, DJF)
end

"""
split_by_season_across_time(dates::AbstractArray{<:Dates.DateTime})

Split `dates` into vectors representing seasons, arranged in chronological order. Each
vector corresponds to a single season and the ordering of the vectors is determined by the
dates of the season. The return type is a vector of vectors of dates.

If no dates are found for a particular season, then the vector will be empty. The first
vector is guaranteed to be non-empty.

This function differs from `split_by_season` as `split_by_season` splits dates into
different seasons and ignores that dates could come from seasons in different years. In
contrast, `split_by_season_across_time` splits dates into seasons for each year.

Examples
=========

```jldoctest
julia> import Dates

julia> dates = collect(Dates.DateTime(2010, i) for i in 1:12);

julia> split_by_season_across_time(dates)
5-element Vector{Vector{DateTime}}:
[DateTime("2010-01-01T00:00:00"), DateTime("2010-02-01T00:00:00")]
[DateTime("2010-03-01T00:00:00"), DateTime("2010-04-01T00:00:00"), DateTime("2010-05-01T00:00:00")]
[DateTime("2010-06-01T00:00:00"), DateTime("2010-07-01T00:00:00"), DateTime("2010-08-01T00:00:00")]
[DateTime("2010-09-01T00:00:00"), DateTime("2010-10-01T00:00:00"), DateTime("2010-11-01T00:00:00")]
[DateTime("2010-12-01T00:00:00")]
"""
function split_by_season_across_time(dates::AbstractArray{<:Dates.DateTime})
# Dates are not necessarily sorted
dates = sort(dates)

# Empty case
isempty(dates) && return Vector{Vector{eltype(dates)}}[]

# Find the first date of the season that first(dates) belongs in
(first_season, first_year) = find_season_and_year(first(dates))
season_to_month = Dict("MAM" => 3, "JJA" => 6, "SON" => 9, "DJF" => 12)
first_date_of_season =
Dates.DateTime(first_year, season_to_month[first_season], 1)

# Create an ordered dict to map between season and year to vector of dates
season_and_year2dates = OrderedDict{
Tuple{typeof(first_season), typeof(first_year)},
Vector{eltype(dates)},
}()
# Need to iterate because some seasons can be empty and we want empty vectors for that
curr_date = first_date_of_season
while curr_date <= dates[end]
(season, year) = find_season_and_year(curr_date)
season_and_year2dates[(season, year)] = typeof(curr_date)[]
curr_date += Dates.Month(3) # season change every 3 months
end

# Add dates to the correct vectors in season_and_year2dates
for date in dates
(season, year) = find_season_and_year(date)
push!(season_and_year2dates[(season, year)], date)
end
return collect(values(season_and_year2dates))
end

"""
find_season_and_year(date::Dates.DateTime)

Return a tuple of the year and season belong to `date`. The variable `year` is
an integer and `season` is a string.

The months of the seasons are March to May, June to August, September to
November, and December to February. If a date is in December to February, the
year is chosen to be the year that the season starts.
"""
function find_season_and_year(date::Dates.DateTime)
if Dates.Month(3) <= Dates.Month(date) <= Dates.Month(5)
return ("MAM", Dates.year(date))
elseif Dates.Month(6) <= Dates.Month(date) <= Dates.Month(8)
return ("JJA", Dates.year(date))
elseif Dates.Month(9) <= Dates.Month(date) <= Dates.Month(11)
return ("SON", Dates.year(date))
else
# ambiguous what year should be used, so we use the convention that
# it is the year of December
corrected_year =
Dates.month(date) == 12 ? Dates.year(date) : Dates.year(date) - 1
return ("DJF", corrected_year)
end
end

"""
_isequispaced(arr::Vector)

Expand Down
115 changes: 93 additions & 22 deletions src/Var.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import ..Utils:
seconds_to_prettystr,
squeeze,
split_by_season,
split_by_season_across_time,
time_to_date,
date_to_time,
_data_at_dim_vals,
Expand Down Expand Up @@ -48,6 +49,7 @@ export OutputVar,
integrate_lat,
isempty,
split_by_season,
split_by_season_across_time,
bias,
global_bias,
squared_error,
Expand Down Expand Up @@ -1288,50 +1290,119 @@ end

Return a vector of four `OutputVar`s split by season.

The months of the seasons are March to May, June to August, September to November, and
December to February. The order of the vector is MAM, JJA, SON, and DJF. If there are no
dates found for a season, then the `OutputVar` for that season will be an empty `OutputVar`.
The months of the seasons are March to May (MAM), June to August (JJA), September to
November (SON), and December to February (JDF). The order of the vector is MAM, JJA, SON,
and DJF. If there are no dates found for a season, then the `OutputVar` for that season will
be an empty `OutputVar`.

The function will use the start date in `var.attributes["start_date"]`. The unit of time is
expected to be second. Also, the interpolations will be inaccurate in time intervals
outside of their respective season for the returned `OutputVar`s.
expected to be second.

!!! note "Interpolating between seasons"
Interpolations will be inaccurate in time intervals outside of their respective season
for the returned `OutputVar`s. For example, if an `OutputVar` has the dates 2010-2-1,
2010-3-1, 2010-4-1, and 2011-2-1 after splitting by seasons, then any interpolation in
time between the dates 2010-4-1 and 2011-2-1 will be inaccurate.

This function differs from `split_by_season_across_time` as `split_by_season_across_time`
splits dates by season for each year.
"""
function split_by_season(var::OutputVar)
# Check time exists and unit is second
_check_time_dim(var::OutputVar)
start_date = Dates.DateTime(var.attributes["start_date"])

season_dates = split_by_season(time_to_date.(start_date, times(var)))
season_times =
(date_to_time.(start_date, season) for season in season_dates)

return _split_along_dim(var, time_name(var), season_times)
end

"""
split_by_season_across_time(var::OutputVar)

Split `var` into `OutputVar`s representing seasons, sorted in chronological order. Each
`OutputVar` corresponds to a single season, and the ordering of the `OutputVar`s is
determined by the dates of the season. The return type is a vector of `OutputVar`s.

The months of the seasons are March to May (MAM), June to August (JJA),
September to November (SON), and December to February (DJF). If there are no
dates found for a season, then the `OutputVar` for that season will be an empty
`OutputVar`. The first `OutputVar` is guaranteed to not be empty.

The function will use the start date in `var.attributes["start_date"]`. The unit of time is
expected to be second.

This function differs from `split_by_season` as `split_by_season` splits dates by
season and ignores that seasons can come from different years.
"""
function split_by_season_across_time(var::OutputVar)
_check_time_dim(var::OutputVar)
start_date = Dates.DateTime(var.attributes["start_date"])

seasons_across_year_dates =
split_by_season_across_time(time_to_date.(start_date, times(var)))
seasons_across_year_times = (
date_to_time.(start_date, season) for
season in seasons_across_year_dates
)

return _split_along_dim(var, time_name(var), seasons_across_year_times)
end

"""
check_time_dim(var::OutputVar)

Check time dimension exists, unit for the time dimension is second, and a
start date is present.
"""
function _check_time_dim(var::OutputVar)
has_time(var) || error("Time is not a dimension in var")
dim_units(var, time_name(var)) == "s" ||
error("Unit for time is not second")
haskey(var.attributes, "start_date") ||
error("Start date is not found in var")
return nothing
end

# Check start date exists
haskey(var.attributes, "start_date") ?
start_date = Dates.DateTime(var.attributes["start_date"]) :
error("Start date is not found in var")
"""
_split_along_dim(var::OutputVar, dim_name, split_vectors)

season_dates = split_by_season(time_to_date.(start_date, times(var)))
season_times =
(date_to_time.(start_date, season) for season in season_dates)
Given `dim_name` in `var`, split the `OutputVar` by the values in `split_vectors`
and return a vector of `OutputVar`s.

# Split data according to seasons
season_data = (
For example, if `dim_name = "time" and `split_vectors = [[0.0, 3.0], [2.0,
4.0]]`, the result is a vector of two `OutputVar`s, where the first OutputVar
has a time dimension of `[0.0, 3.0]` and the second OutputVar has a time
dimension of `[2.0, 4.0]`.

If the vector in `split_vectors` is empty, then an empty OutputVar is returned.
Additonally, there is no checks that are performed in the values in the vectors
in `split_vectors` as the nearest values in `var.dims[dim_name]` are used for
splitting.
"""
function _split_along_dim(var::OutputVar, dim_name, split_vectors)
# Split data by vectors in split_vectors
split_data = (
collect(
_data_at_dim_vals(
var.data,
times(var),
var.dim2index[time_name(var)],
season_time,
var.dims[dim_name],
var.dim2index[dim_name],
split,
),
) for season_time in season_times
) for split in split_vectors
)

# Construct an OutputVar for each season
return map(season_times, season_data) do time, data
if isempty(time)
return map(split_vectors, split_data) do split, data
if isempty(split)
dims = empty(var.dims)
data = similar(var.data, 0)
return OutputVar(dims, data)
end
ret_dims = deepcopy(var.dims)
ret_dims[time_name(var)] = time
ret_dims[dim_name] = split
remake(var, dims = ret_dims, data = data)
end
end
Expand Down
Loading