Skip to content

Commit 8924350

Browse files
committed
Add Var._dates_to_seconds
This commit helps with preprocessing observational data. This commit add Var._dates_to_seconds to the the constructor for reading NetCDF files so that automatically converts dates to seconds in the time dimension. For testing, a sample of precipitation observational data is added.
1 parent 26bf2a1 commit 8924350

File tree

5 files changed

+227
-4
lines changed

5 files changed

+227
-4
lines changed

NEWS.md

+22
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,28 @@ arrays are equispaced and span the entire range, then a periodic boundary condit
2020
for the longitude dimension and a flat boundary condition is added for the latitude
2121
dimension.
2222

23+
### Preprocess dates and times
24+
There is now support for preprocessing dates and times. The constructor for reading NetCDF
25+
files now automatically converts dates to seconds in the time dimension. This is done
26+
because `ClimaAnalysis` does not support interpolating on dates which mean functions that
27+
rely on the interpolats, such as `resampled_as`, will not work otherwise.
28+
29+
Also, the constructor supports two additional parameters `new_start_date` and `shift_by`.
30+
After converting from dates to seconds, the seconds are shifted to match `new_start_date`.
31+
If preprocessing of dates is needed before shifting to `new_start_date`, then the parameter
32+
`shift_by` can be used as it accepts a function that takes in `Dates.DateTime` elements and
33+
return Dates.DateTime elements. This function is applied to each element of the time array.
34+
```julia
35+
# Shift the dates to first day of month, convert to seconds, and adjust seconds to
36+
# match "1/1/2010"
37+
shift_var = OutputVar(
38+
"test.nc",
39+
"pr",
40+
new_start_date = "1/1/2010", # or Dates.DateTime(2010, 1, 1)
41+
shift_by = Dates.firstdayofmonth,
42+
)
43+
```
44+
2345
## Bug fixes
2446

2547
- Interpolation is not possible with dates. When dates are detected in any dimension, an

docs/src/var.md

+25
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,31 @@ and latitude dimensions, a periodic boundary condition and a flat boundary condi
8080
added, respectively, when the dimension array is equispaced and spans the entire range. For
8181
all other cases, extrapolating beyond the domain of the dimension will throw an error.
8282

83+
## Preprocess dates and seconds
84+
When loading a NetCDF file, dates in the time dimension are automatically converted to
85+
seconds and a start date is added to the attributes of the `OutputVar`. This is done because
86+
`ClimaAnalysis` does not support interpolating on dates which mean functions that rely on
87+
the interpolats, such as `resampled_as`, will not work otherwise.
88+
89+
Two additional parameters are provided to help preprocess dates which are `new_start_date`
90+
and `shift_by`. If `new_start_date` is provided, then dates in the time dimension will
91+
automatically be converted with reference to the `new_start_date` rather than the first date
92+
found in the NetCDF file. The parameter `new_start_date` can be any string parseable by the
93+
[Dates](https://docs.julialang.org/en/v1/stdlib/Dates/) module or a `Dates.DateTime` object.
94+
If additional preprocessing is needed, then one can provide a function that takes in and
95+
returns a `Date.DateTime` object. This function is applied to each date before converting
96+
each dates to seconds with reference with the start date.
97+
```@julia dates_to_seconds
98+
# Shift the dates to first day of month, convert to seconds, and adjust seconds to
99+
# match the date 1/1/2010
100+
obs_var = ClimaAnalysis.OutputVar(
101+
"pr.nc",
102+
"precip",
103+
new_start_date = "2010-01-01T00:00:00", # or Dates.DateTime(2010, 1, 1)
104+
shift_by = Dates.firstdayofmonth,
105+
)
106+
```
107+
83108
## Integration
84109

85110
`OutputVar`s can be integrated with respect to longitude, latitude, or both using

src/Var.jl

+94-4
Original file line numberDiff line numberDiff line change
@@ -195,14 +195,41 @@ function OutputVar(dims, data)
195195
end
196196

197197
"""
198-
OutputVar(path, short_name = nothing)
198+
OutputVar(path,
199+
short_name = nothing;
200+
new_start_date = nothing,
201+
shift_by = identity)
199202
200203
Read the NetCDF file in `path` as an `OutputVar`.
201204
202205
If `short_name` is `nothing`, automatically find the name.
203-
"""
204-
function OutputVar(path::String, short_name = nothing)
205-
return read_var(path; short_name)
206+
207+
Dates in the time dimension are automatically converted to seconds with respect to the first
208+
date in the time dimension array or the `new_start_date`. The parameter `new_start_date` can
209+
be any string parseable by the [Dates](https://docs.julialang.org/en/v1/stdlib/Dates/)
210+
module or a `Dates.DateTime` object. The parameter `shift_by` is a function that takes in
211+
Dates.DateTime elements and return Dates.DateTime elements. The start date is added to the
212+
attributes of the `OutputVar`. The parameter `shift_by` is a function that takes in
213+
`Dates.DateTime` elements and returns `Dates.DateTime` elements. This function is applied to
214+
each element of the time array. Shifting the dates and converting to seconds is done in that
215+
order.
216+
"""
217+
function OutputVar(
218+
path::String,
219+
short_name = nothing;
220+
new_start_date = nothing,
221+
shift_by = identity,
222+
)
223+
var = read_var(path; short_name)
224+
# Check if it is possible to convert dates to seconds in the time dimension
225+
if (has_time(var) && eltype(times(var)) <: Dates.DateTime)
226+
var = _dates_to_seconds(
227+
read_var(path; short_name),
228+
new_start_date = new_start_date,
229+
shift_by = shift_by,
230+
)
231+
end
232+
return var
206233
end
207234

208235
"""
@@ -1299,6 +1326,69 @@ function global_rmse(sim::OutputVar, obs::OutputVar)
12991326
squared_error_var = squared_error(sim, obs)
13001327
return squared_error_var.attributes["global_rmse"]
13011328
end
1329+
1330+
"""
1331+
_dates_to_seconds(var::OutputVar;
1332+
new_start_date = nothing,
1333+
shift_by = identity)
1334+
1335+
Convert dates in time dimension to seconds with respect to the first date in the time
1336+
dimension array or the `new_start_date`.
1337+
1338+
Dates in the time dimension are automatically converted to seconds with respect to the first
1339+
date in the time dimension array or the `new_start_date`. The parameter `new_start_date` can
1340+
be any string parseable by the [Dates](https://docs.julialang.org/en/v1/stdlib/Dates/)
1341+
module or a `Dates.DateTime` object. The parameter `shift_by` is a function that takes in
1342+
Dates.DateTime elements and return Dates.DateTime elements. The start date is added to the
1343+
attributes of the `OutputVar`. The parameter `shift_by` is a function that takes in
1344+
`Dates.DateTime` elements and returns `Dates.DateTime` elements. This function is applied to
1345+
each element of the time array. Shifting the dates and converting to seconds is done in that
1346+
order.
1347+
1348+
Note that this function only works for the time dimension and will not work for the date
1349+
dimension.
1350+
"""
1351+
function _dates_to_seconds(
1352+
var::OutputVar;
1353+
new_start_date = nothing,
1354+
shift_by = identity,
1355+
)
1356+
has_time(var) || error(
1357+
"Converting from dates to seconds is only supported for the time dimension",
1358+
)
1359+
eltype(times(var)) <: Dates.DateTime ||
1360+
error("Type of time dimension is not dates")
1361+
1362+
# Preprocess time_arr by shifting dates
1363+
time_arr = copy(times(var))
1364+
if !isnothing(shift_by)
1365+
time_arr .= shift_by.(time_arr)
1366+
end
1367+
1368+
# Convert from dates to seconds using the first date in the time dimension array as the
1369+
# start date or the new_start_date
1370+
start_date = isnothing(new_start_date) ? time_arr[begin] : new_start_date
1371+
1372+
# Handle the case if start_date is a DateTime or string; if it is the latter, then try
1373+
# to parse it as a DateTime
1374+
start_date isa AbstractString && (start_date = Dates.DateTime(start_date))
1375+
time_arr = map(date -> date_to_time(start_date, date), time_arr)
1376+
1377+
# Remake OutputVar
1378+
ret_attribs = deepcopy(var.attributes)
1379+
ret_attribs["start_date"] = string(start_date) # add start_date as an attribute
1380+
ret_dim_attribs = deepcopy(var.dim_attributes)
1381+
ret_dim_attribs[time_name(var)]["units"] = "s" # add unit
1382+
var_dims = deepcopy(var.dims)
1383+
ret_dims_generator = (
1384+
conventional_dim_name(dim_name) == "time" ? dim_name => time_arr :
1385+
dim_name => dim_data for (dim_name, dim_data) in var_dims
1386+
)
1387+
ret_dims = OrderedDict(ret_dims_generator...)
1388+
ret_data = copy(var.data)
1389+
return OutputVar(ret_attribs, ret_dims, ret_dim_attribs, ret_data)
1390+
end
1391+
13021392
"""
13031393
overload_binary_op(op)
13041394

test/sample_nc/test_pr.nc

17.1 KB
Binary file not shown.

test/test_Var.jl

+86
Original file line numberDiff line numberDiff line change
@@ -1349,3 +1349,89 @@ end
13491349
var_units = ClimaAnalysis.set_units(var, "idk")
13501350
@test ClimaAnalysis.units(var_units) == "idk"
13511351
end
1352+
1353+
@testset "Dates to seconds for vars" begin
1354+
# Test for no start date
1355+
time_arr = [
1356+
Dates.DateTime(2020, 3, 1, 1, 1),
1357+
Dates.DateTime(2020, 3, 1, 1, 2),
1358+
Dates.DateTime(2020, 3, 1, 1, 3),
1359+
]
1360+
data = ones(length(time_arr))
1361+
dims = OrderedDict("time" => time_arr)
1362+
dim_attribs = OrderedDict("time" => Dict("blah" => "blah"))
1363+
attribs =
1364+
Dict("long_name" => "idk", "short_name" => "short", "units" => "kg")
1365+
var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data)
1366+
var_s = ClimaAnalysis.Var._dates_to_seconds(var)
1367+
@test ClimaAnalysis.times(var_s) == [0.0, 60.0, 120.0]
1368+
@test var_s.attributes["start_date"] == "2020-03-01T01:01:00"
1369+
1370+
# Test for a new start date
1371+
var_s = ClimaAnalysis.Var._dates_to_seconds(
1372+
var;
1373+
new_start_date = "2020-03-01T01:03:00",
1374+
)
1375+
@test ClimaAnalysis.times(var_s) == [-120.0, -60.0, 0.0]
1376+
@test var_s.attributes["start_date"] == "2020-03-01T01:03:00"
1377+
1378+
# Test for a new start date as a DateTime object
1379+
var_s = ClimaAnalysis.Var._dates_to_seconds(
1380+
var;
1381+
new_start_date = Dates.DateTime("2020-03-01T01:03:00"),
1382+
)
1383+
@test ClimaAnalysis.times(var_s) == [-120.0, -60.0, 0.0]
1384+
@test var_s.attributes["start_date"] == "2020-03-01T01:03:00"
1385+
1386+
# Test for shifting dates
1387+
var_s = ClimaAnalysis.Var._dates_to_seconds(
1388+
var,
1389+
shift_by = t -> t - Dates.Day(15),
1390+
)
1391+
@test ClimaAnalysis.times(var_s) == [0.0, 60.0, 120.0]
1392+
@test var_s.attributes["start_date"] == "2020-02-15T01:01:00"
1393+
1394+
# Test for shifting dates and new date together
1395+
var_s = ClimaAnalysis.Var._dates_to_seconds(
1396+
var;
1397+
new_start_date = "2020-03-01T01:00:00",
1398+
shift_by = t -> t + Dates.Minute(4),
1399+
)
1400+
@test ClimaAnalysis.times(var_s) == [300.0, 360.0, 420.0]
1401+
@test var_s.attributes["start_date"] == "2020-03-01T01:00:00"
1402+
1403+
# Test constructor for OutputVar that uses _dates_to_seconds
1404+
ncpath = joinpath(@__DIR__, "sample_nc/test_pr.nc")
1405+
file_var = ClimaAnalysis.OutputVar(
1406+
ncpath;
1407+
new_start_date = nothing,
1408+
shift_by = identity,
1409+
)
1410+
@test ClimaAnalysis.times(file_var) == [0.0, 1398902400.0]
1411+
@test file_var.attributes["start_date"] == "1979-01-01T00:00:00"
1412+
1413+
# Test for error handling
1414+
# Use date dimension instead of time dimension
1415+
date_arr = [
1416+
Dates.DateTime(2020, 3, 1, 1, 1),
1417+
Dates.DateTime(2020, 3, 1, 1, 2),
1418+
Dates.DateTime(2020, 3, 1, 1, 3),
1419+
]
1420+
data = ones(length(date_arr))
1421+
dims = OrderedDict("date" => date_arr)
1422+
dim_attribs = OrderedDict("date" => Dict("blah" => "blah"))
1423+
attribs =
1424+
Dict("long_name" => "idk", "short_name" => "short", "units" => "kg")
1425+
var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data)
1426+
@test_throws ErrorException ClimaAnalysis.Var._dates_to_seconds(var)
1427+
1428+
# Cannot convert if the element type of time array is float
1429+
time_arr = [0.0, 60.0, 120.0]
1430+
data = ones(length(time_arr))
1431+
dims = OrderedDict("time" => time_arr)
1432+
dim_attribs = OrderedDict("time" => Dict("blah" => "blah"))
1433+
attribs =
1434+
Dict("long_name" => "idk", "short_name" => "short", "units" => "kg")
1435+
var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data)
1436+
@test_throws ErrorException ClimaAnalysis.Var._dates_to_seconds(var)
1437+
end

0 commit comments

Comments
 (0)