Skip to content

Commit c416390

Browse files
committed
Add Var.bias and Var.squared_error
Also, add Var.global_bias, Var.global_mse, and Var.global_rmse.
1 parent ea898f3 commit c416390

File tree

5 files changed

+576
-1
lines changed

5 files changed

+576
-1
lines changed

NEWS.md

+49
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,55 @@ julia> [MAM.data, JJA.data, DJF.data]
204204
[1.0]
205205
```
206206

207+
### Bias and squared error
208+
Bias and squared error can be computed from simulation data and observational data in
209+
`OutputVar`s using `bias(sim, obs)` and `squared_error(sim, obs)`. The function `bias(sim,
210+
obs)` returns a `OutputVar` whose data is the bias (`sim.data - obs.data`) and computes the
211+
global bias of `data` in `sim` and `obs` over longitude and latitude. The result is stored
212+
in `var.attributes["global_bias"]`. The function `squared_error(sim, obs)` returns a
213+
`OutputVar` whose data is the squared error (`(sim.data - obs.data)^2`) and computes the
214+
global mean squared error (MSE) and the global root mean squared error (RMSE) of `data` in
215+
`sim` and `obs` over longitude and latitude. The result is stored in
216+
`var.attributes["global_mse"]` and `var.attributes["global_rmse"]`. Resampling is
217+
automatically done by resampling `obs` on `sim`. If you are only interested in computing
218+
global bias, MSE, or RMSE, you can use `global_bias(sim, obs)`, `global_mse(sim, obs)`, or
219+
`global_rmse(sim, obs)`.
220+
221+
As of now, these functions are implemented for `OutputVar`s with only the dimensions
222+
longitude and latitude. Furthermore, units must be supplied for data and dimensions in `sim`
223+
and `obs` and the units for longitude and latitude should be degrees.
224+
225+
Consider the following example, where we compute the bias and RMSE between our simulation
226+
and some observations stored in "ta\_1d\_average.nc".
227+
228+
```@julia bias_and_mse
229+
julia> obs_var = OutputVar("ta_1d_average.nc"); # load in observational data
230+
231+
julia> sim_var = get(simdir("simulation_output"), "ta"); # load in simulation data
232+
233+
julia> ClimaAnalysis.short_name(sim_var)
234+
"ta"
235+
236+
julia> bias_var = ClimaAnalysis.bias(sim_var, obs_var); # bias_var is a OutputVar that can be plotted
237+
238+
julia> global_bias(sim, obs)
239+
2.0
240+
241+
julia> units(bias_var)
242+
"K"
243+
244+
julia> se_var = ClimaAnalysis.squared_error(sim_var, obs_var); # can also be plotted
245+
246+
julia> global_mse(sim, obs)
247+
4.0
248+
249+
julia> global_rmse(sim, obs)
250+
2.0
251+
252+
julia> units(se_var)
253+
"K^2"
254+
```
255+
207256
## Bug fixes
208257

209258
- Increased the default value for `warp_string` to 72.

docs/src/api.md

+5
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,11 @@ Var.integrate_lonlat
6060
Var.integrate_lat
6161
Var.integrate_lon
6262
Var.split_by_season(var::OutputVar)
63+
Var.bias
64+
Var.global_bias
65+
Var.squared_error
66+
Var.global_mse
67+
Var.global_rmse
6368
```
6469

6570
## Utilities

docs/src/var.md

+49
Original file line numberDiff line numberDiff line change
@@ -139,4 +139,53 @@ julia> [MAM.data, JJA.data, DJF.data]
139139
[2.0]
140140
[3.0]
141141
[1.0]
142+
```
143+
144+
## Bias and squared error
145+
Bias and squared error can be computed from simulation data and observational data in
146+
`OutputVar`s using `bias(sim, obs)` and `squared_error(sim, obs)`. The function `bias(sim,
147+
obs)` returns a `OutputVar` whose data is the bias (`sim.data - obs.data`) and computes the
148+
global bias of `data` in `sim` and `obs` over longitude and latitude. The result is stored
149+
in `var.attributes["global_bias"]`. The function `squared_error(sim, obs)` returns a
150+
`OutputVar` whose data is the squared error (`(sim.data - obs.data)^2`) and computes the
151+
global mean squared error (MSE) and the global root mean squared error (RMSE) of `data` in
152+
`sim` and `obs` over longitude and latitude. The result is stored in
153+
`var.attributes["global_mse"]` and `var.attributes["global_rmse"]`. Resampling is
154+
automatically done by resampling `obs` on `sim`. If you are only interested in computing
155+
global bias, MSE, or RMSE, you can use `global_bias(sim, obs)`, `global_mse(sim, obs)`, or
156+
`global_rmse(sim, obs)`.
157+
158+
As of now, these functions are implemented for `OutputVar`s with only the dimensions
159+
longitude and latitude. Furthermore, units must be supplied for data and dimensions in `sim`
160+
and `obs` and the units for longitude and latitude should be degrees.
161+
162+
Consider the following example, where we compute the bias and RMSE between our simulation
163+
and some observations stored in "ta\_1d\_average.nc".
164+
165+
```@julia bias_and_mse
166+
julia> obs_var = OutputVar("ta_1d_average.nc"); # load in observational data
167+
168+
julia> sim_var = get(simdir("simulation_output"), "ta"); # load in simulation data
169+
170+
julia> ClimaAnalysis.short_name(sim_var)
171+
"ta"
172+
173+
julia> bias_var = ClimaAnalysis.bias(sim_var, obs_var); # bias_var is a OutputVar that can be plotted
174+
175+
julia> global_bias(sim, obs)
176+
2.0
177+
178+
julia> units(bias_var)
179+
"K"
180+
181+
julia> se_var = ClimaAnalysis.squared_error(sim_var, obs_var); # can also be plotted
182+
183+
julia> global_mse(sim, obs)
184+
4.0
185+
186+
julia> global_rmse(sim, obs)
187+
2.0
188+
189+
julia> units(se_var)
190+
"K^2"
142191
```

src/Var.jl

+215-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,12 @@ export OutputVar,
4444
integrate_lon,
4545
integrate_lat,
4646
isempty,
47-
split_by_season
47+
split_by_season,
48+
bias,
49+
global_bias,
50+
squared_error,
51+
global_mse,
52+
global_rmse
4853

4954
"""
5055
Representing an output variable
@@ -966,6 +971,215 @@ function split_by_season(var::OutputVar)
966971
end
967972
end
968973

974+
"""
975+
_check_sim_obs_units_consistent(sim::OutputVar, obs::OutputVar)
976+
977+
Check if the number of dimensions are two, the `data` in `sim` and `obs` is missing units or
978+
not, and if the units of data are the same in `sim` and `obs`.
979+
980+
This function does not check if the dimensions are longitude and latitude in `sim` and `obs`
981+
because `integrate_lonlat` (in `bias` and `squared_error`) handles that. The function also
982+
does not check if the units of dimensions in `sim` and `obs` are the same because
983+
`resampled_as` (in `bias` and `squared_error`) handles that.
984+
"""
985+
function _check_sim_obs_units_consistent(sim::OutputVar, obs::OutputVar)
986+
# Check number of dimensions
987+
sim_num_dims = length(sim.dims)
988+
obs_num_dims = length(obs.dims)
989+
((sim_num_dims != 2) || (obs_num_dims != 2)) && error(
990+
"There are not only two dimensions in sim ($sim_num_dims) or obs ($obs_num_dims).",
991+
)
992+
993+
# Check units for data is not missing
994+
sim_data_units = units(sim)
995+
obs_data_units = units(obs)
996+
sim_data_units == "" && error("Unit is missing in data for sim")
997+
obs_data_units == "" && error("Unit is missing in data for obs")
998+
999+
# Check if units of data match between sim and obs
1000+
sim_data_units == obs_data_units || error(
1001+
"Units do not match between the data in sim ($sim_data_units) and obs ($obs_data_units)",
1002+
)
1003+
return nothing
1004+
end
1005+
1006+
"""
1007+
bias(sim::OutputVar, obs::OutputVar)
1008+
1009+
Return a `OutputVar` whose data is the bias (`sim.data - obs.data`) and compute the global
1010+
bias of `data` in `sim` and `obs` over longitude and latitude. The result is stored in
1011+
`var.attributes["global_bias"]`.
1012+
1013+
This function is currently implemented for `OutputVar`s with only the dimensions longitude
1014+
and latitude. Units must be supplied for data and dimensions in `sim` and `obs`. The units
1015+
for longitude and latitude should be degrees. Resampling is done automatically by resampling
1016+
`obs` on `sim`. Attributes in `sim` and `obs` will be thrown away. The long name and short
1017+
name of the returned `OutputVar` will be updated to reflect that a bias is computed.
1018+
1019+
See also [`global_bias`](@ref), [`squared_error`](@ref), [`global_mse`](@ref),
1020+
[`global_rmse`](@ref).
1021+
"""
1022+
function bias(sim::OutputVar, obs::OutputVar)
1023+
_check_sim_obs_units_consistent(sim, obs)
1024+
1025+
# Resample obs on sim to ensure the size of data in sim and obs are the same and the
1026+
# dims are the same
1027+
obs_resampled = resampled_as(obs, sim)
1028+
1029+
# Compute bias
1030+
bias = sim - obs_resampled
1031+
1032+
# Do this because we do not want to store global bias as a string and unit could be Unitful
1033+
ret_attributes = Dict{keytype(bias.attributes), Any}(bias.attributes)
1034+
1035+
# Add units back for bias
1036+
ret_attributes["units"] = units(sim)
1037+
1038+
# Add short and long name
1039+
ret_attributes["short_name"] = "sim-obs"
1040+
ret_attributes["long_name"] = "SIM - OBS"
1041+
if !isempty(short_name(sim))
1042+
ret_attributes["short_name"] *= "_" * short_name(sim)
1043+
ret_attributes["long_name"] *= " " * short_name(sim)
1044+
end
1045+
1046+
# Compute global bias and store it as an attribute
1047+
integrated_bias = integrate_lonlat(bias).data
1048+
normalization =
1049+
integrate_lonlat(
1050+
OutputVar(
1051+
bias.attributes,
1052+
bias.dims,
1053+
bias.dim_attributes,
1054+
ones(size(bias.data)),
1055+
),
1056+
).data
1057+
# Do ./ instead of / because we are dividing between zero dimensional arrays
1058+
global_bias = integrated_bias ./ normalization
1059+
ret_attributes["global_bias"] = global_bias
1060+
return OutputVar(ret_attributes, bias.dims, bias.dim_attributes, bias.data)
1061+
end
1062+
1063+
"""
1064+
global_bias(sim::OutputVar, obs::OutputVar)
1065+
1066+
Return the global bias of `data` in `sim` and `obs` over longitude and latitude.
1067+
1068+
This function is currently only implemented for `OutputVar`s with only the dimensions
1069+
longitude and latitude. Units must be supplied for data and dimensions in `sim` and `obs`.
1070+
The units for longitude and latitude should be degrees. Resampling is done automatically by
1071+
resampling `obs` on `sim`.
1072+
1073+
See also [`bias`](@ref), [`squared_error`](@ref), [`global_mse`](@ref),
1074+
[`global_rmse`](@ref).
1075+
"""
1076+
function global_bias(sim::OutputVar, obs::OutputVar)
1077+
bias_var = bias(sim, obs)
1078+
return bias_var.attributes["global_bias"]
1079+
end
1080+
1081+
"""
1082+
squared_error(sim::OutputVar, obs::OutputVar)
1083+
1084+
Return a `OutputVar` whose data is the squared error (`(sim.data - obs.data)^2`) and compute
1085+
the global mean squared error (MSE) and global root mean squared error (RMSE) of `data` in
1086+
`sim` and `obs` over longitude and latitude. The result is stored in `var.attributes["mse"]`
1087+
and `var.attributes["rmse"]`.
1088+
1089+
This function is currently implemented for `OutputVar`s with only the dimensions longitude
1090+
and latitude. Units must be supplied for data and dimensions in `sim` and `obs`. The units
1091+
for longitude and latitude should be degrees. Resampling is done automatically by resampling
1092+
`obs` on `sim`. Attributes in `sim` and `obs` will be thrown away. The long name and short
1093+
name of the returned `OutputVar` will be updated to reflect that a squared error is computed.
1094+
1095+
See also [`global_mse`](@ref), [`global_rmse`](@ref), [`bias`](@ref), [`global_bias`](@ref).
1096+
"""
1097+
function squared_error(sim::OutputVar, obs::OutputVar)
1098+
_check_sim_obs_units_consistent(sim, obs)
1099+
1100+
# Resample obs on sim to ensure the size of data in sim and obs are the same and the
1101+
# dims are the same
1102+
obs_resampled = resampled_as(obs, sim)
1103+
1104+
# Compute squared error
1105+
# Do not use ^ since ^ is not defined between a OutputVar and Real
1106+
squared_error = (sim - obs_resampled) * (sim - obs_resampled)
1107+
1108+
# Do this because we do not want to store global mse and rmse as strings
1109+
ret_attributes = Dict{String, Any}(squared_error.attributes)
1110+
1111+
# Add units back for bias
1112+
# Always treat as a string since the string representation of something type Unitful is
1113+
# not always parseable as a Unitful.Unit (see:
1114+
# https://github.com/PainterQubits/Unitful.jl/issues/412)
1115+
ret_attributes["units"] = "($(units(sim)))^2"
1116+
1117+
# Add short and long name
1118+
ret_attributes["short_name"] = "(sim-obs)^2"
1119+
ret_attributes["long_name"] = "(SIM - OBS)^2"
1120+
if !isempty(short_name(sim))
1121+
ret_attributes["short_name"] *= "_" * short_name(sim)
1122+
ret_attributes["long_name"] *= " " * short_name(sim)
1123+
end
1124+
1125+
# Compute global mse and global rmse and store it as an attribute
1126+
integrated_squared_error = integrate_lonlat(squared_error).data
1127+
normalization =
1128+
integrate_lonlat(
1129+
OutputVar(
1130+
squared_error.attributes,
1131+
squared_error.dims,
1132+
squared_error.dim_attributes,
1133+
ones(size(squared_error.data)),
1134+
),
1135+
).data
1136+
# Do ./ instead of / because we are dividing between zero dimensional arrays
1137+
mse = integrated_squared_error ./ normalization
1138+
ret_attributes["global_mse"] = mse
1139+
ret_attributes["global_rmse"] = sqrt(mse)
1140+
return OutputVar(
1141+
ret_attributes,
1142+
squared_error.dims,
1143+
squared_error.dim_attributes,
1144+
squared_error.data,
1145+
)
1146+
end
1147+
1148+
"""
1149+
global_mse(sim::OutputVar, obs::OutputVar)
1150+
1151+
Return the global mean squared error (MSE) of `data` in `sim` and `obs` over longitude and
1152+
latitude.
1153+
1154+
This function is currently implemented for `OutputVar`s with only the dimensions longitude
1155+
and latitude. Units must be supplied for data and dimensions in `sim` and `obs`. The units
1156+
for longitude and latitude should be degrees. Resampling is done automatically by resampling
1157+
`obs` on `sim`.
1158+
1159+
See also [`squared_error`](@ref), [`global_rmse`](@ref), [`bias`](@ref), [`global_bias`](@ref).
1160+
"""
1161+
function global_mse(sim::OutputVar, obs::OutputVar)
1162+
squared_error_var = squared_error(sim, obs)
1163+
return squared_error_var.attributes["global_mse"]
1164+
end
1165+
1166+
"""
1167+
global_rmse(sim::OutputVar, obs::OutputVar)
1168+
1169+
Return the global root mean squared error (RMSE) of `data` in `sim` and `obs` over longitude
1170+
and latitude.
1171+
1172+
This function is currently implemented for `OutputVar`s with only the dimensions longitude
1173+
and latitude. Units must be supplied for data and dimensions in `sim` and `obs`. The units
1174+
for longitude and latitude should be degrees. Resampling is done automatically by resampling
1175+
`obs` on `sim`.
1176+
1177+
See also [`squared_error`](@ref), [`global_mse`](@ref), [`bias`](@ref), [`global_bias`](@ref).
1178+
"""
1179+
function global_rmse(sim::OutputVar, obs::OutputVar)
1180+
squared_error_var = squared_error(sim, obs)
1181+
return squared_error_var.attributes["global_rmse"]
1182+
end
9691183
"""
9701184
overload_binary_op(op)
9711185

0 commit comments

Comments
 (0)