Skip to content

Commit 26bf2a1

Browse files
committed
Fix identifying variables with _ in the name
The regular expression used by ClimaAnalysis was not identifying correctly variables with an underscore in the short name. The reason for this was that the time interval capturing group was too greedy. I changed the capturing group to only match with m|M|d|y|s|min, allowing me to capture more general short names.
1 parent e42b996 commit 26bf2a1

File tree

3 files changed

+47
-22
lines changed

3 files changed

+47
-22
lines changed

NEWS.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ dimension.
2424

2525
- Interpolation is not possible with dates. When dates are detected in any dimension, an
2626
interpolat will not be made.
27+
- Fix identifying variables with underscore in the short name (such as
28+
`net_toa_flux`). ([#109](https://github.com/CliMA/ClimaAnalysis.jl/pull/109
29+
"PR109"))
2730

2831
v0.5.9
2932
------
@@ -58,11 +61,11 @@ julia> reordered_var.dims |> keys |> collect
5861
## Bug fixes
5962

6063
- Fix models repeating in legend of box plots by not considering the models in `model_names`
61-
when finding the best and worst models
64+
when finding the best and worst models.
6265
- Fix legend from covering the box plot by adding the parameter `legend_text_width` which
63-
control the number of characters on each line of the legend of the box plot
66+
control the number of characters on each line of the legend of the box plot.
6467
- Use default marker size instead of a marker size of 20 when plotting other models beside
65-
`CliMA` on the box plot
68+
`CliMA` on the box plot.
6669
- Fix support for `""` in units.
6770

6871
v0.5.8

src/Utils.jl

+28-15
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ julia> match_nc_filename("ta_1d_average.nc")
2626
```
2727
2828
```jldoctest
29-
julia> match_nc_filename("pfull_6.0min_max.nc")
30-
("pfull", "6.0min", "max")
29+
julia> match_nc_filename("pfull_6.0m_max.nc")
30+
("pfull", "6.0m", "max")
3131
```
3232
3333
```jldoctest
@@ -39,23 +39,36 @@ function match_nc_filename(filename::String)
3939
# Let's unpack this regular expression to find files names like "orog_inst.nc" or
4040
# "ta_3.0h_average.nc" and extract information from there.
4141

42-
# ^ $: mean match the entire string
43-
# (\w+?): the first capturing group, matching any word non greedily
44-
# _: matches this literal character
45-
# (?>([a-zA-Z0-9\.]*)_)?: an optional group (it doesn't always exist for _inst
46-
# variables) ?> means that we don't want to capture the outside
47-
# group the inside group is any combinations of letters/numbers,
48-
# and the literal character ., followed by the _. We capture the
49-
# combination of characters because that's the reduction
50-
# (\w+): Again, any word
51-
# \.nc: file extension has to be .nc
52-
re = r"^(\w+?)_(?>([a-zA-Z0-9_\.]*)_)?(\w*)\.nc$"
42+
# ^: Matches the beginning of the string
43+
44+
# (\w+?): Matches one or more word characters (letters, numbers, or underscore)
45+
# non-greedily and captures it as the first group (variable name)
46+
47+
# _: Matches the underscore separating the variable name and the optional time
48+
# resolution.
49+
50+
# ((?:[0-9]|m|M|d|s|y|_|\.)*?): Matches zero or more occurrences of the allowed
51+
# characters (digits, time units, underscore, or dot) non-greedily and captures the
52+
# entire time resolution string as the second group
53+
54+
# _?: Matches an optional underscore (to handle cases where there's no time resolution)
55+
56+
# ([a-zA-Z0-9]+): Matches one or more alphanumeric characters and captures it as the
57+
# third group (statistic)
58+
59+
# \.nc: Matches the literal ".nc" file extension
60+
61+
# $: Matches the end of the string
62+
63+
re = r"^(\w+?)_((?:[0-9]|m|M|d|s|y|h|_|\.)*?)_?([a-zA-Z0-9]+)\.nc$"
5364
m = match(re, filename)
5465
if !isnothing(m)
5566
# m.captures returns `SubString`s (or nothing). We want to have actual `String`s (or
56-
# nothing) so that we can assume we have `String`s everywhere.
67+
# nothing) so that we can assume we have `String`s everywhere. We also take care of
68+
# the case where the period is matched to an empty string and return nothing instead
5769
return Tuple(
58-
isnothing(cap) ? nothing : String(cap) for cap in m.captures
70+
(isnothing(cap) || cap == "") ? nothing : String(cap) for
71+
cap in m.captures
5972
)
6073
else
6174
return nothing

test/test_Utils.jl

+13-4
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,20 @@ import Dates
99
@test Utils.match_nc_filename("ta_1d_average.nc") ==
1010
Tuple(["ta", "1d", "average"])
1111

12-
@test Utils.match_nc_filename("ta_1m_40s_inst.nc") ==
13-
Tuple(["ta", "1m_40s", "inst"])
12+
@test Utils.match_nc_filename("ta_3.0h_average.nc") ==
13+
Tuple(["ta", "3.0h", "average"])
1414

15-
@test Utils.match_nc_filename("pfull_6.0min_max.nc") ==
16-
Tuple(["pfull", "6.0min", "max"])
15+
@test Utils.match_nc_filename("toa_net_flux_1m_40s_inst.nc") ==
16+
Tuple(["toa_net_flux", "1m_40s", "inst"])
17+
18+
@test Utils.match_nc_filename("toa_net_flux_1M_inst.nc") ==
19+
Tuple(["toa_net_flux", "1M", "inst"])
20+
21+
@test Utils.match_nc_filename("p500_1M_inst.nc") ==
22+
Tuple(["p500", "1M", "inst"])
23+
24+
@test Utils.match_nc_filename("pfull_6.0m_max.nc") ==
25+
Tuple(["pfull", "6.0m", "max"])
1726

1827
@test Utils.match_nc_filename("hu_inst.nc") ==
1928
Tuple(["hu", nothing, "inst"])

0 commit comments

Comments
 (0)