mask out bad data from NEON eval files #4

wwieder · 2021-11-02T21:25:05Z

To mask out absurd measurements from NEON data @ddurden recommended using these min and max thresholds that are used in Ameriflux data processing.

@negin513 , it's not urgent but can you bring these thresholds into scripts that plot up NEON observations?

Flags used for Ameriflux data
Rng$Min <- data.frame(
"FC" = -100, #[umol m-2 s-1]
"SC" = -100, #[umol m-2 s-1]
"NEE" = -100, #[umol m-2 s-1
"LE" = -500, #[W m-2]
"H" = -500, #[W m-2]
"USTAR" = 0, #[m s-1]
"CO2" = 200, #[umol mol-1]
"H2O" = 0, #[mmol mol-1]
"WS_1_1_1" = 0, #[m s-1]
"WS_MAX_1_1_1" = 0, #[m s-1]
"WD_1_1_1" = -0.1, #[deg]
"T_SONIC" = -55.0, #[C]
)

Rng$Max <- data.frame(
"FC" = 100, #[umol m-2 s-1]
"SC" = 100, #[umol m-2 s-1]
"NEE" = 100, #[umol m-2 s-1]
"LE" = 1000, #[W m-2]
"H" = 1000, #[W m-2]
"USTAR" = 5, #[m s-1]
"CO2" = 800, #[umol mol-1]
"H2O" = 100, #[mmol mol-1]
"WS_1_1_1" = 50, #[m s-1]
"WS_MAX_1_1_1" = 50, #[m s-1]
"WD_1_1_1" = 360, #[deg]
"T_SONIC" = 45.0, #[C]
)

wwieder · 2021-11-19T20:25:42Z

@negin513 not critical, but did you ever try applying these masks to the plots of NEON data?

negin513 · 2021-11-19T20:35:01Z

Thanks @wwieder for the reminder. I actually did not see this before. I will work on applying these filters. I am wondering what would be the best way to do this. I think we eventually want these filters for both Bokeh and matplotlib plots so maybe writing a function remove_outliers (or something like that) and call it during pre-processing makes the most sense.

negin513 · 2021-11-19T20:43:15Z

What I originally had in mind for filtering the outlier was using std instead of fixed values. I am not sure which method (using fixed values for each variable vs. using automatic outlier detection methods) works better and it is easier.

For automatic outlier detection, there are other options available as well:

For example, we can try to filter values that are +-3STD from the mean. (approximately 99.7 percentile assuming Gaussian distributions).
Or we can use more advanced methods such as machine learning. In fact there is one method of machine learning specifically used for outlier detection called one class or unary classification. https://en.wikipedia.org/wiki/One-class_classification

An example of using one-class classification for outlier detection: https://blogs.sap.com/2020/12/29/outlier-detection-with-one-class-classification-using-python-machine-learning-client-for-sap-hana/

wwieder · 2021-11-19T21:09:03Z

I like the function to remove_outliers. At this stage I'd keep it simple and really obvious what we're doing. Using fixed values or the 3 sigma threshold will hopefully catch the bulk of the crazy spikes in the measurements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mask out bad data from NEON eval files #4

mask out bad data from NEON eval files #4

wwieder commented Nov 2, 2021

wwieder commented Nov 19, 2021

negin513 commented Nov 19, 2021

negin513 commented Nov 19, 2021 •

edited

Loading

wwieder commented Nov 19, 2021

mask out bad data from NEON eval files #4

mask out bad data from NEON eval files #4

Comments

wwieder commented Nov 2, 2021

wwieder commented Nov 19, 2021

negin513 commented Nov 19, 2021

negin513 commented Nov 19, 2021 • edited Loading

wwieder commented Nov 19, 2021

negin513 commented Nov 19, 2021 •

edited

Loading