Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mask out bad data from NEON eval files #4

Open
wwieder opened this issue Nov 2, 2021 · 4 comments
Open

mask out bad data from NEON eval files #4

wwieder opened this issue Nov 2, 2021 · 4 comments

Comments

@wwieder
Copy link
Collaborator

wwieder commented Nov 2, 2021

To mask out absurd measurements from NEON data @ddurden recommended using these min and max thresholds that are used in Ameriflux data processing.

@negin513 , it's not urgent but can you bring these thresholds into scripts that plot up NEON observations?

Flags used for Ameriflux data
Rng$Min <- data.frame(
"FC" = -100, #[umol m-2 s-1]
"SC" = -100, #[umol m-2 s-1]
"NEE" = -100, #[umol m-2 s-1
"LE" = -500, #[W m-2]
"H" = -500, #[W m-2]
"USTAR" = 0, #[m s-1]
"CO2" = 200, #[umol mol-1]
"H2O" = 0, #[mmol mol-1]
"WS_1_1_1" = 0, #[m s-1]
"WS_MAX_1_1_1" = 0, #[m s-1]
"WD_1_1_1" = -0.1, #[deg]
"T_SONIC" = -55.0, #[C]
)

Rng$Max <- data.frame(
"FC" = 100, #[umol m-2 s-1]
"SC" = 100, #[umol m-2 s-1]
"NEE" = 100, #[umol m-2 s-1]
"LE" = 1000, #[W m-2]
"H" = 1000, #[W m-2]
"USTAR" = 5, #[m s-1]
"CO2" = 800, #[umol mol-1]
"H2O" = 100, #[mmol mol-1]
"WS_1_1_1" = 50, #[m s-1]
"WS_MAX_1_1_1" = 50, #[m s-1]
"WD_1_1_1" = 360, #[deg]
"T_SONIC" = 45.0, #[C]
)

@wwieder
Copy link
Collaborator Author

wwieder commented Nov 19, 2021

@negin513 not critical, but did you ever try applying these masks to the plots of NEON data?

@negin513
Copy link
Collaborator

Thanks @wwieder for the reminder. I actually did not see this before. I will work on applying these filters. I am wondering what would be the best way to do this. I think we eventually want these filters for both Bokeh and matplotlib plots so maybe writing a function remove_outliers (or something like that) and call it during pre-processing makes the most sense.

@negin513
Copy link
Collaborator

negin513 commented Nov 19, 2021

What I originally had in mind for filtering the outlier was using std instead of fixed values. I am not sure which method (using fixed values for each variable vs. using automatic outlier detection methods) works better and it is easier.

For automatic outlier detection, there are other options available as well:

  • For example, we can try to filter values that are +-3STD from the mean. (approximately 99.7 percentile assuming Gaussian distributions).
  • Or we can use more advanced methods such as machine learning. In fact there is one method of machine learning specifically used for outlier detection called one class or unary classification. https://en.wikipedia.org/wiki/One-class_classification

An example of using one-class classification for outlier detection: https://blogs.sap.com/2020/12/29/outlier-detection-with-one-class-classification-using-python-machine-learning-client-for-sap-hana/

@wwieder
Copy link
Collaborator Author

wwieder commented Nov 19, 2021

I like the function to remove_outliers. At this stage I'd keep it simple and really obvious what we're doing. Using fixed values or the 3 sigma threshold will hopefully catch the bulk of the crazy spikes in the measurements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants