-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[13pt] Update CatFIM site filtering for non-CONUS regions to pull non-forecast points #1356
Comments
Current vs Proposed Metadata API Call MethodsCurrent (single) API Call Method: A single API call to get metadata for all regions, only selecting forecast points. Proposed (double) API Call Method:* Two API calls, the first API call gets metadata for all regions, only selecting forecast points. The second API call gets metadata for Alaska, Hawaii, an Puerto Rico (regardless of whether it is a forecast point). Then, a simple filtering process removes any duplicate points that had already been pulled in the first API call. Points without an NWS LID are also filtered out. *Note: The double API call (minus the filtering out duplicates) was our method of getting metadata up until recently. This brought us a lot of points but caused some problems because of the duplicate points. This proposed method implements some filtering code to fix the issue of duplicate points. Stats by State: Comparing Current and Proposed API Call Methods
**There are a lot of Alaska points with this filter method, so some additional filtering might be useful. It will be important to test whether this impacts runtime and model readability. An easy option would be to filter out the Alaska HUCs that are not included in the NWM processing. Alternatively, we could decide not to pull the non-forecast points for Alaska (but to continue to pull the for Hawaii and Puerto Rico) if we find that it doesn’t add enough sites to warrant the processing time. The next step is to compare the amount of CatFIM sites that this proposed API call + filteration method gets and compare it to the CatFIM results that are in production. |
Yes. I agree with the double with filtering as a better idea, unless we use a third option which is to get WRDS to give us two new "selector" options, one for CONUS and one for "AK and others. A minor thing to keep track of with filtering on the CatFIM sides is that strangely enough, it looks like not all nws_lid records have the "state" value set on it, so we will not get as many records back for a "second" call to WRDS, based on state. Yes.. it is weird to think that not all records have a state value and this needs to be re-confirmed. If this is true about possible lid records missing "state" values, then we are fundamentally trying to compensate for a data issue. However, early tests around this topic bumped this number of possible AK, HI and PR sites to possibly double the number then came in for the second state-based WRDS calls. More research is required. |
PS.. it appears that the "state" field is unstable from WRDS, but the HUC field is stable. So we could look at HUC values to determine who is who. |
Evaluating instability in the metadata state columnSince the proposed filtering method would add a metadata call based on state, it is important to evaluate how reliable the state column(s) are. In the metadata list pulled in the get_metadata() function, there are four different ‘state’ columns: [’nws_data’][’state’], [’usgs_data’][’state’], ['nws_preferred']['state'], and ['usgs_preferred']['state']. I used the current At a glance, the metadata shows that the [’usgs_data’][’state’] column has a lot of ‘None’ values, whereas the others don’t seem to have that many. Summarizing the metadata dataframe by number of ‘None’ values per column supports this observation. There’s only one site in this metadata set where the nws_preferred_state value and the usgs_preferred_state value are both empty. I then pulled the metadata for Alaska, Puerto Rico, and Hawaii using the state selector (and removing the ‘is_forecast_point’ filter that was previously present). This pulls a larger amount of points, but only points that are properly connected to a state label. The metadata in this set has a larger amount of ‘None’ values in the nws_data_state and usgs_data_state columns, but no columns where there the nws_preferred_state or the usgs_preferred_state are null. Essentially, my thoughts are that the Based on all that, I think it should be okay to pull in additional points for select regions using the ‘state’ columns. |
oh cool. One test that I did was to see if I could use the state field and pass in a large number of state codes. That failed. The API can only have so many being passed in at one time. We know it can handle 3 States at one, but can't handle 20 states listed. So.. if we do it by state or preferred states, we might have to make 50+ (or maybe 50/3) calls.. |
Oh neat! Yeah, I don't think doing the CONUS calls by state would be necessary or efficient. I like the setup of the two API calls, one for all forecast points and one to more points for AK, HI, and PR. |
yup.. that makes the most sense. Then just figure out how to drop the dup AK, HI and PR from the two lists :) |
There are many sites in non-CONUS regions (AK, PR, HI) where we would like to run CatFIM but they are being excluded because they are not NWM forecast points.
Loosen filters so we are pulling non-forecast points in AK, PR, and HI from the WRDS API and adjust site processing/filtering downstream to prevent duplicate LIDs.
Preliminary research notes from Rob:
The text was updated successfully, but these errors were encountered: