Preprocessing to create SUMMA input #142

DavidChoi76 · 2021-01-07T01:35:10Z

Add preprocessing python and R code
- Test these codes using 7 Jupyter notebooks on CyberGIS Jupyter for water
  (https://www.hydroshare.org/resource/303f64250a1d45b8b3c24031a4500781/)
  Step_1_Download_RawData_of_CoweetaSub18.ipynb
  Step_2_Delineate_watershed_and_extract_local_attributes_using_GRASS_GIS.ipynb
  - For the step-2 notebook, we need GRASS-GIS, R-packages, and R-libraries. So Drew and I preconfigured these computational environments in CyberGIS-Jupyter for water.
  Step_3_Create_Local_Attribute_CSV_file_with_GRUs_and_HRUs.ipynb
  Step_4_Create_Initial_Condition_of_SUMMA_Input.ipynb
  Step_5_Create_NetCDF_SUMMA_Input.ipynb
  Step_6_Create_3hour_Forcing_using_Metsim.ipynb
  Step_7_SUMMA_execution.ipynb
- HS resource SUMMA model input: SUMMA model folder with default files, spatial data (dem, landuse, soil), and time-series data (climate and streamflow data) https://www.hydroshare.org/resource/62d7a7be65e14680af9e53464925a980/
delete specworker folder: HydroShare CUAHSI JupyterHub doesn't use specworker anymore.

Merge latest version of pysumma from develop branch

…ayer

arbennett

It looks like there is a lot of functionality here for creating the various input files for SUMMA from "raw" data sources. Because of that, this is a rather large pull request and I think we'll need a couple of rounds on it. I just went through and looked at some of the general architecture and style and thing that once we get that cleaned up it'll be easier to see how to make this "mesh" with the rest of the pysumma codebase a little bit more cleanly.

One thing that does concern me a bit is the inclusion of the SSURGO database. I would prefer not to maintain external datasets in this repo. That is something that users should download on their own.

arbennett · 2021-04-26T22:01:04Z

pysumma/__init__.py

@@ -8,6 +8,10 @@
 from .force_file_list import ForcingList
 from . import utils
 from .calibration import Ostrich, OstrichParam
+from . import hydroshare_utils
+from .preprocessing import LocalAttributes, InitialCondition, ParamTrial


I think it might be good to have these not only live in preprocessing but be more general classes which act as interfaces to these methods. Then we could do something like LocalAttributes.from_netcdf(filepath) or LocalAttributes.from_tif(filepath) which might end up being a bit cleaner. Let's also make sure these follow the conventions in the FileManager class, so we should update ParamTrial -> TrialParams

arbennett · 2021-04-26T22:02:20Z

pysumma/preprocessing/create_initial_condition.py

@@ -0,0 +1,152 @@
+from osgeo import gdal


Minor comment, but I think a better name would just be initial_condition.py rather than create_initial_condition.py so that we can have it be more general than just methods for making the initial conditions, but also for modifying them.

Same goes for the create_local_attribute and create_param_trial

arbennett · 2021-04-26T22:03:19Z

pysumma/preprocessing/create_initial_condition.py

+ssurgo_soil_db = pkg_resources.resource_filename(
+        __name__, 'meta/ssurgo_soils_db.csv')
+
+class InitialCondition(object):


I think we need some documentation in here, especially for the more complex functions.

Also, could you explain exactly what assumptions/datasets we're using to create this here?

arbennett · 2021-04-26T22:04:20Z

pysumma/preprocessing/create_initial_condition.py

+        gru_mukey_depth = gru_mukey_1_col.merge(SSURGO_soil_depth, on='MUKEY')
+        gru_mukey_layer = gru_mukey_depth[['nSoil']]
+        gru_mukey_layer['nSoil'].unique().max()
+        gru_initial_cond_hru = gru_mukey_layer.assign(**{'scalarv': hru_ic["scalarv"], 'scalarSnowAlbedo': hru_ic["scalarSnowAlbedo"], 


There are a lot of really long lines in this PR, could you work to split some of it so line length is < 100 characters long?

arbennett · 2021-04-26T22:06:50Z

pysumma/preprocessing/create_local_attribute.py

+                if fdr.loc[index_r, index_c] == 1 and index_r-1 >= 0 and index_c+1 >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r-1, index_c+1]
+                elif fdr.loc[index_r, index_c] == 2 and index_r-1 >= 0 and index_c >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r-1, index_c]
+                elif fdr.loc[index_r, index_c] == 3 and index_r-1 >= 0 and index_c-1 >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r-1, index_c-1]
+                elif fdr.loc[index_r, index_c] == 4 and index_r >= 0 and index_c-1 >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r, index_c-1]
+                elif fdr.loc[index_r, index_c] == 5 and index_r+1 >= 0 and index_c-1 >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r+1, index_c-1]
+                elif fdr.loc[index_r, index_c] == 6 and index_r+1 >= 0 and index_c >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r+1, index_c]
+                elif fdr.loc[index_r, index_c] == 7 and index_r+1 >= 0 and index_c+1 >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r+1, index_c+1]
+                elif fdr.loc[index_r, index_c] == 8 and index_r >= 0 and index_c+1 >= 0:
+                    downHRUindex.loc[index_r,
+                                     index_c] = hruid.loc[index_r, index_c+1]
+                elif fdr.loc[index_r, index_c] < 0 and index_r >= 0 and index_c+1 >= 0:
+                    downHRUindex.loc[index_r, index_c] = -9999
+                else:
+                    downHRUindex.loc[index_r, index_c] = 0


I have no idea what any of this means - could you provide either some comments about what these are or rename some of the variables/raw integers so that this is easier to understand?

arbennett · 2021-04-26T22:07:43Z

pysumma/preprocessing/create_local_attribute.py

+                if lulc_df['vegTypeIndex'].loc[index] == 11 and 12 and 13 and 14 and 15 and 16 and 17:
+                    lulc['vegTypeIndex'][index] = 1
+                elif lulc_df['vegTypeIndex'].loc[index] == 22 and 24:
+                    lulc['vegTypeIndex'][index] = 4
+                elif lulc_df['vegTypeIndex'].loc[index] == 21:
+                    lulc['vegTypeIndex'][index] = 5
+                elif lulc_df['vegTypeIndex'].loc[index] == 23:
+                    lulc['vegTypeIndex'][index] = 7
+                elif lulc_df['vegTypeIndex'].loc[index] == 32 and 33:
+                    lulc['vegTypeIndex'][index] = 9
+                elif lulc_df['vegTypeIndex'].loc[index] == 41:
+                    lulc['vegTypeIndex'][index] = 11
+                elif lulc_df['vegTypeIndex'].loc[index] == 42:
+                    lulc['vegTypeIndex'][index] = 13
+                elif lulc_df['vegTypeIndex'].loc[index] == 43:
+                    lulc['vegTypeIndex'][index] = 15
+                elif lulc_df['vegTypeIndex'].loc[index] == 51 and 52 and 53 and 54 and 72:
+                    lulc['vegTypeIndex'][index] = 16
+                elif lulc_df['vegTypeIndex'].loc[index] == 31 and 62:
+                    lulc['vegTypeIndex'][index] = 17
+                elif lulc_df['vegTypeIndex'].loc[index] == 61:
+                    lulc['vegTypeIndex'][index] = 18
+                elif lulc_df['vegTypeIndex'].loc[index] == 82:
+                    lulc['vegTypeIndex'][index] = 20
+                elif lulc_df['vegTypeIndex'].loc[index] == 81:
+                    lulc['vegTypeIndex'][index] = 21
+                elif lulc_df['vegTypeIndex'].loc[index] == 84:
+                    lulc['vegTypeIndex'][index] = 22
+                elif lulc_df['vegTypeIndex'].loc[index] == 71 and 74 and 75 and 76 and 77 and 83:
+                    lulc['vegTypeIndex'][index] = 23
+                elif lulc_df['vegTypeIndex'].loc[index] == 91 and 92:
+                    lulc['vegTypeIndex'][index] = 24
+                elif lulc_df['vegTypeIndex'].loc[index] == 73:
+                    lulc['vegTypeIndex'][index] = 27
+                else:
+                    pass
+
+        elif option == "MODIFIED_IGBP_MODIS_NOAH":
+            for index, raster in lulc_df.iterrows():
+                if lulc_df['vegTypeIndex'].loc[index] == 42:
+                    lulc['vegTypeIndex'][index] = 2
+                elif lulc_df['vegTypeIndex'].loc[index] == 41:
+                    lulc['vegTypeIndex'][index] = 4
+                elif lulc_df['vegTypeIndex'].loc[index] == 43:
+                    lulc['vegTypeIndex'][index] = 5
+                elif lulc_df['vegTypeIndex'].loc[index] == 61 and 62:
+                    lulc['vegTypeIndex'][index] = 11
+                elif lulc_df['vegTypeIndex'].loc[index] == 21 and 22:
+                    lulc['vegTypeIndex'][index] = 12
+                elif lulc_df['vegTypeIndex'].loc[index] == 11 and 12 and 13 and 14 and 16 and 17:
+                    lulc['vegTypeIndex'][index] = 13
+                elif lulc_df['vegTypeIndex'].loc[index] == 23 and 24 and 31 and 32 and 33:
+                    lulc['vegTypeIndex'][index] = 14
+                elif lulc_df['vegTypeIndex'].loc[index] == 91 and 92:
+                    lulc['vegTypeIndex'][index] = 15
+                elif lulc_df['vegTypeIndex'].loc[index] == 51 and 52 and 53 and 54:
+                    lulc['vegTypeIndex'][index] = 17
+                elif lulc_df['vegTypeIndex'].loc[index] == 84:
+                    lulc['vegTypeIndex'][index] = 19
+                elif lulc_df['vegTypeIndex'].loc[index] == 71 and 72 and 73 and 74 and 75 and 76 and 77 and 81 and 82 and 83:
+                    lulc['vegTypeIndex'][index] = 20
+                else:
+                    pass
+        return lulc


As above, just having these big chains of elifs on raw integers is very mysterious. I have no idea what each of these stands for.

arbennett · 2021-04-26T22:09:12Z

pysumma/preprocessing/create_local_attribute.py

+        lulc = pd.DataFrame(
+            np.zeros((lulc_df.shape[0], lulc_df.shape[1])), columns=['vegTypeIndex'])
+        lulc = lulc.astype('int64')
+        if option == "USGS" and "USGS-RUC":


I think the logic here is not what you want. The and "USGS-RUC" part doesn't actually do anything. I think you mean if option == "USGS" or option == "USGS-RUC", which could also be written option in ["USGS", "USGS-RUC"].

arbennett · 2021-04-26T22:09:30Z

pysumma/preprocessing/create_local_attribute.py

+                    pass
+        return lulc
+
+    def gru_local_attributes(self, gru_name, gru_num, dir_grassdata, hru_start_num, hru_area, contour_length=10, slopeTypeIndex=1, vegTypeIndex_option='USGS'):


Again, a lot of very long lines here.

arbennett · 2021-04-26T22:12:08Z

pysumma/preprocessing/input_netcdf.py

+    """
+
+    # Create a netCDF file
+    initial_cond = Dataset(name, "w", format="NETCDF3_CLASSIC")


I'd prefer to use xarray here just to be internally consistent. Also, is there a reason you're using NETCDF3_CLASSIC? If it's not a hard constraint I'd rather use something either newer, or (even better IMO) agnostic to the version.

arbennett · 2021-04-26T22:13:07Z

pysumma/preprocessing/meta/basin_extraction.R

@@ -0,0 +1,95 @@
+library(rgrass7)


In my first look through it's not entirely clear where these are used. Could you point me to where these are called? I'd prefer not to introduce too many external scripts.

This goes for all of the scripts in preprocessing/meta/

Also, could you just provide some sort of header comments on what these actually do?

DaveCasson · 2021-11-11T12:38:30Z

Hi David and Andrew,
I wanted to point to: https://github.com/CH-Earth/summaWorkflow_public to raise awareness of this development. This is led by Wouter Knoben. Andrew will know the development as I see him on the publication ;) Also a new post-doc Bart van Osnabrugge will be looking at scripting to combine efforts in summa model development.
It would be worth a conversation about respective efforts for summa input building.

YoungDon Choi added 8 commits January 6, 2021 19:36

Merge pull request #4 from UW-Hydro/develop

846344c

Merge latest version of pysumma from develop branch

add preprocessing modules

9e7c497

don't need specworker anymore on CJH

b22f704

add preprocessing modules

74c18da

move this function to preprocessing folder

c0ee637

add meta folder in preprocessing folder

315fc4a

edit values of heightCanopy Top and Bottom

4528aaa

change the default value of soil depth to make the same number soil l…

52a463d

…ayer

arbennett requested changes Apr 26, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing to create SUMMA input #142

Preprocessing to create SUMMA input #142

DavidChoi76 commented Jan 7, 2021

arbennett left a comment

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

arbennett Apr 26, 2021

DaveCasson commented Nov 11, 2021 •

edited

Loading

Preprocessing to create SUMMA input #142

Are you sure you want to change the base?

Preprocessing to create SUMMA input #142

Conversation

DavidChoi76 commented Jan 7, 2021

arbennett left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCasson commented Nov 11, 2021 • edited Loading

DaveCasson commented Nov 11, 2021 •

edited

Loading