Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing to create SUMMA input #142

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

DavidChoi76
Copy link
Collaborator

  • Add preprocessing python and R code
    • Test these codes using 7 Jupyter notebooks on CyberGIS Jupyter for water
      (https://www.hydroshare.org/resource/303f64250a1d45b8b3c24031a4500781/)
      Step_1_Download_RawData_of_CoweetaSub18.ipynb
      Step_2_Delineate_watershed_and_extract_local_attributes_using_GRASS_GIS.ipynb
      - For the step-2 notebook, we need GRASS-GIS, R-packages, and R-libraries. So Drew and I preconfigured these computational environments in CyberGIS-Jupyter for water.
      Step_3_Create_Local_Attribute_CSV_file_with_GRUs_and_HRUs.ipynb
      Step_4_Create_Initial_Condition_of_SUMMA_Input.ipynb
      Step_5_Create_NetCDF_SUMMA_Input.ipynb
      Step_6_Create_3hour_Forcing_using_Metsim.ipynb
      Step_7_SUMMA_execution.ipynb
    • HS resource SUMMA model input: SUMMA model folder with default files, spatial data (dem, landuse, soil), and time-series data (climate and streamflow data) https://www.hydroshare.org/resource/62d7a7be65e14680af9e53464925a980/
  • delete specworker folder: HydroShare CUAHSI JupyterHub doesn't use specworker anymore.

Copy link
Member

@arbennett arbennett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a lot of functionality here for creating the various input files for SUMMA from "raw" data sources. Because of that, this is a rather large pull request and I think we'll need a couple of rounds on it. I just went through and looked at some of the general architecture and style and thing that once we get that cleaned up it'll be easier to see how to make this "mesh" with the rest of the pysumma codebase a little bit more cleanly.

One thing that does concern me a bit is the inclusion of the SSURGO database. I would prefer not to maintain external datasets in this repo. That is something that users should download on their own.

@@ -8,6 +8,10 @@
from .force_file_list import ForcingList
from . import utils
from .calibration import Ostrich, OstrichParam
from . import hydroshare_utils
from .preprocessing import LocalAttributes, InitialCondition, ParamTrial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be good to have these not only live in preprocessing but be more general classes which act as interfaces to these methods. Then we could do something like LocalAttributes.from_netcdf(filepath) or LocalAttributes.from_tif(filepath) which might end up being a bit cleaner. Let's also make sure these follow the conventions in the FileManager class, so we should update ParamTrial -> TrialParams

@@ -0,0 +1,152 @@
from osgeo import gdal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but I think a better name would just be initial_condition.py rather than create_initial_condition.py so that we can have it be more general than just methods for making the initial conditions, but also for modifying them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for the create_local_attribute and create_param_trial

ssurgo_soil_db = pkg_resources.resource_filename(
__name__, 'meta/ssurgo_soils_db.csv')

class InitialCondition(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some documentation in here, especially for the more complex functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you explain exactly what assumptions/datasets we're using to create this here?

gru_mukey_depth = gru_mukey_1_col.merge(SSURGO_soil_depth, on='MUKEY')
gru_mukey_layer = gru_mukey_depth[['nSoil']]
gru_mukey_layer['nSoil'].unique().max()
gru_initial_cond_hru = gru_mukey_layer.assign(**{'scalarv': hru_ic["scalarv"], 'scalarSnowAlbedo': hru_ic["scalarSnowAlbedo"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of really long lines in this PR, could you work to split some of it so line length is < 100 characters long?

Comment on lines +94 to +121
if fdr.loc[index_r, index_c] == 1 and index_r-1 >= 0 and index_c+1 >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r-1, index_c+1]
elif fdr.loc[index_r, index_c] == 2 and index_r-1 >= 0 and index_c >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r-1, index_c]
elif fdr.loc[index_r, index_c] == 3 and index_r-1 >= 0 and index_c-1 >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r-1, index_c-1]
elif fdr.loc[index_r, index_c] == 4 and index_r >= 0 and index_c-1 >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r, index_c-1]
elif fdr.loc[index_r, index_c] == 5 and index_r+1 >= 0 and index_c-1 >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r+1, index_c-1]
elif fdr.loc[index_r, index_c] == 6 and index_r+1 >= 0 and index_c >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r+1, index_c]
elif fdr.loc[index_r, index_c] == 7 and index_r+1 >= 0 and index_c+1 >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r+1, index_c+1]
elif fdr.loc[index_r, index_c] == 8 and index_r >= 0 and index_c+1 >= 0:
downHRUindex.loc[index_r,
index_c] = hruid.loc[index_r, index_c+1]
elif fdr.loc[index_r, index_c] < 0 and index_r >= 0 and index_c+1 >= 0:
downHRUindex.loc[index_r, index_c] = -9999
else:
downHRUindex.loc[index_r, index_c] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what any of this means - could you provide either some comments about what these are or rename some of the variables/raw integers so that this is easier to understand?

Comment on lines +134 to +197
if lulc_df['vegTypeIndex'].loc[index] == 11 and 12 and 13 and 14 and 15 and 16 and 17:
lulc['vegTypeIndex'][index] = 1
elif lulc_df['vegTypeIndex'].loc[index] == 22 and 24:
lulc['vegTypeIndex'][index] = 4
elif lulc_df['vegTypeIndex'].loc[index] == 21:
lulc['vegTypeIndex'][index] = 5
elif lulc_df['vegTypeIndex'].loc[index] == 23:
lulc['vegTypeIndex'][index] = 7
elif lulc_df['vegTypeIndex'].loc[index] == 32 and 33:
lulc['vegTypeIndex'][index] = 9
elif lulc_df['vegTypeIndex'].loc[index] == 41:
lulc['vegTypeIndex'][index] = 11
elif lulc_df['vegTypeIndex'].loc[index] == 42:
lulc['vegTypeIndex'][index] = 13
elif lulc_df['vegTypeIndex'].loc[index] == 43:
lulc['vegTypeIndex'][index] = 15
elif lulc_df['vegTypeIndex'].loc[index] == 51 and 52 and 53 and 54 and 72:
lulc['vegTypeIndex'][index] = 16
elif lulc_df['vegTypeIndex'].loc[index] == 31 and 62:
lulc['vegTypeIndex'][index] = 17
elif lulc_df['vegTypeIndex'].loc[index] == 61:
lulc['vegTypeIndex'][index] = 18
elif lulc_df['vegTypeIndex'].loc[index] == 82:
lulc['vegTypeIndex'][index] = 20
elif lulc_df['vegTypeIndex'].loc[index] == 81:
lulc['vegTypeIndex'][index] = 21
elif lulc_df['vegTypeIndex'].loc[index] == 84:
lulc['vegTypeIndex'][index] = 22
elif lulc_df['vegTypeIndex'].loc[index] == 71 and 74 and 75 and 76 and 77 and 83:
lulc['vegTypeIndex'][index] = 23
elif lulc_df['vegTypeIndex'].loc[index] == 91 and 92:
lulc['vegTypeIndex'][index] = 24
elif lulc_df['vegTypeIndex'].loc[index] == 73:
lulc['vegTypeIndex'][index] = 27
else:
pass

elif option == "MODIFIED_IGBP_MODIS_NOAH":
for index, raster in lulc_df.iterrows():
if lulc_df['vegTypeIndex'].loc[index] == 42:
lulc['vegTypeIndex'][index] = 2
elif lulc_df['vegTypeIndex'].loc[index] == 41:
lulc['vegTypeIndex'][index] = 4
elif lulc_df['vegTypeIndex'].loc[index] == 43:
lulc['vegTypeIndex'][index] = 5
elif lulc_df['vegTypeIndex'].loc[index] == 61 and 62:
lulc['vegTypeIndex'][index] = 11
elif lulc_df['vegTypeIndex'].loc[index] == 21 and 22:
lulc['vegTypeIndex'][index] = 12
elif lulc_df['vegTypeIndex'].loc[index] == 11 and 12 and 13 and 14 and 16 and 17:
lulc['vegTypeIndex'][index] = 13
elif lulc_df['vegTypeIndex'].loc[index] == 23 and 24 and 31 and 32 and 33:
lulc['vegTypeIndex'][index] = 14
elif lulc_df['vegTypeIndex'].loc[index] == 91 and 92:
lulc['vegTypeIndex'][index] = 15
elif lulc_df['vegTypeIndex'].loc[index] == 51 and 52 and 53 and 54:
lulc['vegTypeIndex'][index] = 17
elif lulc_df['vegTypeIndex'].loc[index] == 84:
lulc['vegTypeIndex'][index] = 19
elif lulc_df['vegTypeIndex'].loc[index] == 71 and 72 and 73 and 74 and 75 and 76 and 77 and 81 and 82 and 83:
lulc['vegTypeIndex'][index] = 20
else:
pass
return lulc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, just having these big chains of elifs on raw integers is very mysterious. I have no idea what each of these stands for.

lulc = pd.DataFrame(
np.zeros((lulc_df.shape[0], lulc_df.shape[1])), columns=['vegTypeIndex'])
lulc = lulc.astype('int64')
if option == "USGS" and "USGS-RUC":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic here is not what you want. The and "USGS-RUC" part doesn't actually do anything. I think you mean if option == "USGS" or option == "USGS-RUC", which could also be written option in ["USGS", "USGS-RUC"].

pass
return lulc

def gru_local_attributes(self, gru_name, gru_num, dir_grassdata, hru_start_num, hru_area, contour_length=10, slopeTypeIndex=1, vegTypeIndex_option='USGS'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a lot of very long lines here.

"""

# Create a netCDF file
initial_cond = Dataset(name, "w", format="NETCDF3_CLASSIC")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use xarray here just to be internally consistent. Also, is there a reason you're using NETCDF3_CLASSIC? If it's not a hard constraint I'd rather use something either newer, or (even better IMO) agnostic to the version.

@@ -0,0 +1,95 @@
library(rgrass7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my first look through it's not entirely clear where these are used. Could you point me to where these are called? I'd prefer not to introduce too many external scripts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes for all of the scripts in preprocessing/meta/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you just provide some sort of header comments on what these actually do?

@DaveCasson
Copy link
Contributor

DaveCasson commented Nov 11, 2021

Hi David and Andrew,
I wanted to point to: https://github.com/CH-Earth/summaWorkflow_public to raise awareness of this development. This is led by Wouter Knoben. Andrew will know the development as I see him on the publication ;) Also a new post-doc Bart van Osnabrugge will be looking at scripting to combine efforts in summa model development.
It would be worth a conversation about respective efforts for summa input building.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants