refactor to use catchment id

Uses catchment id as a default rather then wb_id to reduce confusion. Now when selecting a catchment, the outflow nexus is determined, then everything upstream of that nexus is subset. rather than just what is upstream of the inflow to that catchment
CIROH-UA · Aug 9, 2024 · 3fc4e69 · 3fc4e69
1 parent fe06490
commit 3fc4e69
Show file tree

Hide file tree

Showing 18 changed files with 309 additions and 271 deletions.
diff --git a/README.md b/README.md
@@ -92,28 +92,28 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
 ## Arguments
 
 - `-h`, `--help`: Show the help message and exit.
-- `-i INPUT_FILE`, `--input_file INPUT_FILE`: Path to a CSV or TXT file containing a list of waterbody IDs, lat/lon pairs, or gage IDs; or a single waterbody ID (e.g., `wb-5173`), a single lat/lon pair, or a single gage ID.
-- `-l`, `--latlon`: Use latitude and longitude instead of waterbody IDs. When used with `-i`, the file should contain lat/lon pairs.
-- `-g`, `--gage`: Use gage IDs instead of waterbody IDs. When used with `-i`, the file should contain gage IDs.
-- `-s`, `--subset`: Subset the hydrofabric to the given waterbody IDs, locations, or gage IDs.
-- `-f`, `--forcings`: Generate forcings for the given waterbody IDs, locations, or gage IDs.
-- `-r`, `--realization`: Create a realization for the given waterbody IDs, locations, or gage IDs.
+- `-i INPUT_FILE`, `--input_file INPUT_FILE`: Path to a CSV or TXT file containing a list of catchment IDs, lat/lon pairs, or gage IDs; or a single catchment ID (e.g., `cat-5173`), a single lat/lon pair, or a single gage ID.
+- `-l`, `--latlon`: Use latitude and longitude instead of catchment IDs. When used with `-i`, the file should contain lat/lon pairs.
+- `-g`, `--gage`: Use gage IDs instead of catchment IDs. When used with `-i`, the file should contain gage IDs.
+- `-s`, `--subset`: Subset the hydrofabric to the given catchment IDs, locations, or gage IDs.
+- `-f`, `--forcings`: Generate forcings for the given catchment IDs, locations, or gage IDs.
+- `-r`, `--realization`: Create a realization for the given catchment IDs, locations, or gage IDs.
 - `--start_date START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
 - `--end_date END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
-- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the subset to be created (default is the first waterbody ID in the input file).
+- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the subset to be created (default is the first catchment ID in the input file).
 
 ## Examples
 
 `-l`, `-g`, `-s`, `-f`, `-r` can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.
 
-1. Subset hydrofabric using waterbody IDs:
+1. Subset hydrofabric using catchment IDs:
    ```
-   python -m ngiab_data_cli -i waterbody_ids.txt -s
+   python -m ngiab_data_cli -i catchment_ids.txt -s
    ```
 
-2. Generate forcings using a single waterbody ID:
+2. Generate forcings using a single catchment ID:
    ```
-   python -m ngiab_data_cli -i wb-5173 -f --start_date 2023-01-01 --end_date 2023-12-31
+   python -m ngiab_data_cli -i cat-5173 -f --start_date 2023-01-01 --end_date 2023-12-31
    ```
 
 3. Create realization using lat/lon pairs from a CSV file:
@@ -138,22 +138,22 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
 
 ## File Formats
 
-### 1. Waterbody ID input:
-- CSV file: A single column of waterbody IDs, or a column named 'wb_id', 'waterbody_id', or 'divide_id'.
-- TXT file: One waterbody ID per line.
+### 1. Catchment ID input:
+- CSV file: A single column of catchment IDs, or a column named 'cat_id', 'catchment_id', or 'divide_id'.
+- TXT file: One catchment ID per line.
 
-Example CSV (waterbody_ids.csv):
+Example CSV (catchment_ids.csv):
 ```
-wb_id,soil_type
-wb-5173,some
-wb-5174,data
-wb-5175,here
+cat_id,soil_type
+cat-5173,some
+cat-5174,data
+cat-5175,here
 ```
 Or:
 ```
-wb-5173
-wb-5174
-wb-5175
+cat-5173
+cat-5174
+cat-5175
 ```
 
 ### 2. Lat/Lon input:
@@ -195,6 +195,6 @@ Or:
 
 ## Output
 
-The script creates an output folder named after the first waterbody ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, and realization creation operations.
+The script creates an output folder named after the first catchment ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, and realization creation operations.
 
 </details>
diff --git a/modules/data_processing/create_realization.py b/modules/data_processing/create_realization.py
@@ -130,7 +130,7 @@ def make_noahowp_config(
 
 
 def configure_troute(
-    wb_id: str, config_dir: Path, start_time: datetime, end_time: datetime
+    cat_id: str, config_dir: Path, start_time: datetime, end_time: datetime
 ) -> int:
     with open(file_paths.template_troute_config(), "r") as file:
         troute = yaml.safe_load(file)  # Use safe_load for loading
@@ -140,7 +140,7 @@ def configure_troute(
     network_topology = troute["network_topology_parameters"]
     supernetwork_params = network_topology["supernetwork_parameters"]
 
-    geo_file_path = f"/ngen/ngen/data/config/{wb_id}_subset.gpkg"
+    geo_file_path = f"/ngen/ngen/data/config/{cat_id}_subset.gpkg"
     supernetwork_params["geo_file_path"] = geo_file_path
 
     troute["compute_parameters"]["restart_parameters"]["start_datetime"] = start_time.strftime(
@@ -177,10 +177,10 @@ def make_ngen_realization_json(
         json.dump(realization, file, indent=4)
 
 
-def create_realization(wb_id: str, start_time: datetime, end_time: datetime):
+def create_realization(cat_id: str, start_time: datetime, end_time: datetime):
     # quick wrapper to get the cfe realization working
     # without having to refactor this whole thing
-    paths = file_paths(wb_id)
+    paths = file_paths(cat_id)
 
     # make cfe init config files
     cfe_atts_path = paths.config_dir() / "cfe_noahowp_attributes.csv"
@@ -191,7 +191,7 @@ def create_realization(wb_id: str, start_time: datetime, end_time: datetime):
     make_noahowp_config(paths.config_dir(), cfe_atts_path, start_time, end_time)
 
     # make troute config files
-    num_timesteps = configure_troute(wb_id, paths.config_dir(), start_time, end_time)
+    num_timesteps = configure_troute(cat_id, paths.config_dir(), start_time, end_time)
 
     # create the realization
     make_ngen_realization_json(paths.config_dir(), start_time, end_time, num_timesteps)
@@ -200,9 +200,9 @@ def create_realization(wb_id: str, start_time: datetime, end_time: datetime):
 
 
 if __name__ == "__main__":
-    wb_id = "wb-1643991"
+    cat_id = "cat-1643991"
     start_time = datetime(2010, 1, 1, 0, 0, 0)
     end_time = datetime(2010, 1, 2, 0, 0, 0)
     # output_interval = 3600
     # nts = 2592
-    create_realization(wb_id, start_time, end_time)
+    create_realization(cat_id, start_time, end_time)
diff --git a/modules/data_processing/file_paths.py b/modules/data_processing/file_paths.py
@@ -11,15 +11,15 @@ class file_paths:
 
     config_file = Path("~/.NGIAB_data_preprocess").expanduser()
 
-    def __init__(self, wb_id: str):
+    def __init__(self, cat_id: str):
         """
         Initialize the file_paths class with a water body ID.
         The following functions require a water body ID:
         config_dir, forcings_dir, geopackage_path, cached_nc_file
         Args:
-            wb_id (str): Water body ID.
+            cat_id (str): Water body ID.
         """
-        self.wb_id = wb_id
+        self.cat_id = cat_id
 
     @staticmethod
     def get_working_dir() -> Path:
@@ -101,7 +101,7 @@ def template_noahowp_config() -> Path:
         return file_paths.data_sources() / "noah-owp-modular-init.namelist.input"
 
     def subset_dir(self) -> Path:
-        return file_paths.root_output_dir() / self.wb_id
+        return file_paths.root_output_dir() / self.cat_id
 
     def config_dir(self) -> Path:
         return file_paths.subset_dir(self) / "config"
@@ -110,7 +110,7 @@ def forcings_dir(self) -> Path:
         return file_paths.subset_dir(self) / "forcings"
 
     def geopackage_path(self) -> Path:
-        return self.config_dir() / f"{self.wb_id}_subset.gpkg"
+        return self.config_dir() / f"{self.cat_id}_subset.gpkg"
 
     def cached_nc_file(self) -> Path:
         return file_paths.subset_dir(self) / "merged_data.nc"

diff --git a/modules/data_processing/forcings.py b/modules/data_processing/forcings.py
@@ -190,8 +190,8 @@ def compute_zonal_stats(
     )
 
 
-def setup_directories(wb_id: str) -> file_paths:
-    forcing_paths = file_paths(wb_id)
+def setup_directories(cat_id: str) -> file_paths:
+    forcing_paths = file_paths(cat_id)
     for folder in ["by_catchment", "temp"]:
         os.makedirs(forcing_paths.forcings_dir() / folder, exist_ok=True)
     return forcing_paths
@@ -220,8 +220,8 @@ def create_forcings(start_time: str, end_time: str, output_folder_name: str) ->
     # Example usage
     start_time = "2010-01-01 00:00"
     end_time = "2010-01-02 00:00"
-    output_folder_name = "wb-1643991"
-    # looks in output/wb-1643991/config for the geopackage wb-1643991_subset.gpkg
-    # puts forcings in output/wb-1643991/forcings
+    output_folder_name = "cat-1643991"
+    # looks in output/cat-1643991/config for the geopackage cat-1643991_subset.gpkg
+    # puts forcings in output/cat-1643991/forcings
     logger.basicConfig(level=logging.DEBUG)
     create_forcings(start_time, end_time, output_folder_name)
diff --git a/modules/data_processing/gpkg_utils.py b/modules/data_processing/gpkg_utils.py
@@ -121,22 +121,22 @@ def blob_to_centroid(blob: bytes) -> Point:
     return Point(x, y)
 
 
-def get_wbid_from_point(coords):
+def get_catid_from_point(coords):
     """
-    Retrieves the watershed boundary ID (wbid) of the watershed that contains the given point.
+    Retrieves the watershed boundary ID (catid) of the watershed that contains the given point.
 
     Args:
         coords (dict): A dictionary containing the latitude and longitude coordinates of the point.
             Example: {"lat": 40.7128, "lng": -74.0060}
 
     Returns:
-        int: The watershed boundary ID (wbid) of the watershed containing the point.
+        int: The watershed boundary ID (catid) of the watershed containing the point.
 
     Raises:
         IndexError: If no watershed boundary is found for the given point.
 
     """
-    logger.info(f"Getting wbid for {coords}")
+    logger.info(f"Getting catid for {coords}")
     q = file_paths.conus_hydrofabric()
     d = {"col1": ["point"], "geometry": [Point(coords["lng"], coords["lat"])]}
     point = gpd.GeoDataFrame(d, crs="EPSG:4326")
@@ -297,7 +297,7 @@ def get_table_crs(gpkg: str, table: str) -> str:
     return crs
 
 
-def get_wb_from_gage_id(gage_id: str, gpkg: Path = file_paths.conus_hydrofabric()) -> str:
+def get_cat_from_gage_id(gage_id: str, gpkg: Path = file_paths.conus_hydrofabric()) -> str:
     """
     Get the nexus id of associated with a gage id.
 
@@ -312,13 +312,13 @@ def get_wb_from_gage_id(gage_id: str, gpkg: Path = file_paths.conus_hydrofabric(
 
     """
     gage_id = "".join([x for x in gage_id if x.isdigit()])
-    logger.info(f"Getting wbid for {gage_id}, in {gpkg}")
+    logger.info(f"Getting catid for {gage_id}, in {gpkg}")
     with sqlite3.connect(gpkg) as con:
         sql_query = f"SELECT id FROM hydrolocations WHERE hl_uri = 'Gages-{gage_id}'"
         nex_id = con.execute(sql_query).fetchone()[0]
         sql_query = f"SELECT id FROM network WHERE toid = '{nex_id}'"
-        wb_id = con.execute(sql_query).fetchall()
-        wb_ids = [str(x[0]) for x in wb_id]
+        cat_id = con.execute(sql_query).fetchall()
+        cat_ids = [str(x[0]) for x in cat_id]
     if nex_id is None:
         raise IndexError(f"No nexus found for gage ID {gage_id}")
-    return wb_ids
+    return cat_ids
diff --git a/modules/data_processing/graph_utils.py b/modules/data_processing/graph_utils.py
@@ -84,6 +84,36 @@ def get_graph() -> ig.Graph:
     return network_graph
 
 
+def get_outlet_id(wb_or_cat_id: str) -> str:
+    """
+    Retrieves the ID of the node downstream of the given node in the hydrological network.
+
+    Given a node name, this function identifies the downstream node in the network, effectively tracing the water flow
+    towards the outlet.
+
+    When finding the upstreams of a 'wb' waterbody or 'cat' catchment, what we actually want is the upstreams of the outlet of the 'wb'.
+
+    Args:
+        name (str): The name of the node.
+
+    Returns:
+        str: The ID of the node downstream of the specified node.
+    """
+    # all the watebody and catchment IDs are the same, but the graph nodes are named wb-<id>
+    # remove everything that isn't a digit, then prepend wb- to get the graph node name
+    stem = "".join(filter(str.isdigit, wb_or_cat_id))
+    name = f"wb-{stem}"
+    graph = get_graph()
+    node_index = graph.vs.find(name=name).index
+    # this returns the current node, and every node downstream of it in order
+    downstream_node = graph.subcomponent(node_index, mode="OUT")
+    if len(downstream_node) >= 2:
+        # if there is more than one node in the list,
+        # then the second is the downstream node of the first
+        return graph.vs[downstream_node[1]]["name"]
+    return None
+
+
 def get_upstream_ids(names: Union[str, List[str]]) -> Set[str]:
     """
     Retrieves IDs of all nodes upstream of the given nodes in the hydrological network.
@@ -102,6 +132,8 @@ def get_upstream_ids(names: Union[str, List[str]]) -> Set[str]:
         names = [names]
     parent_ids = set()
     for name in names:
+        if "wb" in name or "cat" in name:
+            name = get_outlet_id(name)
         if name in parent_ids:
             continue
         node_index = graph.vs.find(name=name).index

diff --git a/modules/data_processing/subset.py b/modules/data_processing/subset.py
@@ -66,10 +66,10 @@ def subset_parquet(ids: List[str], paths: file_paths) -> None:
 
 
 def subset(
-    wb_ids: List[str], hydrofabric: str = file_paths.conus_hydrofabric(), subset_name: str = None
+    cat_ids: List[str], hydrofabric: str = file_paths.conus_hydrofabric(), subset_name: str = None
 ) -> str:
 
-    upstream_ids = get_upstream_ids(wb_ids)
+    upstream_ids = get_upstream_ids(cat_ids)
 
     if not subset_name:
         # if the name isn't provided, use the first upstream id

diff --git a/modules/map_app/__main__.py b/modules/map_app/__main__.py
@@ -80,6 +80,7 @@ def open_browser():
 
 def set_logs_to_warning():
     logging.getLogger("werkzeug").setLevel(logging.WARNING)
+    console_handler.setLevel(logging.DEBUG)
 
 
 if __name__ == "__main__":

diff --git a/modules/map_app/static/css/colors.css b/modules/map_app/static/css/colors.css
@@ -1,12 +1,12 @@
 /* colourblind safe taken from https://personal.sron.nl/~pault/ */
 :root {
-    --selected-wb-outline: rgba(238, 51, 119, 0.7);
-    --selected-wb-fill: rgba(238, 51, 119, 0.316);
+    --selected-cat-outline: rgba(238, 51, 119, 0.7);
+    --selected-cat-fill: rgba(238, 51, 119, 0.316);
 
-    --upstream-wb-outline: rgba(238, 119, 51, 0.7);
-    --upstream-wb-fill: rgba(238, 119, 51, 0.278);
+    --upstream-cat-outline: rgba(238, 119, 51, 0.7);
+    --upstream-cat-fill: rgba(238, 119, 51, 0.278);
 
-    --flowline-to-wb-outline: rgba(0, 153, 136, 1);
+    --flowline-to-cat-outline: rgba(0, 153, 136, 1);
     --flowline-to-nexus-outline: rgba(0, 119, 187, 1);
 
     --nexus-outline: rgba(1, 1, 1, 0.5);
@@ -15,13 +15,13 @@
 }
 
 .high-contrast {
-    --selected-wb-outline: rgba(0, 68, 136, 1);
-    --selected-wb-fill: rgba(0, 68, 136, 0.316);
+    --selected-cat-outline: rgba(0, 68, 136, 1);
+    --selected-cat-fill: rgba(0, 68, 136, 0.316);
 
-    --upstream-wb-outline: rgba(221, 170, 51, 1);
-    --upstream-wb-fill: rgba(221, 170, 51, 0.278);
+    --upstream-cat-outline: rgba(221, 170, 51, 1);
+    --upstream-cat-fill: rgba(221, 170, 51, 0.278);
 
-    --flowline-to-wb-outline: #000000;
+    --flowline-to-cat-outline: #000000;
     --flowline-to-nexus-outline: #BB5566;
 
     --nexus-outline: rgba(1, 1, 1, 0.5);

diff --git a/modules/map_app/static/css/legend.css b/modules/map_app/static/css/legend.css
@@ -49,16 +49,16 @@
     margin: 5px;
 }
 
-#legend_selected_wb_layer_icon {
-    background-color: var(--selected-wb-outline);
+#legend_selected_cat_layer_icon {
+    background-color: var(--selected-cat-outline);
 }
 
 #legend_upstream_layer_icon {
-    background-color: var(--upstream-wb-outline);
+    background-color: var(--upstream-cat-outline);
 }
 
-#legend_to_wb_icon {
-    background-color: var(--flowline-to-wb-outline);
+#legend_to_cat_icon {
+    background-color: var(--flowline-to-cat-outline);
 }
 
 #legend_to_nexus_icon {