Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace missing RWD source files #65

Closed
mmcfarland opened this issue Feb 15, 2017 · 5 comments
Closed

Replace missing RWD source files #65

mmcfarland opened this issue Feb 15, 2017 · 5 comments
Assignees
Labels

Comments

@mmcfarland
Copy link

mmcfarland commented Feb 15, 2017

At least catchment 26965 & 27260 are missing in RWD_DATA (Subwatershed_ALL/Subwatershed26965/subwatershed_26965dist.tif) on staging and on the dev data stick that has been distributed. This is responsible for errors being reported as in #26.

  • Is there a way to check what catchments are missing? There should be a directory for every catchment feature in gwmaster.shp
  • Are more catchments missing?
  • Did a single zip fail to extract?
  • Did a single zip fail to get uploaded to the GDrive?

Source: https://drive.google.com/drive/u/1/folders/0B7V8il12WGQJM1JFWkN6bXpOX1k

Those are questions to pursue, but ultimately we need to restore any missing catchment tifs and cut an updated volume drive.

For testing, these two points are in the known missing catchments.

42.307022535158495,-90.41748046874999
42.8115217450979,-90.52734374999999
@kdeloach
Copy link

We discussed creating a data verification script which will scan each directory to check that the files we expect to be there actually exist. The README.txt file on the GDrive has documentation which describes which files should exist in each directory.

Once we obtain a list of missing files using this script, we can determine if these files are present in the zip files on the GDrive.

One challenge I found is that it's not a straightforward process to identify which file contains which subwatershed. The sort order used to group these files does not use a natural numeric sort order. So a zip file may contain subwatersheds 100, 1000, and 10000 instead of 100, 200, 300, etc.

@kdeloach
Copy link

I was able to locate which zip file that a specific subwatershed belongs to by doing the following.

ls -1 /opt/rwd-data/nhd/Subwatershed_ALL/ > dirs.txt
grep -n "26965" dirs.txt

Take the line number yielded from grep and divide by 1000. In this case, the line number was 46299 so I guessed that subwatershed 26965 could be found in MSsub46.zip (this was off by one -- the actual answer is MSsub47.zip).

@mmcfarland
Copy link
Author

The missing catchment zip has been upload to the GDrive. Some additional notes:

I used the operational version with Simple_watershed* shapefiles included. This should avoid the necessity for you to re-extract files from Simple.zip that you previously had done on these files. However this does not include files *ad8, *ord, *plen, tlen that are no longer used, but are in the other Mssub.zip files and may still be in your data unless you deleted them.

@kdeloach kdeloach self-assigned this Feb 20, 2017
@kdeloach
Copy link

I'm going to start pushing this to staging now.

kdeloach pushed a commit to WikiWatershed/model-my-watershed that referenced this issue Feb 20, 2017
This volume contains missing shapefiles from MSsub47.zip.

Connects
WikiWatershed/rapid-watershed-delineation#65
@kdeloach
Copy link

For future reference, I have uploaded RWDProductionVolumeContents20170220.txt.tar.gz to the GDrive. This file contains the full list of RWD files that exist on staging/production as of today.

Here's a sample:

/opt/rwd-data/
└── nhd
    ├── checksums.txt
    ├── Main_Watershed
    │   ├── domainfromregions.dbf
    │   ├── domainfromregions.prj
    │   ├── domainfromregions.sbn
    │   ├── domainfromregions.sbx
    │   ├── domainfromregions.shp
    │   ├── domainfromregions.shx
    │   ├── gwgrid.tfw
    │   ├── gwgrid.tif
    │   ├── gwgrid.tif.aux.xml
    │   ├── gwgrid.tif.ovr
    │   ├── gwgrid.tif.xml
    │   ├── gwmaster.dbf
    │   ├── gwmaster.prj
    │   ├── gwmaster.shp
    │   ├── gwmaster.shx
    │   ├── masterid.txt
    │   └── regions.tif
    ├── README.txt
    ├── ReduceUnneeded.py
    └── Subwatershed_ALL
        ├── Subwatershed126965
        │   ├── Full_watershed126965.cpg
        │   ├── Full_watershed126965.dbf
        │   ├── Full_watershed126965.prj
        │   ├── Full_watershed126965.sbn
        │   ├── Full_watershed126965.sbx
        │   ├── Full_watershed126965.shp
        │   ├── Full_watershed126965.shp.xml
        │   ├── Full_watershed126965.shx
        │   ├── Simple_watershed126965.cpg
        │   ├── Simple_watershed126965.dbf
        │   ├── Simple_watershed126965.prj
        │   ├── Simple_watershed126965.sbn
        │   ├── Simple_watershed126965.sbx
        │   ├── Simple_watershed126965.shp
        │   ├── Simple_watershed126965.shp.xml
        │   ├── Simple_watershed126965.shx
        │   ├── subwatershed_126965.cpg
        │   ├── subwatershed_126965.dbf
        │   ├── subwatershed_126965dist.tif
        │   ├── subwatershed_126965.prj
        │   ├── subwatershed_126965p.tif
        │   ├── subwatershed_126965.shp
        │   ├── subwatershed_126965.shx
        │   ├── subwatershed_126965src1.tif
        │   └── upcatchids.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants