Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read a polygon with a hole from a ShapeFile #80

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

DGuidi
Copy link
Contributor

@DGuidi DGuidi commented Jan 20, 2022

To investigate #70

@DGuidi DGuidi self-assigned this Jan 20, 2022
@DGuidi
Copy link
Contributor Author

DGuidi commented Jan 20, 2022

from Shapefile specs, page 8-9

The order of vertices or orientation for a ring indicates which side of the ring is the interior of the polygon... Vertices of rings defining holes in polygons are in a counterclockwise direction. Vertices for a single, ringed polygon are, therefore, always in clockwise order.
The order of rings in the points array is not significant.

so basically testing for rings CW and CCW orientation is the correct way of doing things.
Maybe is this code that try to clean data that is inherently dirty?

…hat are not inside any shell, due mainly to invalid inputs with shell and holes not correctly oriented - as separate shells
@DGuidi DGuidi marked this pull request as ready for review January 21, 2022 09:27
@DGuidi
Copy link
Contributor Author

DGuidi commented Jan 21, 2022

The shell_bad_ccw.shp contains a single polygon, with:

  • a shell CCW-oriented (like a hole from ESRI specs)
  • a hole CW-oriented (like a shell from ESRI specs)

The idea of this PR is to let the ShapefileReader reads this kind of data as two separate shells, so a MultiPolygon is returned.

@FObermaier
Copy link
Member

The idea of this PR is to let the ShapefileReader reads this kind of data as two separate shells, so a MultiPolygonis returned.

Does one polygon contain the other? In that case the result would be invalid.

@DGuidi
Copy link
Contributor Author

DGuidi commented Jan 21, 2022

Does one polygon contain the other? In that case the result would be invalid.

Yep, exactly (made explicit in the unit test).

I see few different strategies:

  1. let the develop behavior: now a single polygon is read, and it's the hole (interpreted as a shell from NTS). the actual shell (interpreted as a hole from NTS) is discarded from the output because NTS doesn't find any shell that can be considered a "shell for this hole" (sorry for the poor explaination)
  2. accept the PR behavior: both polygons are read and returned, the invalid data is appended at the end of the multipolygon and the geometry is not valid. It's up to the consumer to fix the geometry applying the logic.
  3. throw an exception if the data is not valid (i.e.: unassigned data is found); this means that ShapefileReader.Read() returns false, because of this code and the reason must be inspected checking the windows messages (not good, but removing the catch is a breaking change).
  4. PUT YOUR IDEA HERE

I'm open to suggestions :)

@DGuidi DGuidi marked this pull request as draft January 21, 2022 13:17
@FObermaier
Copy link
Member

@DGuidi , I think we should

  1. Order the polygon rings by their area (as if they were Polygons without holes)
  2. Pick the largest as 1st shell
  3. Check the others if they are contained by any of the shells and if so, make them a hole of that shell, otherwise create a new shell (in that case we have a MultiPolygon).
  4. Repeat 3 until there are no more rings left
  5. Build the valid [Multi]Polygon.

I think the PR I pointed you to yesterday does it in a similar way.

@DGuidi
Copy link
Contributor Author

DGuidi commented Jan 21, 2022

I think the PR I pointed you to yesterday does it in a similar way

I will do a check, thanks
The code in the WIP PR looks similar to the code actually in the standard lib to me

@DGuidi
Copy link
Contributor Author

DGuidi commented Jan 24, 2022

I'm working on resolving an issue with this geom:
MULTIPOLYGON (((-124.134 -79.199, -124.141 -79.316, -124.164 -79.431, -124.202 -79.542, -124.254 -79.647, -124.319 -79.745, -124.396 -79.833, -124.484 -79.91, -124.582 -79.975, -124.687 -80.027, -124.798 -80.065, -124.913 -80.088, -125.03 -80.095, -125.147 -80.088, -125.262 -80.065, -125.373 -80.027, -125.478 -79.975, -125.576 -79.91, -125.664 -79.833, -125.741 -79.745, -125.806 -79.647, -125.858 -79.542, -125.896 -79.431, -125.919 -79.316, -125.926 -79.199, -125.919 -79.082, -125.896 -78.967, -125.858 -78.856, -125.806 -78.751, -125.741 -78.653, -125.664 -78.565, -125.576 -78.488, -125.478 -78.423, -125.373 -78.371, -125.262 -78.333, -125.147 -78.31, -125.03 -78.303, -124.913 -78.31, -124.798 -78.333, -124.687 -78.371, -124.582 -78.423, -124.484 -78.488, -124.396 -78.565, -124.319 -78.653, -124.254 -78.751, -124.202 -78.856, -124.164 -78.967, -124.141 -79.082, -124.134 -79.199),(-124.438 -79.199, -124.443 -79.122, -124.459 -79.046, -124.483 -78.973, -124.518 -78.903, -124.561 -78.839, -124.612 -78.781, -124.67 -78.73, -124.734 -78.687, -124.804 -78.652, -124.877 -78.628, -124.953 -78.612, -125.03 -78.607, -125.107 -78.612, -125.183 -78.628, -125.256 -78.652, -125.326 -78.687, -125.39 -78.73, -125.448 -78.781, -125.499 -78.839, -125.542 -78.903, -125.577 -78.973, -125.601 -79.046, -125.617 -79.122, -125.622 -79.199, -125.617 -79.276, -125.601 -79.352, -125.577 -79.425, -125.542 -79.495, -125.499 -79.559, -125.448 -79.617, -125.39 -79.668, -125.326 -79.711, -125.256 -79.746, -125.183 -79.77, -125.107 -79.786, -125.03 -79.791, -124.953 -79.786, -124.877 -79.77, -124.804 -79.746, -124.734 -79.711, -124.67 -79.668, -124.612 -79.617, -124.561 -79.559, -124.518 -79.495, -124.483 -79.425, -124.459 -79.352, -124.443 -79.276, -124.438 -79.199)),((-124.582 -79.199, -124.586 -79.257, -124.597 -79.315, -124.616 -79.371, -124.642 -79.423, -124.674 -79.472, -124.713 -79.516, -124.757 -79.555, -124.806 -79.587, -124.858 -79.613, -124.914 -79.632, -124.972 -79.643, -125.03 -79.647, -125.088 -79.643, -125.146 -79.632, -125.202 -79.613, -125.254 -79.587, -125.303 -79.555, -125.347 -79.516, -125.386 -79.472, -125.418 -79.423, -125.444 -79.371, -125.463 -79.315, -125.474 -79.257, -125.478 -79.199, -125.474 -79.141, -125.463 -79.083, -125.444 -79.027, -125.418 -78.975, -125.386 -78.926, -125.347 -78.882, -125.303 -78.843, -125.254 -78.811, -125.202 -78.785, -125.146 -78.766, -125.088 -78.755, -125.03 -78.751, -124.972 -78.755, -124.914 -78.766, -124.858 -78.785, -124.806 -78.811, -124.757 -78.843, -124.713 -78.882, -124.674 -78.926, -124.642 -78.975, -124.616 -79.027, -124.597 -79.083, -124.586 -79.141, -124.582 -79.199),(-124.896 -79.199, -124.897 -79.181, -124.9 -79.164, -124.906 -79.148, -124.914 -79.132, -124.923 -79.117, -124.935 -79.104, -124.948 -79.092, -124.963 -79.083, -124.979 -79.075, -124.995 -79.069, -125.012 -79.066, -125.03 -79.065, -125.048 -79.066, -125.065 -79.069, -125.081 -79.075, -125.097 -79.083, -125.112 -79.092, -125.125 -79.104, -125.137 -79.117, -125.146 -79.132, -125.154 -79.148, -125.16 -79.164, -125.163 -79.181, -125.164 -79.199, -125.163 -79.217, -125.16 -79.234, -125.154 -79.25, -125.146 -79.266, -125.137 -79.281, -125.125 -79.294, -125.112 -79.306, -125.097 -79.315, -125.081 -79.323, -125.065 -79.329, -125.048 -79.332, -125.03 -79.333, -125.012 -79.332, -124.995 -79.329, -124.979 -79.323, -124.963 -79.315, -124.948 -79.306, -124.935 -79.294, -124.923 -79.281, -124.914 -79.266, -124.906 -79.25, -124.9 -79.234, -124.897 -79.217, -124.896 -79.199)))
image

This multipolygon is valid (two separate polygons each with a hole) but using the suggested logic the data from the shapefile is considered a polygon. I'm working on it.

when a potential hole of a shell is found, we check also the shell holes, to see if any hole contains potential hole. If so, the potential hole can not be considered a hole for the shell, but a separate geometry.
@DGuidi
Copy link
Contributor Author

DGuidi commented Jan 25, 2022

@FObermaier I slightly changed the logic.
Now when a potential hole of a shell is found, we check also the shell holes, to see if any hole contains the potential hole. If so, the potential hole can not be considered a hole for the shell, but a separate geometry.
It works but I think I can miss few other edge cases.

@DGuidi DGuidi mentioned this pull request Jan 25, 2022
@DGuidi DGuidi marked this pull request as ready for review February 1, 2022 13:09
@DGuidi DGuidi added the help wanted Extra attention is needed label Feb 3, 2022
@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 7, 2022

@FObermaier any advice on this?

@FObermaier
Copy link
Member

I actually don't quite know how to proceed. There are complaints that current ShapeFile implementation is (dreadfully) slow. That is why I pointed you to the WIP/redo everyting PR. If we do thorough topology test on the rings we read, I assume it gets even worse. Maybe we can enable the thorough test conditionally, if the user of the library really wants it?

@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 7, 2022

I actually don't quite know how to proceed.

Actually, I have the same feeling: with the "new" polygon builder logic all the tests are green, but I fear that some behavior not covered by some test can be broken.

That is why I pointed you to the WIP/redo everyting PR

that shows the same problem, actually

If we do thorough topology test on the rings we read

the "new" polygon builder logic I think is similar to the older one, in the sense that I didn't expected to be too much slower. But actually this is just a feeling, I haven't tried no comparison tests

we can enable the thorough test conditionally

I can work on it: a flag can be helpful also to build some performance test

my2cents

@DGuidi DGuidi marked this pull request as draft February 7, 2022 08:40
TODO: flag needs to be handled differently, suggestions well accepted
@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 7, 2022

@FObermaier added a "temporary" static property as a flag, I need to think a bit better about how to manage the entire "flag" thing... suggestions are welcome!

after a brief inspection: adding a flag property to the shapefiledatareader, means that the same flag shall be copied to the shapefilereader and then to the polygonhandler, that is created in a static factory method of the shapefile class, so we must ensure that any call to the factory method should also handle the flag... additionally, the Shapefile.CreateDataReader needs to accept another parameter, just to enable the new behavior.

viable, not so complicated at all, maybe acceptable for an "experiment", but I hope to find a better alternative

@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 10, 2022

I didn't expected to be too much slower

but actually - based on some poor-man's performance tests - it's much (~3X) slower

    WRITE => elapsed: '00:00:00.4138680'
    ShapeFileDataReader
    flag DISABLED => elapsed: '00:00:00.4381579'
    flag ENABLED => elapsed: '00:00:01.3507803'
    ShapeReader
    flag DISABLED => elapsed: '00:00:00.4276830'
    flag ENABLED => elapsed: '00:00:01.2875299'

@DGuidi DGuidi marked this pull request as ready for review February 11, 2022 08:17
@DGuidi DGuidi added enhancement New feature or request and removed help wanted Extra attention is needed labels Feb 11, 2022
DGuidi and others added 8 commits February 11, 2022 09:26
Added 2 more variants to build the polygons.
* Sequential
  Assumes that polygons are always serialized `shell[, holes][, shell[, holes][, ...]]` and thus starts a new polygon when the current ring is not contained by the current polygon. This is slightly faster than the current default implementation.
* UsePolygonizer
  Throws all rings at the `NetTopologySuite.Operations.Polygonize.Polygonizer` and lets it build the polygon. This is really slow.
@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 17, 2022

just a small note: tests marked as Explicit are always executed in my machine (using VS2019). this is why I used Ignore instead: now I understood that probably it's something related to my setup, and not a general problem.

@FObermaier
Copy link
Member

AFAIK Explicit will always run from within the UI (VisualStudio).

@FObermaier
Copy link
Member

In GDAL/OGR shapefile driver, all rings are converted to Polygons and afterwards sent to organizePolygons:
https://github.com/OSGeo/gdal/blob/363178015ae009fa1187d8c1e20b3587db43aa58/ogr/ogrgeometryfactory.cpp#L1455-L2144

I have not bisected the algorithm altogether, but it seems to be a bit of everything...

@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 17, 2022

AFAIK Explicit will always run from within the UI (VisualStudio).

exactly what happens to me

@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 17, 2022

"For faster computation, a polygon is considered to be inside another one if a single point of its external ring is included into the other one"

maybe we can refactor this check to speed up a bit the computation:

bool IsHoleContainedInShell(LinearRing shell, LinearRing hole) =>
   //shell.EnvelopeInternal.Contains(hole.EnvelopeInternal) &&
   PointLocation.IsInRing(hole.GetCoordinateN(0), shell.Coordinates);

@FObermaier
Copy link
Member

I'd assume that without the cheap envelope precondition test we'll see a performance degeneration.

@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 17, 2022

I'd assume that without the cheap envelope precondition test we'll see a performance degeneration.

I just mean that IsHoleContainedInShell check should just use PointLocation.IsInRing

@FObermaier
Copy link
Member

I got that, but PointLocation.IsInRing is the expensive test (compared to shell.EnvelopeInternal.Contains(hole.EnvelopeInternal)). If we don't do the envelope test, the PointLocation.IsInRing test is performed more often which will lead to performance degeneration.

@DGuidi
Copy link
Contributor Author

DGuidi commented Feb 17, 2022

If we don't do the envelope test, the PointLocation.IsInRing test is performed more often which will lead to performance degeneration.

yep now I see your point. curiously, from performance tests I see performance improvements

@FObermaier
Copy link
Member

FObermaier commented Feb 17, 2022

yep now I see your point. curiously, from performance tests I see performance improvements

If you only have polygons with holes that might be true, but for multipolygons whose shell ring envelopes don't intersect, I doubt that.

@DGuidi DGuidi marked this pull request as draft March 21, 2022 13:45
@DGuidi DGuidi marked this pull request as ready for review March 21, 2022 13:45
@DGuidi
Copy link
Contributor Author

DGuidi commented Mar 22, 2022

can be this PR cancelled, considering the discussion in #81, and in particular this comment ? Maybe part of this work can be added to #69 , if worthy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants