Discussion:
[Community] Shapely: How to efficiently create many Polygons from a numpy array
Matthias Müller
2013-07-31 09:15:05 UTC
Permalink
Hello everybody,

is there an efficient way to create a large number of polygons from a
numpy array?

In my case I have one array "vericesXY" which contains the vertices of
2108 polygons:

'''
some code that calculates the vertices
and writes them to an array verticesXY
'''

print verticesXY.shape
(379440, 2)

Then I obtain each polygon's vertices by splitting verticesXY:
(all polygons happen to have the same number of corners)

verticesXYList = np.array_split(verticesXY, 2108)
polys = []
for item in verticesXYList:
polys.append(Polygon(item))


These are the timings:
- calculate vertices: 0.8 sec
- create polygons: 3.0 sec

So the current bottleneck is not the vertex computation buth the loop
that creates the objects.

My next idea was to avoid that loop and speed up instantiation by first
creating a multipolygon from the verticesXYList and then obtain the
individual polygons with

multiPoly = MultiPolygon(someNumpyArray)
polys = multipolygon.geoms

But there I got stuck with the MultiPolygon constructor for which I
can't generate an appropriate numpy array of the right form [1]. - What
gives me headaches are the mandatory empty elements that represent the
holes.


- Matthias

[1]: (taken from shapely source code documentation)
ob = MultiPolygon([
(( (0.0, 0.0), (0.0, 1.0), (1.0, 1.0), (1.0, 0.0) ), []),
(( (2.0, 2.0), (2.0, 3.0), (3.0, 3.0), (3.0, 2.0) ), []),
...
])
Oliver Tonnhofer
2013-07-31 09:51:59 UTC
Permalink
Hi,
is there an efficient way to create a large number of polygons from a numpy array?
2108 polygons are not much. Do you have the speedups enabled?
http://toblerity.github.io/shapely/manual.html#performance



Gru?/Regards,
Oliver
Matthias Müller
2013-07-31 11:40:32 UTC
Permalink
Hi Oliver,

that was a good hint indeed.

However, with speedups activated, the script fails. Here's what happens:

speedups *enabled*
- all polygons evaluate to false (polygon.is_valid)
- cascaded_union fails

speedups *disabled*
- all polygons evaluate to true
- cascaded_union succeeds

Any idea what is going on here?

Thanks,
Matthias
Post by Oliver Tonnhofer
Hi,
is there an efficient way to create a large number of polygons from a numpy array?
2108 polygons are not much. Do you have the speedups enabled?
http://toblerity.github.io/shapely/manual.html#performance
Gru?/Regards,
Oliver
_______________________________________________
Community mailing list
Community at lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community
Oliver Tonnhofer
2013-07-31 13:50:09 UTC
Permalink
Post by Matthias Müller
that was a good hint indeed.
speedups *enabled*
- all polygons evaluate to false (polygon.is_valid)
- cascaded_union fails
speedups *disabled*
- all polygons evaluate to true
- cascaded_union succeeds
Any idea what is going on here?
Strange. Are the speedups available? Does it work when you create Polygons from lists/tuples instead of NumPy?

Gru?/Regards,
Oliver
Matthias Müller
2013-07-31 14:29:52 UTC
Permalink
Post by Oliver Tonnhofer
Are the speedups available? Does it work when you create Polygons from lists/tuples instead of NumPy?
Speedups are available and Polygons are valid when I do it by hand in
IDLE [1].

So it looks like a debugging exercise, but I haven't a real idea what
the problem might be. Why could the speedups behave differently compared
to the "regular" mode?

Matthias
speedups.enable(); print "Speedups enabled"

Speedups enabled
Post by Oliver Tonnhofer
poly1 = Polygon(( (0.0, 0.0), (0.0, 1.0), (1.0, 1.0), (1.0, 0.0) ))
poly1.is_valid
True
Post by Oliver Tonnhofer
poly2 = Polygon([ [0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0] ])
poly2.is_valid
True
Post by Oliver Tonnhofer
pArray = np.asarray([ [0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0] ])
poly3 = Polygon(pArray)
poly3.is_valid
True
Oliver Tonnhofer
2013-08-01 06:04:23 UTC
Permalink
Post by Oliver Tonnhofer
Are the speedups available? Does it work when you create Polygons from lists/tuples instead of NumPy?
Speedups are available and Polygons are valid when I do it by hand in IDLE [1].
So it looks like a debugging exercise, but I haven't a real idea what the problem might be. Why could the speedups behave differently compared to the "regular" mode?
The speedups are for the construction of new geometries and it makes direct access to the NumPy data[0].

What datatype do you use for the NumPy array? What platform are you on? GEOS version?
Post by Oliver Tonnhofer
pArray = np.asarray([ [0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0] ])
poly3 = Polygon(pArray)
poly3.is_valid
True
Do you have a minimal example where it doesn't work?


[0] https://github.com/Toblerity/Shapely/blob/master/shapely/speedups/_speedups.pyx#L47..L65


Gru?/Regards,
Oliver
Matthias Müller
2013-08-01 08:13:09 UTC
Permalink
Hi Oliver,
Post by Oliver Tonnhofer
What datatype do you use for the NumPy array?
float64
Post by Oliver Tonnhofer
What platform are you on? GEOS version?
Platform: Windows
Shapely Version: 1.8.17, 1.8.18
(I tested both and now operate on 1.8.18)
GEOS Version: 3.3.7 (x86) (shipped with the Windows installer)
Post by Oliver Tonnhofer
Do you have a minimal example where it doesn't work?
That is hard. I trapped into my code [1] and dumped a failing array [2].
However, when I reload that dump [3] and create the polygon, the polygon
creation works fine [4].
When you look at the bounds, you can see that they minx and miny are
completely off in the first case. I have no clue what is going on here;
it could be a shapely bug.


- Matthias


[1]: Trapping code

polys = []
cnt = 0
for item in polyVertices:
p = Polygon(item)
if not p.is_valid:
print "%i of %i" % (cnt, len(pV))
print p.is_valid
print item
print item.dtype
print p.bounds
item.dump("badarray.dmp")
cnt += 1


[2]: Output failing array

2105 of 2107
False
[[ 26.26609749 50.97121149]
[ 26.2496201 50.88475089]
[ -1.87877814 49.57952558]
[ -1.91442624 49.66615456]]
float64
(-41.96802492005479, -76.5401405224277, 26.26609749124117,
50.97121149470595)


[3]: Reload dumped array and test

ar = np.load("badarray.dmp")
print ar
p = Polygon(ar)
print p.is_valid
print ar.dtype
print p.bounds


[4]: Reload dump - output

[[ 26.26609749 50.97121149]
[ 26.2496201 50.88475089]
[ -1.87877814 49.57952558]
[ -1.91442624 49.66615456]]
True
float64
(-1.9144262399854206, 49.57952558069644, 26.26609749124117,
50.97121149470595)
end test - loading dump

Loading...