Discussion:
[Community] Shapely Performance
David Stuebe
2013-08-30 20:10:24 UTC
Permalink
Hi GisPython

I am new to using shapley.

I have a few polygons and I am interested in finding out about which of
some millions of points intersect those polygons.

I only need to create the polygons once - there are dozens at most, but
creating millions of point objects is killing my performance big time!

Here is some cprofile output from a test run:

159616551 function calls (159601695 primitive calls) in 307.189
seconds

Ordered by: internal time
List reduced from 1844 to 20 due to restriction <20>

ncalls tottime percall cumtime percall filename:lineno(function)
7518198 135.451 0.000 185.103 0.000 predicates.py:8(__call__)
1253033 22.242 0.000 51.047 0.000
point.py:170(geos_point_from_py)
7518198 16.698 0.000 16.698 0.000 base.py:24(geometry_type_name)
8011 16.586 0.002 302.903 0.038
shapely_intersects.py:26(shape_function)
7518198 14.399 0.000 201.687 0.000 base.py:447(intersects)
8771627 11.716 0.000 38.436 0.000 {hasattr}
15036396 11.607 0.000 45.146 0.000 topology.py:14(_validate)
8771241 7.834 0.000 11.235 0.000 collections.py:119(itervalues)
3759104 7.515 0.000 10.693 0.000 base.py:131(empty)
1253033 7.092 0.000 12.961 0.000 numeric.py:446(require)
1253033 5.791 0.000 63.158 0.000 point.py:105(_set_coords)
37590990 5.679 0.000 5.679 0.000 base.py:162(_get_geom)
7518198 5.659 0.000 23.459 0.000 base.py:242(geometryType)
1253033 4.999 0.000 4.999 0.000 __init__.py:501(cast)
8771283 3.401 0.000 3.401 0.000 collections.py:72(__iter__)
7518198 3.260 0.000 26.719 0.000 base.py:245(type)
3759104 3.178 0.000 3.178 0.000 base.py:124(_is_empty)
1253033 2.874 0.000 66.251 0.000 point.py:38(__init__)
1253033 2.835 0.000 23.231 0.000 coords.py:17(required)
1253036 2.689 0.000 2.689 0.000 {method 'copy' of
'numpy.ndarray' objects}

Would it be possible to pool the point objects and just change out the
lat/lon location of the point object before every call to intersects?

Here is a code except -

blen = len(block)

particle_position = block['loc'] # a (n,2) array of lat/lon

spillets_in_shapes = numpy.zeros((blen, slen),dtype='bool')

for i, pos in enumerate(particle_position):
p = Point(pos)
for j,shape in enumerate(shapes.itervalues()):
if shape.intersects(p):
spillets_in_shapes[i,j] = True


This code is called many times for each block of particles that I have to
process.

It seems most of my time is spent in initializing point objects and in
something called predicates.py?

Any suggestions for optimization?

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gispython.org/pipermail/community/attachments/20130830/f9cacfab/attachment.htm>
Sean Gillies
2013-08-31 04:10:34 UTC
Permalink
Hi David,

Yes, there is a fair amount of object overhead in Shapely. You may want to
consider a more optimized solution written as a C extension with no Shapely
classes at all. See "Subject 2.03: How do I find if a point lies within a
polygon?" in http://www.faqs.org/faqs/graphics/algorithms-faq/. FWIW,
there's an implementation of pnpoly in matplotlib:
http://matplotlib.org/1.2.0/api/nxutils_api.html.
Post by David Stuebe
Hi GisPython
I am new to using shapley.
I have a few polygons and I am interested in finding out about which of
some millions of points intersect those polygons.
I only need to create the polygons once - there are dozens at most, but
creating millions of point objects is killing my performance big time!
159616551 function calls (159601695 primitive calls) in 307.189
seconds
Ordered by: internal time
List reduced from 1844 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
7518198 135.451 0.000 185.103 0.000 predicates.py:8(__call__)
1253033 22.242 0.000 51.047 0.000
point.py:170(geos_point_from_py)
7518198 16.698 0.000 16.698 0.000
base.py:24(geometry_type_name)
8011 16.586 0.002 302.903 0.038
shapely_intersects.py:26(shape_function)
7518198 14.399 0.000 201.687 0.000 base.py:447(intersects)
8771627 11.716 0.000 38.436 0.000 {hasattr}
15036396 11.607 0.000 45.146 0.000 topology.py:14(_validate)
8771241 7.834 0.000 11.235 0.000
collections.py:119(itervalues)
3759104 7.515 0.000 10.693 0.000 base.py:131(empty)
1253033 7.092 0.000 12.961 0.000 numeric.py:446(require)
1253033 5.791 0.000 63.158 0.000 point.py:105(_set_coords)
37590990 5.679 0.000 5.679 0.000 base.py:162(_get_geom)
7518198 5.659 0.000 23.459 0.000 base.py:242(geometryType)
1253033 4.999 0.000 4.999 0.000 __init__.py:501(cast)
8771283 3.401 0.000 3.401 0.000 collections.py:72(__iter__)
7518198 3.260 0.000 26.719 0.000 base.py:245(type)
3759104 3.178 0.000 3.178 0.000 base.py:124(_is_empty)
1253033 2.874 0.000 66.251 0.000 point.py:38(__init__)
1253033 2.835 0.000 23.231 0.000 coords.py:17(required)
1253036 2.689 0.000 2.689 0.000 {method 'copy' of
'numpy.ndarray' objects}
Would it be possible to pool the point objects and just change out the
lat/lon location of the point object before every call to intersects?
Here is a code except -
blen = len(block)
particle_position = block['loc'] # a (n,2) array of lat/lon
spillets_in_shapes = numpy.zeros((blen, slen),dtype='bool')
p = Point(pos)
spillets_in_shapes[i,j] = True
This code is called many times for each block of particles that I have to
process.
It seems most of my time is spent in initializing point objects and in
something called predicates.py?
Any suggestions for optimization?
David
_______________________________________________
Community mailing list
Community at lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community
--
Sean Gillies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gispython.org/pipermail/community/attachments/20130830/4e62d69c/attachment.htm>
David Stuebe
2013-09-03 19:01:00 UTC
Permalink
Here is the output of the same function rewritten using an iterator on the
a MultiPoint object rather than creating each Point object:

Tue Sep 3 14:51:46 2013 restats

148634517 function calls (148605205 primitive calls) in 273.772
seconds

Ordered by: internal time
List reduced from 1869 to 40 due to restriction <40>

ncalls tottime percall cumtime percall filename:lineno(function)
7518198 137.685 0.000 187.778 0.000 predicates.py:8(__call__)
7518198 17.106 0.000 17.106 0.000 base.py:24(geometry_type_name)
1253033 15.604 0.000 18.043 0.000
point.py:170(geos_point_from_py)
7518198 14.405 0.000 204.416 0.000 base.py:447(intersects)
8011 12.041 0.002 268.612 0.034
shapely_intersects.py:28(shape_function)
15042841 11.606 0.000 45.573 0.000 topology.py:14(_validate)
8771251 7.613 0.000 10.888 0.000 collections.py:119(itervalues)
1253033 6.750 0.000 11.462 0.000 base.py:588(_get_geom_item)
37623215 5.981 0.000 5.981 0.000 base.py:162(_get_geom)
8778104 5.789 0.000 33.026 0.000 {hasattr}
7518198 5.584 0.000 23.819 0.000 base.py:242(geometryType)
6445 4.689 0.001 25.740 0.004
multipoint.py:131(geos_multipoint_from_py)
7518198 3.416 0.000 27.235 0.000 base.py:245(type)
8771313 3.275 0.000 3.275 0.000 collections.py:72(__iter__)
1259478 2.822 0.000 2.822 0.000 __init__.py:501(cast)
7524643 2.313 0.000 2.313 0.000
geos.py:185(errcheck_predicate)
7524643 2.237 0.000 2.237 0.000 impl.py:43(__getitem__)
2518960 1.739 0.000 1.756 0.000 base.py:131(empty)
1253033 1.615 0.000 1.831 0.000 point.py:38(__init__)
1253033 1.357 0.000 3.187 0.000
multipoint.py:55(shape_factory)
16038 1.267 0.000 2.065 0.000
oilmdl_reader.py:218(stream_record_blocks)
1259484 1.013 0.000 1.552 0.000 base.py:164(_set_geom)
1259478 1.012 0.000 12.509 0.000 base.py:596(__iter__)
1259476/1253032 0.885 0.000 2.103 0.000 base.py:141(__del__)
43347 0.881 0.000 0.881 0.000 {method 'reduce' of
'numpy.ufunc' objects}
6445 0.733 0.000 0.733 0.000 {sum}
1259478 0.636 0.000 2.432 0.000 coords.py:17(required)
16040 0.609 0.000 0.609 0.000 {method 'read' of 'file'
objects}
8028 0.428 0.000 1.224 0.000 util.py:113(extents_coroutine)
2577031/2576733 0.415 0.000 0.415 0.000 {len}
8010 0.309 0.000 0.309 0.000 grids.py:79(indexof)
6445 0.244 0.000 0.244 0.000 {method 'dot' of
'numpy.ndarray' objects}
1 0.218 0.218 0.218 0.218 parallel.py:1(<module>)
6446 0.120 0.000 0.343 0.000
polygon_counter_op.py:17(polygon_counter_op)
6446 0.110 0.000 0.353 0.000
polygon_mass_op.py:17(polygon_mass_op)
8011 0.103 0.000 1.230 0.000
bin_grid_mapper.py:20(grid_function)
32040 0.077 0.000 0.077 0.000
{numpy.core.multiarray.frombuffer}
28886 0.073 0.000 0.073 0.000 {abs}
6445 0.062 0.000 25.829 0.004 multipoint.py:28(__init__)
8011 0.060 0.000 0.795 0.000
shore_oil_stats_op.py:27(shore_oil_stats_op)
Post by David Stuebe
Hi GisPython
I am new to using shapley.
I have a few polygons and I am interested in finding out about which of
some millions of points intersect those polygons.
I only need to create the polygons once - there are dozens at most, but
creating millions of point objects is killing my performance big time!
159616551 function calls (159601695 primitive calls) in 307.189
seconds
Ordered by: internal time
List reduced from 1844 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
7518198 135.451 0.000 185.103 0.000 predicates.py:8(__call__)
1253033 22.242 0.000 51.047 0.000
point.py:170(geos_point_from_py)
7518198 16.698 0.000 16.698 0.000
base.py:24(geometry_type_name)
8011 16.586 0.002 302.903 0.038
shapely_intersects.py:26(shape_function)
7518198 14.399 0.000 201.687 0.000 base.py:447(intersects)
8771627 11.716 0.000 38.436 0.000 {hasattr}
15036396 11.607 0.000 45.146 0.000 topology.py:14(_validate)
8771241 7.834 0.000 11.235 0.000
collections.py:119(itervalues)
3759104 7.515 0.000 10.693 0.000 base.py:131(empty)
1253033 7.092 0.000 12.961 0.000 numeric.py:446(require)
1253033 5.791 0.000 63.158 0.000 point.py:105(_set_coords)
37590990 5.679 0.000 5.679 0.000 base.py:162(_get_geom)
7518198 5.659 0.000 23.459 0.000 base.py:242(geometryType)
1253033 4.999 0.000 4.999 0.000 __init__.py:501(cast)
8771283 3.401 0.000 3.401 0.000 collections.py:72(__iter__)
7518198 3.260 0.000 26.719 0.000 base.py:245(type)
3759104 3.178 0.000 3.178 0.000 base.py:124(_is_empty)
1253033 2.874 0.000 66.251 0.000 point.py:38(__init__)
1253033 2.835 0.000 23.231 0.000 coords.py:17(required)
1253036 2.689 0.000 2.689 0.000 {method 'copy' of
'numpy.ndarray' objects}
Would it be possible to pool the point objects and just change out the
lat/lon location of the point object before every call to intersects?
Here is a code except -
blen = len(block)
particle_position = block['loc'] # a (n,2) array of lat/lon
spillets_in_shapes = numpy.zeros((blen, slen),dtype='bool')
p = Point(pos)
spillets_in_shapes[i,j] = True
This code is called many times for each block of particles that I have to
process.
It seems most of my time is spent in initializing point objects and in
something called predicates.py?
Any suggestions for optimization?
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gispython.org/pipermail/community/attachments/20130903/ec3d8b64/attachment.htm>
Loading...