Discussion:
[Community] Humongous memory usage with jVectorMap converter.py
Jeff Cook
2012-08-14 04:18:36 UTC
Permalink
Hello all

I am using jVectorMap's converter.py script (
https://github.com/bjornd/jvectormap/blob/db22821449ea6e1939f3f91070c2f6280ae99b51/converter/converter.py
) to process an 85MB Shapefile that includes all telephone area codes
in the United States. After a short while, memory usage hovers around
8G, 50% of my system memory. Once the script attempts to write to
disk, usage jumps to 14G+ and causes my system to start swapping out.

I am a relative newbie when it comes to GIS data, and I have never
used any Python libraries to deal with such data, so please forgive my
ignorance.

I am interested in making this run faster if there's a way to do so
reasonably. I am currently doing an extensive massif run to get a
reasonable memory profile, but initial runs seem to indicate most
memory is being consumed by C objects in libgeos. This is consistent
with the results from heapy, which pretty consistently show Python
objects only taking 10-12 MB of space in the program's early stages.

I was wondering if there was something relatively simple that could be
done to make the program release memory more reasonably. Some of my
reading leads me to believe this issue may lie deeper than the
surface-level Python code. I still have more investigation to do, but
I thought I should get a message posted here quickly since the list
will likely have better ideas than I.

Thanks
Jeff
Sean Gillies
2012-08-20 21:23:17 UTC
Permalink
Hi Jeff,

Just back from vacation. I've never used that converter script, and am
not sure exactly how it works, but the way it builds up large lists of
data before writing out the paths seems unlikely to scale.

What versions of Shapely and GEOS are you using?
Post by Jeff Cook
Hello all
I am using jVectorMap's converter.py script (
https://github.com/bjornd/jvectormap/blob/db22821449ea6e1939f3f91070c2f6280ae99b51/converter/converter.py
) to process an 85MB Shapefile that includes all telephone area codes
in the United States. After a short while, memory usage hovers around
8G, 50% of my system memory. Once the script attempts to write to
disk, usage jumps to 14G+ and causes my system to start swapping out.
I am a relative newbie when it comes to GIS data, and I have never
used any Python libraries to deal with such data, so please forgive my
ignorance.
I am interested in making this run faster if there's a way to do so
reasonably. I am currently doing an extensive massif run to get a
reasonable memory profile, but initial runs seem to indicate most
memory is being consumed by C objects in libgeos. This is consistent
with the results from heapy, which pretty consistently show Python
objects only taking 10-12 MB of space in the program's early stages.
I was wondering if there was something relatively simple that could be
done to make the program release memory more reasonably. Some of my
reading leads me to believe this issue may lie deeper than the
surface-level Python code. I still have more investigation to do, but
I thought I should get a message posted here quickly since the list
will likely have better ideas than I.
Thanks
Jeff
_______________________________________________
Community mailing list
Community at lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community
--
Sean Gillies
Jeff Cook
2012-08-22 22:14:57 UTC
Permalink
Thanks for your reply Sean.

I am using GEOS 3.3.4 and Shapely 1.2.15 on Python 2.7.3 on Arch Linux
x86_64 3.4.8. Machine is an i7-2600K with 16GB of RAM.

As I said, I still have the massif output file if it would be of any
value (and of course, could generate a new one).
Post by Sean Gillies
Hi Jeff,
Just back from vacation. I've never used that converter script, and am
not sure exactly how it works, but the way it builds up large lists of
data before writing out the paths seems unlikely to scale.
What versions of Shapely and GEOS are you using?
Post by Jeff Cook
Hello all
I am using jVectorMap's converter.py script (
https://github.com/bjornd/jvectormap/blob/db22821449ea6e1939f3f91070c2f6280ae99b51/converter/converter.py
) to process an 85MB Shapefile that includes all telephone area codes
in the United States. After a short while, memory usage hovers around
8G, 50% of my system memory. Once the script attempts to write to
disk, usage jumps to 14G+ and causes my system to start swapping out.
I am a relative newbie when it comes to GIS data, and I have never
used any Python libraries to deal with such data, so please forgive my
ignorance.
I am interested in making this run faster if there's a way to do so
reasonably. I am currently doing an extensive massif run to get a
reasonable memory profile, but initial runs seem to indicate most
memory is being consumed by C objects in libgeos. This is consistent
with the results from heapy, which pretty consistently show Python
objects only taking 10-12 MB of space in the program's early stages.
I was wondering if there was something relatively simple that could be
done to make the program release memory more reasonably. Some of my
reading leads me to believe this issue may lie deeper than the
surface-level Python code. I still have more investigation to do, but
I thought I should get a message posted here quickly since the list
will likely have better ideas than I.
Thanks
Jeff
_______________________________________________
Community mailing list
Community at lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community
--
Sean Gillies
Sean Gillies
2012-08-29 16:51:37 UTC
Permalink
Jeff,

I've run Shapely's tests under valgrind in the past, but not with a
very recent GEOS. The one place I did see leaks was in the GEOS WKT
and WKB readers and writers. I can't rule out new leaks in the more
recent GEOS but I think they are unlikely.

Since you're already using osgeo.ogr, you could remove Shapely from
the script and just use the OGR geometry methods to see if that helps.
I don't have a better idea at the moment.
Post by Jeff Cook
Thanks for your reply Sean.
I am using GEOS 3.3.4 and Shapely 1.2.15 on Python 2.7.3 on Arch Linux
x86_64 3.4.8. Machine is an i7-2600K with 16GB of RAM.
As I said, I still have the massif output file if it would be of any
value (and of course, could generate a new one).
Post by Sean Gillies
Hi Jeff,
Just back from vacation. I've never used that converter script, and am
not sure exactly how it works, but the way it builds up large lists of
data before writing out the paths seems unlikely to scale.
What versions of Shapely and GEOS are you using?
Post by Jeff Cook
Hello all
I am using jVectorMap's converter.py script (
https://github.com/bjornd/jvectormap/blob/db22821449ea6e1939f3f91070c2f6280ae99b51/converter/converter.py
) to process an 85MB Shapefile that includes all telephone area codes
in the United States. After a short while, memory usage hovers around
8G, 50% of my system memory. Once the script attempts to write to
disk, usage jumps to 14G+ and causes my system to start swapping out.
I am a relative newbie when it comes to GIS data, and I have never
used any Python libraries to deal with such data, so please forgive my
ignorance.
I am interested in making this run faster if there's a way to do so
reasonably. I am currently doing an extensive massif run to get a
reasonable memory profile, but initial runs seem to indicate most
memory is being consumed by C objects in libgeos. This is consistent
with the results from heapy, which pretty consistently show Python
objects only taking 10-12 MB of space in the program's early stages.
I was wondering if there was something relatively simple that could be
done to make the program release memory more reasonably. Some of my
reading leads me to believe this issue may lie deeper than the
surface-level Python code. I still have more investigation to do, but
I thought I should get a message posted here quickly since the list
will likely have better ideas than I.
Thanks
Jeff
_______________________________________________
Community mailing list
Community at lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community
--
Sean Gillies
--
Sean Gillies
Loading...