Jeff Cook
2012-08-14 04:18:36 UTC
Hello all
I am using jVectorMap's converter.py script (
https://github.com/bjornd/jvectormap/blob/db22821449ea6e1939f3f91070c2f6280ae99b51/converter/converter.py
) to process an 85MB Shapefile that includes all telephone area codes
in the United States. After a short while, memory usage hovers around
8G, 50% of my system memory. Once the script attempts to write to
disk, usage jumps to 14G+ and causes my system to start swapping out.
I am a relative newbie when it comes to GIS data, and I have never
used any Python libraries to deal with such data, so please forgive my
ignorance.
I am interested in making this run faster if there's a way to do so
reasonably. I am currently doing an extensive massif run to get a
reasonable memory profile, but initial runs seem to indicate most
memory is being consumed by C objects in libgeos. This is consistent
with the results from heapy, which pretty consistently show Python
objects only taking 10-12 MB of space in the program's early stages.
I was wondering if there was something relatively simple that could be
done to make the program release memory more reasonably. Some of my
reading leads me to believe this issue may lie deeper than the
surface-level Python code. I still have more investigation to do, but
I thought I should get a message posted here quickly since the list
will likely have better ideas than I.
Thanks
Jeff
I am using jVectorMap's converter.py script (
https://github.com/bjornd/jvectormap/blob/db22821449ea6e1939f3f91070c2f6280ae99b51/converter/converter.py
) to process an 85MB Shapefile that includes all telephone area codes
in the United States. After a short while, memory usage hovers around
8G, 50% of my system memory. Once the script attempts to write to
disk, usage jumps to 14G+ and causes my system to start swapping out.
I am a relative newbie when it comes to GIS data, and I have never
used any Python libraries to deal with such data, so please forgive my
ignorance.
I am interested in making this run faster if there's a way to do so
reasonably. I am currently doing an extensive massif run to get a
reasonable memory profile, but initial runs seem to indicate most
memory is being consumed by C objects in libgeos. This is consistent
with the results from heapy, which pretty consistently show Python
objects only taking 10-12 MB of space in the program's early stages.
I was wondering if there was something relatively simple that could be
done to make the program release memory more reasonably. Some of my
reading leads me to believe this issue may lie deeper than the
surface-level Python code. I still have more investigation to do, but
I thought I should get a message posted here quickly since the list
will likely have better ideas than I.
Thanks
Jeff