Futher OSM/PostGIS work
It took > two weeks to load osmosis’s dump for postgres. We need a faster development cycle than that, so we’re going to start with just the BC data, to whet our appetites. And to try out some other software that may be better. Hailey heard that there’s a new tool and imposm that’s multithreaded. Maybe we can try that.
First I had to remember how to create the database. Pretty easy:
# CREATE DATABASE osm_bc TEMPLATE template_postgis TABLESPACE osmspace
imposm doesn’t seem to support subsetting to a bounding box, so I turned back to osmosis for that. There’s probably and easier way, but to get the bounding box, I used qgis, set the map projection to WGS84, enabled reprojection on the fly, and then brought in some layer with the BC political boundaries. Then I subseted the planet.osm like this:
hiebert@windy:/home/data/gis/osm/pgimport_bc$ osmosis --read-xml enableDateParsing=no file=../planet-latest.osm --bounding-box bottom=48.15 top=60 left=-139.25 right=-114-used-node idTrackerType=BitSet --write-xml bc-latest.osm 16-Sep-2011 11:56:54 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Osmosis Version 0.34 log4j:WARN No appenders could be found for logger (org.java.plugin.ObjectFactory). log4j:WARN Please initialize the log4j system properly. 16-Sep-2011 11:56:54 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Preparing pipeline. 16-Sep-2011 11:56:54 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Launching pipeline execution. 16-Sep-2011 11:56:54 AM org.openstreetmap.osmosis.core.Osmosis run INFO: Pipeline executing, waiting for completion. 16-Sep-2011 1:50:39 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Pipeline complete. 16-Sep-2011 1:50:39 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Total execution time: 6824958 milliseconds. hiebert@windy:/home/data/gis/osm/pgimport_bc$ ls -lh total 4.8G -rw-rw-r-- 1 hiebert staff 4.5G Sep 16 13:50 bc-latest.osm
FWIW: someone’s wiki page about benchmarking osmosis says that the best thing that you can do to speed it up is enableDateParsing=no. Everything else seems to be pretty minor. There doesn’t seem to be any way to use multiple cores unless your decompressing with on in the pipeline. After subsetting, I tried to use imposm to load into the database, but it doesn’t work at all. And the error messages are completely decipherable to me (and I speak python!).
postgres@windy:/home/data4/gis/osm/pgimport_bc$ imposm --read --concurrency 2 --write --database osm_bc --user postgres --optimize bc-latest.osm password for postgres at localhost: [13:58:20] ## reading bc-latest.osm Process CacheWriterProcess-2: Traceback (most recent call last): [13:58:20] coords: 24068k nodes: 481k ways: 3438k relations: 24k (estimated) File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap self.run() File "/usr/local/lib/python2.6/dist-packages/imposm/reader.py", line 117, in run cache = self.cache(mode='w', estimated_records=self.estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 37, in coords_cache return self._x_cache(self.coords_fname, DeltaCoordsDB, mode, estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 62, in _x_cache cache = x_class(x, mode, estimated_records=estimated_records) File "tc.pyx", line 393, in imposm.cache.tc.DeltaCoordsDB.__init__ (imposm/cache/tc.c:5291) Process CacheWriterProcess-3: Traceback (most recent call last): File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap File "tc.pyx", line 104, in imposm.cache.tc.BDB.__init__ (imposm/cache/tc.c:1263) self.run() File "/usr/local/lib/python2.6/dist-packages/imposm/reader.py", line 117, in run cache = self.cache(mode='w', estimated_records=self.estimated_records) Process CacheWriterProcess-4: File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 40, in nodes_cache Traceback (most recent call last): File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap return self._x_cache(self.nodes_fname, NodeDB, mode, estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 62, in _x_cache cache = x_class(x, mode, estimated_records=estimated_records) File "tc.pyx", line 104, in imposm.cache.tc.BDB.__init__ (imposm/cache/tc.c:1263) IOError: 4 self.run() File "/usr/local/lib/python2.6/dist-packages/imposm/reader.py", line 117, in run cache = self.cache(mode='w', estimated_records=self.estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 43, in ways_cache IOError: 4 return self._x_cache(self.ways_fname, WayDB, mode, estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 62, in _x_cache cache = x_class(x, mode, estimated_records=estimated_records) File "tc.pyx", line 104, in imposm.cache.tc.BDB.__init__ (imposm/cache/tc.c:1263) Process CacheWriterProcess-5: Traceback (most recent call last): IOError: 4 File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap self.run() File "/usr/local/lib/python2.6/dist-packages/imposm/reader.py", line 117, in run cache = self.cache(mode='w', estimated_records=self.estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 53, in relations_cache return self._x_cache(self.relations_fname, RelationDB, mode, estimated_records) File "/usr/local/lib/python2.6/dist-packages/imposm/cache/osm.py", line 62, in _x_cache cache = x_class(x, mode, estimated_records=estimated_records) File "tc.pyx", line 104, in imposm.cache.tc.BDB.__init__ (imposm/cache/tc.c:1263) IOError: 4 ^CTraceback (most recent call last): File "/usr/local/bin/imposm", line 9, inload_entry_point('imposm==2.3.2', 'console_scripts', 'imposm')() File "/usr/local/lib/python2.6/dist-packages/imposm/app.py", line 217, in main Process ParserProgress-1: reader.read(arg) File "/usr/local/lib/python2.6/dist-packages/imposm/reader.py", line 88, in read Traceback (most recent call last): parser.parse(filename) File "/usr/local/lib/python2.6/dist-packages/imposm/parser/simple.py", line 64, in parse return self.parse_xml_file(filename) File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap File "/usr/local/lib/python2.6/dist-packages/imposm/parser/simple.py", line 82, in parse_xml_file return self._parse(input, XMLMultiProcParser) File "/usr/local/lib/python2.6/dist-packages/imposm/parser/simple.py", line 132, in _parse callback(items) File "/usr/lib/python2.6/multiprocessing/queues.py", line 287, in put if not self._sem.acquire(block, timeout): KeyboardInterrupt self.run() File "/usr/local/lib/python2.6/dist-packages/imposm/util.py", line 51, in run log_statement = self.queue.get() File "/usr/lib/python2.6/multiprocessing/queues.py", line 91, in get res = self._recv() KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function p.join() File "/usr/lib/python2.6/multiprocessing/process.py", line 119, in join res = self._popen.wait(timeout) File "/usr/lib/python2.6/multiprocessing/forking.py", line 117, in wait return self.poll(0) File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt Error in sys.exitfunc: Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/usr/lib/python2.6/multiprocessing/util.py", line 269, in _exit_function p.join() File "/usr/lib/python2.6/multiprocessing/process.py", line 119, in join res = self._popen.wait(timeout) File "/usr/lib/python2.6/multiprocessing/forking.py", line 117, in wait return self.poll(0) File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt Back to osmosis. Only using a subset, it generates the dump lots faster (6 minutes). hiebert@windy:/home/data/gis/osm/pgimport_bc$ JAVACMD_OPTIONS="-Xmx10g" osmosis --read-xml file="bc-latest.osm" --used-node idTrackerType=BitSet --write-pgsql-dump directory="./pgdump" enableBboxBuilder="no" enableLinestringBuilder="no" nodeLocationStoreType="InMemory" 16-Sep-2011 2:15:30 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Osmosis Version 0.34 log4j:WARN No appenders could be found for logger (org.java.plugin.ObjectFactory). log4j:WARN Please initialize the log4j system properly. 16-Sep-2011 2:15:30 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Preparing pipeline. 16-Sep-2011 2:15:30 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Launching pipeline execution. 16-Sep-2011 2:15:30 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Pipeline executing, waiting for completion. 16-Sep-2011 2:21:18 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Pipeline complete. 16-Sep-2011 2:21:18 PM org.openstreetmap.osmosis.core.Osmosis run INFO: Total execution time: 347656 milliseconds.Then the resulting dump loads in less than an hour (praise Jesus).osm_bc=# \i /usr/share/doc/osmosis/examples/pgsql_simple_schema_0.6_linestring.sql addgeometrycolumn -------------------------------------------------------- public.ways.linestring SRID:4326 TYPE:GEOMETRY DIMS:2 (1 row) CREATE INDEX osm_bc=# \i /usr/share/doc/osmosis/examples/pgsql_simple_schema_0.6_bbox.sql addgeometrycolumn -------------------------------------------------- public.ways.bbox SRID:4326 TYPE:GEOMETRY DIMS:2 (1 row) CREATE INDEX osm_bc=# \i /usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql ALTER TABLE ALTER TABLE ALTER TABLE ALTER TABLE psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:6: ERROR: index "idx_nodes_action" does not exist DROP INDEX DROP INDEX psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:9: ERROR: index "idx_ways_action" does not exist DROP INDEX DROP INDEX psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:12: ERROR: index "idx_relations_action" does not exist DROP INDEX DROP INDEX DROP INDEX dropgeometrycolumn --------------------------------------- public.ways.bbox effectively removed. (1 row) dropgeometrycolumn --------------------------------------------- public.ways.linestring effectively removed. (1 row) psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430381: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pk_nodes" for table "nodes" ALTER TABLE psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430382: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pk_ways" for table "ways" ALTER TABLE psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430383: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pk_way_nodes" for table "way_nodes" ALTER TABLE psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430384: NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pk_relations" for table "relations" ALTER TABLE psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430385: ERROR: column "action" does not exist CREATE INDEX CREATE INDEX psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430388: ERROR: column "action" does not exist CREATE INDEX CREATE INDEX psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430391: ERROR: column "action" does not exist CREATE INDEX addgeometrycolumn -------------------------------------------------- public.ways.bbox SRID:4326 TYPE:GEOMETRY DIMS:2 (1 row) addgeometrycolumn -------------------------------------------------------- public.ways.linestring SRID:4326 TYPE:GEOMETRY DIMS:2 (1 row) UPDATE 1198178 psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430415: ERROR: syntax error at or near "CREATE" LINE 10: CREATE INDEX idx_ways_bbox ON ways USING gist (bbox); ^ CREATE INDEX psql:/usr/share/doc/osmosis/examples/pgsql_simple_load_0.6.sql:57430419: NOTICE: no notnull values, invalid stats VACUUM osm_bc=#-----
blog comments powered by Disqus