Transcode OSM PBF file to parquet files with hive-style partitioning by type:
planet.osm.pbf
parquet/
type=node/
node_0000.zstd.parquet
...
type=relation/
relation_0000.zstd.parquet
...
type=way/
way_0000.zstd.parquet
...
Download latest version for OS from releases
Example for x86_64 linux system with pre-compiled binary:
curl -L "https://github.com/brad-richardson/osm-pbf-parquet/releases/latest/download/osm-pbf-parquet-x86_64-unknown-linux-gnu.tar.gz" -o "osm-pbf-parquet.tar.gz"
tar -xzf osm-pbf-parquet.tar.gz
chmod +x osm-pbf-parquet
./osm-pbf-parquet --input your.osm.pbf --output ./parquet
OR compile and run locally:
git clone https://github.com/brad-richardson/osm-pbf-parquet.git
cargo run --release -- --input your.osm.pbf --output ./parquet
- Install rust
- Clone repo
git clone https://github.com/brad-richardson/osm-pbf-parquet.git
- Make changes
- Run against PBF with
cargo run -- --input your.osm.pbf
(Geofabrik regional PBF extracts here) - Test with
cd test && ./prepare.sh && python3 validate.py
osm-pbf-parquet prioritizes transcode speed over file size, file count or perserving ordering. Here is a comparison against similar tools on the 2024-06-24 OSM planet PBF with target file size of 500MB:
Time (wall) | Output size | File count | |
---|---|---|---|
osm-pbf-parquet (zstd:3) | 30 minutes | 182GB | ~600 |
osm-pbf-parquet (zstd:9) | 60 minutes | 165GB | ~600 |
osm-parquetizer | 196 minutes | 285GB | 3 |
osm2orc | 385 minutes | 110GB | 1 |
Test system:
i5-9400 (6 CPU, 32GB memory)
Ubuntu 24.04
OpenJDK 17
Rust 1.79.0
Distributed under the MIT License. See LICENSE
for more information.