GeoETL v0.3.0: GeoParquet Support
ยท 3 min read
GeoETL v0.3.0 adds full GeoParquet format support with production-ready performance.
Key highlights:
- โ Full read/write support
- โก 3,315 MB/min processing throughput
- ๐พ Streaming architecture with minimal memory
- ๐ Handles 100M+ features efficiently
What's Newโ
GeoParquet is now a fully supported format driver in GeoETL, joining CSV and GeoJSON.
Format: Apache Parquet with WKB-encoded geometries and GeoArrow types
Use cases:
- Large-scale geospatial data processing
- Cloud storage (smaller files)
- Analytics pipelines
- Data archival
Performance Resultsโ
Benchmarked with Microsoft Buildings dataset (up to 129M features):
Processing Speed (1M features)โ
| Operation | Throughput | Duration |
|---|---|---|
| GeoJSON โ GeoJSON | 300 MB/min | 23s |
| CSV โ CSV | 3,211 MB/min | 1s |
| GeoParquet โ GeoParquet | 3,315 MB/min | 1s |
| GeoJSON โ GeoParquet | 3,804 MB/min | 2s |
| CSV โ GeoParquet | 3,211 MB/min | 1s |
Key finding: GeoJSON โ GeoParquet conversion achieves highest throughput (3,804 MB/min), making format migration fast and efficient.
File Size (1M features)โ
| Format | Size | vs GeoJSON |
|---|---|---|
| GeoJSON | 114.13 MB | baseline |
| CSV | 32.11 MB | 3.5x smaller |
| GeoParquet | 16.86 MB | 6.8x smaller |
Memory Usageโ
All conversions use <250 MB peak memory regardless of dataset size, confirming streaming architecture works at scale.
Scalability Test (129M features)โ
| Format | Input Size | Processing Time | Peak Memory |
|---|---|---|---|
| GeoJSON | 14.5 GB | 50 minutes | 84 MB |
| GeoParquet | ~4 GB | ~2 minutes (projected) | <100 MB |
Getting Startedโ
Installationโ
Download GeoETL v0.3.0: GitHub Releases
Basic Usageโ
# GeoJSON to GeoParquet
geoetl-cli convert \
--input data.geojson \
--output data.parquet \
--input-driver GeoJSON \
--output-driver GeoParquet
# CSV to GeoParquet
geoetl-cli convert \
--input data.csv \
--output data.parquet \
--input-driver CSV \
--output-driver GeoParquet \
--geometry-column WKT
# GeoParquet to GeoJSON
geoetl-cli convert \
--input data.parquet \
--output data.geojson \
--input-driver GeoParquet \
--output-driver GeoJSON
Architectureโ
GeoETL's GeoParquet implementation:
- Streaming: Constant O(1) memory regardless of file size
- Native types: GeoArrow Point, LineString, Polygon, Multi*, etc.
- Standard encoding: WKB (Well-Known Binary)
- Metadata: CRS, bounding boxes, schema preservation
Technical details: Architecture ADR 004
Documentationโ
- Supported Drivers Reference - Complete driver documentation
- Working with GeoParquet Tutorial
- Benchmark Results
- GeoParquet Specification
What's Nextโ
- Next (v0.4.0): FlatGeobuf format support
See full Roadmap
Communityโ
- GitHub Discussions - Ask questions
- GitHub Issues - Report bugs
- Documentation - Learn more
Download: GeoETL v0.3.0
