Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.7.0] - 2025-12-07
Added
-
29 New Spatial SQL Functions powered by GEOS:
-
Binary Predicates (10 functions)
ST_Intersects: True if geometries share any spaceST_Contains: True if first geometry contains secondST_Within: True if first geometry is within secondST_Overlaps: True if geometries overlap but neither contains the otherST_Touches: True if geometries touch at boundary onlyST_Crosses: True if geometries cross each otherST_Disjoint: True if geometries share no spaceST_Equals: True if geometries are spatially equalST_Covers: True if first geometry covers secondST_CoveredBy: True if first geometry is covered by second
-
Unary Validators (5 functions)
ST_IsValid: True if geometry is validST_IsEmpty: True if geometry is emptyST_IsSimple: True if geometry has no self-intersectionST_IsClosed: True if linestring is closedST_IsRing: True if linestring is closed and simple
-
Geometry Generators (6 functions)
ST_Envelope: Bounding box of geometryST_ConvexHull: Convex hull of geometryST_Boundary: Boundary of geometryST_PointOnSurface: Point guaranteed to be on surfaceST_Simplify: Douglas-Peucker simplificationST_SimplifyPreserveTopology: Topology-preserving simplification
-
Set Operations (2 functions)
ST_Difference: First geometry minus secondST_SymDifference: Area in either but not both
-
Accessors (6 functions)
ST_X: X coordinate of pointST_Y: Y coordinate of pointST_NumPoints: Count of points in geometryST_NumGeometries: Count of geometries in collectionST_GeometryType: Type name (Point, Polygon, etc.)ST_Dimension: Topological dimension (0, 1, or 2)
-
Geometry Operations (4 functions)
ST_Buffer: Create buffer around geometryST_Centroid: Centroid of geometryST_Intersection: Shared area between geometriesST_Union: Combined area of geometries
-
-
Shared GEOS Helpers Module
- New
geos_helpers.rsmodule for consistent GeoArrow-to-GEOS geometry conversion - Reduces code duplication across spatial UDFs by ~500 lines
- New
Changed
- Refactored Spatial UDFs
- All GEOS-based spatial UDFs now use shared helper module
- Improved error messages with row-level context
- Better null value propagation
[0.6.0] - 2025-12-07
Added
-
ST_Area Spatial UDF (#spatial-udfs)
- Calculate area of polygon geometries using GEOS
- Returns 0 for Points and LineStrings (non-areal geometries)
- Supports WKB binary, GeoArrow native types, and mixed geometry arrays
- Proper null value propagation
-
ST_Length Spatial UDF (#spatial-udfs)
- Calculate length of LineStrings and perimeter of Polygons using GEOS
- Returns 0 for Point geometries
- Supports WKB binary, GeoArrow native types, and mixed geometry arrays
- Proper null value propagation
Changed
- Spatial UDF Documentation
- Updated module documentation with new functions
- Added
ST_AreaandST_Lengthto registered UDF list
[0.5.0] - 2025-11-28
Added
-
Spatial UDFs for SQL Queries (#5abd25d, #1545ef1)
- ST_Distance: Calculate minimum distance between geometries using GEOS
- Supports WKT strings, WKB binary, and GeoArrow native types
- Optimized point-to-point distance using direct Euclidean calculation
- Works with mixed geometry types from GeoJSON (Union arrays)
- ST_Point / ST_MakePoint: Create point geometry from X, Y coordinates
- ST_GeomFromText: Parse WKT strings to GeoArrow WKB format
- ST_GeomFromWKB: Validate and tag WKB binary as GeoArrow geometry
- ST_Distance: Calculate minimum distance between geometries using GEOS
-
GeoArrow Native Type Support (#5abd25d)
- New
geoarrow_types.rsmodule with extension type utilities - Support for GeoArrow Struct (Point with x/y fields)
- Support for GeoArrow FixedSizeList (interleaved coordinates)
- Proper null value propagation across all geometry operations
- New
-
Static GEOS Linking (#855abb5)
- Added
geos-staticfeature flag for portable release binaries - Release binaries no longer require GEOS to be installed on target systems
- CI uses dynamic linking for speed; releases use static linking for portability
- Added
Changed
- Architecture Refactoring
- Separated construction functions from spatial operations
- Added geozero dependency for geometry format conversion
geoetl-corenow depends ongeoetl-operationsfor spatial UDF registration
[0.4.2] - 2025-11-15
Fixed
-
GeoParquet CRS Handling (#0e287dc)
- Extract CRS from GeoParquet schema-level metadata
- Properly read coordinate reference system information from all GeoParquet files
-
Table Name Sanitization (#343355a)
- Sanitize table names to handle special characters in filenames
- Prevents SQL parsing errors when filenames contain spaces or special characters
[0.4.1] - 2025-11-10
Changed
-
CI Optimization (#5b481f0)
- Optimized CI pipeline by combining build, test, and coverage jobs
- Added e2e test coverage flag
-
Code Refactoring (#5ba905c)
- Moved table name inference to format-specific implementations
- Improved modularity and separation of concerns
-
Documentation Improvements (#6dbd918)
- Improved website with custom icons and accurate content
Dependencies
- Enhanced Dependabot configuration with comprehensive monitoring
[0.4.0] - 2025-11-08
Added
-
SQL Query Support (#54e1ab5)
- Added
--sqlflag to execute SQL queries on input datasets during conversion - Added
--table-nameflag to override auto-inferred table names - Automatic table name inference from input filenames (e.g.,
cities.csv→ table"cities") - Full DataFusion SQL capabilities: WHERE, SELECT, JOIN, GROUP BY, ORDER BY, LIMIT
- Support for filtering, column selection, aggregations, sorting, and limiting results
- Enables data transformation workflows without intermediate files
- Added
-
Comprehensive SQL Testing
- Added 7 integration tests covering SQL query functionality:
- SQL filtering with WHERE clauses
- Column selection with SELECT
- Aggregations with GROUP BY
- Sorting with ORDER BY
- Custom table name overrides
- Invalid SQL query error handling
- Multi-step filter and transform workflows
- Added 7 integration tests covering SQL query functionality:
-
SQL Documentation
- Added 5 SQL query examples to convert.md (Examples 7-11)
- Added 3 common SQL workflow examples
- Documented table name inference behavior
- Added data processing options section
-
Release Blog Posts (#6aa66f6)
- Added missing release blog posts for v0.1.2, v0.2.0, and v0.3.1
- Created release blog post guidelines in docs/README_blog_release_post.md
Changed
-
Documentation Restructure (#8526f07)
- Established single source of truth for documentation in website
- Removed promotional content and duplicated documentation
- Added comprehensive community section (changelog, contributing, roadmap)
- Reorganized drivers documentation with dedicated vector format pages
- Added detailed getting started guides and tutorials
- Created programs reference section with command documentation
- Added FAQ and glossary for better discoverability
- Improved troubleshooting guide
- Total: 4,442 additions, 1,628 deletions across 38 files
-
convertoperation now accepts optionalsql_queryandtable_name_overrideparameters -
initialize_contextfunction now returns bothSessionContextand inferred/custom table name -
All existing tests updated to include new optional SQL parameters
Breaking Changes
None - all new parameters are optional and backward compatible
[0.3.1] - 2025-11-06
Added
-
Shell Completions Support (#218975b)
- Added
completionssubcommand to generate shell completion scripts - Support for 5 shells: bash, zsh, fish, powershell, and elvish
- Enables tab completion for commands, subcommands, and options
- Updated documentation with installation instructions and examples
- Added
-
New Geospatial Format Scaffolding (#c0f4932)
- Arrow IPC format module for zero-copy data exchange
- GeoPackage format module for SQLite-based vector data
- OpenStreetMap (OSM) format module for OSM PBF/XML data
- Shapefile format module for ESRI Shapefile support
-
GeoParquet Streaming I/O Enhancements (#9631d93)
- Implemented statistics inference for improved performance
- Enhanced streaming I/O capabilities
- Reduced memory usage for large file processing
Changed
- Documentation Updates
- Refactored GeoParquet ADR to follow Michael Nygard template (#40a0a7c)
- Added shell completions documentation to README.md, QUICKREF.md, and doc site
- Removed version-specific annotations from documentation
Dependencies
- Upgraded
geoarrowfrom 0.5.0 to 0.6.2 - Added
clap_complete4.5.50 for shell completion generation
Removed
- Removed performance tests from GeoParquet module in end to end tests
[0.3.0] - 2025-11-04
Added
-
GeoParquet Format Support (ADR 004)
- Implemented production-ready GeoParquet format with Apache Arrow and GeoArrow integration
- Full read/write support with WKB (Well-Known Binary) geometry encoding
- Streaming architecture with O(1) memory complexity
- Native GeoArrow types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon
- Schema preservation and GeoParquet metadata handling (CRS, bbox)
- DataFusion FileFormat and DataSink integration
-
InsertOp::Overwrite Support
- Added Overwrite mode for GeoJSON writer
- Added Overwrite mode for CSV writer
- Enables file replacement without manual deletion
-
E2E Test Infrastructure for GeoParquet
- Comprehensive E2E tests with natural-earth dataset
- GeoParquet roundtrip conversion tests
- Cross-format conversion tests (GeoJSON ↔ GeoParquet, CSV ↔ GeoParquet)
- Test data: natural-earth_cities.parquet (15KB, 243 features)
-
Comprehensive Documentation
- New tutorials for working with GeoParquet and GeoJSON
- New reference pages for supported drivers
- Blog post announcing GeoParquet support
- Updated documentation with GeoParquet features
-
GeoParquet Benchmarks (bench/README.md)
- Added comprehensive GeoParquet benchmark suite
- Performance comparison tables for all three formats
- Conversion benchmark commands and results
- Analysis of compression, throughput, and memory efficiency
Changed
-
Documentation Structure
- Removed common-operations.md - consolidated into format-specific tutorials
- Reorganized tutorial flow: Installation → First Conversion → Drivers → GeoJSON → CSV → GeoParquet → Troubleshooting
- Added Reference section with Supported Drivers page
- Updated all cross-references and navigation links
-
Driver Count Accuracy
- Updated documentation from "68+ drivers" to accurate "3 drivers (GeoJSON, CSV, GeoParquet)"
- Added note about 68+ planned drivers via GDAL integration
- Updated all driver count references throughout documentation
Dependencies
- Added
parquetv53.3.0 - Added
geoarrowv0.5.0
Breaking Changes
None
[0.2.0] - 2025-11-03
Added
-
Streaming CSV Architecture (ADR 002)
- Implemented production-ready CSV format support with inline WKT geometry conversion
- O(1) memory complexity: processes large files in constant memory
- Production-ready throughput
- Automatic single-partition write enforcement for proper CSV formatting
- Inline geometry ↔ WKT conversion during streaming (no buffering)
-
Configurable Batch Size (#156)
- Added
--batch-sizeCLI parameter for performance tuning - Default: 8,192 features (conservative, memory-efficient)
- Users can tune memory/speed tradeoff for their workload
- Added
-
Configurable Partitioning Parameters
- Added
--read-partitionsCLI parameter to control parallel reading - Added
--write-partitionsCLI parameter to control parallel writing - CSV and GeoJSON formats automatically override write partitions to 1 with warning
- Enables future parallel processing optimizations for other formats
- Added
-
GeoJSON Incremental Decoder (ADR 001)
- Implemented state machine-based incremental JSON parsing
- Handles incomplete JSON chunks across byte stream boundaries
- Eliminates OOM errors on large files
- Supports FeatureCollection and newline-delimited GeoJSON formats
-
Comprehensive Benchmarking Infrastructure (bench/)
- Real-time monitoring script with CPU, memory, disk I/O tracking
- Automated benchmark suite with JSON result output
- Data download scripts for Microsoft Buildings dataset
- Systematic testing across multiple dataset sizes
- Performance regression testing framework
-
Architecture Decision Records (ADRs)
-
Factory Pattern for Writers
- Implemented
WriterFactorytrait for consistent writer creation - Added factory methods to CSV and GeoJSON formats
- Increased test coverage for writer initialization
- Implemented
Changed
-
GeoJSON Performance Optimization
- Updated default batch_size for optimal performance
- Applied to both SessionConfig and physical execution plan
-
CSV Format Production-Ready
- Production-ready throughput
- Efficient memory usage
- Recommended for performance-critical workloads
-
Documentation Restructuring
- Removed outdated implementation docs (superseded by ADRs)
- Enhanced DataFusion integration guide
- Clear separation: ADRs (architecture), bench/README (procedures), blog (public)
Fixed
- GeoJSON Schema Inference - Reduced memory usage from scanning entire file to sampling first 10 MB
- GeoJSON Reader OOM - Fixed out-of-memory errors on large files by implementing streaming decoder
- CSV Write Partitioning - Fixed invalid CSV output when write_partitions > 1 (now enforced to 1 with warning)
Removed
- Outdated Documentation - Superseded by ADRs
[0.1.2] - 2025-11-01
Added
- Custom Error Types: Implemented comprehensive error handling system with
GeoEtlErrorenum- Added specialized error types for IO, driver, format, conversion, validation, configuration, data processing, and geometry operations
- Integrated error types across CLI, core, and operations crates
- All error handling tests passing
- Automated Documentation Deployment: Integrated Cloudflare Pages deployment into release workflow
- Documentation automatically deploys to production after GitHub release creation
- Uses CircleCI with Wrangler CLI for deployment
Changed
- Documentation Reorganization:
- Removed redundant user guide (content on website)
- Updated all references to point to website
- Moved format-specific documentation to package directories
Removed
- User guide markdown file - Superseded by documentation website
- Format documentation directory - Documentation moved to respective package directories
[0.1.0] - 2025-10-31
Added
- Initial Project Structure: Created Rust workspace with initial crates
- CLI Framework: Implemented robust CLI using
clap - Core Commands:
drivers: Lists all available vector format drivers and capabilitiesconvert: Initial implementation of format conversioninfo: Placeholder for dataset metadata display
- Driver and Operations Logic: Driver registry and convert function
- Unit Tests: Initial unit tests for convert command handler
- Documentation: README, VISION, DEVELOPMENT, ADR documents
- CI/CD: CircleCI configuration and Makefile