Skip to content

STAC Metadata & Extensions: Gap Analysis and Implementation Plan #271

@nlebovits

Description

@nlebovits

STAC Metadata & Extensions: Gap Analysis and Implementation Plan

Summary

Comprehensive analysis of Portolan's STAC metadata generation, comparing current implementation against STAC spec v1.1.0 and relevant extensions. This research informs implementation priorities for full STAC compliance.

Key findings:

  • stac_extensions array is never populated — violates STAC best practice 1
  • Rich metadata is extracted but not exposed through STAC extensions
  • Projection extension should be used for CRS metadata per spec 2
  • Table extension should be used for GeoParquet schema per spec 3
  • Partition metadata is a gap in the Table extension — upstream contribution opportunity

Research Methodology

Phase 1: Codebase Analysis (3 parallel agents)

Agent Scope Method
1A: Flow Tracer CLI → STAC output pipeline Call graph analysis
1B: Grep Auditor All STAC field references Pattern search
1C: Dependency Analyst Metadata extraction capabilities pyproject.toml + API analysis

Phase 2: Spec Research (3 parallel agents)

Agent Scope Method
2A: Spec Literalist Extension field inventories Spec document analysis
2B: Practitioner Real-world usage, pystac API Catalog + code analysis
2C: Cross-Reference Overlaps, deprecations, versions Comparative analysis

Additional Research

Topic Sources
Table Extension upstream GitHub issues, PR #13
Vector Extension status GitHub repo
pystac API pystac source
Real-world validation Planetary Computer API, Overture Maps STAC

Current State: Codebase Findings

Entity Construction

Entity Location stac_extensions Set?
Catalog portolan_cli/catalog.py:L231-291 ❌ No
Collection portolan_cli/stac.py:L26-73 ❌ No
Item portolan_cli/stac.py:L76-119 ❌ No

Evidence: Grep search for stac_extensions in portolan_cli/ returns no matches in production code. Found only in test fixtures (tests/fixtures/metadata/stac/valid/item_cog.json:L4-6).

STAC Version

Location Current Value Should Be
portolan_cli/stac.py:L20 "1.0.0" "1.1.0"

Rationale: STAC v1.1.0 is current spec (released 2024). Unified bands array in common metadata supersedes eo:bands/raster:bands 4.

Metadata Extraction vs STAC Population

Metadata Extracted In Currently Goes To STAC Field (per spec)
CRS/EPSG metadata/cog.py:L87-94, metadata/geoparquet.py:L143-156 Internal crs attr proj:code 5
Bounding Box metadata/cog.py:L97-102, metadata/geoparquet.py:L103-139 Item.bbox ✓ bbox (WGS84), proj:bbox (native)
Width/Height metadata/cog.py:L110-111 Internal width/height proj:shape 5
Resolution metadata/cog.py:L105 Internal resolution raster:spatial_resolution 6
Column Schema metadata/geoparquet.py:L93, L226 schema.json table:columns 3
Row Count metadata/geoparquet.py:L96 Internal feature_count table:row_count 3
Geometry Column metadata/geoparquet.py:L99, L221 Internal table:primary_geometry 3
Geometry Types metadata/geoparquet.py:L109, L159-167 geoparquet:geometry_type property vector:geometry_types 7
Band Data Type metadata/cog.py:L113, L183 raster:bands[].data_type data_type (common metadata) 8
Nodata metadata/cog.py:L114, L184 raster:bands[].nodata ⚠️ (first band only) nodata (common metadata) 8
SHA256 dataset.py (compute_checksum) versions.json file:checksum 9
File Size Not captured file:size 9

CRS Transformation Gap

STAC Requirement: Item bbox and geometry MUST be WGS84 (EPSG:4326) per RFC 7946 10.

Current behavior: portolan_cli/stac.py:L79-88 uses bbox from metadata directly without CRS check or transformation.

Risk: If source data is in projected CRS (e.g., EPSG:32618), Item bbox will be invalid.


STAC Spec Requirements

Required Extensions for Portolan Use Cases

Extension Version Scope Required For Spec Reference
Projection 2.0.0 Item, Asset CRS metadata, native bbox/shape README
Table 1.2.0 Collection, Item GeoParquet schema, row count README
Raster 2.0.0 Asset, Band Spatial resolution, sampling README
File 2.1.0 Asset Size, checksum README

Recommended Extensions

Extension Version Maturity Scope Recommended For
Vector 0.1.0 Proposal Item, Asset Geometry types

Extension Field Requirements (per specs)

Projection Extension 5:

"At least one of the fields must be specified."

Applicable fields for Portolan:

  • proj:code — CRS identifier (e.g., "EPSG:4326")
  • proj:bbox — Bounding box in native CRS
  • proj:shape — Pixel dimensions [height, width]
  • proj:transform — Affine transform coefficients

Table Extension 3:

"This extension applies to STAC Items and STAC Collections."

Applicable fields for Portolan:

  • table:columns — Column schema array
  • table:row_count — Number of rows
  • table:primary_geometry — Geometry column name

Common Metadata (STAC v1.1.0) 4:

"bands...is meant to be the successor of the bands fields in the eo and raster extension"

Portolan should use:

  • bands array (replaces raster:bands when targeting v1.1.0)
  • nodata, data_type, unit, statistics per band

Decisions

Decision Choice Rationale Reference
STAC version v1.1.0 Current spec; unified bands Changelog
Projection extension ✅ Use Required for CRS compliance Spec
Table extension ✅ Use Required for tabular data Spec
Vector extension ✅ Use with caveat Useful for geometry types; document Proposal status Spec
File extension ✅ Use Size and checksum already available Spec
CRS handling Transform to WGS84 STAC spec requires WGS84 bbox Item Spec
Bbox transform Use pyproj Standard library for CRS transforms pyproj docs
Band metadata Use bands array v1.1.0 unified approach Common Metadata

Upstream Research Findings

Table Extension

Topic Status Reference
Nested column types PR #13 addresses type system PR #13
Column statistics PR #13 allows statistics on columns PR #13
Partition metadata No existing issue Issues search

Partition metadata is a gap. Portolan should:

  1. Implement with portolan:partition_* properties initially
  2. Propose table:partitioning upstream after implementation experience

Vector Extension

Attribute Value Reference
Maturity Proposal (v0.1.0) README
Created Feb 2026 First commit
Production usage 1 (Copernicus Land) GitHub code search

Recommendation: Use vector:geometry_types with documented caveat about Proposal maturity. Provides value and contributes to extension adoption.

pystac API

Table extension is fully supported:

# portolan_cli/stac.py integration pattern
from pystac.extensions.table import TableExtension, Column

table_ext = TableExtension.ext(collection, add_if_missing=True)
table_ext.columns = [Column({"name": "id", "type": "int64"})]
table_ext.row_count = row_count
table_ext.primary_geometry = "geometry"

Reference: pystac/extensions/table.py


Implementation Plan

Wave 1: Foundation

Effort: Low | Dependencies: None | Blocks: All subsequent waves

Task File(s) STAC Requirement
Update stac_version to "1.1.0" stac.py:L20 Current spec
Implement stac_extensions array stac.py, dataset.py Best practice 1
Add table:columns stac.py, use metadata/geoparquet.py extraction Table Extension 3
Add table:row_count stac.py, use metadata/geoparquet.py extraction Table Extension 3
Add table:primary_geometry stac.py, use metadata/geoparquet.py extraction Table Extension 3
Add bbox WGS84 transformation New: portolan_cli/crs.py Item Spec 10
Add proj:code, proj:bbox stac.py, use existing CRS extraction Projection Extension 5
Fix per-band nodata metadata/cog.py:L114 Common Metadata 8

Wave 2: Extended Metadata

Effort: Moderate | Dependencies: Wave 1

Task File(s) STAC Requirement
Add proj:shape, proj:transform stac.py, metadata/cog.py Projection Extension 5
Add vector:geometry_types stac.py, metadata/geoparquet.py Vector Extension 7
Add raster:spatial_resolution stac.py, metadata/cog.py Raster Extension 6
Migrate to unified bands array stac.py, metadata/cog.py Common Metadata v1.1.0 4
Add band/column statistics metadata/cog.py, metadata/geoparquet.py Common Metadata 8
Aggregate Collection summaries stac.py Collection Spec 11
Fix Collection temporal extent stac.py Collection Spec 11

Wave 3: File Metadata & Partitioning

Effort: Moderate | Dependencies: Wave 2, #232

Task File(s) STAC Requirement
Add file:size stac.py, dataset.py File Extension 9
Add file:checksum (multihash) stac.py, dataset.py File Extension 9
Add table:storage_options stac.py, config integration Table Extension 3
Implement partition detection dataset.py, new partition module Portolan requirement
Add portolan:partition_* properties stac.py Custom (pre-upstream)

Wave 4: Upstream Contributions

Effort: Variable | Dependencies: Implementation experience

Contribution Target Priority
table:partitioning proposal stac-extensions/table High
Vector extension feedback stac-extensions/vector Medium
Type mapping documentation stac-extensions/table (after PR #13) Low

Validation Against Real-World Catalogs

To validate our spec-compliant approach is also practically consumable, we tested against production catalogs:

Catalog Extensions Used Validates Our Approach
Planetary Computer table ✅ Table extension works in production
Overture Maps storage, alternate-assets ✅ Custom properties work for partition hints

Note: Some catalogs omit Projection extension for GeoParquet. We choose spec compliance over pattern-matching — Projection extension provides machine-readable CRS that improves interoperability.


Related Issues

Issue Relationship
#232 Partition metadata design (Wave 3 dependency)
#233 Table extension integration details
#231 Collection-level assets (foundation)
ADR-0031 Collection-level assets for vector data

References

STAC Specifications

Extensions

Tools

Footnotes

  1. STAC Best Practices - Extensions 2

  2. Projection Extension README

  3. Table Extension README 2 3 4 5 6 7 8 9

  4. STAC Common Metadata - Bands 2 3

  5. Projection Extension - Fields 2 3 4 5

  6. Raster Extension - Fields 2

  7. Vector Extension - Fields 2

  8. Common Metadata - Data Values 2 3 4

  9. File Extension - Fields 2 3 4

  10. STAC Item Spec - bbox: "formatted according to RFC 7946, section 5" 2

  11. STAC Collection Spec - summaries 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestroadmap:mvpPhase 1: Core CLI + Spec

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions