STAC Metadata & Extensions: Gap Analysis and Implementation Plan
Summary
Comprehensive analysis of Portolan's STAC metadata generation, comparing current implementation against STAC spec v1.1.0 and relevant extensions. This research informs implementation priorities for full STAC compliance.
Key findings:
stac_extensions array is never populated — violates STAC best practice 1
- Rich metadata is extracted but not exposed through STAC extensions
- Projection extension should be used for CRS metadata per spec 2
- Table extension should be used for GeoParquet schema per spec 3
- Partition metadata is a gap in the Table extension — upstream contribution opportunity
Research Methodology
Phase 1: Codebase Analysis (3 parallel agents)
| Agent |
Scope |
Method |
| 1A: Flow Tracer |
CLI → STAC output pipeline |
Call graph analysis |
| 1B: Grep Auditor |
All STAC field references |
Pattern search |
| 1C: Dependency Analyst |
Metadata extraction capabilities |
pyproject.toml + API analysis |
Phase 2: Spec Research (3 parallel agents)
| Agent |
Scope |
Method |
| 2A: Spec Literalist |
Extension field inventories |
Spec document analysis |
| 2B: Practitioner |
Real-world usage, pystac API |
Catalog + code analysis |
| 2C: Cross-Reference |
Overlaps, deprecations, versions |
Comparative analysis |
Additional Research
Current State: Codebase Findings
Entity Construction
| Entity |
Location |
stac_extensions Set? |
| Catalog |
portolan_cli/catalog.py:L231-291 |
❌ No |
| Collection |
portolan_cli/stac.py:L26-73 |
❌ No |
| Item |
portolan_cli/stac.py:L76-119 |
❌ No |
Evidence: Grep search for stac_extensions in portolan_cli/ returns no matches in production code. Found only in test fixtures (tests/fixtures/metadata/stac/valid/item_cog.json:L4-6).
STAC Version
| Location |
Current Value |
Should Be |
portolan_cli/stac.py:L20 |
"1.0.0" |
"1.1.0" |
Rationale: STAC v1.1.0 is current spec (released 2024). Unified bands array in common metadata supersedes eo:bands/raster:bands 4.
Metadata Extraction vs STAC Population
| Metadata |
Extracted In |
Currently Goes To |
STAC Field (per spec) |
| CRS/EPSG |
metadata/cog.py:L87-94, metadata/geoparquet.py:L143-156 |
Internal crs attr |
proj:code 5 |
| Bounding Box |
metadata/cog.py:L97-102, metadata/geoparquet.py:L103-139 |
Item.bbox ✓ |
bbox (WGS84), proj:bbox (native) |
| Width/Height |
metadata/cog.py:L110-111 |
Internal width/height |
proj:shape 5 |
| Resolution |
metadata/cog.py:L105 |
Internal resolution |
raster:spatial_resolution 6 |
| Column Schema |
metadata/geoparquet.py:L93, L226 |
schema.json |
table:columns 3 |
| Row Count |
metadata/geoparquet.py:L96 |
Internal feature_count |
table:row_count 3 |
| Geometry Column |
metadata/geoparquet.py:L99, L221 |
Internal |
table:primary_geometry 3 |
| Geometry Types |
metadata/geoparquet.py:L109, L159-167 |
geoparquet:geometry_type property |
vector:geometry_types 7 |
| Band Data Type |
metadata/cog.py:L113, L183 |
raster:bands[].data_type ✓ |
data_type (common metadata) 8 |
| Nodata |
metadata/cog.py:L114, L184 |
raster:bands[].nodata ⚠️ (first band only) |
nodata (common metadata) 8 |
| SHA256 |
dataset.py (compute_checksum) |
versions.json |
file:checksum 9 |
| File Size |
Not captured |
— |
file:size 9 |
CRS Transformation Gap
STAC Requirement: Item bbox and geometry MUST be WGS84 (EPSG:4326) per RFC 7946 10.
Current behavior: portolan_cli/stac.py:L79-88 uses bbox from metadata directly without CRS check or transformation.
Risk: If source data is in projected CRS (e.g., EPSG:32618), Item bbox will be invalid.
STAC Spec Requirements
Required Extensions for Portolan Use Cases
| Extension |
Version |
Scope |
Required For |
Spec Reference |
| Projection |
2.0.0 |
Item, Asset |
CRS metadata, native bbox/shape |
README |
| Table |
1.2.0 |
Collection, Item |
GeoParquet schema, row count |
README |
| Raster |
2.0.0 |
Asset, Band |
Spatial resolution, sampling |
README |
| File |
2.1.0 |
Asset |
Size, checksum |
README |
Recommended Extensions
| Extension |
Version |
Maturity |
Scope |
Recommended For |
| Vector |
0.1.0 |
Proposal |
Item, Asset |
Geometry types |
Extension Field Requirements (per specs)
Projection Extension 5:
"At least one of the fields must be specified."
Applicable fields for Portolan:
proj:code — CRS identifier (e.g., "EPSG:4326")
proj:bbox — Bounding box in native CRS
proj:shape — Pixel dimensions [height, width]
proj:transform — Affine transform coefficients
Table Extension 3:
"This extension applies to STAC Items and STAC Collections."
Applicable fields for Portolan:
table:columns — Column schema array
table:row_count — Number of rows
table:primary_geometry — Geometry column name
Common Metadata (STAC v1.1.0) 4:
"bands...is meant to be the successor of the bands fields in the eo and raster extension"
Portolan should use:
bands array (replaces raster:bands when targeting v1.1.0)
nodata, data_type, unit, statistics per band
Decisions
| Decision |
Choice |
Rationale |
Reference |
| STAC version |
v1.1.0 |
Current spec; unified bands |
Changelog |
| Projection extension |
✅ Use |
Required for CRS compliance |
Spec |
| Table extension |
✅ Use |
Required for tabular data |
Spec |
| Vector extension |
✅ Use with caveat |
Useful for geometry types; document Proposal status |
Spec |
| File extension |
✅ Use |
Size and checksum already available |
Spec |
| CRS handling |
Transform to WGS84 |
STAC spec requires WGS84 bbox |
Item Spec |
| Bbox transform |
Use pyproj |
Standard library for CRS transforms |
pyproj docs |
| Band metadata |
Use bands array |
v1.1.0 unified approach |
Common Metadata |
Upstream Research Findings
Table Extension
| Topic |
Status |
Reference |
| Nested column types |
PR #13 addresses type system |
PR #13 |
| Column statistics |
PR #13 allows statistics on columns |
PR #13 |
| Partition metadata |
No existing issue |
Issues search |
Partition metadata is a gap. Portolan should:
- Implement with
portolan:partition_* properties initially
- Propose
table:partitioning upstream after implementation experience
Vector Extension
| Attribute |
Value |
Reference |
| Maturity |
Proposal (v0.1.0) |
README |
| Created |
Feb 2026 |
First commit |
| Production usage |
1 (Copernicus Land) |
GitHub code search |
Recommendation: Use vector:geometry_types with documented caveat about Proposal maturity. Provides value and contributes to extension adoption.
pystac API
Table extension is fully supported:
# portolan_cli/stac.py integration pattern
from pystac.extensions.table import TableExtension, Column
table_ext = TableExtension.ext(collection, add_if_missing=True)
table_ext.columns = [Column({"name": "id", "type": "int64"})]
table_ext.row_count = row_count
table_ext.primary_geometry = "geometry"
Reference: pystac/extensions/table.py
Implementation Plan
Wave 1: Foundation
Effort: Low | Dependencies: None | Blocks: All subsequent waves
| Task |
File(s) |
STAC Requirement |
Update stac_version to "1.1.0" |
stac.py:L20 |
Current spec |
Implement stac_extensions array |
stac.py, dataset.py |
Best practice 1 |
Add table:columns |
stac.py, use metadata/geoparquet.py extraction |
Table Extension 3 |
Add table:row_count |
stac.py, use metadata/geoparquet.py extraction |
Table Extension 3 |
Add table:primary_geometry |
stac.py, use metadata/geoparquet.py extraction |
Table Extension 3 |
| Add bbox WGS84 transformation |
New: portolan_cli/crs.py |
Item Spec 10 |
Add proj:code, proj:bbox |
stac.py, use existing CRS extraction |
Projection Extension 5 |
| Fix per-band nodata |
metadata/cog.py:L114 |
Common Metadata 8 |
Wave 2: Extended Metadata
Effort: Moderate | Dependencies: Wave 1
| Task |
File(s) |
STAC Requirement |
Add proj:shape, proj:transform |
stac.py, metadata/cog.py |
Projection Extension 5 |
Add vector:geometry_types |
stac.py, metadata/geoparquet.py |
Vector Extension 7 |
Add raster:spatial_resolution |
stac.py, metadata/cog.py |
Raster Extension 6 |
Migrate to unified bands array |
stac.py, metadata/cog.py |
Common Metadata v1.1.0 4 |
Add band/column statistics |
metadata/cog.py, metadata/geoparquet.py |
Common Metadata 8 |
Aggregate Collection summaries |
stac.py |
Collection Spec 11 |
| Fix Collection temporal extent |
stac.py |
Collection Spec 11 |
Wave 3: File Metadata & Partitioning
Effort: Moderate | Dependencies: Wave 2, #232
| Task |
File(s) |
STAC Requirement |
Add file:size |
stac.py, dataset.py |
File Extension 9 |
Add file:checksum (multihash) |
stac.py, dataset.py |
File Extension 9 |
Add table:storage_options |
stac.py, config integration |
Table Extension 3 |
| Implement partition detection |
dataset.py, new partition module |
Portolan requirement |
Add portolan:partition_* properties |
stac.py |
Custom (pre-upstream) |
Wave 4: Upstream Contributions
Effort: Variable | Dependencies: Implementation experience
Validation Against Real-World Catalogs
To validate our spec-compliant approach is also practically consumable, we tested against production catalogs:
| Catalog |
Extensions Used |
Validates Our Approach |
| Planetary Computer |
table |
✅ Table extension works in production |
| Overture Maps |
storage, alternate-assets |
✅ Custom properties work for partition hints |
Note: Some catalogs omit Projection extension for GeoParquet. We choose spec compliance over pattern-matching — Projection extension provides machine-readable CRS that improves interoperability.
Related Issues
| Issue |
Relationship |
| #232 |
Partition metadata design (Wave 3 dependency) |
| #233 |
Table extension integration details |
| #231 |
Collection-level assets (foundation) |
| ADR-0031 |
Collection-level assets for vector data |
References
STAC Specifications
Extensions
Tools
STAC Metadata & Extensions: Gap Analysis and Implementation Plan
Summary
Comprehensive analysis of Portolan's STAC metadata generation, comparing current implementation against STAC spec v1.1.0 and relevant extensions. This research informs implementation priorities for full STAC compliance.
Key findings:
stac_extensionsarray is never populated — violates STAC best practice 1Research Methodology
Phase 1: Codebase Analysis (3 parallel agents)
Phase 2: Spec Research (3 parallel agents)
Additional Research
Current State: Codebase Findings
Entity Construction
stac_extensionsSet?portolan_cli/catalog.py:L231-291portolan_cli/stac.py:L26-73portolan_cli/stac.py:L76-119Evidence: Grep search for
stac_extensionsinportolan_cli/returns no matches in production code. Found only in test fixtures (tests/fixtures/metadata/stac/valid/item_cog.json:L4-6).STAC Version
portolan_cli/stac.py:L20"1.0.0""1.1.0"Rationale: STAC v1.1.0 is current spec (released 2024). Unified
bandsarray in common metadata supersedeseo:bands/raster:bands4.Metadata Extraction vs STAC Population
metadata/cog.py:L87-94,metadata/geoparquet.py:L143-156crsattrproj:code5metadata/cog.py:L97-102,metadata/geoparquet.py:L103-139bbox(WGS84),proj:bbox(native)metadata/cog.py:L110-111width/heightproj:shape5metadata/cog.py:L105resolutionraster:spatial_resolution6metadata/geoparquet.py:L93, L226schema.jsontable:columns3metadata/geoparquet.py:L96feature_counttable:row_count3metadata/geoparquet.py:L99, L221table:primary_geometry3metadata/geoparquet.py:L109, L159-167geoparquet:geometry_typepropertyvector:geometry_types7metadata/cog.py:L113, L183raster:bands[].data_type✓data_type(common metadata) 8metadata/cog.py:L114, L184raster:bands[].nodatanodata(common metadata) 8dataset.py(compute_checksum)versions.jsonfile:checksum9file:size9CRS Transformation Gap
STAC Requirement: Item
bboxandgeometryMUST be WGS84 (EPSG:4326) per RFC 7946 10.Current behavior:
portolan_cli/stac.py:L79-88uses bbox from metadata directly without CRS check or transformation.Risk: If source data is in projected CRS (e.g., EPSG:32618), Item bbox will be invalid.
STAC Spec Requirements
Required Extensions for Portolan Use Cases
Recommended Extensions
Extension Field Requirements (per specs)
Projection Extension 5:
Applicable fields for Portolan:
proj:code— CRS identifier (e.g.,"EPSG:4326")proj:bbox— Bounding box in native CRSproj:shape— Pixel dimensions[height, width]proj:transform— Affine transform coefficientsTable Extension 3:
Applicable fields for Portolan:
table:columns— Column schema arraytable:row_count— Number of rowstable:primary_geometry— Geometry column nameCommon Metadata (STAC v1.1.0) 4:
Portolan should use:
bandsarray (replacesraster:bandswhen targeting v1.1.0)nodata,data_type,unit,statisticsper bandDecisions
bandsbandsarrayUpstream Research Findings
Table Extension
statisticson columnsPartition metadata is a gap. Portolan should:
portolan:partition_*properties initiallytable:partitioningupstream after implementation experienceVector Extension
Recommendation: Use
vector:geometry_typeswith documented caveat about Proposal maturity. Provides value and contributes to extension adoption.pystac API
Table extension is fully supported:
Reference: pystac/extensions/table.py
Implementation Plan
Wave 1: Foundation
Effort: Low | Dependencies: None | Blocks: All subsequent waves
stac_versionto "1.1.0"stac.py:L20stac_extensionsarraystac.py,dataset.pytable:columnsstac.py, usemetadata/geoparquet.pyextractiontable:row_countstac.py, usemetadata/geoparquet.pyextractiontable:primary_geometrystac.py, usemetadata/geoparquet.pyextractionportolan_cli/crs.pyproj:code,proj:bboxstac.py, use existing CRS extractionmetadata/cog.py:L114Wave 2: Extended Metadata
Effort: Moderate | Dependencies: Wave 1
proj:shape,proj:transformstac.py,metadata/cog.pyvector:geometry_typesstac.py,metadata/geoparquet.pyraster:spatial_resolutionstac.py,metadata/cog.pybandsarraystac.py,metadata/cog.pystatisticsmetadata/cog.py,metadata/geoparquet.pysummariesstac.pystac.pyWave 3: File Metadata & Partitioning
Effort: Moderate | Dependencies: Wave 2, #232
file:sizestac.py,dataset.pyfile:checksum(multihash)stac.py,dataset.pytable:storage_optionsstac.py, config integrationdataset.py, new partition moduleportolan:partition_*propertiesstac.pyWave 4: Upstream Contributions
Effort: Variable | Dependencies: Implementation experience
table:partitioningproposalValidation Against Real-World Catalogs
To validate our spec-compliant approach is also practically consumable, we tested against production catalogs:
tablestorage,alternate-assetsNote: Some catalogs omit Projection extension for GeoParquet. We choose spec compliance over pattern-matching — Projection extension provides machine-readable CRS that improves interoperability.
Related Issues
References
STAC Specifications
Extensions
Tools
Footnotes
STAC Best Practices - Extensions ↩ ↩2
Projection Extension README ↩
Table Extension README ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
STAC Common Metadata - Bands ↩ ↩2 ↩3
Projection Extension - Fields ↩ ↩2 ↩3 ↩4 ↩5
Raster Extension - Fields ↩ ↩2
Vector Extension - Fields ↩ ↩2
Common Metadata - Data Values ↩ ↩2 ↩3 ↩4
File Extension - Fields ↩ ↩2 ↩3 ↩4
STAC Item Spec - bbox: "formatted according to RFC 7946, section 5" ↩ ↩2
STAC Collection Spec - summaries ↩ ↩2