Skip to content

Scicrop/infinitestack-ontology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InfiniteStack Ontology

An open ontology project for agriculture, analytics, and semantic feature governance.


Overview

InfiniteStack Ontology is an open initiative to build a practical ontology for the agricultural domain and its adjacent analytical layers, including climate, soil, crop, phytosanitary, operational, and productivity data.

The project is designed to help organizations move beyond disconnected data dictionaries and isolated feature engineering logic by providing a shared semantic model for:

  • agricultural entities
  • observations and variables
  • units and frequencies
  • analytical datasets
  • transformations and derived features
  • targets used in machine learning models
  • provenance, explainability, and governance

This ontology is being created with a pragmatic goal: to support real-world agricultural analytics, machine learning, data lakehouse architectures, semantic catalogs, and interoperable data products.


Why this project exists

In modern data platforms, especially in agriculture, a great deal of effort is spent on:

  • cleaning and harmonizing raw data
  • defining analytical variables
  • documenting how features were derived
  • explaining why certain variables were used in a model
  • tracking provenance across datasets, notebooks, pipelines, and models

Even when organizations adopt sound engineering patterns such as Medallion architectures, data catalogs, and feature stores, they often still lack a formal semantic layer capable of answering questions such as:

  • What exactly does this variable mean?
  • Is it a raw measurement, a derived feature, or an aggregated indicator?
  • Which business entity does it describe?
  • What is its expected unit and sampling frequency?
  • What is its analytical grain?
  • Which target variables is it relevant for?
  • What agronomic reasoning justifies its inclusion in a model?
  • Which source tables and transformations produced it?

InfiniteStack Ontology is intended to address that gap.


What this ontology is for

This project aims to provide a semantic foundation for:

1. Agricultural domain modeling

Represent core concepts such as:

  • farm
  • field plot
  • season
  • crop
  • sugarcane
  • pest
  • climate
  • irrigation
  • fertilization
  • biological control
  • productivity indicators

2. Observation semantics

Represent measurable or observable properties such as:

  • rainfall
  • thermal accumulation
  • GDD
  • soil moisture
  • infestation index
  • TCH
  • ATR
  • operational counts
  • climatic windows

3. Analytical feature semantics

Describe analytical variables with metadata such as:

  • domain category
  • subcategory
  • unit
  • data nature
  • expected sampling frequency
  • analytical grain
  • source tables
  • transformations
  • analytical role
  • candidate targets
  • agronomic justification

4. Explainable machine learning

Provide a semantic layer that helps explain:

  • why a feature exists
  • why it was engineered
  • why it was used for a certain model
  • what domain logic supports its relevance

5. Governance and interoperability

Support:

  • semantic data catalogs
  • feature registries
  • knowledge graphs
  • lineage-aware analytics
  • documentation generation
  • reusable domain vocabularies

Scope

The ontology is initially focused on agriculture and agricultural analytics, with particular attention to operational and analytical realities found in:

  • crop production
  • climate and agrometeorological data
  • phytosanitary analysis
  • productivity modeling
  • irrigation and management
  • data engineering for analytical pipelines
  • feature engineering for machine learning

While the ontology is open and domain-extensible, its first practical emphasis is on problems such as:

  • productivity prediction
  • pest pressure analysis
  • climatic aggregation
  • agronomic explainability
  • semantic feature documentation

Design principles

This project follows a few explicit design principles.

Practicality over academic isolation

The ontology is intended to be useful in production environments, not only as a theoretical exercise.

Reuse before reinvention

Where possible, the ontology should align with or reuse existing standards and initiatives from the semantic web, agriculture, and scientific measurement ecosystems.

Semantic clarity

A concept should have a clear meaning, not just a name.

Explainability

The ontology should help explain analytical assets, not just classify them.

Extensibility

It should be possible to expand the ontology across crops, geographies, data products, and model types.

Separation of concerns

The ontology should distinguish clearly between:

  • domain concepts
  • observed variables
  • analytical variables
  • targets
  • transformations
  • concrete observations
  • source structures

Conceptual foundation

The project is inspired by the intersection of:

  • semantic web principles
  • ontology engineering
  • agricultural domain modeling
  • analytical data architecture
  • explainable ML feature semantics

Conceptual distinction

The ontology intentionally goes beyond a simple taxonomy.

A taxonomy helps classify concepts into categories and subcategories.

An ontology additionally describes:

  • what entities are
  • how they relate
  • what attributes qualify them
  • what kinds of observations can be made about them
  • how analytical variables are derived and used

For example:

A taxonomy might say:

  • Climate

    • Rainfall
    • Temperature

An ontology might additionally say:

  • Rainfall is a climate-related observable property
  • It may be measured in millimeters
  • It may have a daily expected sampling frequency
  • It may be aggregated across a crop cycle
  • It may be relevant for sugarcane productivity models
  • It may be observed over a field plot

Relationship with existing initiatives

This project is informed by several important external efforts, including:

  • AGROVOC and Agrontology, for agricultural vocabulary and semantic grounding
  • Agronomy Ontology (AgrO), for agronomic domain concepts
  • Crop Ontology, for traits, variables, methods, and scales
  • SSN/SOSA, for observation, sensor, and observable property semantics
  • QUDT and OM, for quantities and units of measurement
  • ADAPT, as an important market reference for agricultural interoperability

InfiniteStack Ontology does not attempt to replace these efforts. Instead, it seeks to assemble a practical semantic layer suitable for modern agricultural analytics and production-grade data platforms.


Core modeling idea

A central idea of this project is that an analytical feature should not be treated as a mere column name.

A feature should be semantically described in terms of:

  • what it means
  • what domain it belongs to
  • which entity it refers to
  • how it was obtained
  • at which analytical grain it is defined
  • for which targets it is relevant
  • what agronomic or operational logic justifies it

For example, a variable such as RAINFALL_CYCLE should be representable as:

  • a climate-related analytical variable
  • subcategory: rainfall
  • default unit: mm
  • data nature: continuous
  • expected base sampling frequency: daily
  • analytical grain: field plot x cycle x season
  • source tables: climate-related datasets
  • transformation: cycle accumulation
  • analytical role: water availability indicator
  • candidate target: real productivity, pest-related outcomes

This is one of the main goals of the ontology.


Initial domain areas

The ontology is expected to include, at minimum, the following domain layers.

Agricultural business entities

  • Organization
  • Farm
  • FieldPlot
  • Season
  • Crop
  • Variety
  • ProductionEnvironment

Biophysical and agronomic domains

  • Climate
  • Soil
  • CropDevelopment
  • Irrigation
  • Fertilization
  • Phytosanitary
  • Pest
  • BiologicalControl
  • Productivity

Observations and variables

  • ObservableProperty
  • Observation
  • AnalyticalVariable
  • TargetVariable
  • Indicator
  • AggregatedMeasure

Data engineering and provenance

  • SourceSystem
  • SourceTable
  • SourceColumn
  • Transformation
  • AggregationRule
  • TimeWindow
  • AnalyticalDataset
  • FeatureSet

Qualification and governance

  • DataNature
  • SamplingFrequency
  • Unit
  • AnalyticalGrain
  • AnalyticalRole
  • Justification
  • QualityConstraint

Example use case

One of the driving use cases behind this project is the semantic description of variables used in machine learning datasets for agricultural prediction.

Imagine an analytical dataset with:

  • 26 explanatory variables
  • 2 target variables
  • one target for real productivity
  • another target for pest pressure or infestation index

The ontology should be able to explain:

  • what each variable means
  • how it was grouped semantically
  • which source systems it came from
  • what transformation generated it
  • whether it is continuous, discrete, categorical, or derived
  • what analytical role it plays
  • which target it is a candidate feature for

This makes the ontology useful not only for semantic modeling, but also for:

  • documentation
  • model audits
  • feature governance
  • cross-team understanding
  • ML explainability support

Architecture philosophy

This project assumes that a useful semantic architecture should separate the following layers.

1. Domain ontology

Stable concepts from the agricultural world.

Examples:

  • FieldPlot
  • Crop
  • Sugarcane
  • Rainfall
  • Pest
  • BiologicalControl

2. Observation ontology

How measurements, observations, and results relate to entities.

Examples:

  • observed property
  • feature of interest
  • result time
  • observed value

3. Analytical ontology

How engineered variables and targets are represented.

Examples:

  • AnalyticalVariable
  • TargetVariable
  • FeatureSet
  • AnalyticalGrain
  • AnalyticalRole

4. Provenance and transformation ontology

How features were built.

Examples:

  • source table
  • transformation
  • aggregation rule
  • time window
  • derivation dependency

5. Governance and explainability ontology

Why the variable matters.

Examples:

  • agronomic justification
  • operational justification
  • candidate target
  • quality constraint
  • interpretability note

Technology choices

The project is expected to use semantic web standards and compatible serializations.

Representation standards

  • RDF
  • RDFS
  • OWL

Recommended serializations

  • Turtle for ontology authoring and human-readable version control
  • JSON-LD for application-friendly interchange and API integration
  • RDF/XML only when needed for compatibility

Query and reasoning

  • SPARQL for querying
  • OWL reasoners where applicable

Why OWL

OWL is a strong fit for this project because it allows us to define:

  • classes
  • object properties
  • data properties
  • individuals
  • subclass relations
  • equivalence
  • disjointness
  • constraints and logical structure
  • inference-friendly semantics

OWL is particularly valuable here because the project is not just trying to list terms; it is trying to formally represent knowledge about agricultural entities, observations, analytical variables, and their relationships.


Why JSON-LD and Turtle

This project benefits from two different practical representations.

Turtle

Best for:

  • authoring ontologies
  • human inspection
  • Git-based review
  • semantic modeling work

JSON-LD

Best for:

  • application integration
  • APIs
  • notebooks
  • semantic feature registries
  • graph-aware services

A likely pattern for this project is to maintain the ontology in Turtle and provide JSON-LD exports or companion artifacts.


Proposed repository goals

The repository is intended to evolve into a home for:

  • ontology source files
  • modular vocabulary definitions
  • JSON-LD examples
  • Turtle examples
  • domain examples
  • feature examples
  • machine-readable releases
  • documentation and diagrams
  • issue-driven semantic discussions

Proposed repository structure

infinitestack-ontology/
├─ README.md
├─ LICENSE
├─ CONTRIBUTING.md
├─ docs/
│  ├─ concepts/
│  ├─ diagrams/
│  ├─ examples/
│  └─ decisions/
├─ ontology/
│  ├─ core/
│  ├─ domain/
│  ├─ analytics/
│  ├─ provenance/
│  └─ governance/
├─ examples/
│  ├─ jsonld/
│  ├─ turtle/
│  └─ notebooks/
├─ schemas/
├─ tests/
└─ releases/

This is only a suggested structure and may evolve.


Suggested ontology modules

To keep the project maintainable, it is recommended to modularize the ontology.

core

Foundational concepts and shared abstractions.

domain

Agricultural entities and domain-specific concepts.

observation

Observable properties, features of interest, observations, and measurement semantics.

analytics

Analytical variables, targets, feature sets, grains, roles, and model-facing constructs.

provenance

Sources, tables, columns, transformations, derivations, and lineage.

governance

Justifications, quality constraints, documentation metadata, and usage restrictions.

units

Units, quantities, and compatibility mappings, likely aligned with QUDT or OM.


Example semantic feature description

A feature in this ontology may eventually look conceptually like this:

{
  "variable_id": "RAINFALL_CYCLE",
  "label": "Cycle rainfall",
  "domain_category": "Climate",
  "sub_category": "Rainfall",
  "data_nature": "Continuous",
  "unit": "mm",
  "base_sampling_frequency": "Daily",
  "analytical_grain": "FieldPlot-Cycle-Season",
  "observed_entity": "FieldPlot",
  "source_tables": [
    "ClimateDailyReadings",
    "ClimateTrainingTable"
  ],
  "transformation": "Accumulated over cycle",
  "candidate_targets": [
    "RealProductivity",
    "BorerIndex"
  ],
  "analytical_role": "Water availability indicator",
  "agronomic_justification": "Water availability affects crop development and productivity"
}

This is not meant to be the final data model, but rather a conceptual example of the kind of semantic richness the project aims to support.


Intended users

This ontology may be useful for:

  • data engineers
  • ML engineers
  • agronomists
  • ontology engineers
  • knowledge graph practitioners
  • data governance teams
  • data scientists
  • product teams building agricultural data platforms
  • researchers working on semantic agricultural interoperability

What this project is not

To avoid confusion, this project is not intended to be:

  • a replacement for ETL or ELT pipelines
  • a replacement for SQL-based transformation logic
  • a full agricultural ERP model
  • a generic data warehouse design framework
  • a single universal ontology for all of agriculture
  • a substitute for feature stores or data catalogs

Instead, it is meant to be a semantic and ontological layer that complements those systems.


Current status

This project is in its early stage.

The current intent is to establish:

  • a clear conceptual scope
  • a foundational core model
  • a first set of domain concepts
  • a first set of analytical feature semantics
  • an open collaboration path

The ontology is expected to evolve incrementally and pragmatically.


Roadmap ideas

Possible milestones include:

Phase 1

  • define foundational classes and properties
  • publish first core module
  • publish first JSON-LD and Turtle examples

Phase 2

  • add agricultural domain modules
  • add observation semantics
  • add analytical feature semantics
  • add target variable semantics

Phase 3

  • add provenance and governance modules
  • align with existing external vocabularies
  • add example datasets and notebooks

Phase 4

  • publish reusable feature semantic templates
  • add SPARQL query examples
  • add validation and testing workflows
  • prepare stable releases

Contribution philosophy

This is intended to be an open project.

Contributions are welcome in areas such as:

  • ontology design
  • agricultural domain modeling
  • semantic alignment
  • JSON-LD examples
  • Turtle modeling
  • documentation
  • notebooks and visualization
  • practical use cases
  • issue reports and conceptual reviews

Good contributions will usually help improve one or more of the following:

  • semantic clarity
  • interoperability
  • practical usability
  • documentation quality
  • domain correctness

Suggested contribution workflow

  1. Open an issue describing the proposed concept, change, or module.
  2. Explain the domain need and expected semantic value.
  3. Propose class/property names and definitions.
  4. Discuss alignment with existing vocabularies.
  5. Add examples.
  6. Keep changes modular and documented.

Naming principles

As the ontology evolves, names should aim to be:

  • explicit
  • domain-meaningful
  • reusable
  • stable
  • not overly tied to one single internal dataset
  • understandable by both technical and domain users

For example, a semantically rich concept such as CycleRainfall is often more reusable than a purely internal column name. At the same time, mappings to actual source column names should be preserved when relevant.


Long-term vision

The long-term vision for InfiniteStack Ontology is to help create an open semantic foundation for agricultural analytics that can support:

  • interoperable agricultural data products
  • explainable analytical variables
  • semantic feature registries
  • ontology-aware lakehouse governance
  • graph-based exploration of agricultural knowledge
  • model-facing semantic catalogs
  • reusable agricultural analytics standards

Related ideas this project may eventually support

  • semantic feature stores
  • graph-based explainability for ML features
  • ontology-backed data catalogs
  • ontology-aware documentation generators
  • linked data publication for agricultural analytics assets
  • alignment between agronomic expertise and machine learning pipelines

Getting started

A practical way to begin with this repository is:

  1. read the conceptual documentation
  2. inspect the core ontology files
  3. review example JSON-LD and Turtle artifacts
  4. load the ontology into a notebook using RDF libraries
  5. query it with SPARQL
  6. experiment with feature-level semantic descriptions
  7. propose additions through GitHub issues or pull requests

Example notebook workflow

A typical exploration workflow may involve:

  • loading JSON-LD with rdflib
  • converting it to Turtle for inspection
  • querying feature definitions via SPARQL
  • visualizing subgraphs for individual variables
  • using widgets to inspect semantic properties of selected features

This makes the ontology immediately usable for experimentation and validation.


License

This repository should include an open license appropriate for semantic and documentation assets.

A permissive option such as MIT or Apache-2.0 may be suitable, depending on the desired governance model.


Acknowledgment of intent

This project starts from a practical need encountered in agricultural data and AI work:

to build a semantic layer capable of explaining not only what data exists, but also why analytical variables exist, how they were derived, and why they matter.

That is the central motivation behind InfiniteStack Ontology.


Contributing

Contributions, discussions, and critiques are welcome.

This project is especially interested in collaboration across:

  • agriculture
  • data engineering
  • semantic web
  • ontology engineering
  • machine learning
  • analytics governance

Final note

If you work with agricultural data and have ever asked questions like:

  • Why was this feature created?
  • What does this variable really mean?
  • What is the semantic difference between a daily observation and a cycle-level analytical variable?
  • How can we make feature engineering more explainable and reusable?

then this project was created for exactly that kind of problem.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages