An open ontology project for agriculture, analytics, and semantic feature governance.
InfiniteStack Ontology is an open initiative to build a practical ontology for the agricultural domain and its adjacent analytical layers, including climate, soil, crop, phytosanitary, operational, and productivity data.
The project is designed to help organizations move beyond disconnected data dictionaries and isolated feature engineering logic by providing a shared semantic model for:
- agricultural entities
- observations and variables
- units and frequencies
- analytical datasets
- transformations and derived features
- targets used in machine learning models
- provenance, explainability, and governance
This ontology is being created with a pragmatic goal: to support real-world agricultural analytics, machine learning, data lakehouse architectures, semantic catalogs, and interoperable data products.
In modern data platforms, especially in agriculture, a great deal of effort is spent on:
- cleaning and harmonizing raw data
- defining analytical variables
- documenting how features were derived
- explaining why certain variables were used in a model
- tracking provenance across datasets, notebooks, pipelines, and models
Even when organizations adopt sound engineering patterns such as Medallion architectures, data catalogs, and feature stores, they often still lack a formal semantic layer capable of answering questions such as:
- What exactly does this variable mean?
- Is it a raw measurement, a derived feature, or an aggregated indicator?
- Which business entity does it describe?
- What is its expected unit and sampling frequency?
- What is its analytical grain?
- Which target variables is it relevant for?
- What agronomic reasoning justifies its inclusion in a model?
- Which source tables and transformations produced it?
InfiniteStack Ontology is intended to address that gap.
This project aims to provide a semantic foundation for:
Represent core concepts such as:
- farm
- field plot
- season
- crop
- sugarcane
- pest
- climate
- irrigation
- fertilization
- biological control
- productivity indicators
Represent measurable or observable properties such as:
- rainfall
- thermal accumulation
- GDD
- soil moisture
- infestation index
- TCH
- ATR
- operational counts
- climatic windows
Describe analytical variables with metadata such as:
- domain category
- subcategory
- unit
- data nature
- expected sampling frequency
- analytical grain
- source tables
- transformations
- analytical role
- candidate targets
- agronomic justification
Provide a semantic layer that helps explain:
- why a feature exists
- why it was engineered
- why it was used for a certain model
- what domain logic supports its relevance
Support:
- semantic data catalogs
- feature registries
- knowledge graphs
- lineage-aware analytics
- documentation generation
- reusable domain vocabularies
The ontology is initially focused on agriculture and agricultural analytics, with particular attention to operational and analytical realities found in:
- crop production
- climate and agrometeorological data
- phytosanitary analysis
- productivity modeling
- irrigation and management
- data engineering for analytical pipelines
- feature engineering for machine learning
While the ontology is open and domain-extensible, its first practical emphasis is on problems such as:
- productivity prediction
- pest pressure analysis
- climatic aggregation
- agronomic explainability
- semantic feature documentation
This project follows a few explicit design principles.
The ontology is intended to be useful in production environments, not only as a theoretical exercise.
Where possible, the ontology should align with or reuse existing standards and initiatives from the semantic web, agriculture, and scientific measurement ecosystems.
A concept should have a clear meaning, not just a name.
The ontology should help explain analytical assets, not just classify them.
It should be possible to expand the ontology across crops, geographies, data products, and model types.
The ontology should distinguish clearly between:
- domain concepts
- observed variables
- analytical variables
- targets
- transformations
- concrete observations
- source structures
The project is inspired by the intersection of:
- semantic web principles
- ontology engineering
- agricultural domain modeling
- analytical data architecture
- explainable ML feature semantics
The ontology intentionally goes beyond a simple taxonomy.
A taxonomy helps classify concepts into categories and subcategories.
An ontology additionally describes:
- what entities are
- how they relate
- what attributes qualify them
- what kinds of observations can be made about them
- how analytical variables are derived and used
For example:
A taxonomy might say:
-
Climate
- Rainfall
- Temperature
An ontology might additionally say:
- Rainfall is a climate-related observable property
- It may be measured in millimeters
- It may have a daily expected sampling frequency
- It may be aggregated across a crop cycle
- It may be relevant for sugarcane productivity models
- It may be observed over a field plot
This project is informed by several important external efforts, including:
- AGROVOC and Agrontology, for agricultural vocabulary and semantic grounding
- Agronomy Ontology (AgrO), for agronomic domain concepts
- Crop Ontology, for traits, variables, methods, and scales
- SSN/SOSA, for observation, sensor, and observable property semantics
- QUDT and OM, for quantities and units of measurement
- ADAPT, as an important market reference for agricultural interoperability
InfiniteStack Ontology does not attempt to replace these efforts. Instead, it seeks to assemble a practical semantic layer suitable for modern agricultural analytics and production-grade data platforms.
A central idea of this project is that an analytical feature should not be treated as a mere column name.
A feature should be semantically described in terms of:
- what it means
- what domain it belongs to
- which entity it refers to
- how it was obtained
- at which analytical grain it is defined
- for which targets it is relevant
- what agronomic or operational logic justifies it
For example, a variable such as RAINFALL_CYCLE should be representable as:
- a climate-related analytical variable
- subcategory: rainfall
- default unit: mm
- data nature: continuous
- expected base sampling frequency: daily
- analytical grain: field plot x cycle x season
- source tables: climate-related datasets
- transformation: cycle accumulation
- analytical role: water availability indicator
- candidate target: real productivity, pest-related outcomes
This is one of the main goals of the ontology.
The ontology is expected to include, at minimum, the following domain layers.
- Organization
- Farm
- FieldPlot
- Season
- Crop
- Variety
- ProductionEnvironment
- Climate
- Soil
- CropDevelopment
- Irrigation
- Fertilization
- Phytosanitary
- Pest
- BiologicalControl
- Productivity
- ObservableProperty
- Observation
- AnalyticalVariable
- TargetVariable
- Indicator
- AggregatedMeasure
- SourceSystem
- SourceTable
- SourceColumn
- Transformation
- AggregationRule
- TimeWindow
- AnalyticalDataset
- FeatureSet
- DataNature
- SamplingFrequency
- Unit
- AnalyticalGrain
- AnalyticalRole
- Justification
- QualityConstraint
One of the driving use cases behind this project is the semantic description of variables used in machine learning datasets for agricultural prediction.
Imagine an analytical dataset with:
- 26 explanatory variables
- 2 target variables
- one target for real productivity
- another target for pest pressure or infestation index
The ontology should be able to explain:
- what each variable means
- how it was grouped semantically
- which source systems it came from
- what transformation generated it
- whether it is continuous, discrete, categorical, or derived
- what analytical role it plays
- which target it is a candidate feature for
This makes the ontology useful not only for semantic modeling, but also for:
- documentation
- model audits
- feature governance
- cross-team understanding
- ML explainability support
This project assumes that a useful semantic architecture should separate the following layers.
Stable concepts from the agricultural world.
Examples:
- FieldPlot
- Crop
- Sugarcane
- Rainfall
- Pest
- BiologicalControl
How measurements, observations, and results relate to entities.
Examples:
- observed property
- feature of interest
- result time
- observed value
How engineered variables and targets are represented.
Examples:
- AnalyticalVariable
- TargetVariable
- FeatureSet
- AnalyticalGrain
- AnalyticalRole
How features were built.
Examples:
- source table
- transformation
- aggregation rule
- time window
- derivation dependency
Why the variable matters.
Examples:
- agronomic justification
- operational justification
- candidate target
- quality constraint
- interpretability note
The project is expected to use semantic web standards and compatible serializations.
- RDF
- RDFS
- OWL
- Turtle for ontology authoring and human-readable version control
- JSON-LD for application-friendly interchange and API integration
- RDF/XML only when needed for compatibility
- SPARQL for querying
- OWL reasoners where applicable
OWL is a strong fit for this project because it allows us to define:
- classes
- object properties
- data properties
- individuals
- subclass relations
- equivalence
- disjointness
- constraints and logical structure
- inference-friendly semantics
OWL is particularly valuable here because the project is not just trying to list terms; it is trying to formally represent knowledge about agricultural entities, observations, analytical variables, and their relationships.
This project benefits from two different practical representations.
Best for:
- authoring ontologies
- human inspection
- Git-based review
- semantic modeling work
Best for:
- application integration
- APIs
- notebooks
- semantic feature registries
- graph-aware services
A likely pattern for this project is to maintain the ontology in Turtle and provide JSON-LD exports or companion artifacts.
The repository is intended to evolve into a home for:
- ontology source files
- modular vocabulary definitions
- JSON-LD examples
- Turtle examples
- domain examples
- feature examples
- machine-readable releases
- documentation and diagrams
- issue-driven semantic discussions
infinitestack-ontology/
├─ README.md
├─ LICENSE
├─ CONTRIBUTING.md
├─ docs/
│ ├─ concepts/
│ ├─ diagrams/
│ ├─ examples/
│ └─ decisions/
├─ ontology/
│ ├─ core/
│ ├─ domain/
│ ├─ analytics/
│ ├─ provenance/
│ └─ governance/
├─ examples/
│ ├─ jsonld/
│ ├─ turtle/
│ └─ notebooks/
├─ schemas/
├─ tests/
└─ releases/
This is only a suggested structure and may evolve.
To keep the project maintainable, it is recommended to modularize the ontology.
Foundational concepts and shared abstractions.
Agricultural entities and domain-specific concepts.
Observable properties, features of interest, observations, and measurement semantics.
Analytical variables, targets, feature sets, grains, roles, and model-facing constructs.
Sources, tables, columns, transformations, derivations, and lineage.
Justifications, quality constraints, documentation metadata, and usage restrictions.
Units, quantities, and compatibility mappings, likely aligned with QUDT or OM.
A feature in this ontology may eventually look conceptually like this:
{
"variable_id": "RAINFALL_CYCLE",
"label": "Cycle rainfall",
"domain_category": "Climate",
"sub_category": "Rainfall",
"data_nature": "Continuous",
"unit": "mm",
"base_sampling_frequency": "Daily",
"analytical_grain": "FieldPlot-Cycle-Season",
"observed_entity": "FieldPlot",
"source_tables": [
"ClimateDailyReadings",
"ClimateTrainingTable"
],
"transformation": "Accumulated over cycle",
"candidate_targets": [
"RealProductivity",
"BorerIndex"
],
"analytical_role": "Water availability indicator",
"agronomic_justification": "Water availability affects crop development and productivity"
}This is not meant to be the final data model, but rather a conceptual example of the kind of semantic richness the project aims to support.
This ontology may be useful for:
- data engineers
- ML engineers
- agronomists
- ontology engineers
- knowledge graph practitioners
- data governance teams
- data scientists
- product teams building agricultural data platforms
- researchers working on semantic agricultural interoperability
To avoid confusion, this project is not intended to be:
- a replacement for ETL or ELT pipelines
- a replacement for SQL-based transformation logic
- a full agricultural ERP model
- a generic data warehouse design framework
- a single universal ontology for all of agriculture
- a substitute for feature stores or data catalogs
Instead, it is meant to be a semantic and ontological layer that complements those systems.
This project is in its early stage.
The current intent is to establish:
- a clear conceptual scope
- a foundational core model
- a first set of domain concepts
- a first set of analytical feature semantics
- an open collaboration path
The ontology is expected to evolve incrementally and pragmatically.
Possible milestones include:
- define foundational classes and properties
- publish first core module
- publish first JSON-LD and Turtle examples
- add agricultural domain modules
- add observation semantics
- add analytical feature semantics
- add target variable semantics
- add provenance and governance modules
- align with existing external vocabularies
- add example datasets and notebooks
- publish reusable feature semantic templates
- add SPARQL query examples
- add validation and testing workflows
- prepare stable releases
This is intended to be an open project.
Contributions are welcome in areas such as:
- ontology design
- agricultural domain modeling
- semantic alignment
- JSON-LD examples
- Turtle modeling
- documentation
- notebooks and visualization
- practical use cases
- issue reports and conceptual reviews
Good contributions will usually help improve one or more of the following:
- semantic clarity
- interoperability
- practical usability
- documentation quality
- domain correctness
- Open an issue describing the proposed concept, change, or module.
- Explain the domain need and expected semantic value.
- Propose class/property names and definitions.
- Discuss alignment with existing vocabularies.
- Add examples.
- Keep changes modular and documented.
As the ontology evolves, names should aim to be:
- explicit
- domain-meaningful
- reusable
- stable
- not overly tied to one single internal dataset
- understandable by both technical and domain users
For example, a semantically rich concept such as CycleRainfall is often more reusable than a purely internal column name. At the same time, mappings to actual source column names should be preserved when relevant.
The long-term vision for InfiniteStack Ontology is to help create an open semantic foundation for agricultural analytics that can support:
- interoperable agricultural data products
- explainable analytical variables
- semantic feature registries
- ontology-aware lakehouse governance
- graph-based exploration of agricultural knowledge
- model-facing semantic catalogs
- reusable agricultural analytics standards
- semantic feature stores
- graph-based explainability for ML features
- ontology-backed data catalogs
- ontology-aware documentation generators
- linked data publication for agricultural analytics assets
- alignment between agronomic expertise and machine learning pipelines
A practical way to begin with this repository is:
- read the conceptual documentation
- inspect the core ontology files
- review example JSON-LD and Turtle artifacts
- load the ontology into a notebook using RDF libraries
- query it with SPARQL
- experiment with feature-level semantic descriptions
- propose additions through GitHub issues or pull requests
A typical exploration workflow may involve:
- loading JSON-LD with
rdflib - converting it to Turtle for inspection
- querying feature definitions via SPARQL
- visualizing subgraphs for individual variables
- using widgets to inspect semantic properties of selected features
This makes the ontology immediately usable for experimentation and validation.
This repository should include an open license appropriate for semantic and documentation assets.
A permissive option such as MIT or Apache-2.0 may be suitable, depending on the desired governance model.
This project starts from a practical need encountered in agricultural data and AI work:
to build a semantic layer capable of explaining not only what data exists, but also why analytical variables exist, how they were derived, and why they matter.
That is the central motivation behind InfiniteStack Ontology.
Contributions, discussions, and critiques are welcome.
This project is especially interested in collaboration across:
- agriculture
- data engineering
- semantic web
- ontology engineering
- machine learning
- analytics governance
If you work with agricultural data and have ever asked questions like:
- Why was this feature created?
- What does this variable really mean?
- What is the semantic difference between a daily observation and a cycle-level analytical variable?
- How can we make feature engineering more explainable and reusable?
then this project was created for exactly that kind of problem.