AgroAdvisor

End-to-end reasoning pipeline in action
Conversational Advisor
Deep Query

Inspiration

Agriculture is one of the most data-rich industries in the world. Still, most farmers and agribusinesses rely more on instinct than on evidence when making decisions. The data is there, on crop yields, climate patterns, pesticide use, and commodity prices, but it is scattered across different sources, difficult to query, and often impossible to understand without technical expertise.

The inequality we set out to tackle is straightforward: large agribusinesses can afford teams of data scientists and analysts. Smallholder farmers in Spain or Colombia cannot. We wanted to help level that playing field.

The Denodo AI SDK challenge gave us the ideal opportunity to do just that. By combining a live virtual database that federates diverse agricultural data with LLM-powered SQL generation, we made it possible for anyone to query complex datasets using plain language.

What it does

AgroAdvisor is a single-page web application built around six specialized analysis modules, all powered by the same core reasoning pipeline.

Consulta Libre allows users to ask any free-form agricultural question and receive a data-backed answer with supporting charts. Asesor de Cultivos recommends which crops to plant based on regional, soil, and climate conditions. Análisis de Pesticidas examines the relationship between pesticide use and crop yield, identifying efficiency levels and diminishing-returns thresholds. Impacto Climático analyzes how long-term temperature and rainfall trends influence agricultural productivity. Inteligencia de Mercado explores commodity price trends and highlights potential investment signals. Informe Regional generates a comprehensive agricultural profile for any country.

Results can also be exported as a professionally formatted PDF report, making them easy to share and apply in real-world decision-making.

How we built it

Data Layer — Denodo VDP

All data is federated through five virtual views in a Denodo Virtual DataPort database:

yield — historical crop output by country and year, including temperature, rainfall, and pesticide usage
crop — ideal growing conditions for each crop type (N, P, K, pH, temperature, humidity, rainfall)
prices — commodity price data by crop, country, and year
water_usage — agricultural water consumption metrics by region and time period
farm — structural agricultural data, including land use and farm-level indicators

The virtual layer allows us to query heterogeneous CSV sources with full SQL, without managing a physical database. This was critical in a 24-hour hackathon setting, where speed and flexibility mattered more than infrastructure.

The 3-Phase Reasoning Pipeline

The core of AgroAdvisor is a custom reasoning pipeline built on top of the Denodo AI SDK. Each user query goes through four sequential phases:

Phase 0 — Semantic Retrieval
ChromaDB vector search (using gemini-embedding-001 embeddings) ranks all available tables (yield, crop, prices, water_usage, farm) by semantic similarity to the user’s question. This is combined with a lightweight keyword-scoring function over our internal table catalogue.

Phase 1 — Schema Discovery
A focused answerMetadataQuestion call retrieves the exact column names and data types for the top-ranked tables. This ensures that the SQL-generation model always operates with precise and up-to-date schema context.

Phase 2 — Data Extraction
An enriched instruction block containing the schema, vector context, and strict SQL rules is sent to answerDataQuestion. The model generates SQL, executes it against the Denodo VDP, and returns structured results.

Phase 3 — Interpretation
A reasoning model (Gemini 2.5 Flash via deepQuery) receives the raw SQL output and synthesizes a final agronomist-style recommendation grounded in the retrieved data.

For multi-query endpoints such as Crop Advisor, which executes several Phase-2 queries in parallel, total latency can be expressed as:

[ T_{\text{total}} = T_0 + T_1 + \max_{i}(T_{2,i}) + \max_{i}(T_{3,i}) ]

where (T_0) and (T_1) are sequential, and Phases 2 and 3 are executed concurrently using asyncio.gather.

Backend Chart Generation

After each SQL response, the backend inspects the result structure and selects the most appropriate visualization:

If a year column is detected → line chart (one per numeric metric, up to 4)
If a categorical column with many rows is detected → horizontal bar chart (up to 3 metrics)
If a categorical column with (n \leq 10) rows and (k \geq 4) numeric metrics is detected → radar chart

For radar charts, numeric metrics are normalized to a 0–100 scale:

[ \hat{v}{i,c} = \frac{v{i,c}}{\max_j v_{j,c}} \times 100 ]

Charts are rendered server-side using matplotlib with the non-interactive Agg backend, saved as PNG files in static/charts/, and returned as URL paths. No client-side rendering is required.

Prompt Engineering

The SYSTEM_INSTRUCTIONS string functions as a compact agricultural knowledge base and SQL governance layer. It includes:

A glossary mapping Spanish agricultural terms to exact database column names (for example, rendir → hg_ha_yield, pesticidas → pesticides_tonnes)
A translation layer for crop and country names (Spanish to the English values stored in the database)
Few-shot SQL examples covering common analytical patterns across yield, crop, prices, water_usage, and farm
A mandatory multi-table reasoning policy, encouraging the model to cross-reference production, climate, market, and structural data whenever relevant
An expert decision rule: the system does not ask clarifying questions. It selects the most reasonable interpretation of the query, executes the appropriate analysis, and delivers a clear, data-backed recommendation.

Challenges we ran into

Accomplishments that we're proud of

What we learned

Prompt position matters. Prepending a critical rule vs. appending it has a measurable effect on LLM compliance — the model pays more attention to instructions it reads first.
Virtual data federation is underrated for hackathons. Denodo VDP let us join heterogeneous CSV datasets, expose them over a single authenticated SQL endpoint, and query them from the AI SDK without managing indices, migrations or schemas. We went from raw CSVs to queryable views in under an hour.
Backend chart generation is more reliable than client-side. Column type detection, normalisation and chart-type selection are far easier to implement and debug in Python than in JavaScript, especially when the incoming data structure varies across queries.
Mandatory cross-table analysis produces better answers. Early versions of the system prompt allowed the model to answer from a single table. Adding an explicit multi-table workflow (query all relevant views, cross-reference findings) produced dramatically richer, more accurate recommendations.