This file documents the codebase structure, development workflows, and conventions for AI assistants working in this repository.
HopDatabase is a full-stack application that:
- Scrapes hop variety data from 6 commercial hop suppliers (Python backend)
- Merges and normalizes the data into a unified JSON dataset
- Visualizes it via an interactive React web app deployed to GitHub Pages
Live site: https://kasperg3.github.io/HopDatabase
HopDatabase/
├── hop_database/ # Python package (scraping + data model)
│ ├── models/
│ │ └── hop_model.py # HopEntry dataclass + analysis functions
│ ├── scrapers/ # One module per supplier
│ │ ├── yakima_chief.py
│ │ ├── yakima_valley_hops.py
│ │ ├── barth_haas.py
│ │ ├── hopsteiner.py # Uses cached hopsteiner_raw_data.json
│ │ ├── crosby_hops.py
│ │ └── john_i_haas.py
│ ├── utils/
│ └── data/ # Raw/cached data files
│ └── hopsteiner_raw_data.json
├── website/ # React frontend
│ ├── public/
│ │ └── data/hops.json # Final merged dataset (served statically)
│ └── src/
│ ├── components/ # 22 React components
│ ├── contexts/ # HopFilterContext (useReducer state)
│ ├── hooks/ # Custom React hooks
│ ├── services/ # HopDataService singleton
│ └── utils/ # Utilities, constants, presets
├── run_scrapers.py # Main ETL orchestration script
├── setup.py # Python package config (entry point: hop-scraper)
├── requirements.txt # Python deps: requests, beautifulsoup4
├── notebook.ipynb # Jupyter analysis notebook
└── .github/workflows/
├── ci.yaml # Runs scrapers monthly, uploads hops.json artifacts
└── deploy.yml # Builds + deploys React app to GitHub Pages
The central data structure is the HopEntry dataclass:
@dataclass
class HopEntry:
name: str
country: str = ""
source: str = ""
href: str = ""
alpha_from, alpha_to: Union[str, float] # Alpha acid % range
beta_from, beta_to: Union[str, float] # Beta acid % range
oil_from, oil_to: Union[str, float] # Oil content mL/100g range
co_h_from, co_h_to: Union[str, float] # Cohumulone % range
notes: List[str]
standardized_aromas: Dict[str, float] # Keys = STANDARD_AROMAS (0–5 scale)
raw_aroma_data: Dict[str, float] # Source-specific, pre-mapping
storage: str
additional_properties: Dict # Hopsteiner extended fields9 standard aroma categories (all values 0–5 after scaling):
Citrus, Resin/Pine, Spice, Herbal, Grassy, Floral, Berry, Stone Fruit, Tropical Fruit
Key methods:
set_standardized_aromas(source_name, source_aroma_data)— maps source-specific aroma labels to standard categories usingAROMA_MAPPINGSget_brewing_purpose()→"Bittering"/"Aroma"/"Dual Purpose"based on alpha/oil thresholdsto_dict()— serializes to JSON-ready dict (note:standardized_aromasis serialized as key"aromas")
Serialization note: HopEntry.to_dict() outputs "aromas" (not "standardized_aromas") — the frontend reads hop.aromas.
Each scraper lives in hop_database/scrapers/<name>.py and exposes a single function:
def scrape(save=False) -> List[HopEntry]:
...save=Falseis the norm; the orchestrator handles saving.- Scrapers parse HTML with
BeautifulSoup4, or in Hopsteiner's case, read from a cached JSON file (hop_database/data/hopsteiner_raw_data.json). - After extracting raw aroma data, scrapers call
hop.set_standardized_aromas(source_key, aroma_dict).
Source keys used in AROMA_MAPPINGS:
| Scraper | Source key |
|---|---|
| yakima_chief | "yakima" |
| barth_haas | "barth" |
| hopsteiner | "hopsteiner" |
| crosby_hops | "crosby" |
| john_i_haas | (no aroma mapping defined yet) |
| yakima_valley_hops | (no aroma mapping defined yet) |
The main() function runs the full pipeline:
- Extract — runs all 6 scrapers (YCH US + EU are deduped by name before combining)
- Scale aromas —
scale_aroma_values_by_source()normalizes each source's aroma values to 0–5 based on that source's overall maximum - Merge —
merge_hops()groups hops by normalized name, appliesMERGE_NAME_ALIASESfor known equivalents, then merges:- Brewing ranges:
min(from values)/max(to values)across sources - Aromas: average across sources that have a non-zero value
- Country: first non-empty value
- Source, href, storage: joined with
/or|
- Brewing ranges:
- Save — writes to
data/hops.json,data/combined.json, andwebsite/public/data/hops.json
Name normalization (normalize_hop_name):
- Lowercases, strips trademark symbols (
®,™), removes content in parentheses, strips trailing 2–3 char suffixes (e.g.-US)
Known name aliases (MERGE_NAME_ALIASES in run_scrapers.py):
- Hallertauer variants →
"hallertauer mittelfrüh" "East kent goldings"→"East kent golding""Fuggle"→"Fuggles"
To run the full pipeline locally:
pip install -r requirements.txt
python run_scrapers.pyOr via the installed entry point:
pip install -e .
hop-scraper| Library | Version | Purpose |
|---|---|---|
| React | 18.2.0 | UI framework |
| Mantine UI | 7.3.2 | Component library + theming |
| Plotly.js | 2.27.1 | Spider/radar charts |
| Recharts | 3.1.0 | Additional charts |
| @tabler/icons-react | 2.46.0 | Icons |
| CRACO | 7.1.0 | Webpack config override |
| gh-pages | 6.1.0 | GitHub Pages deployment |
cd website
npm install
npm start # Dev server (hot reload)
npm test # Run tests
npm run build # Production build + bundle analysis
npm run build:prod # Production build without source maps
npm run deploy # Build (no source maps) + push to gh-pages branchEntry points:
src/index.js→ mounts React appsrc/App.js→ wraps with MantineMantineProvider+HopFilterContextsrc/AppContent.js→ main layout, data loading, UI orchestration
Core components:
| Component | Purpose |
|---|---|
HopSelector.js |
Multi-select hop picker with all filter controls |
AromaFilters.js |
Click-cycle aroma filters (none → high → low) |
BrewingParameterFilters.js |
Range sliders for alpha, cohumulone, oil |
QuickStylePresets.js |
Quick-select brewing style presets |
StylePresetsModal.js |
Modal for browsing all style presets |
FilterSummary.js |
Displays active filter count/state |
LazySpiderChart.js |
Lazy-loaded wrapper for the radar chart |
SpiderChart.js |
Plotly.js radar chart for aroma comparison |
BrewingSummary.js |
Brewing stats and recommendations |
SelectedHops.js |
List display of selected hops |
BrewingParametersComparison.js |
Side-by-side parameter table |
State management:
src/contexts/HopFilterContext.js—useReducer-based global state for all filter state (aroma filters, parameter ranges, thresholds, selected hops)src/hooks/useHopFiltering.js— filtering logic (standalone)src/hooks/useHopFilteringWithContext.js— filtering logic connected to contextsrc/services/HopDataService.js— singleton that fetches and cacheshops.json
Utilities:
src/utils/hopUtils.js— hop-specific calculationssrc/utils/hopConstants.js— shared constantssrc/components/presets.js— brewing style preset definitionssrc/components/constants.js— component-level constants
The frontend fetches public/data/hops.json at runtime via HopDataService. The JSON structure per hop:
{
"name": "Citra",
"country": "USA",
"source": "Yakima Chief / Barth Haas",
"href": "https://...",
"alpha_from": 11.0,
"alpha_to": 13.0,
"beta_from": 3.5,
"beta_to": 4.5,
"oil_from": 2.2,
"oil_to": 2.8,
"co_h_from": 22.0,
"co_h_to": 24.0,
"storage": "",
"notes": ["tropical", "citrus"],
"aromas": {
"Citrus": 4.5,
"Resin/Pine": 1.0,
"Spice": 0.5,
"Herbal": 0.0,
"Grassy": 0.0,
"Floral": 1.5,
"Berry": 2.0,
"Stone Fruit": 1.0,
"Tropical Fruit": 4.0
},
"additional_properties": {},
"brewing_stats": {
"brewing_purpose": "Aroma",
"avg_alpha": 12.0,
"avg_beta": 4.0,
"avg_oil": 2.5,
"avg_cohumulone": 23.0,
"alpha_classification": "High",
"oil_classification": "High"
}
}- Triggers: Scheduled monthly (1st of month 00:00 UTC), tag pushes (
v*), manual dispatch - Steps: Python 3.10 →
pip install→python run_scrapers.py→ uploadhops.json+combined.jsonas artifacts - Release job: Creates GitHub release with JSON data on tag pushes
- Triggers: Push to
main, pull requests - Steps: Node.js 18 →
npm ci→npm run build:prod→ deploy to GitHub Pages - Permissions: Requires
pages: writeandid-token: write
- Create
hop_database/scrapers/<supplier_name>.py - Implement
def scrape(save=False) -> List[HopEntry]: - Use
requests+BeautifulSoup4to parse the supplier's website - Map aromas using
hop.set_standardized_aromas(source_key, aroma_dict)— add a new entry toAROMA_MAPPINGSinhop_model.pyif needed - Import and call from
run_scrapers.py→ add tocombined_hop_entries
Add to MERGE_NAME_ALIASES in run_scrapers.py:
MERGE_NAME_ALIASES = {
"normalized source name": "normalized canonical name",
}Names are already lowercased before lookup — use lowercase keys.
When adding a new source's aroma categories to AROMA_MAPPINGS in hop_model.py:
- Key: source-specific label (exact string from scraper data)
- Value: one of the 9
STANDARD_AROMASstrings - Multiple source categories can map to the same standard category (max intensity wins)
- Use Mantine UI components for layout and UI elements
- Aroma filter state cycles:
none→high→low→none - All global filter state lives in
HopFilterContext— don't add local state for filters - Charts are lazy-loaded; wrap new heavy components similarly with
React.lazy+Suspense - Data access always goes through
HopDataService(singleton), never fetch directly
- Python 3.8+ compatibility required
- Dependencies stay minimal — only add to
requirements.txtif essential - Scrapers must gracefully handle network failures (don't crash the whole pipeline)
- All aroma values must be 0–5 after
scale_aroma_values_by_source()runs
| File | Purpose |
|---|---|
hop_database/models/hop_model.py |
Data model, aroma mappings, serialization |
run_scrapers.py |
ETL orchestration, name normalization, merging |
website/public/data/hops.json |
Merged hop dataset consumed by frontend |
website/src/contexts/HopFilterContext.js |
Global React state |
website/src/services/HopDataService.js |
Data fetching singleton |
website/src/components/SpiderChart.js |
Aroma radar chart |
.github/workflows/ci.yaml |
Scraper automation |
.github/workflows/deploy.yml |
Frontend deployment |