Ilex paraguariensis contains numerous bioactive compounds and is a form of social interaction in many countries. Yerba mate beverages generally contain higher total polyphenol content (TPC) compared to other plant-based drinks, with hot water-extracted mate offering a higher amount of polyphenol intake, highlighting its importance as a source of antioxidants. For TPC quantification, the colorimetric method using smartphones combined with the Folin-Ciocalteu (FC) assay has been employed for a low-cost and rapid evaluation in different products. However, this approach relies on the FC assay, limiting this method’s accessibility due to the reagent costs and equipment involved.
The study of this repository is present at this link.
This repository provides tools for analyzing and predicting Total Polyphenol Content (TPC) in yerba mate aqueous extracts before a Folin-Ciocalteu assay using image processing and machine learning. The system uses RGB and HSV color features along with pH measurements to estimate phenolic compound concentrations.
- Clone the repository:
git clone https://github.com/GustavoSantiago113/YerbaMatePhenolicExtraction.git
cd YerbaMatePhenolicExtraction- Create and activate a virtual environment (recommended):
python -m venv env
source env/bin/activate # Linux/Mac
env\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txtColor Extraction colorsExtraction
Tool for extracting color features from yerba mate extract images obtained using a cellphone camera and the image capture chamber.
Usage:
- Place your images in the "images" folder
- Run the color extraction script:
python colorsExtraction/colorExtraction.py- Follow the GUI prompts:
- Select folder containing images
- Click on points of interest in each image
- Confirm selection using the button
- After iterating over all images of the foler, the results will be saved in colorsExtraction/color_points.csv
Data Analysis dataAnalysis
Contains scripts in R and Python for model training and evaluation.
Components:
-
data-analysis.R: R script for initial data analysis, feature selections with XGB-Boruta and Pearson Correlation and visualization.
-
modelsWithOptuna.py: Implements and optimizes multiple machine learning models using Optuna:
- Gradient Boosting
- XGBoost
- Elastic Net
- SVM
- Polynomial Regression
- A simple Neural Network
-
modelExport.py: Exports the best performing model (Elastic Net) for implementation.
-
Output files:
- model_metrics_optuna.csv: Performance metrics for all models.
- best_hyperparameters.txt: Optimized parameters for each model.
- model_comparisons_optuna.png: Visual comparison of model performances.
Implementation implementation.py
Interactive tool for predicting TPC from new images using the trained model.
Usage:
- Run the implementation script:
python implementation.py- Follow the interactive steps:
- Select an image when prompted
- Click on a point in the image to analyze
- Enter the pH value in the text box
- Click "Confirm" to get the TPC prediction
The implemented Elastic Net model achieves:
- RMSE: 128.53
- RRMSE: 40.66%
- KGE: 0.696
- R²: 0.612
- Input images should be well-lit photographs of yerba mate extract inside the cuvette within the image capture chamber.
- pH measurements.
- Supported image formats: .jpg, .jpeg, .png, .bmp