FoodVision is an end-to-end food classification system that identifies pizza, steak, and sushi in real-time images. The project implements a complete model deployment workflow with comparative analysis of two state-of-the-art architectures: EfficientNet-B2 and Vision Transformer (ViT).
For detailed architecture information, please see Architecture.md.
- Accuracy: Achieve ≥95% classification accuracy on food images
- Speed: Maintain ≤0.03 seconds per inference for real-time performance (~30 FPS)
- Comparative analysis of CNN (EfficientNet-B2) vs Transformer (ViT) architectures
- Real-time inference optimization with performance benchmarking
- Interactive Gradio web application for model testing
- Automated model comparison and performance analysis
my_project/
├── data/ # Dataset directory (train/test images)
├── models/ # Trained model checkpoints
├── utils/ # Reusable utility modules
├── app/ # Web application and UI components
│ ├── foodvision_app.py # Gradio web application
│ └── test_app.py # Application testing utilities
├── training/ # Model training scripts
│ └── train_models.py # Model training script
├── tests/ # Project testing utilities
│ └── test_setup.py # Project environment verification
├── run_project.py # Main project runner
├── requirements.txt # Project dependencies
├── README.md # Project overview and usage
├── Architecture.md # Architecture documentation
└── PROJECT_SUMMARY.md # Project summary
For detailed architecture information, please see Architecture.md.
# Clone the repository
git clone <repository-url>
cd my_project
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThe project requires the pizza, steak, sushi dataset. If you have the zip file:
# Extract the dataset (if provided as zip)
unzip pizza_steak_sushi_20_percent.zip -d data/
# Or download the dataset separately to the data/ directory
# Ensure the structure is: data/train/pizza/, data/train/steak/, data/train/sushi/
# And: data/test/pizza/, data/test/steak/, data/test/sushi/The project includes a unified script to manage different operations:
# Launch the FoodVision web application
python run_project.py ui
# Run model tests
python run_project.py test
# Train models (EfficientNet-B2 and ViT)
python run_project.py train
# Display project information and available actions
python run_project.py info# Train both EfficientNet-B2 and Vision Transformer models
python training/train_models.py
# Launch the Gradio interface
python app/foodvision_app.py
# The app will be available at http://localhost:7860
# Verify the project environment and dependencies
python tests/test_setup.py
# Test the trained model
python app/test_app.py| Metric | EfficientNet-B2 | Vision Transformer |
|---|---|---|
| Accuracy | 96.9% | 98.5% |
| Inference Time | 0.027s | 0.066s |
| Frames Per Second | 37 FPS | 15 FPS |
| Parameters | 7.7M | 85.8M |
| Model Size | 29 MB | 327 MB |




