A full-stack application for generating synthetic data using weak supervision, generative models, and AI agents.
# 1. Run installation script
./install.sh
# 2. Create .env file with your NVIDIA API key
nano .env
# Add: NVIDIA_API_KEY=nvapi-your-key-here# Start all services (Frontend + Backend + Agent)
./start.sh
# Open in browser: http://localhost:8080# Stop all services
./stop.shIf you prefer to set up manually:
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r backend/requirements.txt
# Create .env file
echo "NVIDIA_API_KEY=your_key_here" > .env
# Terminal 1: Start Flask API (Port 5000)
python backend/flask_api.py --port 5000
# Terminal 2: Start Orchestrator (Port 8000)
python backend/agents/serve_orchestrator.py# Terminal 3: Start Frontend (Port 8080)
cd frontend
npm install
npm run dev- Frontend (Port 8080): React + TypeScript + Vite
- Flask API (Port 5000): Direct tool access for labeling, training, generation
- Orchestrator (Port 8000): AI agent for natural language workflows
Backend:
- Flask - RESTful API
- FastAPI - Agent orchestrator
- PyTorch - Deep learning models
- Snorkel - Weak supervision labeling
- LangChain + LangGraph - Agent orchestration
- NVIDIA NIMs - LLM services
Frontend:
- React + TypeScript
- Vite - Build tool
- TanStack Query - Data fetching
- Shadcn UI - Component library
- Tailwind CSS - Styling
- Backend API: See
backend/FLASK_API_README.md - Agent Setup: See
backend/AGENT_SETUP.md - Quick Reference: See
backend/API_QUICK_REFERENCE.md
Create a .env file in the project root:
# Required for Agent Mode
NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxxGet your API key from NVIDIA Build
- Upload Dataset: CSV or Parquet files
- Data Labeling: Weak supervision with Snorkel
- Model Training: CVAE, CTGAN, VAE+GMM models
- Data Generation: Generate synthetic samples
- Export: Download as Parquet or CSV
- Natural Language: Describe your goal in plain English
- Automated Workflow: Agent orchestrates all steps
- Self-Correcting: Adjusts hyperparameters automatically
- Multi-Step Tasks: Handles complex workflows
HackUTDNVIDIA/
βββ backend/
β βββ agents/ # AI agent pipelines
β βββ DataFoundry/ # Core data processing
β βββ testing_data/ # Sample datasets
β βββ output_data/ # Generated outputs
β βββ flask_api.py # Main API server
β βββ requirements.txt # Python dependencies
βββ frontend/
β βββ src/
β β βββ components/ # React components
β β βββ pages/ # Page components
β β βββ lib/ # API client & utilities
β β βββ hooks/ # Custom React hooks
β βββ package.json # Node dependencies
βββ logs/ # Service logs
βββ install.sh # Installation script
βββ start.sh # Start all services
βββ stop.sh # Stop all services
βββ .env # Environment variables (create this)
View logs for each service:
# Flask API
tail -f logs/flask.log
# Orchestrator
tail -f logs/orchestrator.log
# Frontend
tail -f logs/frontend.log# Activate venv
source venv/bin/activate
# Test backend API
cd backend
python -m pytest
# Test agents
python -m agents.test_agents --test all# Check what's using a port
lsof -ti:5000 # Flask
lsof -ti:8000 # Orchestrator
lsof -ti:8080 # Frontend
# Kill process on port
lsof -ti:5000 | xargs kill -9- Make sure virtual environment is activated
- Check all dependencies are installed
- Verify
.envfile exists with API key - Check logs for specific errors
- Verify Flask is running on port 5000
- Check CORS is enabled in Flask
- Verify
VITE_BACKEND_URLinfrontend/.envis set tohttp://localhost:5000
[Add your license here]
[Add contributors here]
- NVIDIA for NIM services
- Snorkel AI for weak supervision
- LangChain for agent framework