Skip to content

reyanp/Arctyx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DataFoundry - Synthetic Data Generation Platform

A full-stack application for generating synthetic data using weak supervision, generative models, and AI agents.

πŸš€ Quick Start

One-Time Setup

# 1. Run installation script
./install.sh

# 2. Create .env file with your NVIDIA API key
nano .env
# Add: NVIDIA_API_KEY=nvapi-your-key-here

Running the Application

# Start all services (Frontend + Backend + Agent)
./start.sh

# Open in browser: http://localhost:8080

Stopping the Application

# Stop all services
./stop.sh

πŸ“‹ Manual Setup (Alternative)

If you prefer to set up manually:

Backend Setup

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r backend/requirements.txt

# Create .env file
echo "NVIDIA_API_KEY=your_key_here" > .env

# Terminal 1: Start Flask API (Port 5000)
python backend/flask_api.py --port 5000

# Terminal 2: Start Orchestrator (Port 8000)
python backend/agents/serve_orchestrator.py

Frontend Setup

# Terminal 3: Start Frontend (Port 8080)
cd frontend
npm install
npm run dev

πŸ—οΈ Architecture

Services

  • Frontend (Port 8080): React + TypeScript + Vite
  • Flask API (Port 5000): Direct tool access for labeling, training, generation
  • Orchestrator (Port 8000): AI agent for natural language workflows

Tech Stack

Backend:

  • Flask - RESTful API
  • FastAPI - Agent orchestrator
  • PyTorch - Deep learning models
  • Snorkel - Weak supervision labeling
  • LangChain + LangGraph - Agent orchestration
  • NVIDIA NIMs - LLM services

Frontend:

  • React + TypeScript
  • Vite - Build tool
  • TanStack Query - Data fetching
  • Shadcn UI - Component library
  • Tailwind CSS - Styling

πŸ“– Documentation

  • Backend API: See backend/FLASK_API_README.md
  • Agent Setup: See backend/AGENT_SETUP.md
  • Quick Reference: See backend/API_QUICK_REFERENCE.md

πŸ”‘ Environment Variables

Create a .env file in the project root:

# Required for Agent Mode
NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxx

Get your API key from NVIDIA Build

🎯 Features

Manual Mode (Direct Control)

  • Upload Dataset: CSV or Parquet files
  • Data Labeling: Weak supervision with Snorkel
  • Model Training: CVAE, CTGAN, VAE+GMM models
  • Data Generation: Generate synthetic samples
  • Export: Download as Parquet or CSV

Agent Mode (AI-Powered)

  • Natural Language: Describe your goal in plain English
  • Automated Workflow: Agent orchestrates all steps
  • Self-Correcting: Adjusts hyperparameters automatically
  • Multi-Step Tasks: Handles complex workflows

πŸ“ Project Structure

HackUTDNVIDIA/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ agents/                 # AI agent pipelines
β”‚   β”œβ”€β”€ DataFoundry/           # Core data processing
β”‚   β”œβ”€β”€ testing_data/          # Sample datasets
β”‚   β”œβ”€β”€ output_data/           # Generated outputs
β”‚   β”œβ”€β”€ flask_api.py           # Main API server
β”‚   └── requirements.txt       # Python dependencies
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/        # React components
β”‚   β”‚   β”œβ”€β”€ pages/            # Page components
β”‚   β”‚   β”œβ”€β”€ lib/              # API client & utilities
β”‚   β”‚   └── hooks/            # Custom React hooks
β”‚   └── package.json          # Node dependencies
β”œβ”€β”€ logs/                      # Service logs
β”œβ”€β”€ install.sh                # Installation script
β”œβ”€β”€ start.sh                  # Start all services
β”œβ”€β”€ stop.sh                   # Stop all services
└── .env                      # Environment variables (create this)

πŸ› οΈ Development

Logs

View logs for each service:

# Flask API
tail -f logs/flask.log

# Orchestrator
tail -f logs/orchestrator.log

# Frontend
tail -f logs/frontend.log

Testing

# Activate venv
source venv/bin/activate

# Test backend API
cd backend
python -m pytest

# Test agents
python -m agents.test_agents --test all

πŸ› Troubleshooting

Port Already in Use

# Check what's using a port
lsof -ti:5000  # Flask
lsof -ti:8000  # Orchestrator
lsof -ti:8080  # Frontend

# Kill process on port
lsof -ti:5000 | xargs kill -9

Services Won't Start

  1. Make sure virtual environment is activated
  2. Check all dependencies are installed
  3. Verify .env file exists with API key
  4. Check logs for specific errors

Frontend Can't Connect to Backend

  1. Verify Flask is running on port 5000
  2. Check CORS is enabled in Flask
  3. Verify VITE_BACKEND_URL in frontend/.env is set to http://localhost:5000

πŸ“ License

[Add your license here]

πŸ‘₯ Contributors

[Add contributors here]

πŸ™ Acknowledgments

  • NVIDIA for NIM services
  • Snorkel AI for weak supervision
  • LangChain for agent framework

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors