Inspiration
We were frustrated by the gap between understanding ML concepts and actually deploying production models. Building a typical ML pipeline means stitching together dozens of tools like Jupyter notebooks for prototyping, custom training scripts, cloud GPU provisioning, model hosting, and API deployment, all carrying significant DevOps overhead. With the rise of agentic AI and RAG pipelines making this space even more compelling, we set out to build a platform where anyone could visually design, train, and deploy ML workflows end-to-end without writing boilerplate infrastructure code: a drag-and-drop canvas for wiring together data ingestion, chunking, fine-tuning, vector search, and LLM reasoning into a single deployable pipeline.
What it does
Forge is a visual ML pipeline builder that lets users design, train, and deploy machine learning workflows through a drag-and-drop node editor. Users can:
- Build pipelines visually by connecting 60+ node types, including text/image/audio/tabular inputs, chunking strategies, model architectures (CNNs, RNNs, transformers, Whisper, YOLO), LLM agents, and deployment endpoints
- Upload diverse data (PDFs, images, audio, CSVs) and have AI automatically recommend the best chunking/preprocessing strategy
- Fine-tune models on GPU with configurable hyperparameters, from SimCSE embeddings and LoRA adapters to CNN classifiers and object detectors
- Integrate agentic AI with streaming LLM chat, tool use, sub-agents, and RLHF/RLAIF feedback loops
- Deploy trained models as persistent API endpoints with semantic search (RAG) capabilities
- Monitor training in real-time via streaming logs
How we built it
We built Forge as a distributed monorepo with five core layers:
- Frontend: React + TypeScript + Vite, using React Flow for the node-based visual editor and Zustand for state management, styled with Tailwind CSS
- Backend API: Cloudflare Workers running Hono (a lightweight TypeScript framework), with Cloudflare KV for workflow/job metadata and Cloudflare R2 for model weight storage
- GPU Compute: Modal (serverless GPU platform) running FastAPI endpoints for training, inference, chunking, and semantic search, with PyTorch, HuggingFace Transformers, Sentence-Transformers, and PEFT for LoRA fine-tuning
- Vector Search: Actian VectorAI database for semantic embeddings powering RAG pipelines
- Pipeline Engine: A Python DAG executor that topologically sorts workflow nodes and executes them sequentially, with a code generator that dynamically produces training scripts based on node types
The key architectural insight was separating orchestration (Cloudflare Workers, fast, cheap, global) from compute (Modal, on-demand GPUs that scale to zero when idle).
Challenges we ran into
- Browser timeouts: GPU training jobs take minutes, but browsers timeout HTTP requests after around 30 seconds. We solved this by returning a job_id instantly and using Server-Sent Events (SSE) for real-time log streaming and status polling.
- CORS on error paths: Our error responses were missing CORS headers, causing cryptic "Load failed" errors in the browser. We had to make sure every response path, including 500s, included proper headers.
- File routing complexity: A single training request can contain text files (need chunking), images (need R2 storage), and config JSON, all in one FormData payload. We built a type-aware router in the Worker that inspects node types to figure out how each file should be processed.
- Cold starts on GPU: Modal functions cold-start when no warm instance exists. We optimized by pre-baking dependencies into container images and using Modal's keep-warm features strategically.
- Duck-typing across boundaries: Cloudflare Workers send files as blobs to Modal, but Modal expects FastAPI's UploadFile. We had to duck-type the file objects to bridge the gap.
Accomplishments that we're proud of
- 60+ node types covering text, image, audio, tabular data, LLMs, and deployment, all composable via drag-and-drop
- Agentic AI chunking that analyzes file samples and automatically recommends the best preprocessing strategy
- Zero-idle-cost GPU training since Modal scales to zero when no jobs are running, so we only pay for actual compute time
- Automated tool generation for orchestrator-worker multi-agent architectures
- End-to-end in one platform, from raw data upload to deployed API endpoint, without leaving the canvas
- Real-time training logs streamed via SSE so users can watch their models train live
- Dynamic code generation where the backend generates Python training scripts on-the-fly based on the visual pipeline topology
What we learned
- Serverless edge + serverless GPU is a powerful combination. Cloudflare Workers handle global low-latency API routing while Modal provides burst GPU capacity without infrastructure management
- Streaming is essential for ML UX. Users need real-time feedback during long-running training jobs, and SSE is way simpler than WebSockets for this use case
- File handling is the hardest part. Parsing PDFs, DOCX, HTML, CSV, and audio files reliably across a distributed system requires extensive error handling and format detection
- Visual pipeline builders need strong validation. Users can create invalid graphs (cycles, type mismatches, missing connections), so we built a DAG validator that catches errors before they hit the GPU
What's next for Forge
- Collaborative editing with real-time multiplayer workflow editing and conflict resolution
- Pipeline versioning with git-like version control for workflow definitions and trained models
- Auto-ML nodes that automatically search hyperparameter spaces and select optimal architectures
- Marketplace to share and discover community-built pipeline templates and fine-tuned models
- Edge deployment to export trained models to run on-device via ONNX/TensorRT
- Expanded agentic capabilities including multi-step reasoning chains, memory-augmented agents, and autonomous pipeline optimization via RLHF
Built with: React, TypeScript, Vite, React Flow, Zustand, Tailwind CSS, Cloudflare Workers, Hono, Cloudflare KV, Cloudflare R2, Modal, FastAPI, PyTorch, HuggingFace Transformers, Sentence-Transformers, PEFT (LoRA), OpenAI API, Actian VectorAI, Python, Node.js
Built With
- actian
- cloud
- cloudflare
- fastapi
- hono
- huggingface
- modal
- next.js
- node.js
- openai
- peft
- python
- pytorch
- react.js
- sentence-transformers
- tailwind
- typescript
- vite
- zustand
Log in or sign up for Devpost to join the conversation.