Forge

Inspiration

We were frustrated by the gap between understanding ML concepts and actually deploying production models. Building a typical ML pipeline means stitching together dozens of tools like Jupyter notebooks for prototyping, custom training scripts, cloud GPU provisioning, model hosting, and API deployment, all carrying significant DevOps overhead. With the rise of agentic AI and RAG pipelines making this space even more compelling, we set out to build a platform where anyone could visually design, train, and deploy ML workflows end-to-end without writing boilerplate infrastructure code: a drag-and-drop canvas for wiring together data ingestion, chunking, fine-tuning, vector search, and LLM reasoning into a single deployable pipeline.

What it does

Forge is a visual ML pipeline builder that lets users design, train, and deploy machine learning workflows through a drag-and-drop node editor. Users can:

Build pipelines visually by connecting 60+ node types, including text/image/audio/tabular inputs, chunking strategies, model architectures (CNNs, RNNs, transformers, Whisper, YOLO), LLM agents, and deployment endpoints
Upload diverse data (PDFs, images, audio, CSVs) and have AI automatically recommend the best chunking/preprocessing strategy
Fine-tune models on GPU with configurable hyperparameters, from SimCSE embeddings and LoRA adapters to CNN classifiers and object detectors
Integrate agentic AI with streaming LLM chat, tool use, sub-agents, and RLHF/RLAIF feedback loops
Deploy trained models as persistent API endpoints with semantic search (RAG) capabilities
Monitor training in real-time via streaming logs

How we built it

We built Forge as a distributed monorepo with five core layers:

Frontend: React + TypeScript + Vite, using React Flow for the node-based visual editor and Zustand for state management, styled with Tailwind CSS
Backend API: Cloudflare Workers running Hono (a lightweight TypeScript framework), with Cloudflare KV for workflow/job metadata and Cloudflare R2 for model weight storage
GPU Compute: Modal (serverless GPU platform) running FastAPI endpoints for training, inference, chunking, and semantic search, with PyTorch, HuggingFace Transformers, Sentence-Transformers, and PEFT for LoRA fine-tuning
Vector Search: Actian VectorAI database for semantic embeddings powering RAG pipelines
Pipeline Engine: A Python DAG executor that topologically sorts workflow nodes and executes them sequentially, with a code generator that dynamically produces training scripts based on node types

The key architectural insight was separating orchestration (Cloudflare Workers, fast, cheap, global) from compute (Modal, on-demand GPUs that scale to zero when idle).

Challenges we ran into

Browser timeouts: GPU training jobs take minutes, but browsers timeout HTTP requests after around 30 seconds. We solved this by returning a job_id instantly and using Server-Sent Events (SSE) for real-time log streaming and status polling.
CORS on error paths: Our error responses were missing CORS headers, causing cryptic "Load failed" errors in the browser. We had to make sure every response path, including 500s, included proper headers.
File routing complexity: A single training request can contain text files (need chunking), images (need R2 storage), and config JSON, all in one FormData payload. We built a type-aware router in the Worker that inspects node types to figure out how each file should be processed.
Cold starts on GPU: Modal functions cold-start when no warm instance exists. We optimized by pre-baking dependencies into container images and using Modal's keep-warm features strategically.
Duck-typing across boundaries: Cloudflare Workers send files as blobs to Modal, but Modal expects FastAPI's UploadFile. We had to duck-type the file objects to bridge the gap.

Accomplishments that we're proud of

60+ node types covering text, image, audio, tabular data, LLMs, and deployment, all composable via drag-and-drop
Agentic AI chunking that analyzes file samples and automatically recommends the best preprocessing strategy
Zero-idle-cost GPU training since Modal scales to zero when no jobs are running, so we only pay for actual compute time
Automated tool generation for orchestrator-worker multi-agent architectures
End-to-end in one platform, from raw data upload to deployed API endpoint, without leaving the canvas
Real-time training logs streamed via SSE so users can watch their models train live
Dynamic code generation where the backend generates Python training scripts on-the-fly based on the visual pipeline topology

What we learned

Serverless edge + serverless GPU is a powerful combination. Cloudflare Workers handle global low-latency API routing while Modal provides burst GPU capacity without infrastructure management
Streaming is essential for ML UX. Users need real-time feedback during long-running training jobs, and SSE is way simpler than WebSockets for this use case
File handling is the hardest part. Parsing PDFs, DOCX, HTML, CSV, and audio files reliably across a distributed system requires extensive error handling and format detection
Visual pipeline builders need strong validation. Users can create invalid graphs (cycles, type mismatches, missing connections), so we built a DAG validator that catches errors before they hit the GPU

What's next for Forge

Collaborative editing with real-time multiplayer workflow editing and conflict resolution
Pipeline versioning with git-like version control for workflow definitions and trained models
Auto-ML nodes that automatically search hyperparameter spaces and select optimal architectures
Marketplace to share and discover community-built pipeline templates and fine-tuned models
Edge deployment to export trained models to run on-device via ONNX/TensorRT
Expanded agentic capabilities including multi-step reasoning chains, memory-augmented agents, and autonomous pipeline optimization via RLHF

Built with: React, TypeScript, Vite, React Flow, Zustand, Tailwind CSS, Cloudflare Workers, Hono, Cloudflare KV, Cloudflare R2, Modal, FastAPI, PyTorch, HuggingFace Transformers, Sentence-Transformers, PEFT (LoRA), OpenAI API, Actian VectorAI, Python, Node.js

Built With

actian
cloud
cloudflare
fastapi
hono
huggingface
modal
next.js
node.js
openai
peft
python
pytorch
react.js
sentence-transformers
tailwind
typescript
vite
zustand

Updates

Aadivya Raushan started this project — Mar 01, 2026 07:20 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.