Inspiration
Access to AI training infrastructure remains a critical barrier for researchers, students, and communities in developing regions. We built DataForAll to democratize machine learning by creating a decentralized platform where anyone can contribute data, collaboratively train models, and access AI tools - regardless of their technical background or resources.
What it does
DataForAll is a collaborative AI training platform that features:
- Community-driven data contribution: Users upload datasets for missions like crop disease detection or environmental monitoring
- On-demand GPU training: Automatically provisions cloud GPU instances (Lambda Labs H100) to fine-tune models like SmolVLM on contributed datasets
- Real-time training monitoring: Live dashboards show training progress, metrics, and resource utilization
- Model sharing: Trained models are published to Hugging Face Hub for immediate use via REST API
- Gamified contributions: Leaderboards and mission-based challenges incentivize quality data contributions
How we built it
We architected a full-stack distributed system: Frontend: React + Vite with Three.js for 3D visualizations, Framer Motion for animations, and real-time WebSocket updates
Backend: FastAPI with async PostgreSQL (SQLAlchemy), JWT authentication, and S3-compatible object storage (Vultr)
Infrastructure: Kubernetes cluster on Vultr for API deployment, container registry, and database replication
GPU orchestration: Dynamic provisioning of Lambda Labs H100 instances via REST API, configured over SSH (Paramiko)
ML pipeline: PyTorch + Hugging Face ecosystem (Transformers, PEFT, Accelerate) for fine-tuning vision-language models with QLoRA
Training workers: Dockerized GPU workers that pull datasets from S3, train models, stream logs via WebSocket, and push results to Hugging Face Hub
Challenges we ran into
Our biggest challenge was GPU provisioning. We initially designed the system around Vultr Cloud GPUs, building HTTP-based orchestration between our Kubernetes cluster and on-demand GPU instances. However, Vultr's GPU plans required manual account approval via support ticket - incompatible with a hackathon timeline. We pivoted to Lambda Labs mid-development, which meant:
- Rewriting the provisioning layer (Lambda's API differs significantly from Vultr's)
- Implementing SSH-based configuration (Lambda doesn't support cloud-init/user-data like Vultr)
- Handling multi-region fallback logic when H100 capacity was unavailable
- Debugging SSL handshake issues between the Kubernetes cluster and ephemeral GPU workers The migration cost us 8+ hours but taught us valuable lessons about cloud provider abstractions and resilient system design.
Accomplishments that we're proud of
End-to-end automation: From data upload to trained model deployment—fully automated with zero manual intervention Real production deployment: Running on Kubernetes with managed PostgreSQL, object storage, and container registry Successful fine-tuning: Trained SmolVLM-256M on real crop disease datasets with QLoRA on H100 GPUs Resilient architecture: Graceful handling of GPU provisioning failures, training crashes, and network interruptions Beautiful UX: Interactive 3D globe, smooth animations, and real-time training visualizations
What we learned
Cloud GPU availability is unpredictable—always have a fallback provider SSH-based configuration is more fragile than cloud-init but necessary for some platforms WebSocket connections require careful lifecycle management in distributed systems QLoRA enables fine-tuning large vision-language models on consumer GPUs (we tested locally on RTX 4060 Mobile) Kubernetes adds complexity but pays dividends for multi-service orchestration
What's next for DataForAll
- Federated learning: Enable privacy-preserving training where data never leaves contributors' devices
- Model marketplace: Let users monetize their trained models or datasets
- Multi-modal support: Expand beyond vision to audio, time-series, and tabular data
- Community governance: Implement DAO-style voting for mission priorities and resource allocation
Built With
- docker
- huggingface
- jwt
- kubernetes
- lambda
- lambdalabs-face
- nextjs
- node.js
- postgresql
- python
- react
- sqlalchemy
- tailwind
- typescript
- vite
- vultr
Log in or sign up for Devpost to join the conversation.