Inspiration

During a summer internship on Google's TPU Architecture team, one of our team members observed how unpredictable periods of inactivity in AI workloads caused costly power transients that threatened processor performance and even brownouts. As AI training scales to hundred-thousand-GPU clusters, these synchronized power spikes destabilize electrical grids and waste energy through expensive buffering infrastructure. We realized there was an opportunity to predict and prevent these transients rather than just react to them.

What it does

SoftCap predicts power consumption patterns in GPU training workloads and proactively schedules lightweight secondary tasks to smooth out power spikes and dips before they happen. By learning the repeating cycles of compute-heavy and communication-light phases in AI training, our system forecasts when power transients will occur and fills those valleys with carefully timed workloads—flattening the overall power curve at the rack level.

How we built it

We used a three-GPU testbed using RTX 3070s and developed a full hardware-software stack. On the hardware side, we designed a custom power sensing board using a TI INA290 current amplifier for high-resolution telemetry beyond what NVIDIA's APIs provide. On the software side, we implemented an autocorrelation-based prediction algorithm that learns power patterns in real-time, a scheduling system using NVIDIA's Multi-Process Manager to inject secondary workloads, and a Node.js dashboard with Prometheus for live monitoring. The entire control loop operates over Ethernet using UDP for low-latency communication.

Challenges we ran into

Achieving accurate real-time prediction was difficult—training workloads are periodic but not perfectly regular, and GPU-to-GPU variability introduced noise. Synchronizing injected workloads to precisely overlap with predicted power dips required careful tuning. We also struggled with hardware integration, including designing reliable current sensing circuits and developing drivers for sub-millisecond telemetry streaming. Balancing prediction accuracy with computational overhead on the host CPU was an ongoing challenge throughout development.

Accomplishments that we're proud of

We achieved an 88.6% reduction in peak-to-peak power amplitude and a 96.1% reduction in power oscillation intensity on controlled GPUs compared to baseline. Our predictions accurately tracked actual power consumption with sub-second precision. We successfully built a working end-to-end system—from custom PCB hardware to prediction algorithms to live monitoring—and demonstrated it works on real GPU training emulations. The entire platform costs under $50 per node in hardware.

What we learned

We gained deep insight into power delivery challenges in modern datacenters, learned to design closed-loop control systems with real-time constraints, and developed skills in embedded hardware design, low-latency networking, and time-series prediction. We also learned about datacenter power quality standards (IEEE 519, TIA-942, IEC 61000) and how software-level scheduling can meaningfully impact physical infrastructure stress. Most importantly, we learned how to bridge multiple disciplines—power electronics, operating systems, machine learning, and networking—to solve a complex real-world problem.

What's next for SoftCap

We plan to integrate our custom hardware more tightly with the scheduling algorithm for sub-millisecond mitigation response times, which will require developing custom Ethernet drivers and embedded control systems. We want to replace artificial filler workloads with actually useful tasks like speculative inference or data preprocessing. We'll also build better emulation of distributed training with multi-node AllReduce patterns, extend SoftCap to multi-rack orchestration, and integrate with production cluster schedulers like Kubernetes. Our ultimate goal is to deploy SoftCap at datacenter scale and demonstrate its impact on real AI training infrastructure.

Built With

Share this project:

Updates