BRACE: Biomechanical Real-time Adaptive Clinical Evaluation

Inspiration

Sports injuries are an epidemic hiding in plain sight. Every year, 200,000 ACL tears happen in the US alone, over 3.5 million youth athletes are treated for sports-related injuries, and concussions in contact sports remain dangerously underdiagnosed. The common thread is that most of these injuries are preventable. The warning signs are visible in how a person moves long before something tears, breaks, or gives out. But detecting those signs requires a biomechanics lab and a sports medicine assessment that can cost upwards of $300, putting preventive care out of reach for the vast majority of athletes. But, what if your own device could be that lab?

Our team members have personal experience with sports injuries. With rolled ankles from rock climbing, knee pain from running, shoulder issues from overhead pressing, and more. The frustrating part isn't the injury itself, it's that you often know something feels off but have no way to quantify it. By the time you see a specialist, the damage is done. We were also inspired by how coaches make substitution decisions in team sports, largely by guessing who looks tired. In basketball and football, fatigue-related injuries spike in the fourth quarter and late in the halfs, precisely when coaches have the least objective data to work with. We wanted to replace that guesswork with real biomechanical data, tracking every player's fatigue and form degradation simultaneously from a single camera. We saw an opportunity to combine Google's Gemini AI for intelligent understanding of player identity and injury context, Actian's VectorAI DB for persistent cross-session movement embeddings and similarity search, and Figma Make for designing intuitive data visualization dashboards that make complex biomechanical data accessible to athletes and coaches who aren't data scientists.

What it does

BRACE is an AI-powered movement analysis platform that delivers biomechanical assessment in real-time. For individuals, it provides real-time skeleton tracking at 30fps via webcam with over 12 biomechanical metrics including knee valgus, hip drop, trunk lean, angular velocity, and center of mass estimation. Users go through a conversational injury intake powered by a Gemini chat agent, where they describe past injuries in natural language like "I tore my ACL two years ago, my right shoulder clicks when I press overhead," and the system builds a structured injury profile that adapts their risk thresholds. The squat threshold that's safe for someone else triggers a warning 35% earlier for someone with ACL history. BRACE also detects fatigue before you feel it, using spectral median frequency borrowed from clinical EMG research to identify when form is degrading in real-time. Users can upload workout videos and BRACE will analyze them against their personal injury profile, flagging high-risk segments on a risk-annotated timeline. ElevenLabs TTS delivers natural-sounding voice coaching alerts like "Your knees are caving inward — push them out over your toes." A workout dashboard tracks trends over time so users can see whether their ACL risk is improving or if fatigue onset is getting later.

For coaches and teams, BRACE tracks every person in the frame simultaneously, each with their own skeleton, fatigue curve, and injury risk profile. The basketball game analysis pipeline lets coaches upload game footage, and BRACE detects jersey numbers via Gemini Vision, clusters players by team using K-Means on jersey colors, tracks possession, and assigns per-player GREEN, YELLOW, or RED risk status with pull-from-game recommendations. A cross-camera identity system maintains each player's identity and fatigue history even when switching between field cameras.

BRACE also includes a dedicated concussion sideline analysis pipeline. An iPhone mounted on a tripod captures game footage at 240fps and streams head landmark data to the backend via WebSocket. A stateful LiveCollisionDetector monitors head linear velocity and rotational velocity in real time, and when a potential impact is detected, it triggers the phone to record a high-framerate clip. The backend analyzes the clip to compute peak linear velocity in m/s and peak rotational velocity in deg/s, classifies impact location (front, side, or rear) using facial keypoint visibility, and returns a risk level of HIGH, MODERATE, or LOW with actionable recommendations like "EVALUATE NOW" or "MONITOR." A coach notification system delivers reports via webhook or email with a persistent retry queue so no alert is lost during a game.

The platform runs natively on three surfaces beyond the browser. A native iOS app captures video directly from the iPhone camera, streams frames to the backend, and renders a live skeleton overlay with per-joint acceleration charts and real-time movement quality metrics. A Meta Quest 3 VR app built in Unity captures passthrough camera frames, streams them over the same binary WebSocket protocol, and renders 3D bounding boxes with floating stats panels in the athlete's physical space. The VR app supports multi-camera switching, so it enumerates all available Quest cameras, lets the coach switch between them with a controller button press, and auto-cycles to the next camera if the current one produces black frames. All three native clients connect to the same backend at wss://ws.braceml.com, our production domain served through a Caddy reverse proxy with automatic HTTPS.

Under the hood, Actian's VectorAI DB stores persistent motion embeddings via gRPC, enabling cross-session person re-identification, movement similarity search, and activity classification, so BRACE remembers how you moved last week and can compare it to today. We used Figma Make to design the dashboard UI and data visualization layouts, ensuring that the dense biomechanical data, fatigue timelines, risk event markers, per-joint quality scores, is presented in a way that athletes and coaches can immediately act on without needing a sports science degree.

How we built it

The architecture flows from the browser to the backend and back in real-time. The Next.js 15 frontend captures webcam frames as compressed JPEG images and transmits them as binary WebSocket frames at 30fps to a FastAPI backend running Python 3.12. On the backend, YOLO11-pose runs pose estimation, then each detected person's skeleton passes through a One-Euro Filter for jitter-free pose smoothing before being normalized into a Scale/Rotation/Position-invariant coordinate frame. The SRP normalization sets the pelvis midpoint $o = \frac{1}{2}(\text{hip}_L + \text{hip}_R)$ as the origin, aligns the x-axis with the hip vector, derives the y-axis via Gram-Schmidt orthogonalization of the shoulder-pelvis direction, and scales all distances by hip width $w_h$. Each joint $p_j$ is transformed as $x_j^* = \frac{(p_j - o) \cdot \hat{x}}{w_h}$, $y_j^* = \frac{(p_j - o) \cdot \hat{y}}{w_h}$. This ensures identical movements produce identical 28-dimensional feature vectors (14 joints $\times$ 2 coordinates) regardless of body size, camera angle, or position.

Each person gets their own StreamingAnalyzer. Velocity-based motion segmentation identifies pauses between repetitions as valleys in the velocity curve using adaptive smoothing kernels and inverted-signal peak detection. Rather than comparing segments frame-by-frame, we use a phase-invariant spectral distance that compares frequency content via FFT power spectra, remaining invariant to timing variations: $D(a, b) = \frac{|\mu_a - \mu_b|_2 + |S_a - S_b|_2}{\sqrt{d}}$ where $\mu$ represents mean pose vectors, $S$ represents DC-removed FFT spectra, and $d$ is the feature dimensionality. Agglomerative clustering with average linkage groups segments at threshold 2.0, and adjacent same-cluster segments merge while tiny clusters absorb into nearest neighbors. Isolation Forest anomaly detection from scikit-learn operates on a 7-dimensional feature space (FPPA left/right, hip drop, trunk lean, BAI, curvature, jerk), retraining every 60 frames and scoring each frame from 0 to 1. UMAP dimensionality reduction provides a 2D visualization of the high-dimensional movement space so users can see their movement clusters in real time. The results stream back as JSON over the WebSocket, and the frontend renders a 60fps canvas overlay on an HTML5 Canvas using receipt-time interpolation, thus achieving twice the rendering framerate of the camera input for buttery-smooth skeleton animation.

Per-frame biomechanical analysis runs 12+ clinical metrics simultaneously. The Frontal Plane Projection Angle measures knee valgus as $\theta_{\text{FPPA}} = \arctan!\left(\frac{|d_\perp|}{\frac{1}{2}|h - a|}\right)$, a strong ACL tear predictor with activity-specific thresholds (squat: 12°/20°, running: 10°/18°). Hip drop tracks pelvic obliquity via $\theta_{\text{drop}} = \arcsin!\left(\frac{\Delta y_{\text{hips}}}{w_h}\right)$, indicating gluteal weakness. Trunk lean measures lateral deviation as $\theta_{\text{trunk}} = \arccos!\left(\frac{t \cdot \hat{v}}{|t|}\right)$ with exercise-specific thresholds ranging from 8°/15° for planks to 25°/40° for deadlifts. The Bilateral Asymmetry Index quantifies side-to-side imbalances: $\text{BAI} = \frac{|L - R|}{\max(|L|, |R|)} \times 100\%$. Angular velocity flags ballistic movement risk: $\omega_j = |\theta_j^{(t)} - \theta_j^{(t-1)}| \cdot f_s$. Center of mass sway uses Winter's (1990) anthropometric model with segment-weighted positions: $c = \frac{\sum_s m_s \cdot r_s}{\sum_s m_s}$ where trunk mass accounts for 49.7% of body weight.

We built a pluggable dual-pipeline architecture with a Legacy backend using YOLO11-pose with 17 COCO keypoints in 2D, and an Advanced backend using YOLO11 detection plus RTMW3D with 133 keypoints, real 3D depth, BoT-SORT tracking via BoxMOT, and HybrIK for SMPL body model estimation. Switching between them requires only a single environment variable change. Fatigue detection uses an EWMA control chart with a smoothing constant of 0.2, where each observation is exponentially weighted against the running average and compared to an upper control limit derived from the process mean and standard deviation. CUSUM change-point detection accumulates small deviations over time and alarms when the cumulative sum exceeds four standard deviations, catching gradual form degradation that single-frame thresholds would miss. Movement smoothness is quantified by SPARC (Balasubramanian 2012), which measures the arc length of the normalized frequency spectrum. More negative values indicate jerkier, less controlled motion. LDLJ (Balasubramanian 2015) computes the log-dimensionless jerk integral, normalizing by movement duration and peak velocity to enable cross-exercise comparison. Kinematic chain sequencing uses Kendall's tau rank correlation scored 0 to 1 against the ideal hip-to-knee-to-ankle activation order. Sample entropy measures movement complexity and regularity, where higher values indicate erratic, fatigued motion. The composite fatigue score weights all of these:

$$ F = 0.25 \cdot \text{ROM} + 0.20 \cdot \text{EWMA} + 0.20 \cdot \text{CUSUM} + 0.15 \cdot \text{corr} + 0.08 \cdot \text{SPARC} + 0.07 \cdot \text{LDLJ} + 0.05 \cdot \text{spread} $$

The frontend uses a mixed state management pattern to avoid re-rendering at 60fps. React state handles low-frequency updates like phase changes and cluster counts throttled to about 4 per second, while refs hold the high-frequency subject data that the canvas reads directly. We discovered that video.currentTime updates in discrete 33ms jumps causing skeleton jitter, so we switched to performance.now() receipt timestamps for smooth interpolation and built a DelayedVideoCanvas that buffers video frames and replays them delayed by EMA-smoothed round-trip time to align the skeleton with the displayed frame.

Gemini 2.0 Flash plays three distinct roles throughout the system. First, it powers the conversational injury intake agent, extracting structured injury profiles from natural language descriptions and mapping them to specific biomechanical threshold adjustments. Second, it handles real-time activity classification, identifying what exercise a person is performing so the system can load movement-specific risk thresholds and coaching cues. Third, Gemini Vision detects jersey numbers and team colors from cropped player images during basketball game analysis, providing the identity signal that solves player fragmentation across camera cuts. All Gemini calls are lazy-initialized, cached, and rate-limited to 2-second intervals to keep costs at roughly $0.00011 per call.

Actian's VectorAI DB serves as the persistent vector store via gRPC, handling three key functions: cross-session person re-identification by storing and matching CLIP-ReID appearance embeddings, movement similarity search that finds past movements most similar to what a user is currently doing, and vector-based activity classification using stored motion embedding clusters. The system degrades gracefully when VectorAI is unavailable, falling back to in-memory matching.

The concussion pipeline runs as a separate FastAPI router that accepts 240fps video clips via POST or live 60fps landmark streams via WebSocket. The LiveCollisionDetector maintains a state machine with configurable thresholds. A velocity threshold of 7.0 m/s triggers collision detection, and the collision ends when velocity drops below 60% of the threshold for at least 120ms to avoid re-triggering. When a collision is detected on the live stream, the backend sends a control message instructing the iPhone to clip the high-framerate recording. The ConcussionAnalyzer then processes the clip to extract peak impact kinematics and classify risk level.

All native clients, iOS, Quest VR, and the web frontend, share the same binary WebSocket protocol: an 8-byte little-endian timestamp followed by a JPEG frame. This unified protocol means the backend doesn't need to know what kind of device is sending frames. The iOS app is built in SwiftUI with AVFoundation for camera capture, and displays live FPS meters for capture rate, send rate, receive rate, and round-trip latency. The Quest 3 VR app is built in Unity with C# scripts handling passthrough camera permissions (HorizonOS-specific), frame capture at 1280x960 downscaled to 480p, brightness monitoring to detect black frames, and bounding box rendering via LineRenderer quads in world space.

The broader tech stack spans Next.js 15 with React 19, TypeScript, and Three.js on the web frontend, SwiftUI and AVFoundation for iOS, Unity and C# for Quest VR, FastAPI with Python 3.12 on the backend, YOLO11-pose and RTMW3D and BoT-SORT and CLIP-ReID for ML and vision, ElevenLabs for TTS voice coaching, MongoDB 7.0 for persistence, TensorRT FP16 for GPU inference optimization, Caddy for reverse proxy and HTTPS termination at braceml.com, and Docker Compose with multi-profile hardware auto-detection for deployment. The project includes 626 tests across 32 test files.

Challenges we ran into

The hardest challenge was player identity fragmentation in basketball game analysis. A 70-second NBA clip produced over 30 subject IDs for only about 10 actual players. Every camera angle change resets the tracker, creating entirely new track IDs and duplicate identities for every visible player. We solved this by using Gemini Vision to detect jersey numbers and team colors from cropped player images, feeding that information back into the identity resolver, and building a merge mechanism that retroactively combines fragment subjects when jersey detection reveals duplicates. Short-lived fragments under one second are filtered from final results. This reduced 30+ fragmented IDs down to stable 10-player identities.

VR integration with the Meta Quest 3 presented a cascade of challenges. The full response payload of roughly 600KB per second per subject, multiplied by 5 subjects at 30fps, exceeded reasonable bandwidth for VR over Tailscale VPN. We built an adaptive response format where a ?client=vr query parameter triggers a 10x bandwidth reduction, unselected subjects receive only bounding box coordinates and a label at about 80 bytes each. We also had to handle coordinate system mismatches between computer vision conventions where y=0 is at the top and Unity's convention where y=0 is at the bottom. The 130-260ms end-to-end pipeline latency from camera capture through network transmission to GPU inference and back was acceptable for web overlays but too high for head-tracked 3D skeleton rendering, so we made the architectural decision to use bounding boxes with floating stats panels instead of skeleton overlays in VR.

Skeleton-video alignment was a persistent challenge throughout development. The skeleton overlay would visibly lag behind the video by the pipeline round-trip time. We solved this with the DelayedVideoCanvas that buffers video frames and replays them delayed to match the skeleton timing, combined with receipt-time interpolation at 60fps. Getting the z-index stacking right, video element, then delayed canvas, then skeleton canvas, then UI controls, while keeping Chrome's autoplay policy happy required careful CSS engineering, since the video element must remain visible or Chrome blocks autoplay entirely.

Supporting both NVIDIA GPU machines with CUDA 12.8 and TensorRT FP16 and Mac or CPU-only machines with Python 3.12 slim and CPU PyTorch from the same docker-compose.yml required a multi-profile system with hardware auto-detection via a shell script. Bind-mounting over 20 Python files for hot-reload during development while maintaining correct import paths across Docker and local development environments led to a try/except ImportError pattern throughout the codebase, where every cross-module import attempts the Docker flat path first and falls back to the local nested path.

Getting the Meta Quest 3 passthrough camera to work was a challenge in itself. The HorizonOS camera permission system is different from standard Android, requiring horizonos.permission.HEADSET_CAMERA with a fallback to android.permission.CAMERA. Once we had camera access, we discovered that some cameras on the Quest would intermittently produce entirely black frames. We solved this with a brightness monitoring system that samples every frame and auto-cycles to the next available camera if black frames persist for more than 10 consecutive captures. The multi-camera switching also required careful state management to avoid sending stale frames during the transition.

Building the iOS concussion capture app required balancing framerate against reliability. AVFoundation can capture at 240fps, but streaming raw frames at that rate over a network is impractical. We solved this with a two-tier approach: the app continuously streams head landmark positions at 60fps for real-time collision monitoring, and only when the backend detects a potential impact does it instruct the phone to record a short high-framerate clip for detailed analysis. The upload queue persists to disk so clips aren't lost if network connectivity drops during a game.

What we learned

We learned that spectral analysis techniques from clinical research translate remarkably well to real-time movement assessment when adapted for streaming data with sliding windows. SPARC, LDLJ, sample entropy, and spectral median frequency were originally designed for offline analysis of EMG signals and robotic movements, but with careful windowing and incremental computation they run comfortably at 30fps. The key insight was that the phase-invariant spectral distance $D(a,b) = \frac{|\mu_a - \mu_b|_2 + |S_a - S_b|_2}{\sqrt{d}}$ lets us compare movement segments without worrying about timing variations between repetitions, a property that's essential for unsupervised clustering but that naive frame-by-frame distance metrics don't provide.

Identity resolution turned out to be the hardest problem in multi-person tracking. Appearance-based re-identification alone fails completely at scene cuts because the tracker resets and loses all history. But combining appearance matching with domain-specific signals like jersey numbers makes the problem tractable. This taught us that the best ML systems often combine general-purpose models with domain-specific heuristics rather than trying to solve everything with a single model.

We also learned that browser APIs have surprising limitations that aren't documented well. The video.currentTime property doesn't update smoothly, it jumps in discrete steps that cause visible jitter in any overlay system. Chrome silently blocks autoplay if a video element has zero opacity, which broke our initial skeleton overlay approach. WebSocket binary frames need careful backpressure management to prevent the backend from overwhelming slow clients. These kinds of platform-level gotchas consumed significant debugging time.

Building for three native platforms simultaneously taught us the value of a unified protocol. Because the iOS app, Quest VR app, and web frontend all speak the same binary WebSocket format (8-byte timestamp + JPEG), the backend required zero platform-specific code. Every improvement to the analysis pipeline immediately benefited all clients. This also meant we could test the full pipeline from any device without maintaining separate backend branches.

Finally, we were surprised by how effective Gemini Vision is at reading jersey numbers from low-resolution cropped player images. We expected to need a custom-trained OCR model, but Gemini Pro handled it well enough to be the primary identity signal for our basketball pipeline, which significantly reduced our development time for that feature.

What's next for BRACE

The next step is clinical validation. Partnering with athletic training programs to validate our injury risk thresholds against real clinical outcome data. Our biomechanical metrics are based on published clinical research, but the thresholds we use for real-time alerting need to be validated in controlled studies with ground-truth injury outcomes. We also plan to open-source the streaming biomechanics pipeline so researchers and developers can build on it. Longer-term, we want to add longitudinal multi-week injury risk tracking with Gemini-generated progress reports, expand the concussion pipeline with validated clinical thresholds from NFL and NCAA impact data, and integrate with wearable IMU data from smartwatches to combine camera-based biomechanics with accelerometer signals for higher-fidelity fatigue detection.