Bio-Tensor MD

Inspiration

Molecular simulations are a cornerstone of modern drug discovery and therapeutics, but they are currently bottlenecked by power-hungry, generalized GPUs. We asked a fundamental question: "What if molecular physics was a feature of the silicon itself?" We wanted to stop simulating the math in software and start wiring it directly into gates, making protein folding and energy minimization as native to a microchip as basic addition.

What it does

Bio-Tensor MD is a hardware-accelerated "GPS for Molecules." You feed it a messy, high-energy atomic structure, and the chip's custom circuitry physically pulls the atoms into their most stable, "relaxed" equilibrium. It operates as a dedicated Molecular Dynamics (MD) minimization engine, calculating bonded forces (stretches, angles, and torsions) and non-bonded forces (Coulomb electrostatics and Lennard-Jones potentials) in parallel to perform gradient descent directly on the silicon.

How we built it

To fit complex molecular physics into a constrained ASIC footprint, we engineered a custom datapath using Q16.16 fixed-point arithmetic, entirely avoiding the massive overhead of floating-point cores.

The architecture is driven by a strict three-phase Finite State Machine (FSM):

**Phase 1 (Bonded Forces): A "Sliding Window" memory system feeds atomic coordinates to parallel Bond, Angle, and Dihedral cores like a conveyor belt, calculating localized geometry restraints.

**Phase 2 (Non-Bonded Forces): An optimized nested-loop pipeline evaluates the O(N^2) atomic cloud, calculating distance-based Coulombic and Lennard-Jones forces.

**Phase 3 (Gradient Update): The FSM acts as a physical optimizer, moving the atoms along the negative gradient of the potential energy. We implemented a hardware-level annealing schedule (learning rate decay) to step the atoms closer to their local energy minima without overshooting.

Challenges we ran into

Pipeline Synchronization: Our physics cores have wildly different computational latencies—calculating a Dihedral torsion angle takes much longer than a simple Bond stretch. Designing a multi-phase FSM to orchestrate these parallel pipelines and using shift registers to align the data so Newton's 3rd Law (equal and opposite forces) applied on the exact right clock cycle was a massive headache.

Streamlining the Non-Bonded Pipeline: For our Phase 2 O(N^2) non-bonded cloud, we couldn't afford to stall the engine waiting for the heavy Lennard-Jones and Coulomb math to finish. Streamlining this into a deep, fully pipelined architecture so it could output one complete force calculation every single clock cycle took a massive amount of time.

Fixed-Point Physics & Singularities: Translating chaotic 3D physics into Q16.16 fixed-point logic introduced severe edge cases. Our gradient descent step-size kept truncating to absolute zero (causing a "Deep Freeze"), and whenever atoms formed a perfectly straight line, our vector cross-products collapsed to zero, trapping the hardware in Gimbal Lock. We had to hardwire mathematical floors and pre-emptive orthogonal "kicks" to keep the molecules moving.

Accomplishments that we're proud of

Bridging Disciplines in Silicon: We are incredibly proud of successfully merging biology, chemistry, and physics directly into hardware. Researching complex molecular interactions—like Lennard-Jones potentials, Coulombic electrostatics, and dihedral torsions—and figuring out how to translate theoretical chemistry equations into physical logic gates was a massive, rewarding leap outside our computer engineering comfort zone.

Advanced Fixed-Point Architecture: Engineering heavy operations like inverse square roots, 3D vector cross-products, and CORDIC Atan2 approximations using only Q16.16 fixed-point logic.

Rigorous Verification to a Working Baseline: Translating chaotic molecular physics into digital logic meant we couldn't just guess if our pipelines worked. We built a massive suite of testbenches to meticulously verify every force calculation and FSM state, ultimately delivering a fully synthesizable baseline engine that actually minimizes protein geometries.

What we learned

Data Movement is the Real Boss Fight: We discovered that writing the math in Verilog is actually the easy part. Routing the exact 3D coordinates, atomic charges, and spring constants to the right physics core at the precise nanosecond is incredibly difficult. Managing that massive data flow across parallel pipelines taught us exactly why memory bandwidth is the ultimate bottleneck in chip design.

The Area vs. Frequency Tug-of-War: This was our first time building a project where we couldn't just throw more memory or processing power at a bug. We learned how to constantly balance logic depth against our maximum clock frequency, tearing down and redesigning heavy math blocks to achieve timing closure.

The Unforgiving Nature of Fixed-Point Math: We quickly learned that in hardware Q16.16 fixed-point arithmetic, you have to meticulously track bit growth, precision loss, and truncation at every step. Debugging our gradient descent "deep freeze" taught us firsthand how abstract numerical limits physically manifest in silicon.

What's next for Bio-Tensor MD

Full-Physics Engine & Implicit Solvation: To transition this into a fully functional ASIC capable of simulating protein settling, the architecture must be expanded. We plan to integrate Coulombic forces to accurately model critical electrostatic interactions like salt bridges. Furthermore, we will implement Generalized Born (GB) implicit solvation, which approximates the solvent as a continuum without simulating explicit water molecules. GB is the standard for accelerating folding simulations on hardware, allowing massive net speedups while keeping the physics realistic.

Autonomous Settling for Drug Discovery: A simple constant-energy simulation will not settle; the protein will merely oscillate. We aim to build a hardware-native Langevin Thermostat, which mimics solvent viscosity and Brownian motion to create a "heat bath" that allows the protein to explore conformations and settle into a true free energy minimum. Achieving this rapid, autonomous settling in silicon unlocks the potential for ultra-fast drug discovery, allowing us to screen how thousands of small-molecule therapeutics physically dock into protein pockets in real-time.

Scaling for Plasma Membrane Simulations: Calculating interactions between every pair of atoms is computationally inefficient for systems larger than a few hundred atoms. To simulate massive, highly complex biological environments—such as a receptor protein interacting with a lipid plasma membrane—we will upgrade the memory architecture to support on-chip Cell Linked Lists and Verlet Lists. By only considering atoms within a specific cutoff radius, a Neighbour List reduces the computational complexity to O(N). This will allow us to tile the ASIC and process massive trans-membrane interactions without stalling the force pipelines.

Built With

systemverilog
verilog

Updates

Nisarg Saha started this project — Feb 22, 2026 01:24 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.