Traditional hedging models (like Black-Scholes-Merton) rely on restrictive assumptions: continuous trading, zero transaction costs, and constant volatility. In real markets, these assumptions fail, leading to significant hedging errors and transaction costs.
This project implements a "Deep Hedging" agent using Distributional Reinforcement Learning (TQC). The agent learns to hedge a European Call Option autonomously in a realistic environment featuring:
- Heston Stochastic Volatility: Simulating realistic market regimes and "volatility clustering."
- Transaction Costs: Penalizing excessive trading (forcing the agent to balance risk vs. cost).
- Discrete Rebalancing: The agent must make decisions at fixed intervals, unlike continuous-time theoretical models.
Standard RL (like PPO or DDPG) optimizes for the mean expected reward. However, risk management requires controlling the tails of the distribution.
- We utilize TQC, a distributional variant of Soft Actor-Critic.
- The agent predicts a full distribution of future returns (Quantiles) rather than a single scalar value.
- This allows the agent to be explicitly risk-averse regarding "Black Swan" events.
Markets are path-dependent. A simple Feed-Forward network cannot capture the momentum or volatility regime.
- Input: Rolling window of market state (Price, Time-to-Maturity, Volatility, Current Holdings).
- Network: A Transformer Encoder (Self-Attention) processes the temporal history to detect market regimes before passing features to the policy network.
The agent was trained for 100,000 steps and benchmarked against a theoretical Black-Scholes Delta Hedger.
The agent achieved a highly stable P&L distribution, effectively neutralizing market risk while paying a minimal "insurance premium" (transaction costs).
Above: The P&L distribution is centered near zero with low variance, indicating successful hedging.
| Metric | Value | Interpretation |
|---|---|---|
| Mean P&L | -$4.57 |
The average cost of transaction fees to maintain the hedge. |
| Std Dev (Risk) | $14.19 |
The residual risk (variance) left in the portfolio. |
The agent learned to smooth out the "jagged" trading patterns of the theoretical model, reducing transaction costs by ignoring minor market noise.
- Python 3.8+
- PyTorch with CUDA support (recommended)
pip install gymnasium stable-baselines3 sb3-contrib torch numpy pandas matplotlib scipyTo train the TQC agent from scratch on the Heston environment:
python train.pyTo load the trained agent and run the benchmark comparison against Black-Scholes:
python evaluate.py├── assets/ # Images for README
├── deep_hedging_agent.zip # The trained TQC model
├── hedging_env.py # Custom Gymnasium Environment (Heston + Friction)
├── model.py # Transformer Feature Extractor class
├── train.py # Training script
└── evaluate.py # Benchmarking and Visualization script
- Deep Hedging - Buehler et al. (2018), JP Morgan AI Research.
- Heston Model - A stochastic volatility model used to price options.
- Stable Baselines3 - Reliable Reinforcement Learning implementations in PyTorch.
Created by Dhyan

