ml-alpha

Machine learning models for cross-sectional stock return prediction. Replicates and extends the feedforward neural network benchmark from Gu, Kelly, and Xiu (2020), and introduces a cross-sectional Transformer that learns stock-stock interactions via self-attention. Also experiments with the MSRR loss function from Kelly et al. (2025) that directly optimizes portfolio Sharpe ratio.

Models

Model	Parameters	Features	Loss	Notes
NN5 (FFN)	~30K	937 (95 signals + 8 macro + 760 interactions + 74 dummies)	MSE	Replication of GKX (2020)
Cross-Sectional Transformer	~14K	169 per stock + 8 macro	MSE	Self-attention across stocks
MSRR Transformer	~14K	169 per stock + 8 macro	MSRR	Same architecture, portfolio-optimized loss

Results

MSE Models (2012-2019, L/S decile Sharpe)

Model	Avg OOS R²	Avg IC	Avg L/S %/mo	Avg Sharpe	Positive years
NN5 (FFN)	-0.55%	+0.014	+0.91%	+1.63	7/8
Transformer (MSE)	-0.08%	+0.021	+1.53%	+2.16	8/8

MSE vs MSRR Transformer (2016-2019)

Model	Portfolio	Avg Sharpe	Best Year	Worst Year
Transformer (MSE)	L/S decile sort	+1.81	2017 (+3.07)	2016 (+0.58)
Transformer (MSRR)	SDF (direct weights)	+2.05	2016 (+3.03)	2018 (+0.82)

The MSRR loss directly optimizes portfolio Sharpe ratio instead of return prediction accuracy. Same architecture, same data — only the loss function differs. The SDF portfolio uses model outputs as portfolio weights directly (scale-invariant Sharpe).

Factor Attribution (FF5 + Momentum, 2016-2019)

Portfolio	Annual Alpha	t-stat	Significant?
MSRR SDF	9.11%	5.34	Yes
MSRR L/S	10.83%	4.26	Yes
FFN L/S	5.96%	1.07	No

The MSRR Transformer generates highly significant alpha unexplained by Fama-French 5 + Momentum factors. Only 33% of the SDF portfolio return is attributable to known risk factors. The FFN's returns are largely explained by existing factors (particularly profitability/RMW).

Full 19-year results (2001-2019) for the FFN are in MSE_ind_1yr_report.md (average Sharpe +1.07).

Reports

MSE_ind_1yr_report.md — Full replication report for the NN5 FFN (2001-2019)
Transformer_report.md — Cross-sectional Transformer report (2012-2019) with head-to-head comparison
MSRR_Transformer_report.md — MSRR loss experiment (2016-2019), proof-of-concept

Code

train_nn.py — Data pipeline, FFN training, 8-experiment grid configurations
train_transformer.py — Cross-sectional Transformer with MSE loss
train_transformer_msrr.py — Cross-sectional Transformer with MSRR loss (Kelly et al. 2025)

Data (Not Included)

The data files are not included in this repository due to licensing restrictions. You need to obtain them separately:

Stock characteristics (95 signals) and returns — derived from CRSP/Compustat via the 94 signals defined in Green, Hand, and Zhang (2017) and extended in GKX (2020). Available with a WRDS subscription. The processed signal files should be placed in gkx_full/.
Sector mapping (gkx_full/sector_mapping.csv) — SIC 2-digit industry codes per PERMNO, from CRSP.
Welch-Goyal macroeconomic predictors (welch_goyal_2024.xlsx) — publicly available from Amit Goyal's website: http://www.hec.unil.ch/agoyal/

The expected data layout is documented in MSE_ind_1yr_report.md section 10.2.

Requirements

Python 3.10+
PyTorch 2.0+ (CUDA)
numpy, pandas, scipy, openpyxl
NVIDIA GPU with ≥16GB VRAM (tested on RTX 4080 SUPER)

Running

# FFN (NN5), MSE_ind_1yr configuration
python train_nn.py

# Cross-sectional Transformer (MSE loss)
python train_transformer.py

# Cross-sectional Transformer (MSRR loss)
python train_transformer_msrr.py

Edit Config / TransformerConfig / MSRRConfig at the top of each file to change test years, hyperparameters, etc.

Citation

This work builds on:

Gu, S., Kelly, B., & Xiu, D. (2020). Empirical Asset Pricing via Machine Learning. The Review of Financial Studies, 33(5), 2223-2273. https://doi.org/10.1093/rfs/hhaa009

Kelly, B.T., Kuznetsov, B., Malamud, S., & Xu, T.A. (2025). Artificial Intelligence Asset Pricing Models. NBER Working Paper 33351.

If you use this code, please cite the original papers.

Disclaimer

This repository is for research and educational purposes only. It is not investment advice. Past performance does not guarantee future results. The authors of this repository are not affiliated with Gu, Kelly, or Xiu.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
MSE_ind_1yr_report.md		MSE_ind_1yr_report.md
MSRR_Transformer_report.md		MSRR_Transformer_report.md
README.md		README.md
Transformer_report.md		Transformer_report.md
gkx_paper_nn_settings.md		gkx_paper_nn_settings.md
nn_all_parameters.md		nn_all_parameters.md
train_nn.py		train_nn.py
train_transformer.py		train_transformer.py
train_transformer_msrr.py		train_transformer_msrr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-alpha

Models

Results

MSE Models (2012-2019, L/S decile Sharpe)

MSE vs MSRR Transformer (2016-2019)

Factor Attribution (FF5 + Momentum, 2016-2019)

Reports

Code

Data (Not Included)

Requirements

Running

Citation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ml-alpha

Models

Results

MSE Models (2012-2019, L/S decile Sharpe)

MSE vs MSRR Transformer (2016-2019)

Factor Attribution (FF5 + Momentum, 2016-2019)

Reports

Code

Data (Not Included)

Requirements

Running

Citation

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages