Skip to content

nicholas-yap/S-P100FinancialAdvisorBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

S&P 100 Financial Advisor Bot

An AI-powered financial advisor that analyses S&P 100 stocks using deep learning, signal processing, and natural language processing to generate data-driven investment recommendations. Built as a dual-service architecture with a Node.js web frontend and a Python ML backend.

Features

  • Temporal Fusion Transformer (TFT): Multi-horizon stock price forecasting with interpretable attention, producing 20-day return predictions and BUY/SELL/HOLD signals
  • Variational Mode Decomposition (VMD): Decomposes price signals into 5 intrinsic mode functions to separate noise from trend
  • 75 Technical Indicators: Covering trend, momentum, volatility, volume, statistical, and VMD-derived features
  • FinBERT Sentiment Analysis: Score financial headlines from Yahoo Finance and Google News RSS
  • Time-Varying Daily Sentiment: Daily FinBERT scores merged as a time-varying TFT feature (not static per-symbol) for capturing intra-day sentiment shifts
  • Historical Sentiment: Pre-computed FinBERT scores from Kaggle and financial news datasets for lookahead-free backtesting
  • Mean-Variance Optimisation (MVO): Markowitz portfolio construction with Ledoit-Wolf shrinkage (max Sharpe, min volatility, risk parity, efficient risk)
  • Prediction-Accuracy Backtesting: Walk-forward historical simulation with full UI for running, reviewing, and comparing backtest runs
  • SQLite Persistence: Cached predictions, backtest history, and walk-forward simulation results for fast retrieval and run comparison
  • Batch Prediction: Pre-compute predictions for all S&P 100 stocks via CLI script
  • Portfolio Recommender: Automated optimal portfolio construction with 6-month walk-forward simulated rebalancing and Chart.js-powered visualisations
  • Prediction Caching Fast-Path: Single-stock predictions are cached in SQLite for instant retrieval within a configurable freshness window
  • LLM Chat Agent: Ollama-powered conversational interface with ReAct-style tool calling for natural language stock analysis
  • Comprehensive Test Suite: 118 Python (pytest) + 12 Node.js (Jest) unit and integration tests

Architecture

┌──────────────────────┐         ┌──────────────────────────────────┐
│   Browser (Web UI)   │         │   Python ML Engine (Port 8000)   │
│                      │         │                                  │
│  • Dashboard         │  HTTP   │  • Data Fetcher (yfinance)       │
│  • AI Chat           │────────▶│  • S&P 100 (Wikipedia scraping)  │
│  • Backtesting       │         │  • VMD Decomposition             │
│  • Portfolio Recom.  │         │  • 75 Technical Indicators       │
└──────────┬───────────┘         │  • FinBERT Sentiment (PoE)       │
           │                     │  • TFT Forecasting               │
┌──────────▼───────────┐         │  • MVO Portfolio Optimiser        │
│  Node.js (Port 3000) │────────▶│  • Prediction Backtest Engine     │
│                      │         │  • Walk-Forward Simulation Cache  │
│  • Express API       │         │  • SQLite Persistence            │
│  • S&P 100 Scanner   │         └──────────────────────────────────┘
│  • ML Proxy          │
│  • Ollama LLM Agent  │
│  • Predictions Proxy │
└──────────────────────┘

Prerequisites

  • Node.js ≥ 18
  • Python ≥ 3.10
  • Ollama API Key: Required for the LLM chat features (ollama.com)

Setup

1. Clone the Repository

git clone https://github.com/nicholas-yap/financialAdvisorBot.git
cd financialAdvisorBot

2. Install Node.js Dependencies

npm install

3. Set Up the Python Virtual Environment

python3 -m venv venv
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows
pip install -r ml_engine/requirements.txt

4. Configure Environment Variables

Create a .env file in the project root:

OLLAMA_API_KEY=your_ollama_api_key_here
ML_ENGINE_URL=http://localhost:8000
PORT=3000
Variable Default Description
OLLAMA_API_KEY (none) API key for the Ollama LLM cloud service
ML_ENGINE_URL http://localhost:8000 URL of the Python ML backend
PORT 3000 Port for the Node.js Express server
TRAINING_END ~18 months ago Training data cutoff date (YYYY-MM-DD)
VALIDATION_END ~6 months ago Validation data cutoff date (YYYY-MM-DD)

5. (Optional) Download Historical News Dataset

For sentiment-aware backtesting, download a financial news dataset:

source venv/bin/activate
python -m ml_engine.data.download_news_dataset

Or place the Kaggle Massive Stock News Analysis DB CSV at ml_engine/data/kaggle/analyst_ratings_processed.csv, then pre-compute sentiment:

python -m ml_engine.data.precompute_sentiment

Running the Application

Start the Python ML Engine

source venv/bin/activate
uvicorn ml_engine.main:app --port 8000 --reload

Start the Node.js Server

In a separate terminal:

node server.js

(Optional) Batch Predict All S&P 100 Stocks

Pre-compute predictions for fast dashboard loading:

source venv/bin/activate
python -m ml_engine.batch_predict

Available arguments:
  --symbols SYMBOLS [SYMBOLS ...]   Specific tickers to predict (default: all S&P 100)
  --retrain                         Force retrain the TFT model even if a cache exists
  --max-epochs MAX_EPOCHS           Max TFT training epochs (default: 15)

Access the Application

Page URL
Dashboard http://localhost:3000
AI Chat http://localhost:3000/chat.html
Backtesting http://localhost:3000/backtest.html
Portfolio Recommender http://localhost:3000/portfolio.html

Testing

The project includes a comprehensive test suite with 130 tests across both Python and Node.js.

Python Tests (pytest: 118 tests)

# Activate the virtual environment first
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows

# Run all Python tests
python -m pytest tests/ -v

# Run a specific test file
python -m pytest tests/test_sentiment.py -v

Test files:

File Coverage
test_config.py Configuration defaults, env var overrides, device detection
test_vmd.py VMD decomposition correctness, edge cases
test_technical.py Technical indicator computation (75 indicators)
test_sentiment.py FinBERT PoE sentiment scoring, mock inference
test_portfolio.py MVO optimisation strategies, weight constraints
test_benchmarks.py Passive strategy benchmarks (buy & hold, equal weight)
test_predictions_db.py SQLite predictions CRUD, freshness checks
test_backtest_db.py SQLite backtest history persistence
test_fetcher.py yfinance data fetching, cache TTL, SP100 JSON loading
test_dataset.py TFT dataset preparation, feature alignment, time-step indexing
test_tft_cache_key.py TFT model cache key generation, hyperparameter inclusion
test_integration_pipeline.py End-to-end ML pipeline (data → features → TFT → signal)
test_integration_fastapi.py FastAPI endpoint integration tests

Node.js Tests (Jest: 12 tests)

npm test

Covers S&P 100 JSON helpers and Express route integration (predict, scan, backtest, chat).

API Endpoints

Node.js (Port 3000)

Endpoint Method Description
/api/analyze GET Quick S&P 100 scan using SMA₅₀ trend strategy
/api/predict/:symbol POST Full ML pipeline prediction for a single stock
/api/scan POST ML-enhanced multi-stock scan with TFT
/api/backtest POST Walk-forward prediction accuracy backtest
/api/backtest/history GET List all past backtest runs
/api/backtest/history/:runId GET Get a specific backtest run with predictions
/api/portfolio/recommend POST Portfolio recommendation with rebalancing simulation
/api/predictions GET All cached predictions (optional ?signal= filter)
/api/predictions/status GET Prediction database freshness info
/api/predictions/:symbol GET Single cached prediction by symbol
/api/chat POST LLM chat agent with tool calling

Python ML Engine (Port 8000)

Endpoint Method Description
/health GET Liveness check
/symbols GET Current S&P 100 symbol list (used by Node.js layer)
/analyze POST Multi-stock scan → BUY/SELL/HOLD signals
/predict/{symbol} POST Single-stock TFT forecast + signal (with DB cache fast-path)
/backtest POST Walk-forward prediction accuracy backtest
/backtest/history GET List all past backtest runs
/backtest/history/{run_id} GET Get a specific backtest run with predictions
/backtest/history/{run_id} DELETE Delete a backtest run
/portfolio/recommend POST Portfolio recommendation with walk-forward simulation
/predictions GET All cached predictions
/predictions/status GET Database freshness info
/predictions/{symbol} GET Single cached prediction

ML Pipeline

The core analysis pipeline proceeds through six sequential stages:

  1. Data Acquisition: Fetches 10 years of OHLCV data per symbol via yfinance with cache-first Parquet storage
  2. Signal Decomposition: VMD with K=5 modes separates noise, swing patterns, and macro trend
  3. Technical Features: 75 indicators across 6 categories (trend, momentum, volatility, volume, statistical, VMD-derived), with the first 50 warm-up rows dropped to prevent artificial data artifacts
  4. Sentiment Analysis: FinBERT-based scores financial headlines (live RSS or historical dataset), merged as daily time-varying features
  5. TFT Forecasting: Temporal Fusion Transformer predicts 20-day returns → BUY/SELL/HOLD signals with confidence levels
  6. Portfolio Optimisation: MVO constructs efficient portfolios using TFT forecasts as expected returns

Project Structure

financialAdvisorBot/
├── server.js                        # Node.js Express server & LLM agent
├── package.json                     # Node.js dependencies
├── .env                             # Environment variables (API keys, ports)
├── public/                          # Frontend web UI
│   ├── index.html                   # Main dashboard
│   ├── chat.html                    # AI chat interface
│   ├── backtest.html                # Backtesting UI
│   ├── portfolio.html               # Portfolio recommender page
│   ├── app.js                       # Dashboard logic
│   ├── chat.js                      # Chat UI logic
│   ├── backtest.js                  # Backtesting UI logic
│   ├── portfolio.js                 # Portfolio recommender logic
│   └── style.css                    # Styling for all pages
├── tests/                           # Test suite
│   ├── conftest.py                  # Shared pytest fixtures
│   ├── server.test.js               # Jest tests for Express routes
│   ├── test_config.py               # Config & env var tests
│   ├── test_vmd.py                  # VMD decomposition tests
│   ├── test_technical.py            # Technical indicator tests
│   ├── test_sentiment.py            # FinBERT sentiment tests
│   ├── test_portfolio.py            # MVO optimisation tests
│   ├── test_benchmarks.py           # Benchmark strategy tests
│   ├── test_predictions_db.py       # Predictions DB tests
│   ├── test_backtest_db.py          # Backtest DB tests
│   ├── test_fetcher.py              # Data fetcher tests
│   ├── test_dataset.py              # TFT dataset preparation tests
│   ├── test_tft_cache_key.py        # TFT cache key generation tests
│   ├── test_integration_pipeline.py # End-to-end ML pipeline tests
│   └── test_integration_fastapi.py  # FastAPI endpoint tests
├── ml_engine/                       # Python ML backend
│   ├── main.py                      # FastAPI application & endpoints
│   ├── config.py                    # Centralised configuration
│   ├── batch_predict.py             # Batch S&P 100 prediction → SQLite
│   ├── requirements.txt             # Python dependencies
│   ├── data/
│   │   ├── fetcher.py               # yfinance data fetcher + Wikipedia S&P 100 scraper
│   │   ├── headlines.py             # Live RSS headline scraper
│   │   ├── historical_headlines.py  # Historical headline loader & sentiment cache
│   │   ├── download_news_dataset.py # Financial news dataset downloader
│   │   ├── precompute_sentiment.py  # Batch FinBERT sentiment pre-computation
│   │   ├── predictions_db.py        # SQLite DB for cached predictions
│   │   ├── backtest_db.py           # SQLite DB for backtest history
│   │   ├── simulation_db.py         # SQLite cache for walk-forward simulation results
│   │   ├── sp100_symbols.json       # Shared S&P 100 symbol list (auto-refreshed)
│   │   └── kaggle/                  # Historical headline datasets
│   ├── signals/
│   │   └── vmd.py                   # Variational Mode Decomposition
│   ├── features/
│   │   ├── technical.py             # 75 technical indicators
│   │   └── sentiment.py             # FinBERT sentiment
│   ├── models/
│   │   ├── tft.py                   # Temporal Fusion Transformer
│   │   ├── dataset.py               # TFT dataset preparation
│   │   └── checkpoints/             # Cached trained models
│   ├── optimizer/
│   │   └── portfolio.py             # Mean-Variance Optimisation
│   ├── backtest/
│   │   ├── engine.py                # Prediction accuracy backtesting
│   │   └── report.py                # Equity curves & heatmaps
│   └── benchmark/
│       └── benchmarks.py            # Passive strategy comparison
├── ARCHITECTURE.md                  # Detailed architecture documentation
└── project-requirements.txt         # Full project requirements specification

Technology Stack

Component Technology
Frontend HTML / CSS / JavaScript
Charting Chart.js 4.4.0
API Server Node.js + Express 5
ML Backend Python + FastAPI + Uvicorn
Deep Learning PyTorch + PyTorch Forecasting (TFT)
Training Framework PyTorch Lightning
Signal Processing vmdpy (VMD)
Technical Analysis ta (75 indicators)
Sentiment Analysis HuggingFace Transformers (FinBERT)
Portfolio Optimisation PyPortfolioOpt (MVO + HRP)
Market Data Yahoo Finance (yfinance)
Historical News Kaggle dataset, FelixDrinkall/financial-news-dataset
LLM Integration Ollama (DeepSeek V3.1)
Persistence SQLite (predictions + backtests) + Parquet (prices + sentiment)
Testing pytest (Python) + Jest + Supertest (Node.js)
HTTP Client Axios

Configuration

Key parameters are centralised in ml_engine/config.py. Training and validation cutoff dates are computed dynamically relative to today and can be overridden via environment variables.

Parameter Default Description
LOOKBACK_YEARS 10 Historical data window
TRAINING_END ~18 months ago Training data cutoff (dynamic)
VALIDATION_END ~6 months ago Validation data cutoff (dynamic)
VMD_K 5 Number of VMD modes
TFT_FORECAST_HORIZON 20 days Prediction horizon (~1 month)
TFT_LOOKBACK_WINDOW 30 days Encoder context window
TFT_HIDDEN_SIZE 32 Hidden layer neurons
TFT_ATTENTION_HEADS 2 Multi-head attention
TFT_BATCH_SIZE 2048 Training batch size
TFT_MAX_EPOCHS 30 Training epochs (with early stopping)
MVO_MAX_WEIGHT 10% Maximum allocation per stock
BACKTEST_REBALANCE_FREQ Monthly Rebalancing / evaluation cadence
BACKTEST_INITIAL_CAPITAL $100,000 Starting capital
PORTFOLIO_CACHE_MAX_AGE_HOURS 24 Use cached predictions if fresher than this
SIMULATION_MONTHS 6 Walk-forward simulation window
SIMULATION_CACHE_MAX_AGE_HOURS 24 Re-run simulation if cache is older
FINBERT_MODEL_NAME ProsusAI/finbert Sentiment model
DEVICE Auto-detect cuda > mps > cpu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors