S&P 100 Financial Advisor Bot

An AI-powered financial advisor that analyses S&P 100 stocks using deep learning, signal processing, and natural language processing to generate data-driven investment recommendations. Built as a dual-service architecture with a Node.js web frontend and a Python ML backend.

Features

Temporal Fusion Transformer (TFT): Multi-horizon stock price forecasting with interpretable attention, producing 20-day return predictions and BUY/SELL/HOLD signals
Variational Mode Decomposition (VMD): Decomposes price signals into 5 intrinsic mode functions to separate noise from trend
75 Technical Indicators: Covering trend, momentum, volatility, volume, statistical, and VMD-derived features
FinBERT Sentiment Analysis: Score financial headlines from Yahoo Finance and Google News RSS
Time-Varying Daily Sentiment: Daily FinBERT scores merged as a time-varying TFT feature (not static per-symbol) for capturing intra-day sentiment shifts
Historical Sentiment: Pre-computed FinBERT scores from Kaggle and financial news datasets for lookahead-free backtesting
Mean-Variance Optimisation (MVO): Markowitz portfolio construction with Ledoit-Wolf shrinkage (max Sharpe, min volatility, risk parity, efficient risk)
Prediction-Accuracy Backtesting: Walk-forward historical simulation with full UI for running, reviewing, and comparing backtest runs
SQLite Persistence: Cached predictions, backtest history, and walk-forward simulation results for fast retrieval and run comparison
Batch Prediction: Pre-compute predictions for all S&P 100 stocks via CLI script
Portfolio Recommender: Automated optimal portfolio construction with 6-month walk-forward simulated rebalancing and Chart.js-powered visualisations
Prediction Caching Fast-Path: Single-stock predictions are cached in SQLite for instant retrieval within a configurable freshness window
LLM Chat Agent: Ollama-powered conversational interface with ReAct-style tool calling for natural language stock analysis
Comprehensive Test Suite: 118 Python (pytest) + 12 Node.js (Jest) unit and integration tests

Architecture

┌──────────────────────┐         ┌──────────────────────────────────┐
│   Browser (Web UI)   │         │   Python ML Engine (Port 8000)   │
│                      │         │                                  │
│  • Dashboard         │  HTTP   │  • Data Fetcher (yfinance)       │
│  • AI Chat           │────────▶│  • S&P 100 (Wikipedia scraping)  │
│  • Backtesting       │         │  • VMD Decomposition             │
│  • Portfolio Recom.  │         │  • 75 Technical Indicators       │
└──────────┬───────────┘         │  • FinBERT Sentiment (PoE)       │
           │                     │  • TFT Forecasting               │
┌──────────▼───────────┐         │  • MVO Portfolio Optimiser        │
│  Node.js (Port 3000) │────────▶│  • Prediction Backtest Engine     │
│                      │         │  • Walk-Forward Simulation Cache  │
│  • Express API       │         │  • SQLite Persistence            │
│  • S&P 100 Scanner   │         └──────────────────────────────────┘
│  • ML Proxy          │
│  • Ollama LLM Agent  │
│  • Predictions Proxy │
└──────────────────────┘

Prerequisites

Node.js ≥ 18
Python ≥ 3.10
Ollama API Key: Required for the LLM chat features (ollama.com)

Setup

1. Clone the Repository

git clone https://github.com/nicholas-yap/financialAdvisorBot.git
cd financialAdvisorBot

2. Install Node.js Dependencies

npm install

3. Set Up the Python Virtual Environment

python3 -m venv venv
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows
pip install -r ml_engine/requirements.txt

4. Configure Environment Variables

Create a .env file in the project root:

OLLAMA_API_KEY=your_ollama_api_key_here
ML_ENGINE_URL=http://localhost:8000
PORT=3000

Variable	Default	Description
`OLLAMA_API_KEY`	(none)	API key for the Ollama LLM cloud service
`ML_ENGINE_URL`	`http://localhost:8000`	URL of the Python ML backend
`PORT`	`3000`	Port for the Node.js Express server
`TRAINING_END`	~18 months ago	Training data cutoff date (YYYY-MM-DD)
`VALIDATION_END`	~6 months ago	Validation data cutoff date (YYYY-MM-DD)

5. (Optional) Download Historical News Dataset

For sentiment-aware backtesting, download a financial news dataset:

source venv/bin/activate
python -m ml_engine.data.download_news_dataset

Or place the Kaggle Massive Stock News Analysis DB CSV at ml_engine/data/kaggle/analyst_ratings_processed.csv, then pre-compute sentiment:

python -m ml_engine.data.precompute_sentiment

Running the Application

Start the Python ML Engine

source venv/bin/activate
uvicorn ml_engine.main:app --port 8000 --reload

Start the Node.js Server

In a separate terminal:

node server.js

(Optional) Batch Predict All S&P 100 Stocks

Pre-compute predictions for fast dashboard loading:

source venv/bin/activate
python -m ml_engine.batch_predict

Available arguments:
  --symbols SYMBOLS [SYMBOLS ...]   Specific tickers to predict (default: all S&P 100)
  --retrain                         Force retrain the TFT model even if a cache exists
  --max-epochs MAX_EPOCHS           Max TFT training epochs (default: 15)

Access the Application

Page	URL
Dashboard	http://localhost:3000
AI Chat	http://localhost:3000/chat.html
Backtesting	http://localhost:3000/backtest.html
Portfolio Recommender	http://localhost:3000/portfolio.html

Testing

The project includes a comprehensive test suite with 130 tests across both Python and Node.js.

Python Tests (pytest: 118 tests)

# Activate the virtual environment first
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows

# Run all Python tests
python -m pytest tests/ -v

# Run a specific test file
python -m pytest tests/test_sentiment.py -v

Test files:

File	Coverage
`test_config.py`	Configuration defaults, env var overrides, device detection
`test_vmd.py`	VMD decomposition correctness, edge cases
`test_technical.py`	Technical indicator computation (75 indicators)
`test_sentiment.py`	FinBERT PoE sentiment scoring, mock inference
`test_portfolio.py`	MVO optimisation strategies, weight constraints
`test_benchmarks.py`	Passive strategy benchmarks (buy & hold, equal weight)
`test_predictions_db.py`	SQLite predictions CRUD, freshness checks
`test_backtest_db.py`	SQLite backtest history persistence
`test_fetcher.py`	yfinance data fetching, cache TTL, SP100 JSON loading
`test_dataset.py`	TFT dataset preparation, feature alignment, time-step indexing
`test_tft_cache_key.py`	TFT model cache key generation, hyperparameter inclusion
`test_integration_pipeline.py`	End-to-end ML pipeline (data → features → TFT → signal)
`test_integration_fastapi.py`	FastAPI endpoint integration tests

Node.js Tests (Jest: 12 tests)

npm test

Covers S&P 100 JSON helpers and Express route integration (predict, scan, backtest, chat).

API Endpoints

Node.js (Port 3000)

Endpoint	Method	Description
`/api/analyze`	GET	Quick S&P 100 scan using SMA₅₀ trend strategy
`/api/predict/:symbol`	POST	Full ML pipeline prediction for a single stock
`/api/scan`	POST	ML-enhanced multi-stock scan with TFT
`/api/backtest`	POST	Walk-forward prediction accuracy backtest
`/api/backtest/history`	GET	List all past backtest runs
`/api/backtest/history/:runId`	GET	Get a specific backtest run with predictions
`/api/portfolio/recommend`	POST	Portfolio recommendation with rebalancing simulation
`/api/predictions`	GET	All cached predictions (optional `?signal=` filter)
`/api/predictions/status`	GET	Prediction database freshness info
`/api/predictions/:symbol`	GET	Single cached prediction by symbol
`/api/chat`	POST	LLM chat agent with tool calling

Python ML Engine (Port 8000)

Endpoint	Method	Description
`/health`	GET	Liveness check
`/symbols`	GET	Current S&P 100 symbol list (used by Node.js layer)
`/analyze`	POST	Multi-stock scan → BUY/SELL/HOLD signals
`/predict/{symbol}`	POST	Single-stock TFT forecast + signal (with DB cache fast-path)
`/backtest`	POST	Walk-forward prediction accuracy backtest
`/backtest/history`	GET	List all past backtest runs
`/backtest/history/{run_id}`	GET	Get a specific backtest run with predictions
`/backtest/history/{run_id}`	DELETE	Delete a backtest run
`/portfolio/recommend`	POST	Portfolio recommendation with walk-forward simulation
`/predictions`	GET	All cached predictions
`/predictions/status`	GET	Database freshness info
`/predictions/{symbol}`	GET	Single cached prediction

ML Pipeline

The core analysis pipeline proceeds through six sequential stages:

Data Acquisition: Fetches 10 years of OHLCV data per symbol via yfinance with cache-first Parquet storage
Signal Decomposition: VMD with K=5 modes separates noise, swing patterns, and macro trend
Technical Features: 75 indicators across 6 categories (trend, momentum, volatility, volume, statistical, VMD-derived), with the first 50 warm-up rows dropped to prevent artificial data artifacts
Sentiment Analysis: FinBERT-based scores financial headlines (live RSS or historical dataset), merged as daily time-varying features
TFT Forecasting: Temporal Fusion Transformer predicts 20-day returns → BUY/SELL/HOLD signals with confidence levels
Portfolio Optimisation: MVO constructs efficient portfolios using TFT forecasts as expected returns

Project Structure

financialAdvisorBot/
├── server.js                        # Node.js Express server & LLM agent
├── package.json                     # Node.js dependencies
├── .env                             # Environment variables (API keys, ports)
├── public/                          # Frontend web UI
│   ├── index.html                   # Main dashboard
│   ├── chat.html                    # AI chat interface
│   ├── backtest.html                # Backtesting UI
│   ├── portfolio.html               # Portfolio recommender page
│   ├── app.js                       # Dashboard logic
│   ├── chat.js                      # Chat UI logic
│   ├── backtest.js                  # Backtesting UI logic
│   ├── portfolio.js                 # Portfolio recommender logic
│   └── style.css                    # Styling for all pages
├── tests/                           # Test suite
│   ├── conftest.py                  # Shared pytest fixtures
│   ├── server.test.js               # Jest tests for Express routes
│   ├── test_config.py               # Config & env var tests
│   ├── test_vmd.py                  # VMD decomposition tests
│   ├── test_technical.py            # Technical indicator tests
│   ├── test_sentiment.py            # FinBERT sentiment tests
│   ├── test_portfolio.py            # MVO optimisation tests
│   ├── test_benchmarks.py           # Benchmark strategy tests
│   ├── test_predictions_db.py       # Predictions DB tests
│   ├── test_backtest_db.py          # Backtest DB tests
│   ├── test_fetcher.py              # Data fetcher tests
│   ├── test_dataset.py              # TFT dataset preparation tests
│   ├── test_tft_cache_key.py        # TFT cache key generation tests
│   ├── test_integration_pipeline.py # End-to-end ML pipeline tests
│   └── test_integration_fastapi.py  # FastAPI endpoint tests
├── ml_engine/                       # Python ML backend
│   ├── main.py                      # FastAPI application & endpoints
│   ├── config.py                    # Centralised configuration
│   ├── batch_predict.py             # Batch S&P 100 prediction → SQLite
│   ├── requirements.txt             # Python dependencies
│   ├── data/
│   │   ├── fetcher.py               # yfinance data fetcher + Wikipedia S&P 100 scraper
│   │   ├── headlines.py             # Live RSS headline scraper
│   │   ├── historical_headlines.py  # Historical headline loader & sentiment cache
│   │   ├── download_news_dataset.py # Financial news dataset downloader
│   │   ├── precompute_sentiment.py  # Batch FinBERT sentiment pre-computation
│   │   ├── predictions_db.py        # SQLite DB for cached predictions
│   │   ├── backtest_db.py           # SQLite DB for backtest history
│   │   ├── simulation_db.py         # SQLite cache for walk-forward simulation results
│   │   ├── sp100_symbols.json       # Shared S&P 100 symbol list (auto-refreshed)
│   │   └── kaggle/                  # Historical headline datasets
│   ├── signals/
│   │   └── vmd.py                   # Variational Mode Decomposition
│   ├── features/
│   │   ├── technical.py             # 75 technical indicators
│   │   └── sentiment.py             # FinBERT sentiment
│   ├── models/
│   │   ├── tft.py                   # Temporal Fusion Transformer
│   │   ├── dataset.py               # TFT dataset preparation
│   │   └── checkpoints/             # Cached trained models
│   ├── optimizer/
│   │   └── portfolio.py             # Mean-Variance Optimisation
│   ├── backtest/
│   │   ├── engine.py                # Prediction accuracy backtesting
│   │   └── report.py                # Equity curves & heatmaps
│   └── benchmark/
│       └── benchmarks.py            # Passive strategy comparison
├── ARCHITECTURE.md                  # Detailed architecture documentation
└── project-requirements.txt         # Full project requirements specification

Technology Stack

Component	Technology
Frontend	HTML / CSS / JavaScript
Charting	Chart.js 4.4.0
API Server	Node.js + Express 5
ML Backend	Python + FastAPI + Uvicorn
Deep Learning	PyTorch + PyTorch Forecasting (TFT)
Training Framework	PyTorch Lightning
Signal Processing	vmdpy (VMD)
Technical Analysis	ta (75 indicators)
Sentiment Analysis	HuggingFace Transformers (FinBERT)
Portfolio Optimisation	PyPortfolioOpt (MVO + HRP)
Market Data	Yahoo Finance (yfinance)
Historical News	Kaggle dataset, FelixDrinkall/financial-news-dataset
LLM Integration	Ollama (DeepSeek V3.1)
Persistence	SQLite (predictions + backtests) + Parquet (prices + sentiment)
Testing	pytest (Python) + Jest + Supertest (Node.js)
HTTP Client	Axios

Configuration

Key parameters are centralised in ml_engine/config.py. Training and validation cutoff dates are computed dynamically relative to today and can be overridden via environment variables.

Parameter	Default	Description
`LOOKBACK_YEARS`	10	Historical data window
`TRAINING_END`	~18 months ago	Training data cutoff (dynamic)
`VALIDATION_END`	~6 months ago	Validation data cutoff (dynamic)
`VMD_K`	5	Number of VMD modes
`TFT_FORECAST_HORIZON`	20 days	Prediction horizon (~1 month)
`TFT_LOOKBACK_WINDOW`	30 days	Encoder context window
`TFT_HIDDEN_SIZE`	32	Hidden layer neurons
`TFT_ATTENTION_HEADS`	2	Multi-head attention
`TFT_BATCH_SIZE`	2048	Training batch size
`TFT_MAX_EPOCHS`	30	Training epochs (with early stopping)
`MVO_MAX_WEIGHT`	10%	Maximum allocation per stock
`BACKTEST_REBALANCE_FREQ`	Monthly	Rebalancing / evaluation cadence
`BACKTEST_INITIAL_CAPITAL`	$100,000	Starting capital
`PORTFOLIO_CACHE_MAX_AGE_HOURS`	24	Use cached predictions if fresher than this
`SIMULATION_MONTHS`	6	Walk-forward simulation window
`SIMULATION_CACHE_MAX_AGE_HOURS`	24	Re-run simulation if cache is older
`FINBERT_MODEL_NAME`	ProsusAI/finbert	Sentiment model
`DEVICE`	Auto-detect	cuda > mps > cpu

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ml_engine		ml_engine
public		public
tests		tests
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S&P 100 Financial Advisor Bot

Features

Architecture

Prerequisites

Setup

1. Clone the Repository

2. Install Node.js Dependencies

3. Set Up the Python Virtual Environment

4. Configure Environment Variables

5. (Optional) Download Historical News Dataset

Running the Application

Start the Python ML Engine

Start the Node.js Server

(Optional) Batch Predict All S&P 100 Stocks

Access the Application

Testing

Python Tests (pytest: 118 tests)

Node.js Tests (Jest: 12 tests)

API Endpoints

Node.js (Port 3000)

Python ML Engine (Port 8000)

ML Pipeline

Project Structure

Technology Stack

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S&P 100 Financial Advisor Bot

Features

Architecture

Prerequisites

Setup

1. Clone the Repository

2. Install Node.js Dependencies

3. Set Up the Python Virtual Environment

4. Configure Environment Variables

5. (Optional) Download Historical News Dataset

Running the Application

Start the Python ML Engine

Start the Node.js Server

(Optional) Batch Predict All S&P 100 Stocks

Access the Application

Testing

Python Tests (pytest: 118 tests)

Node.js Tests (Jest: 12 tests)

API Endpoints

Node.js (Port 3000)

Python ML Engine (Port 8000)

ML Pipeline

Project Structure

Technology Stack

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages