An AI-powered financial advisor that analyses S&P 100 stocks using deep learning, signal processing, and natural language processing to generate data-driven investment recommendations. Built as a dual-service architecture with a Node.js web frontend and a Python ML backend.
- Temporal Fusion Transformer (TFT): Multi-horizon stock price forecasting with interpretable attention, producing 20-day return predictions and BUY/SELL/HOLD signals
- Variational Mode Decomposition (VMD): Decomposes price signals into 5 intrinsic mode functions to separate noise from trend
- 75 Technical Indicators: Covering trend, momentum, volatility, volume, statistical, and VMD-derived features
- FinBERT Sentiment Analysis: Score financial headlines from Yahoo Finance and Google News RSS
- Time-Varying Daily Sentiment: Daily FinBERT scores merged as a time-varying TFT feature (not static per-symbol) for capturing intra-day sentiment shifts
- Historical Sentiment: Pre-computed FinBERT scores from Kaggle and financial news datasets for lookahead-free backtesting
- Mean-Variance Optimisation (MVO): Markowitz portfolio construction with Ledoit-Wolf shrinkage (max Sharpe, min volatility, risk parity, efficient risk)
- Prediction-Accuracy Backtesting: Walk-forward historical simulation with full UI for running, reviewing, and comparing backtest runs
- SQLite Persistence: Cached predictions, backtest history, and walk-forward simulation results for fast retrieval and run comparison
- Batch Prediction: Pre-compute predictions for all S&P 100 stocks via CLI script
- Portfolio Recommender: Automated optimal portfolio construction with 6-month walk-forward simulated rebalancing and Chart.js-powered visualisations
- Prediction Caching Fast-Path: Single-stock predictions are cached in SQLite for instant retrieval within a configurable freshness window
- LLM Chat Agent: Ollama-powered conversational interface with ReAct-style tool calling for natural language stock analysis
- Comprehensive Test Suite: 118 Python (pytest) + 12 Node.js (Jest) unit and integration tests
┌──────────────────────┐ ┌──────────────────────────────────┐
│ Browser (Web UI) │ │ Python ML Engine (Port 8000) │
│ │ │ │
│ • Dashboard │ HTTP │ • Data Fetcher (yfinance) │
│ • AI Chat │────────▶│ • S&P 100 (Wikipedia scraping) │
│ • Backtesting │ │ • VMD Decomposition │
│ • Portfolio Recom. │ │ • 75 Technical Indicators │
└──────────┬───────────┘ │ • FinBERT Sentiment (PoE) │
│ │ • TFT Forecasting │
┌──────────▼───────────┐ │ • MVO Portfolio Optimiser │
│ Node.js (Port 3000) │────────▶│ • Prediction Backtest Engine │
│ │ │ • Walk-Forward Simulation Cache │
│ • Express API │ │ • SQLite Persistence │
│ • S&P 100 Scanner │ └──────────────────────────────────┘
│ • ML Proxy │
│ • Ollama LLM Agent │
│ • Predictions Proxy │
└──────────────────────┘
- Node.js ≥ 18
- Python ≥ 3.10
- Ollama API Key: Required for the LLM chat features (ollama.com)
git clone https://github.com/nicholas-yap/financialAdvisorBot.git
cd financialAdvisorBotnpm installpython3 -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
pip install -r ml_engine/requirements.txtCreate a .env file in the project root:
OLLAMA_API_KEY=your_ollama_api_key_here
ML_ENGINE_URL=http://localhost:8000
PORT=3000| Variable | Default | Description |
|---|---|---|
OLLAMA_API_KEY |
(none) | API key for the Ollama LLM cloud service |
ML_ENGINE_URL |
http://localhost:8000 |
URL of the Python ML backend |
PORT |
3000 |
Port for the Node.js Express server |
TRAINING_END |
~18 months ago | Training data cutoff date (YYYY-MM-DD) |
VALIDATION_END |
~6 months ago | Validation data cutoff date (YYYY-MM-DD) |
For sentiment-aware backtesting, download a financial news dataset:
source venv/bin/activate
python -m ml_engine.data.download_news_datasetOr place the Kaggle Massive Stock News Analysis DB CSV at ml_engine/data/kaggle/analyst_ratings_processed.csv, then pre-compute sentiment:
python -m ml_engine.data.precompute_sentimentsource venv/bin/activate
uvicorn ml_engine.main:app --port 8000 --reloadIn a separate terminal:
node server.jsPre-compute predictions for fast dashboard loading:
source venv/bin/activate
python -m ml_engine.batch_predict
Available arguments:
--symbols SYMBOLS [SYMBOLS ...] Specific tickers to predict (default: all S&P 100)
--retrain Force retrain the TFT model even if a cache exists
--max-epochs MAX_EPOCHS Max TFT training epochs (default: 15)| Page | URL |
|---|---|
| Dashboard | http://localhost:3000 |
| AI Chat | http://localhost:3000/chat.html |
| Backtesting | http://localhost:3000/backtest.html |
| Portfolio Recommender | http://localhost:3000/portfolio.html |
The project includes a comprehensive test suite with 130 tests across both Python and Node.js.
# Activate the virtual environment first
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
# Run all Python tests
python -m pytest tests/ -v
# Run a specific test file
python -m pytest tests/test_sentiment.py -vTest files:
| File | Coverage |
|---|---|
test_config.py |
Configuration defaults, env var overrides, device detection |
test_vmd.py |
VMD decomposition correctness, edge cases |
test_technical.py |
Technical indicator computation (75 indicators) |
test_sentiment.py |
FinBERT PoE sentiment scoring, mock inference |
test_portfolio.py |
MVO optimisation strategies, weight constraints |
test_benchmarks.py |
Passive strategy benchmarks (buy & hold, equal weight) |
test_predictions_db.py |
SQLite predictions CRUD, freshness checks |
test_backtest_db.py |
SQLite backtest history persistence |
test_fetcher.py |
yfinance data fetching, cache TTL, SP100 JSON loading |
test_dataset.py |
TFT dataset preparation, feature alignment, time-step indexing |
test_tft_cache_key.py |
TFT model cache key generation, hyperparameter inclusion |
test_integration_pipeline.py |
End-to-end ML pipeline (data → features → TFT → signal) |
test_integration_fastapi.py |
FastAPI endpoint integration tests |
npm testCovers S&P 100 JSON helpers and Express route integration (predict, scan, backtest, chat).
| Endpoint | Method | Description |
|---|---|---|
/api/analyze |
GET | Quick S&P 100 scan using SMA₅₀ trend strategy |
/api/predict/:symbol |
POST | Full ML pipeline prediction for a single stock |
/api/scan |
POST | ML-enhanced multi-stock scan with TFT |
/api/backtest |
POST | Walk-forward prediction accuracy backtest |
/api/backtest/history |
GET | List all past backtest runs |
/api/backtest/history/:runId |
GET | Get a specific backtest run with predictions |
/api/portfolio/recommend |
POST | Portfolio recommendation with rebalancing simulation |
/api/predictions |
GET | All cached predictions (optional ?signal= filter) |
/api/predictions/status |
GET | Prediction database freshness info |
/api/predictions/:symbol |
GET | Single cached prediction by symbol |
/api/chat |
POST | LLM chat agent with tool calling |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Liveness check |
/symbols |
GET | Current S&P 100 symbol list (used by Node.js layer) |
/analyze |
POST | Multi-stock scan → BUY/SELL/HOLD signals |
/predict/{symbol} |
POST | Single-stock TFT forecast + signal (with DB cache fast-path) |
/backtest |
POST | Walk-forward prediction accuracy backtest |
/backtest/history |
GET | List all past backtest runs |
/backtest/history/{run_id} |
GET | Get a specific backtest run with predictions |
/backtest/history/{run_id} |
DELETE | Delete a backtest run |
/portfolio/recommend |
POST | Portfolio recommendation with walk-forward simulation |
/predictions |
GET | All cached predictions |
/predictions/status |
GET | Database freshness info |
/predictions/{symbol} |
GET | Single cached prediction |
The core analysis pipeline proceeds through six sequential stages:
- Data Acquisition: Fetches 10 years of OHLCV data per symbol via yfinance with cache-first Parquet storage
- Signal Decomposition: VMD with K=5 modes separates noise, swing patterns, and macro trend
- Technical Features: 75 indicators across 6 categories (trend, momentum, volatility, volume, statistical, VMD-derived), with the first 50 warm-up rows dropped to prevent artificial data artifacts
- Sentiment Analysis: FinBERT-based scores financial headlines (live RSS or historical dataset), merged as daily time-varying features
- TFT Forecasting: Temporal Fusion Transformer predicts 20-day returns → BUY/SELL/HOLD signals with confidence levels
- Portfolio Optimisation: MVO constructs efficient portfolios using TFT forecasts as expected returns
financialAdvisorBot/
├── server.js # Node.js Express server & LLM agent
├── package.json # Node.js dependencies
├── .env # Environment variables (API keys, ports)
├── public/ # Frontend web UI
│ ├── index.html # Main dashboard
│ ├── chat.html # AI chat interface
│ ├── backtest.html # Backtesting UI
│ ├── portfolio.html # Portfolio recommender page
│ ├── app.js # Dashboard logic
│ ├── chat.js # Chat UI logic
│ ├── backtest.js # Backtesting UI logic
│ ├── portfolio.js # Portfolio recommender logic
│ └── style.css # Styling for all pages
├── tests/ # Test suite
│ ├── conftest.py # Shared pytest fixtures
│ ├── server.test.js # Jest tests for Express routes
│ ├── test_config.py # Config & env var tests
│ ├── test_vmd.py # VMD decomposition tests
│ ├── test_technical.py # Technical indicator tests
│ ├── test_sentiment.py # FinBERT sentiment tests
│ ├── test_portfolio.py # MVO optimisation tests
│ ├── test_benchmarks.py # Benchmark strategy tests
│ ├── test_predictions_db.py # Predictions DB tests
│ ├── test_backtest_db.py # Backtest DB tests
│ ├── test_fetcher.py # Data fetcher tests
│ ├── test_dataset.py # TFT dataset preparation tests
│ ├── test_tft_cache_key.py # TFT cache key generation tests
│ ├── test_integration_pipeline.py # End-to-end ML pipeline tests
│ └── test_integration_fastapi.py # FastAPI endpoint tests
├── ml_engine/ # Python ML backend
│ ├── main.py # FastAPI application & endpoints
│ ├── config.py # Centralised configuration
│ ├── batch_predict.py # Batch S&P 100 prediction → SQLite
│ ├── requirements.txt # Python dependencies
│ ├── data/
│ │ ├── fetcher.py # yfinance data fetcher + Wikipedia S&P 100 scraper
│ │ ├── headlines.py # Live RSS headline scraper
│ │ ├── historical_headlines.py # Historical headline loader & sentiment cache
│ │ ├── download_news_dataset.py # Financial news dataset downloader
│ │ ├── precompute_sentiment.py # Batch FinBERT sentiment pre-computation
│ │ ├── predictions_db.py # SQLite DB for cached predictions
│ │ ├── backtest_db.py # SQLite DB for backtest history
│ │ ├── simulation_db.py # SQLite cache for walk-forward simulation results
│ │ ├── sp100_symbols.json # Shared S&P 100 symbol list (auto-refreshed)
│ │ └── kaggle/ # Historical headline datasets
│ ├── signals/
│ │ └── vmd.py # Variational Mode Decomposition
│ ├── features/
│ │ ├── technical.py # 75 technical indicators
│ │ └── sentiment.py # FinBERT sentiment
│ ├── models/
│ │ ├── tft.py # Temporal Fusion Transformer
│ │ ├── dataset.py # TFT dataset preparation
│ │ └── checkpoints/ # Cached trained models
│ ├── optimizer/
│ │ └── portfolio.py # Mean-Variance Optimisation
│ ├── backtest/
│ │ ├── engine.py # Prediction accuracy backtesting
│ │ └── report.py # Equity curves & heatmaps
│ └── benchmark/
│ └── benchmarks.py # Passive strategy comparison
├── ARCHITECTURE.md # Detailed architecture documentation
└── project-requirements.txt # Full project requirements specification
| Component | Technology |
|---|---|
| Frontend | HTML / CSS / JavaScript |
| Charting | Chart.js 4.4.0 |
| API Server | Node.js + Express 5 |
| ML Backend | Python + FastAPI + Uvicorn |
| Deep Learning | PyTorch + PyTorch Forecasting (TFT) |
| Training Framework | PyTorch Lightning |
| Signal Processing | vmdpy (VMD) |
| Technical Analysis | ta (75 indicators) |
| Sentiment Analysis | HuggingFace Transformers (FinBERT) |
| Portfolio Optimisation | PyPortfolioOpt (MVO + HRP) |
| Market Data | Yahoo Finance (yfinance) |
| Historical News | Kaggle dataset, FelixDrinkall/financial-news-dataset |
| LLM Integration | Ollama (DeepSeek V3.1) |
| Persistence | SQLite (predictions + backtests) + Parquet (prices + sentiment) |
| Testing | pytest (Python) + Jest + Supertest (Node.js) |
| HTTP Client | Axios |
Key parameters are centralised in ml_engine/config.py. Training and validation cutoff dates are computed dynamically relative to today and can be overridden via environment variables.
| Parameter | Default | Description |
|---|---|---|
LOOKBACK_YEARS |
10 | Historical data window |
TRAINING_END |
~18 months ago | Training data cutoff (dynamic) |
VALIDATION_END |
~6 months ago | Validation data cutoff (dynamic) |
VMD_K |
5 | Number of VMD modes |
TFT_FORECAST_HORIZON |
20 days | Prediction horizon (~1 month) |
TFT_LOOKBACK_WINDOW |
30 days | Encoder context window |
TFT_HIDDEN_SIZE |
32 | Hidden layer neurons |
TFT_ATTENTION_HEADS |
2 | Multi-head attention |
TFT_BATCH_SIZE |
2048 | Training batch size |
TFT_MAX_EPOCHS |
30 | Training epochs (with early stopping) |
MVO_MAX_WEIGHT |
10% | Maximum allocation per stock |
BACKTEST_REBALANCE_FREQ |
Monthly | Rebalancing / evaluation cadence |
BACKTEST_INITIAL_CAPITAL |
$100,000 | Starting capital |
PORTFOLIO_CACHE_MAX_AGE_HOURS |
24 | Use cached predictions if fresher than this |
SIMULATION_MONTHS |
6 | Walk-forward simulation window |
SIMULATION_CACHE_MAX_AGE_HOURS |
24 | Re-run simulation if cache is older |
FINBERT_MODEL_NAME |
ProsusAI/finbert | Sentiment model |
DEVICE |
Auto-detect | cuda > mps > cpu |