Multi-Touch Attribution System

Machine Learning & Mathematical Models for eCommerce Marketing ROI

📌 Executive Summary

Standard analytics platforms usually credit 100% of a sale to the last advertisement a customer clicked before purchasing. This "Last-Touch" model severely undervalues "discovery" channels (like TikTok Ads or Display) and over-credits "closing" channels (like Direct Traffic or Retargeting Emails).

This system applies Data Science & Machine Learning to map every step of the customer journey, mathematically distributing revenue credit to the specific marketing ads that actually drove the sale.

🚨 The Cost of Doing Nothing (Attribution Loss)

By deploying algorithmic ML attribution over this dataset, we discovered a 42.5% Misallocation Index. Out of a $509,237 total portfolio revenue pool, $216,318 is currently being credited to the wrong channels.

How is this mathematically calculated?

Absolute Disparity: We compute the absolute dollar difference between the Naive "Last Touch" model and the predictive "XGBoost + SHAP" model for every channel.

Sum & Halve: We sum the absolute differences. Because rerouting $1 from Facebook to TikTok mathematically creates a -$1 difference for Facebook alongside a +$1 difference for TikTok (creating a $2 absolute shift), we divide the final sum exactly by 2.

Index Percentage: This de-duplicated volume of physically misallocated dollars ($216k) is divided by the total portfolio cash flow ($509k) to get the final Impact Index.

If marketing decisions remain reliant on traditional analytics:

The business will continue to operate blindly on over 40% of its revenue trajectory.
TikTok Ads are driving +440% more top-of-funnel revenue than Last-Touch initially claims, but they are being starved of budget.
Email and Display are massively undervalued (+318% and +161% true predictive lift), having their influence stolen by organically closing channels.
Facebook Ads and Organic Search are taking excessive credit (often incorrectly claiming ~60-70% more causal revenue than they legitimately generated).

This precise causal intelligence dictates exactly where the next $100k in marketing budget should be allocated to properly maximize ROAS.

🔬 Methodology Portfolio Walkthrough

The project sequentially builds mathematical complexity to evaluate benchmark deviations across models:

Heuristics (The Analytics Baseline)
- Applying industry-standard logic: First-Touch, Last-Touch, Linear, U-Shape.
- Used to understand raw disparities between "Awareness" drivers vs. "Closing" drivers.
Absorbing Markov Chains (Graph Theory)
- Maps all customer traffic into a massive probability transition network.
- Calculates the true "Removal Effect": If we turn off Facebook Ads, what exact percentage of total conversions completely collapses across our network?
Cooperative Game Theory (Shapley Values)
- Computes all $2^N$ marketing combinations (coalitions) mathematically to define the exact marginal contribution of an individual channel toward shared revenue pools.
Data-Driven ML (Hyper-Tuned XGBoost + SHAP Values) 🚀
- A binary classification Machine Learning model (XGBoost) predicts conversion likelihoods by locking onto complex non-linear synergy (e.g. Organic Search -> Email -> Sale).
- Organically combats massive class imbalances (95% non-conversions) using scale_pos_weight and dynamic RandomizedSearchCV structural F1-score parameter tuning.
- SHAP (SHapley Additive exPlanations) is applied over an intentionally engineered feature space (channel frequency, first-touch flags) to calculate localized predictive lifts on specifically converted users.

📈 Exploring The Visualized Insights

A presentation-ready analytical suite completely automated by the pipeline outputs highly actionable charts and fully interactive web-based storytelling tools.

🌟 Interactive HTML Applications (Execution Outputs)

1a_user_journey_sankey.html & 1b_user_journey_sunburst.html: Interactive Plotly diagrams allowing you to explore hierarchically exactly how users flow through complex discovery and closing channels over time.
3b_markov_network_interactive.html: A full physics-engine powered PyVis Network Sandbox. Channel nodes structurally pull toward each other on your screen based on their transition probabilities inside the Markov Matrix.

📊 Static Storytelling Charts

Marketing Attribution Baseline Comparison: A strictly business-focused benchmark contrasting ML & Markov against standard Last-Touch.
Diverging ROI Shift Chart: A stakeholder-friendly horizontal diverging bar chart visually translating the Misallocation Index into exact dollars overfunded and underfunded per channel.
Markov Removal Effects: Waterfall breakdown computing the exact drop in total pipeline conversion probability if a channel node literally vanishes.
SHAP Value Beeswarm: Every dot is a user's probability lift, visualizing whether a specific touchpoint was a positive or negative driving anchor in predicting a sale.
Kaplan-Meier Survival Curves & Hazard Ratios: Forest plots generated via lifelines showcasing the coefficient of channels functioning as "Accelerators" (shortening sales cycles) versus "Draggers" (elongating the timeline).

Key Finding Breakdowns:

Email is a Force-Multiplier: The ML model detects massive non-linear synergy. While an email blast alone converts poorly, when combined with top-of-funnel discovery campaigns, it mathematically multiplies conversion probability, causing XGBoost to allocate it massive predictive value.
Direct is a Byproduct of Prior Marketing: Up to 26% of observed "Direct Traffic Sales" fundamentally belong to the Paid and Organic campaigns that drove the brand discovery steps ahead of the user manually typing the URL to checkout.

🛠️ Project Architecture & Best Practices

Constructed following robust Python Machine Learning standards:

colors.py: Centralized universal color styling applied to every channel ensuring cross-chart coherence.
data_loader.py: Data ingestion, traversal trajectory sorting, and user aggregation.
ml_model.py / markov_model.py / shapley_model.py: Algorithmic sub-classes cleanly abstracting the math.
evaluator.py: Dataframe alignment rendering exact financial deviations.
visualizer.py & story_visualizer.py: End-to-end HTML/PNG asset generators powering the final slide-decks.

To Run Locally

Clone the repository and execute the pipeline:

# Make sure to setup a Python virtual environment (e.g. uv or base venv)
pip install -r requirements.txt

# Run purely statistical validations (cross-validation, Accuracy, F1, C-Index)
python run_performance.py

# Run standard structural pipeline tracing and visualizations
python run_pipeline.py
python run_survival.py

The script will output all model deviations locally to data/processed/ and auto-generate the presentation charts in reports/figures/.

Documentation Note: Read docs/attribution_model_decisions.md to review the entire development journal containing statistical rationale, formula layouts, and codebase decisions.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
docs		docs
lib		lib
notebooks		notebooks
reports		reports
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_performance.py		run_performance.py
run_pipeline.py		run_pipeline.py
run_reports.py		run_reports.py
run_survival.py		run_survival.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Touch Attribution System

📌 Executive Summary

🚨 The Cost of Doing Nothing (Attribution Loss)

🔬 Methodology Portfolio Walkthrough

📈 Exploring The Visualized Insights

🌟 Interactive HTML Applications (Execution Outputs)

📊 Static Storytelling Charts

🛠️ Project Architecture & Best Practices

To Run Locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Touch Attribution System

📌 Executive Summary

🚨 The Cost of Doing Nothing (Attribution Loss)

🔬 Methodology Portfolio Walkthrough

📈 Exploring The Visualized Insights

🌟 Interactive HTML Applications (Execution Outputs)

📊 Static Storytelling Charts

🛠️ Project Architecture & Best Practices

To Run Locally

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages