GitHub - steviecurran/ab-testing-toolkit: A/B testing toolkit

A/B Testing Toolkit: Comparing Two Means

In A/B testing, the goal is to determine whether an observed difference between groups is statistically significant and practically meaningful, as commonly required in product experiments and business decision-making.

In many real-world scenarios, small differences between groups can be statistically significant, particularly with large datasets. This tool helps distinguish meaningful effects from random variation.

The notebook (CI-2_means.ipynb) compares two groups using confidence intervals and hypothesis testing, avoiding manual calculations and statistical tables that are prone to error.

The notebook is interactive and includes options to:

Load a dataset from .csv or .dat files or input the means and standard deviations directly
For the full data option, plot a histogram showing the distributions of the two classes
Adjust the confidence levels from the default 95%, offering more stringent testing
Run a one or two tailed test
Change the t- to z-statistic threshold from the default n = 30
Switch between equal (pooled) and inequal (Welch’s method) variances

Interpretation

This toolkit enables rapid comparison of group means and supports statistical decision-making by quantifying uncertainty and significance.

Quick Start

Run the notebook using a simulated dataset, or to see a full workflow without interaction, change

	USE_DEFAULTS = False
    to
    USE_DEFAULTS = True

Example 1: Raw data of a small sample

The file Mg_levels.dat contains the levels of magnesium in a sample of people before and after taking a supplement. We wish to test the null hypothesis that the supplement does not increase the magnesium levels of the patients

At the 95% confidence level, we reject the null hypothesis and conclude that the supplement increases magnesium levels.

Using the condifence level dropdown it is straightforward to show that the null hypothesis cannot be rejected with 99% confidence.

Note: This example is treated as a two-sample test. In practice, a paired test may be more appropriate for before/after measurements.

Example 2: Large sample test

We can use the medical data below to demonstrate the toolkit on a large sample where we input the summary data only.

For example, for the systolic blood pressure

Here Men have been entered as Sample 1 and Women as Sample 2.

Although the difference in means is small relative to the variability, the large sample size allows us to detect a statistically significant difference. This result remains statistically significant even at very high confidence levels (e.g. 99.9%).

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
CI-2_means.ipynb		CI-2_means.ipynb
Mg_1.png		Mg_1.png
Mg_histo.png		Mg_histo.png
Mg_levels.dat		Mg_levels.dat
Mg_results.png		Mg_results.png
README.md		README.md
apples.dat		apples.dat
bp_results.png		bp_results.png
height_histo.png		height_histo.png
heights.csv		heights.csv
heights_results.png		heights_results.png
medical_large.png		medical_large.png
medical_small.png		medical_small.png
salaries.dat		salaries.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A/B Testing Toolkit: Comparing Two Means

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

A/B Testing Toolkit: Comparing Two Means

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages