Experiments for Journal Paper

This repository contains the scripts and notebook used to generate the results and plots for the associated paper, Evaluating the Impact of Mutant Schemata and Test Selection On Speeding Up Mutation Analysis. It includes utilities for data extraction, processing, statistical analysis, and figure generation.

Getting started

Create and activate a virtual environment

python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate

Install dependencies

pip install -r requirements.txt

Reproducing results

This section assumes you want to regenerate the analysis and figures from the paper.

Step 1: Data Setup

Download the data from Zenodo repository 10.5281/zenodo.17358408
Extract all project zip files to a directory (e.g., ./data/)
Extract the Utils.zip file in the same folder as ./data
Your directory structure should look like:

./data/
├── antomology/
├── commons-cli/
├── commons-csv/
├── commons-dbutils/
├── commons-lang/
├── commons-net/
├── commons-validator/
├── jackson-core/
├── jackson-databind/
├── jackson-dataformat-xml/
├── jaxen/
├── jettison/
├── jra/
├── jterminal/
├── triangle-example/
└── XChart/
|
./Utils/
├─Skew
├─CTM_operator.xlsx
├─grep_ctm.txt
└─grep_real.txt

Each project directory contains subdirectories for different experimental configurations (Original, Schemata, Test Selection, etc.).

Step 2: Data Extraction and Processing

Extract and process the experimental results:

# Extract zip files and run post-processing on test selection based runs
python extract_zips.py "./data" --subsumption --test-summary

# Extract coverage information for whitelisted files
python process_xmls.py "./data" -o coverage_whitelisted_files.txt

# Extract build times
python extract_times.py -d "./data" -o "extracted_times_summary.txt"

Step 3: Test Results Analysis

Analyze test execution counts for Original and Schemata configurations:

# Analyze Original configuration
python parse_test_results.py --directory "./data/commons-lang/Original"

# Analyze Schemata configuration  
python parse_test_results.py --directory "./data/commons-lang/Schemata"

Step 4: Statistical Analysis

Run statistical tests to compare different approaches:

# Perform Wilcoxon signed-rank tests
python wilcoxon_test_analysis.py

# Calculate effect sizes (Vargha-Delaney A)
python VD_A.py

Step 5: Generate Figures

Create the paper figures:

# Run the Jupyter notebook to generate plots
jupyter notebook plot_generation.ipynb

The notebook will export figures to PGF/TikZ format in the pgfs/ directory.

Alternative: Running New Experiments

If you want to run new experiments (rather than analyzing existing data), use hpc_caller.py which is designed for HPC clusters. Note that this script contains the same example paths as in this README as default and should be modified for your HPC setup.

Scripts overview and usage

Below is a quick overview of each script and how to run it.

extract_zips.py — Recursively extract results and post-process

What it does: Finds .zip files (default: file names matching "test"), extracts each to a directory, optionally runs:
- build time analysis on extracted logs
- uncovered mutants analysis
- subsumption analysis (LittleDarwin results → SQLite)
- test summary parsers to aggregate test outcomes
Usage examples:

# Basic: extract all matching zips under a directory, in parallel
python extract_zips.py "D:/final results"

# Sequential and keep extracted dirs
python extract_zips.py "D:/final results" --sequential --keep

# With subsumption and test summaries
python extract_zips.py "D:/final results" --subsumption --test-summary

# Custom filters and workers
python extract_zips.py "D:/final results" --regex "test|results" --max-workers 8

# Custom tool paths
python extract_zips.py "D:/final results" \
  --build-analyzer build_time_analyzer.py \
  --uncovered-script uncovered_mutants.py \
  --test-summary --test-summary-parser test_summary_parser.py --test-summary-parser-csv test_summary_parser_csv.py

Key flags:
- --regex default: test
- --build-analyzer default: build_time_analyzer.py
- --uncovered-script default: uncovered_mutants.py
- --subsumption enable subsumption analysis
- --test-summary enable test summary parsing

build_time_analyzer.py — Parse Maven build logs for total times

What it does: Recursively scans a directory for .txt files, extracts "Total time" entries from Maven logs, normalizes to seconds, prints per-file and overall stats. This script is called from within extract_zips.py.
Usage:

python build_time_analyzer.py ./logs

uncovered_mutants.py — Count uncovered mutants and export Clover metrics

What it does: Traverses an extracted project to count uncovered mutants (via HTMLs) and, if available, runs clover_reader.jar on Clover DBs to export coverage metrics. This script is called from within extract_zips.py.
Note: Requires clover_reader.jar present in the repository root (as referenced in the script) and Java on PATH.

test_summary_parser.py and test_summary_parser_csv.py — Aggregate test results

What they do: Parse test result directories and output summaries (text or CSV). Typically invoked by extract_zips.py --test-summary.
Typical usage when called directly:

python test_summary_parser.py <EXTRACTED_DIR> -o summary_tests.txt
python test_summary_parser_csv.py <EXTRACTED_DIR> -o summary_tests.csv

parse_test_results.py — Sum tests_run from collected test summaries

What it does: Recursively finds files ending with _tests.txt under a directory (e.g., Original or Schemata) and sums the tests_run counts. Applies per-project limits derived from our study configuration. This script is only for original and schemata configurations as we already have the data for test selection based methods in the database.
Notes:
- Supports filenames containing -original or -schemata to infer the project name; falls back to the parent directory name.
- Applies per-project summary limits (e.g., commons-lang: 313, jackson-databind: 656, xchart: 8, triangle-example: 2, etc.).
Usage:

python parse_test_results.py --directory ./results/commons-lang/Original

dataset_collection.py — Orchestrate Java project setup, build, test, and Clover

What it does: Clones configured Java projects, sets appropriate Java versions, runs build/test pipelines, enforces compatible JUnit/Surefire/Compiler plugin versions, and collects Clover artifacts.
Warning: move the contents of Utils/whitelists and Utils/excluded tests to the root folder of your dataset directory.
Run with parameters for your environment (paths to MediumDarwin and dataset directory, Java homes). See function main() signature for arguments.
Note: This script requires explicit parameters - all paths must be provided when calling the main() function.

hpc_caller.py — HPC job submission and project configuration

What it does: Manages HPC job submissions for running experiments across multiple Java projects. Contains project configurations with hardcoded paths to test exclusion files.
Note: This script contains default paths.

ctm_parser.py — Parse compile-time mutation outputs

What it does: Parses CTM text outputs into structured mappings. grep_ctm.txt is the result of running grep -r -E '.*Running compile time mutants.*' --include="*.out" > grep_ctm.txt in the root directory of the extracted/generated data. It extracts databases from each run's zip file and places them in ./dbs subdirectory so that it can query the CTM's information. Uses CLI arguments for output file, data directory, and input file.
Usage:

python ctm_parser.py [-o output.txt] [-d data_directory] [-g grep_ctm.txt]

run_compare_and_diagnose_ts.py — Test selection performance diagnosis tool

What it does: Compares baseline and test selection Maven executions to diagnose performance bottlenecks and regressions. Runs baseline and test selection configurations, collects logs and Surefire reports, parses per-test execution times, scans for JVM forking patterns and JaCoCo agent usage, analyzes slow-test skew, and generates diagnostic reports with recommendations. Can optionally run a batched test selection experiment to confirm cold JVM fragmentation issues.
Features:
- Parses Surefire XML and text reports for detailed test execution data
- Detects cold JVM fragmentation, coverage agent leakage, compile/invocation overhead, slow-test skew, and oversubscription issues
- Generates cumulative coverage plots showing test execution time distribution
- Queries SQLite databases to analyze mutant coverage per test
- Produces Markdown reports and JSON data dumps
Usage:

python run_compare_and_diagnose_ts.py \
  --project-dir /path/to/maven-project \
  --baseline-args "-DforkCount=1 -Dsurefire.reuseForks=true test" \
  --ts-args "-DforkCount=1 -Dsurefire.reuseForks=false test" \
  --target-class com.example.MyTest \
  --target-method testMethod \
  --db-path /path/to/mutant-database.db \
  --pre-clean \
  --batched-check \
  --out-dir ./diagnosis_output

Note: Replace the example paths above with your local paths. If you do not have a database for mutant-test relationships, omit --db-path.

run_skew_script_for_all.py — Batch test selection analysis across multiple projects

What it does: Automates the execution of run_compare_and_diagnose_ts.py across multiple Java projects with different configurations. Manages JDK environment setup for each project and sequentially runs test selection analysis, generating separate reports for each project in dedicated output directories.
Features:
- Configurable project entries with JDK paths, Maven arguments, and database locations
- Automatic JDK environment setup per project
- Individual output directories for each project to avoid conflicts
- Progress reporting and error handling for batch operations
Usage:

# Edit the entries list in the script with your project configurations, then run:
python run_skew_script_for_all.py

Configuration: Each entry in the script should specify:
- jdk_home: Path to JDK installation
- project_dir: Maven project root directory
- baseline_args and ts_args: Maven arguments for baseline and test selection runs
- target_class and target_method: Analysis targets
- db_path: SQLite database path for mutant-test relationships

process_xmls.py — Extract coverage information for whitelisted files

What it does: Searches XML coverage reports and extracts coverage metrics for a predefined set of whitelisted Java files from each project. These files represent key components that were specifically targeted for mutation analysis coverage assessment when the project was too large and the wall time limitation didn't allow us to run the analysis on all files.
Notes:
- Processes both overall project coverage and file-specific coverage metrics
- Whitelisted files (e.g., Conversion.java, SystemProperties.java, ObjectMapper.java, etc.)
- Outputs coverage percentages, lines of code, and element coverage counts
Usage:

python process_xmls.py <directory> [-o output.txt]

extract_times.py — Summarise times from per-project summaries

What it does: Recursively searches for summary_*.txt files and extracts build, clean, and test times into a tab-separated summary file.
Usage:

python extract_times.py -d "./data" -o "extracted_times_summary.txt"

wilcoxon_test_analysis.py — Statistical test of method deltas

What it does: Reads a new_data_*.csv with runs for multiple methods, computes per-run deltas vs original, runs Wilcoxon signed-rank tests per project/method, prints summaries, and writes wilcoxon_test_results.csv.
Usage:

python wilcoxon_test_analysis.py

VD_A.py — Vargha–Delaney A effect size utilities

What it does: Provides VD_A for pairwise effect size and VD_A_DF for DataFrame-based multi-group comparisons.

plot_generation.ipynb — Figure generation notebook

What it does: Loads processed CSVs, produces plots, and exports figures to PGF/TikZ via tikzplotlib into pgfs/.

Notebooks

plot_generation.ipynb produces the paper figures. Ensure dependencies are installed and the expected CSVs exist (e.g., new_data.csv and dated variants).

Data

plot_generation.ipynb uses data about the time the different projects take to run from a CSV file. And example file is provided as new_data_20250811.csv). Other scripts rely on the data available in the Zenodo repository and was not included in this repository because of its size.

Licensing

This project is licensed under the MIT License. See LICENSE for details.

Citation

If you use this repository in academic work, please cite the associated paper. A BibTeX entry can be added here once available.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
VD_A.py		VD_A.py
build_time_analyzer.py		build_time_analyzer.py
clover_reader.jar		clover_reader.jar
ctm_parser.py		ctm_parser.py
data.csv		data.csv
dataset_collection.py		dataset_collection.py
execution_script.sh		execution_script.sh
extract_times.py		extract_times.py
extract_zips.py		extract_zips.py
grep_ctm.txt		grep_ctm.txt
hpc_caller.py		hpc_caller.py
new_data_20250811.csv		new_data_20250811.csv
original_bash.sh		original_bash.sh
parse_test_results.py		parse_test_results.py
plot_generation.ipynb		plot_generation.ipynb
process_xmls.py		process_xmls.py
processed_xmls.txt		processed_xmls.txt
requirements.txt		requirements.txt
run_command_n_times.sh		run_command_n_times.sh
run_compare_and_diagnose_ts.py		run_compare_and_diagnose_ts.py
run_skew_script_for_all.py		run_skew_script_for_all.py
schemata_test_selection.sh		schemata_test_selection.sh
test_selection_bash.sh		test_selection_bash.sh
test_summary_parser.py		test_summary_parser.py
test_summary_parser_csv.py		test_summary_parser_csv.py
uncovered_mutants.py		uncovered_mutants.py
wilcoxon_test_analysis.py		wilcoxon_test_analysis.py
wilcoxon_test_results.csv		wilcoxon_test_results.csv

Folders and files

Latest commit

History

Repository files navigation

Experiments for Journal Paper

Getting started

Reproducing results

Step 1: Data Setup

Step 2: Data Extraction and Processing

Step 3: Test Results Analysis

Step 4: Statistical Analysis

Step 5: Generate Figures

Alternative: Running New Experiments

Scripts overview and usage

extract_zips.py — Recursively extract results and post-process

build_time_analyzer.py — Parse Maven build logs for total times

uncovered_mutants.py — Count uncovered mutants and export Clover metrics

test_summary_parser.py and test_summary_parser_csv.py — Aggregate test results

parse_test_results.py — Sum tests_run from collected test summaries

dataset_collection.py — Orchestrate Java project setup, build, test, and Clover

hpc_caller.py — HPC job submission and project configuration

ctm_parser.py — Parse compile-time mutation outputs

run_compare_and_diagnose_ts.py — Test selection performance diagnosis tool

run_skew_script_for_all.py — Batch test selection analysis across multiple projects

process_xmls.py — Extract coverage information for whitelisted files

extract_times.py — Summarise times from per-project summaries

wilcoxon_test_analysis.py — Statistical test of method deltas

VD_A.py — Vargha–Delaney A effect size utilities

plot_generation.ipynb — Figure generation notebook

Notebooks

Data

Licensing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages