This repository contains the scripts and notebook used to generate the results and plots for the associated paper, Evaluating the Impact of Mutant Schemata and Test Selection On Speeding Up Mutation Analysis. It includes utilities for data extraction, processing, statistical analysis, and figure generation.
- Create and activate a virtual environment
python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate- Install dependencies
pip install -r requirements.txtThis section assumes you want to regenerate the analysis and figures from the paper.
- Download the data from Zenodo repository
10.5281/zenodo.17358408 - Extract all project zip files to a directory (e.g.,
./data/) - Extract the Utils.zip file in the same folder as
./data - Your directory structure should look like:
./data/
├── antomology/
├── commons-cli/
├── commons-csv/
├── commons-dbutils/
├── commons-lang/
├── commons-net/
├── commons-validator/
├── jackson-core/
├── jackson-databind/
├── jackson-dataformat-xml/
├── jaxen/
├── jettison/
├── jra/
├── jterminal/
├── triangle-example/
└── XChart/
|
./Utils/
├─Skew
├─CTM_operator.xlsx
├─grep_ctm.txt
└─grep_real.txt
Each project directory contains subdirectories for different experimental configurations (Original, Schemata, Test Selection, etc.).
Extract and process the experimental results:
# Extract zip files and run post-processing on test selection based runs
python extract_zips.py "./data" --subsumption --test-summary
# Extract coverage information for whitelisted files
python process_xmls.py "./data" -o coverage_whitelisted_files.txt
# Extract build times
python extract_times.py -d "./data" -o "extracted_times_summary.txt"Analyze test execution counts for Original and Schemata configurations:
# Analyze Original configuration
python parse_test_results.py --directory "./data/commons-lang/Original"
# Analyze Schemata configuration
python parse_test_results.py --directory "./data/commons-lang/Schemata"Run statistical tests to compare different approaches:
# Perform Wilcoxon signed-rank tests
python wilcoxon_test_analysis.py
# Calculate effect sizes (Vargha-Delaney A)
python VD_A.pyCreate the paper figures:
# Run the Jupyter notebook to generate plots
jupyter notebook plot_generation.ipynbThe notebook will export figures to PGF/TikZ format in the pgfs/ directory.
If you want to run new experiments (rather than analyzing existing data), use hpc_caller.py which is designed for HPC clusters. Note that this script contains the same example paths as in this README as default and should be modified for your HPC setup.
Below is a quick overview of each script and how to run it.
- What it does: Finds
.zipfiles (default: file names matching "test"), extracts each to a directory, optionally runs:- build time analysis on extracted logs
- uncovered mutants analysis
- subsumption analysis (LittleDarwin results → SQLite)
- test summary parsers to aggregate test outcomes
- Usage examples:
# Basic: extract all matching zips under a directory, in parallel
python extract_zips.py "D:/final results"
# Sequential and keep extracted dirs
python extract_zips.py "D:/final results" --sequential --keep
# With subsumption and test summaries
python extract_zips.py "D:/final results" --subsumption --test-summary
# Custom filters and workers
python extract_zips.py "D:/final results" --regex "test|results" --max-workers 8
# Custom tool paths
python extract_zips.py "D:/final results" \
--build-analyzer build_time_analyzer.py \
--uncovered-script uncovered_mutants.py \
--test-summary --test-summary-parser test_summary_parser.py --test-summary-parser-csv test_summary_parser_csv.py- Key flags:
--regexdefault:test--build-analyzerdefault:build_time_analyzer.py--uncovered-scriptdefault:uncovered_mutants.py--subsumptionenable subsumption analysis--test-summaryenable test summary parsing
- What it does: Recursively scans a directory for
.txtfiles, extracts "Total time" entries from Maven logs, normalizes to seconds, prints per-file and overall stats. This script is called from withinextract_zips.py. - Usage:
python build_time_analyzer.py ./logs- What it does: Traverses an extracted project to count uncovered mutants (via HTMLs) and, if available, runs
clover_reader.jaron Clover DBs to export coverage metrics. This script is called from withinextract_zips.py. - Note: Requires
clover_reader.jarpresent in the repository root (as referenced in the script) and Java on PATH.
- What they do: Parse test result directories and output summaries (text or CSV). Typically invoked by
extract_zips.py --test-summary. - Typical usage when called directly:
python test_summary_parser.py <EXTRACTED_DIR> -o summary_tests.txt
python test_summary_parser_csv.py <EXTRACTED_DIR> -o summary_tests.csv- What it does: Recursively finds files ending with
_tests.txtunder a directory (e.g.,OriginalorSchemata) and sums thetests_runcounts. Applies per-project limits derived from our study configuration. This script is only fororiginalandschemataconfigurations as we already have the data for test selection based methods in the database. - Notes:
- Supports filenames containing
-originalor-schematato infer the project name; falls back to the parent directory name. - Applies per-project summary limits (e.g.,
commons-lang: 313,jackson-databind: 656,xchart: 8,triangle-example: 2, etc.).
- Supports filenames containing
- Usage:
python parse_test_results.py --directory ./results/commons-lang/Original- What it does: Clones configured Java projects, sets appropriate Java versions, runs build/test pipelines, enforces compatible JUnit/Surefire/Compiler plugin versions, and collects Clover artifacts.
- Warning: move the contents of
Utils/whitelistsandUtils/excluded teststo the root folder of your dataset directory. - Run with parameters for your environment (paths to MediumDarwin and dataset directory, Java homes). See function
main()signature for arguments. - Note: This script requires explicit parameters - all paths must be provided when calling the
main()function.
- What it does: Manages HPC job submissions for running experiments across multiple Java projects. Contains project configurations with hardcoded paths to test exclusion files.
- Note: This script contains default paths.
- What it does: Parses CTM text outputs into structured mappings.
grep_ctm.txtis the result of runninggrep -r -E '.*Running compile time mutants.*' --include="*.out" > grep_ctm.txtin the root directory of the extracted/generated data. It extracts databases from each run's zip file and places them in./dbssubdirectory so that it can query the CTM's information. Uses CLI arguments for output file, data directory, and input file. - Usage:
python ctm_parser.py [-o output.txt] [-d data_directory] [-g grep_ctm.txt]- What it does: Compares baseline and test selection Maven executions to diagnose performance bottlenecks and regressions. Runs baseline and test selection configurations, collects logs and Surefire reports, parses per-test execution times, scans for JVM forking patterns and JaCoCo agent usage, analyzes slow-test skew, and generates diagnostic reports with recommendations. Can optionally run a batched test selection experiment to confirm cold JVM fragmentation issues.
- Features:
- Parses Surefire XML and text reports for detailed test execution data
- Detects cold JVM fragmentation, coverage agent leakage, compile/invocation overhead, slow-test skew, and oversubscription issues
- Generates cumulative coverage plots showing test execution time distribution
- Queries SQLite databases to analyze mutant coverage per test
- Produces Markdown reports and JSON data dumps
- Usage:
python run_compare_and_diagnose_ts.py \
--project-dir /path/to/maven-project \
--baseline-args "-DforkCount=1 -Dsurefire.reuseForks=true test" \
--ts-args "-DforkCount=1 -Dsurefire.reuseForks=false test" \
--target-class com.example.MyTest \
--target-method testMethod \
--db-path /path/to/mutant-database.db \
--pre-clean \
--batched-check \
--out-dir ./diagnosis_outputNote: Replace the example paths above with your local paths. If you do not have a database for mutant-test relationships, omit --db-path.
- What it does: Automates the execution of
run_compare_and_diagnose_ts.pyacross multiple Java projects with different configurations. Manages JDK environment setup for each project and sequentially runs test selection analysis, generating separate reports for each project in dedicated output directories. - Features:
- Configurable project entries with JDK paths, Maven arguments, and database locations
- Automatic JDK environment setup per project
- Individual output directories for each project to avoid conflicts
- Progress reporting and error handling for batch operations
- Usage:
# Edit the entries list in the script with your project configurations, then run:
python run_skew_script_for_all.py- Configuration: Each entry in the script should specify:
jdk_home: Path to JDK installationproject_dir: Maven project root directorybaseline_argsandts_args: Maven arguments for baseline and test selection runstarget_classandtarget_method: Analysis targetsdb_path: SQLite database path for mutant-test relationships
- What it does: Searches XML coverage reports and extracts coverage metrics for a predefined set of whitelisted Java files from each project. These files represent key components that were specifically targeted for mutation analysis coverage assessment when the project was too large and the wall time limitation didn't allow us to run the analysis on all files.
- Notes:
- Processes both overall project coverage and file-specific coverage metrics
- Whitelisted files (e.g.,
Conversion.java,SystemProperties.java,ObjectMapper.java, etc.) - Outputs coverage percentages, lines of code, and element coverage counts
- Usage:
python process_xmls.py <directory> [-o output.txt]- What it does: Recursively searches for
summary_*.txtfiles and extracts build, clean, and test times into a tab-separated summary file. - Usage:
python extract_times.py -d "./data" -o "extracted_times_summary.txt"- What it does: Reads a
new_data_*.csvwith runs for multiple methods, computes per-run deltas vsoriginal, runs Wilcoxon signed-rank tests per project/method, prints summaries, and writeswilcoxon_test_results.csv. - Usage:
python wilcoxon_test_analysis.py- What it does: Provides
VD_Afor pairwise effect size andVD_A_DFfor DataFrame-based multi-group comparisons.
- What it does: Loads processed CSVs, produces plots, and exports figures to PGF/TikZ via
tikzplotlibintopgfs/.
plot_generation.ipynbproduces the paper figures. Ensure dependencies are installed and the expected CSVs exist (e.g.,new_data.csvand dated variants).
plot_generation.ipynb uses data about the time the different projects take to run from a CSV file. And example file is provided as new_data_20250811.csv). Other scripts rely on the data available in the Zenodo repository and was not included in this repository because of its size.
This project is licensed under the MIT License. See LICENSE for details.
If you use this repository in academic work, please cite the associated paper. A BibTeX entry can be added here once available.