This is the artifact for the ICSE 2026 paper: "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis" by Burak Yetiştiren, Hong Jin Kang, and Miryung Kim.
| Badge | Justification |
|---|---|
| Available | Archived on Zenodo with DOI 10.5281/zenodo.18250071 |
| Reusable | Docker environment, documented extension guide, reusable Soufflé query templates |
- Artifact DOI: 10.5281/zenodo.18250071
- GitHub Repository: https://github.com/UCLA-SEAL/WhyFlow
- Paper: ICSE 2026 (accepted), preprint at arXiv:2508.07198
- License: MIT (see LICENSE)
The statistical_tests/ and data/ directories contain anonymized user study data from 12 participants. The study was conducted under UCLA IRB approval. No personally identifiable information is included. Storage requirement: < 10 MB.
WhyFlow is an interrogative debugging tool for taint analysis that enables developers to ask why, why-not, and what-if questions about dataflows.
WhyFlow addresses the challenge of making sense of taint analysis results by providing:
- Interrogative Debugging: Ask questions about the existence or absence of specific dataflows
- Speculative Analysis: Explore the impact of different third-party library models and configurations
- Visual Sensemaking: Graph-based visualization with color-coded annotations for global connectivity reasoning
- Interactive Q&A Interface: Template-based queries with contextualized selections for sources, sinks, and APIs
- RAM: 4 GB minimum (8 GB recommended)
- Disk: 5 GB free space (for Docker image)
- CPU: Any modern x86_64 or ARM64 processor
The easiest way to run WhyFlow is with Docker.
Requirements: Docker 20.10+ (or Docker Desktop 4.x+)
# Clone the repository
git clone https://github.com/UCLA-SEAL/WhyFlow.git
cd WhyFlow
# Build the Docker image
docker build -t whyflow .
# Run WhyFlow
docker run -p 3000:3000 whyflowOpen your browser to http://localhost:3000
Note: The first build compiles Soufflé from source for cross-platform compatibility.
To confirm WhyFlow is running correctly:
- Open
http://localhost:3000in your browser - You should see the WhyFlow interface with the Query Options panel
- The D-SRC dropdown should populate with source nodes (e.g., "(2) msg : HttpRequest...")
- Check Docker logs for:
=> App running at: http://localhost:3000/
# View container logs
docker logs $(docker ps -q --filter ancestor=whyflow)For development or if you prefer not to use Docker:
-
Install Meteor:
curl https://install.meteor.com/ | sh -
Install Soufflé:
# macOS brew install souffle-lang/souffle/souffle # Ubuntu/Debian - see https://souffle-lang.github.io/build
-
Install dependencies:
cd taint_debug_app/taint_debug meteor npm install -
Run WhyFlow:
cd taint_debug_app/taint_debug meteor run
Open your browser to http://localhost:3000
WhyFlow supports six interrogative query types:
| Query | Question |
|---|---|
| WhyFlow | Why is there a taint flow from source X to sink Y? |
| WhyNotFlow | Why is there no taint flow from source X to sink Y? |
| AffectedSinks | If we alter a third-party library's model, which sinks are affected? |
| DivergentSinks | Which third-party library model could influence multiple flows from the same source? |
| DivergentSources | Which third-party library model could influence multiple flows to the same sink? |
| GlobalImpact | Which third-party library model has the largest global influence? |
Sample Queries: See replication/Experiment-Reproduction.md for concrete example queries with specific source/sink IDs that you can execute.
- Green nodes: Sources
- Red nodes: Sinks
- Orange nodes: Third-party API calls
- Blue nodes: Other intermediate nodes
- Solid edges: Active taint flows
- Dashed edges: Plausible flows (currently blocked)
Click on any node to view the corresponding source code location.
WhyFlow/
├── paper.pdf # Accepted ICSE 2026 paper
├── LICENSE # MIT License
├── README.md # This file
├── Dockerfile # Docker container definition
├── taint_debug_app/ # Main WhyFlow application
│ ├── taint_debug/ # Meteor web application
│ ├── analysis_files/ # Analysis data and fact files
│ ├── app_souffle_queries/ # Soufflé Datalog query files
│ └── souffle_output/ # Generated query outputs
├── Subject_Prog_CodeQL_Taint/ # Subject program (Apache Dubbo) and CodeQL results
├── statistical_tests/ # User study statistical analysis
├── data/ # User study materials and plots
│ ├── whyflow.csv # WhyFlow accuracy results
│ ├── codeql.csv # CodeQL accuracy results
│ └── plots.ipynb # Jupyter notebook for figures
└── replication/ # Artifact evaluation resources
- Run CodeQL taint analysis on your target program
- Export results in JSON/CSV format
- Place results in
Subject_Prog_CodeQL_Taint/ - Update paths in the application configuration
See replication/Extending-WhyFlow.md for detailed instructions.
Place Soufflé Datalog query files in taint_debug_app/app_souffle_queries/
If you use WhyFlow in your research, please cite our paper:
@inproceedings{yetistiren2026whyflow,
title={WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis},
author={Yetiştiren, Burak and Kang, Hong Jin and Kim, Miryung},
booktitle={Proceedings of the 48th International Conference on Software Engineering},
year={2026},
organization={ACM}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Open an issue on GitHub
- Email: [email protected]
This work is supported by the National Science Foundation under grant numbers 2426162, 2106838, and 2106404, with additional support from Amazon and Samsung.