This guide explains how to use WhyFlow with your own codebase beyond the pre-analyzed Apache Dubbo dataset.
- CodeQL CLI installed (installation guide)
- Soufflé Datalog engine installed (installation guide)
- WhyFlow running (see Experiment-Reproduction.md)
# For Java projects
codeql database create my-project-db --language=java --source-root=/path/to/project
# For other languages, replace --language=java with appropriate languageUse the provided CodeQL query or create your own:
cd Subject_Prog_CodeQL_Taint/codeql-custom-queries-java
# Run the untrusted data to external API query
codeql query run UntrustedDataToExternalAPI.ql \
--database=/path/to/my-project-db \
--output=results.bqrs
# Convert to JSON
codeql bqrs decode results.bqrs --format=json --output=results.jsonUse the provided Python script to convert CodeQL results:
cd Subject_Prog_CodeQL_Taint
python3 utils.py --input results.json --output ../taint_debug_app/analysis_files/This generates:
nodes.facts- All dataflow nodesedges.facts- Dataflow edgessources.facts- Source nodessinks.facts- Sink nodeslibrary_flow.facts- Third-party API flows
cd taint_debug_app
# Run all template queries
for query in app_souffle_queries/*.dl; do
souffle -F analysis_files -D souffle_output $query
done- Update the data path in
taint_debug_app/taint_debug/server/main.js:
const DATA_PATH = '/path/to/your/analysis_files';- Restart WhyFlow:
cd taint_debug_app/taint_debug
meteor reset # Clear old data
meteor runWhyFlow's queries are Datalog rules that can be extended. See taint_debug_app/app_souffle_queries/ for examples.
Create branch_points.dl:
.decl edge(id: number, src: number, dst: number)
.input edge
.decl branch(n: number)
.output branch
branch(n) :-
edge(_, n, _),
c = count : edge(_, n, _),
c > 1.This query identifies nodes where taint flow diverges to multiple targets.
WhyFlow works with any CodeQL-supported language:
- Java
- JavaScript/TypeScript
- Python
- C/C++
- C#
- Go
- Ruby
Ensure you use appropriate taint analysis queries for each language.