Skip to content

Latest commit

 

History

History
113 lines (79 loc) · 2.72 KB

File metadata and controls

113 lines (79 loc) · 2.72 KB

Extending WhyFlow to New Programs

This guide explains how to use WhyFlow with your own codebase beyond the pre-analyzed Apache Dubbo dataset.

Prerequisites

Step 1: Create CodeQL Database

# For Java projects
codeql database create my-project-db --language=java --source-root=/path/to/project

# For other languages, replace --language=java with appropriate language

Step 2: Run Taint Analysis Query

Use the provided CodeQL query or create your own:

cd Subject_Prog_CodeQL_Taint/codeql-custom-queries-java

# Run the untrusted data to external API query
codeql query run UntrustedDataToExternalAPI.ql \
    --database=/path/to/my-project-db \
    --output=results.bqrs

# Convert to JSON
codeql bqrs decode results.bqrs --format=json --output=results.json

Step 3: Convert Results to Soufflé Facts

Use the provided Python script to convert CodeQL results:

cd Subject_Prog_CodeQL_Taint
python3 utils.py --input results.json --output ../taint_debug_app/analysis_files/

This generates:

  • nodes.facts - All dataflow nodes
  • edges.facts - Dataflow edges
  • sources.facts - Source nodes
  • sinks.facts - Sink nodes
  • library_flow.facts - Third-party API flows

Step 4: Run Soufflé Queries

cd taint_debug_app

# Run all template queries
for query in app_souffle_queries/*.dl; do
    souffle -F analysis_files -D souffle_output $query
done

Step 5: Load into WhyFlow

  1. Update the data path in taint_debug_app/taint_debug/server/main.js:
const DATA_PATH = '/path/to/your/analysis_files';
  1. Restart WhyFlow:
cd taint_debug_app/taint_debug
meteor reset  # Clear old data
meteor run

Adding Custom Template Queries

WhyFlow's queries are Datalog rules that can be extended. See taint_debug_app/app_souffle_queries/ for examples.

Example: Adding a "Branch Points" Query

Create branch_points.dl:

.decl edge(id: number, src: number, dst: number)
.input edge

.decl branch(n: number)
.output branch

branch(n) :-
    edge(_, n, _),
    c = count : edge(_, n, _),
    c > 1.

This query identifies nodes where taint flow diverges to multiple targets.

Supported Languages

WhyFlow works with any CodeQL-supported language:

  • Java
  • JavaScript/TypeScript
  • Python
  • C/C++
  • C#
  • Go
  • Ruby

Ensure you use appropriate taint analysis queries for each language.