ChromeSecBench

This repository contains the replication package for the paper When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation. It provides tools for generating Chrome extensions using various Large Language Models (LLMs) and analyzing their security properties.

Paper Abstract

In recent years, the AI wave has grown rapidly in software development. Even novice developers can now design and generate complex framework-constrained software systems based on their high-level requirements with the help of Large Language Models (LLMs). However, when LLMs gradually "take the wheel" of software development, developers may only check whether the program works. They often miss security problems hidden in how the generated programs are implemented.

In this work, we investigate the security properties of framework-constrained programs generated by state-of-the-art LLMs. We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components. To achieve this, we built ChromeSecBench, a dataset with 140 prompts based on known vulnerable extensions. We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities across three dimensions: scenario types, model differences, and vulnerability categories. Our results show that LLMs produced vulnerable programs at alarmingly high rates (18%-50%), particularly in Authentication & Identity and Cookie Management scenarios (up to 83% and 78% respectively). Most vulnerabilities exposed sensitive browser data like cookies, history, or bookmarks to untrusted code. Interestingly, we found that advanced reasoning models performed worse, generating more vulnerabilities than simpler models. These findings highlight a critical gap between LLMs' coding skills and their ability to write secure framework-constrained programs.

Setup

Prerequisites

Python 3.7+
Required Python packages (install via pip install -r requirements.txt):
- openai
- vertexai
- concurrent.futures
- pathlib

API Key Configuration

Before running the tool, you need to set up API keys for the AI models you want to use:

OpenAI API Key:
- Edit code/generation/chats/chat_openai.py
- Replace "Input your api key here" with your OpenAI API key
Claude API Key:
- Edit code/generation/chats/chat_claude.py
- Replace "Input your api key here" with your Claude API key
DeepSeek API Key:
- Edit code/generation/chats/chat_deepseek.py
- Replace "Input your api key here" with your DeepSeek API key
Vertex AI Configuration:
- Edit code/generation/chats/chat_vertex.py
- Replace VERTEX_PROJECT_ID and VERTEX_LOCATION with your Google Cloud project ID and location

Usage

The main script for generating extensions is generation_exts.py. Here's how to use it:

python code/generation/generation_exts.py <base_path> <output_path> [options]

Parameters

base_path: Directory containing the extension datasets (e.g., /path/to/datasets)
output_path: Directory where generated extensions will be saved

Options

--chat-type: AI model provider to use (choices: vertex, openai, deepseek, azure, claude; default: vertex)
--model-name: Specific model name to use (see model enums in respective chat files)
--workers: Number of worker processes for parallel generation (default: 2)
--temperature: Temperature for response generation (0-1, default: 1.0)
--attempts: Number of generation attempts per extension (default: 12)
--ext-id: Process only a specific extension ID (e.g., AUTH01)

Examples

Generate extensions using Vertex AI (default):

python code/generation/generation_exts.py datasets/ output/ --temperature 0.7

Generate extensions using Claude:

python code/generation/generation_exts.py datasets/ output/ --chat-type claude --model-name CLAUDE_35_SONNET

Generate a single extension with OpenAI:

python code/generation/generation_exts.py datasets/ output/ --chat-type openai --ext-id AUTH01

Security Analysis with CoCo

After generating extensions, you can analyze them for security vulnerabilities using CoCo, a tool that utilizes Coverage-guided, Concurrent Abstract Interpretation as described in our paper.

Setting Up CoCo

Navigate to the evaluation directory and clone the CoCo repository:
```
cd code/evaluation
git clone https://github.com/Suuuuuzy/CoCo.git
```

Set up the CoCo environment:

cd CoCo
chmod +x install.sh
./install.sh

Running Security Analysis

After setting up CoCo, you can use the runing_analysis.py script to analyze the generated extensions:

cd ../  # Return to the code/evaluation directory
python runing_analysis.py <base_path> --processes <num_processes>

Parameters

base_path: Directory containing the generated extensions (e.g., ../../output/)
--processes: Number of parallel processes to use (default: 8)

Example

python runing_analysis.py ../../output/AUTH01/GEMINI_15_PRO_002/ --processes 4

This will:

Find all valid extension implementations in the specified directory
Run CoCo analysis on each implementation in parallel
Display progress information and time estimates
Provide a summary of successful and failed analyses

The analysis results will be stored in each extension's directory under opgen_generated_files/used_time.txt.

Results

The results directory contains our experimental findings, including:

paper_results.json: A comprehensive collection of vulnerability data from our experiments, organized by model and extension scenario. This file includes information about which attempts produced vulnerable extensions and the specific vulnerable data flows identified in each case.

These results demonstrate that all evaluated LLMs produce vulnerable Chrome extensions at concerning rates (29-50% of scenarios), with privileged storage access being the most prevalent vulnerability type across all models.

Dataset Structure

The datasets directory contains multiple Chrome extension specifications, each in its own folder:

AUTH01-AUTH06: Authentication & Identity - Managing user credentials, authentication tokens, and identity verification
BHT01-BHT07: Bookmark/History/Tab Management - Improving browser functionality by managing tabs, bookmarks, and history
CM01-CM16: Content Manipulation - Modifying webpage content, injecting functionality, or enhancing web interfaces
COOK01-COOK09: Cookie Management - Handling browser cookies, reading, writing, or manipulating cookie data
DH01-DH22: Data Handling - Processing, transforming, or synchronizing data across platforms or services
DM01-DM14: Download Management - Handling file downloads within the browser
DT01-DT08: Developer Tools - Providing additional functionality for web developers
EC01-EC11: External Communication - Integrating extensions with external websites, services, or APIs
ESI01-ESI08: External System Integration - Connecting the browser with external hardware, software, or specialized services
WA01-WA12: Workflow Automation - Automating repetitive tasks, form filling, or business processes

Each extension folder contains:

description.txt: A brief description of the extension's functionality
manifest.json: The Chrome extension manifest file

The tool uses these files to generate complete implementations of the extensions.

Output Structure

Generated extensions are saved in the specified output directory with the following structure:

output/
├── AUTH01/
│   └── MODEL_NAME/
│       ├── prompt.txt
│       ├── 1/
│       │   ├── response.txt
│       │   ├── background.js
│       │   ├── content_scripts.js
│       │   └── manifest.json
│       ├── 2/
│       │   └── ...
│       └── ...
├── AUTH02/
│   └── ...
└── ...

Each numbered folder (1, 2, etc.) represents a different generation attempt with the same parameters.

Notes

Higher temperature values (closer to 1.0) produce more diverse implementations
Lower temperature values (closer to 0.0) produce more deterministic implementations
The tool automatically skips already generated extensions to allow for resuming interrupted runs
For best results, use models with strong code generation capabilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChromeSecBench

Paper Abstract

Setup

Prerequisites

API Key Configuration

Usage

Parameters

Options

Examples

Security Analysis with CoCo

Setting Up CoCo

Running Security Analysis

Parameters

Example

Results

Dataset Structure

Output Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
datasets		datasets
results		results
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ChromeSecBench

Paper Abstract

Setup

Prerequisites

API Key Configuration

Usage

Parameters

Options

Examples

Security Analysis with CoCo

Setting Up CoCo

Running Security Analysis

Parameters

Example

Results

Dataset Structure

Output Structure

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages