Skip to content

awsm-research/ChromeSecBench

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChromeSecBench

This repository contains the replication package for the paper When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation. It provides tools for generating Chrome extensions using various Large Language Models (LLMs) and analyzing their security properties.

Paper Abstract

In recent years, the AI wave has grown rapidly in software development. Even novice developers can now design and generate complex framework-constrained software systems based on their high-level requirements with the help of Large Language Models (LLMs). However, when LLMs gradually "take the wheel" of software development, developers may only check whether the program works. They often miss security problems hidden in how the generated programs are implemented.

In this work, we investigate the security properties of framework-constrained programs generated by state-of-the-art LLMs. We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components. To achieve this, we built ChromeSecBench, a dataset with 140 prompts based on known vulnerable extensions. We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities across three dimensions: scenario types, model differences, and vulnerability categories. Our results show that LLMs produced vulnerable programs at alarmingly high rates (18%-50%), particularly in Authentication & Identity and Cookie Management scenarios (up to 83% and 78% respectively). Most vulnerabilities exposed sensitive browser data like cookies, history, or bookmarks to untrusted code. Interestingly, we found that advanced reasoning models performed worse, generating more vulnerabilities than simpler models. These findings highlight a critical gap between LLMs' coding skills and their ability to write secure framework-constrained programs.

Setup

Prerequisites

  • Python 3.7+
  • Required Python packages (install via pip install -r requirements.txt):
    • openai
    • vertexai
    • concurrent.futures
    • pathlib

API Key Configuration

Before running the tool, you need to set up API keys for the AI models you want to use:

  1. OpenAI API Key:

    • Edit code/generation/chats/chat_openai.py
    • Replace "Input your api key here" with your OpenAI API key
  2. Claude API Key:

    • Edit code/generation/chats/chat_claude.py
    • Replace "Input your api key here" with your Claude API key
  3. DeepSeek API Key:

    • Edit code/generation/chats/chat_deepseek.py
    • Replace "Input your api key here" with your DeepSeek API key
  4. Vertex AI Configuration:

    • Edit code/generation/chats/chat_vertex.py
    • Replace VERTEX_PROJECT_ID and VERTEX_LOCATION with your Google Cloud project ID and location

Usage

The main script for generating extensions is generation_exts.py. Here's how to use it:

python code/generation/generation_exts.py <base_path> <output_path> [options]

Parameters

  • base_path: Directory containing the extension datasets (e.g., /path/to/datasets)
  • output_path: Directory where generated extensions will be saved

Options

  • --chat-type: AI model provider to use (choices: vertex, openai, deepseek, azure, claude; default: vertex)
  • --model-name: Specific model name to use (see model enums in respective chat files)
  • --workers: Number of worker processes for parallel generation (default: 2)
  • --temperature: Temperature for response generation (0-1, default: 1.0)
  • --attempts: Number of generation attempts per extension (default: 12)
  • --ext-id: Process only a specific extension ID (e.g., AUTH01)

Examples

Generate extensions using Vertex AI (default):

python code/generation/generation_exts.py datasets/ output/ --temperature 0.7

Generate extensions using Claude:

python code/generation/generation_exts.py datasets/ output/ --chat-type claude --model-name CLAUDE_35_SONNET

Generate a single extension with OpenAI:

python code/generation/generation_exts.py datasets/ output/ --chat-type openai --ext-id AUTH01

Security Analysis with CoCo

After generating extensions, you can analyze them for security vulnerabilities using CoCo, a tool that utilizes Coverage-guided, Concurrent Abstract Interpretation as described in our paper.

Setting Up CoCo

  1. Navigate to the evaluation directory and clone the CoCo repository:

    cd code/evaluation
    git clone https://github.com/Suuuuuzy/CoCo.git
  2. Set up the CoCo environment:

    cd CoCo
    chmod +x install.sh
    ./install.sh

Running Security Analysis

After setting up CoCo, you can use the runing_analysis.py script to analyze the generated extensions:

cd ../  # Return to the code/evaluation directory
python runing_analysis.py <base_path> --processes <num_processes>

Parameters

  • base_path: Directory containing the generated extensions (e.g., ../../output/)
  • --processes: Number of parallel processes to use (default: 8)

Example

python runing_analysis.py ../../output/AUTH01/GEMINI_15_PRO_002/ --processes 4

This will:

  1. Find all valid extension implementations in the specified directory
  2. Run CoCo analysis on each implementation in parallel
  3. Display progress information and time estimates
  4. Provide a summary of successful and failed analyses

The analysis results will be stored in each extension's directory under opgen_generated_files/used_time.txt.

Results

The results directory contains our experimental findings, including:

  • paper_results.json: A comprehensive collection of vulnerability data from our experiments, organized by model and extension scenario. This file includes information about which attempts produced vulnerable extensions and the specific vulnerable data flows identified in each case.

These results demonstrate that all evaluated LLMs produce vulnerable Chrome extensions at concerning rates (29-50% of scenarios), with privileged storage access being the most prevalent vulnerability type across all models.

Dataset Structure

The datasets directory contains multiple Chrome extension specifications, each in its own folder:

  • AUTH01-AUTH06: Authentication & Identity - Managing user credentials, authentication tokens, and identity verification
  • BHT01-BHT07: Bookmark/History/Tab Management - Improving browser functionality by managing tabs, bookmarks, and history
  • CM01-CM16: Content Manipulation - Modifying webpage content, injecting functionality, or enhancing web interfaces
  • COOK01-COOK09: Cookie Management - Handling browser cookies, reading, writing, or manipulating cookie data
  • DH01-DH22: Data Handling - Processing, transforming, or synchronizing data across platforms or services
  • DM01-DM14: Download Management - Handling file downloads within the browser
  • DT01-DT08: Developer Tools - Providing additional functionality for web developers
  • EC01-EC11: External Communication - Integrating extensions with external websites, services, or APIs
  • ESI01-ESI08: External System Integration - Connecting the browser with external hardware, software, or specialized services
  • WA01-WA12: Workflow Automation - Automating repetitive tasks, form filling, or business processes

Each extension folder contains:

  • description.txt: A brief description of the extension's functionality
  • manifest.json: The Chrome extension manifest file

The tool uses these files to generate complete implementations of the extensions.

Output Structure

Generated extensions are saved in the specified output directory with the following structure:

output/
├── AUTH01/
│   └── MODEL_NAME/
│       ├── prompt.txt
│       ├── 1/
│       │   ├── response.txt
│       │   ├── background.js
│       │   ├── content_scripts.js
│       │   └── manifest.json
│       ├── 2/
│       │   └── ...
│       └── ...
├── AUTH02/
│   └── ...
└── ...

Each numbered folder (1, 2, etc.) represents a different generation attempt with the same parameters.

Notes

  • Higher temperature values (closer to 1.0) produce more diverse implementations
  • Lower temperature values (closer to 0.0) produce more deterministic implementations
  • The tool automatically skips already generated extensions to allow for resuming interrupted runs
  • For best results, use models with strong code generation capabilities

About

Tools and dataset for the paper "When AI Takes the Wheel", analyzing security vulnerabilities in LLM-generated Chrome extensions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 90.7%
  • Python 8.7%
  • Other 0.6%