This repository contains the replication package for the paper When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation. It provides tools for generating Chrome extensions using various Large Language Models (LLMs) and analyzing their security properties.
In recent years, the AI wave has grown rapidly in software development. Even novice developers can now design and generate complex framework-constrained software systems based on their high-level requirements with the help of Large Language Models (LLMs). However, when LLMs gradually "take the wheel" of software development, developers may only check whether the program works. They often miss security problems hidden in how the generated programs are implemented.
In this work, we investigate the security properties of framework-constrained programs generated by state-of-the-art LLMs. We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components. To achieve this, we built ChromeSecBench, a dataset with 140 prompts based on known vulnerable extensions. We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities across three dimensions: scenario types, model differences, and vulnerability categories. Our results show that LLMs produced vulnerable programs at alarmingly high rates (18%-50%), particularly in Authentication & Identity and Cookie Management scenarios (up to 83% and 78% respectively). Most vulnerabilities exposed sensitive browser data like cookies, history, or bookmarks to untrusted code. Interestingly, we found that advanced reasoning models performed worse, generating more vulnerabilities than simpler models. These findings highlight a critical gap between LLMs' coding skills and their ability to write secure framework-constrained programs.
- Python 3.7+
- Required Python packages (install via
pip install -r requirements.txt):- openai
- vertexai
- concurrent.futures
- pathlib
Before running the tool, you need to set up API keys for the AI models you want to use:
-
OpenAI API Key:
- Edit
code/generation/chats/chat_openai.py - Replace
"Input your api key here"with your OpenAI API key
- Edit
-
Claude API Key:
- Edit
code/generation/chats/chat_claude.py - Replace
"Input your api key here"with your Claude API key
- Edit
-
DeepSeek API Key:
- Edit
code/generation/chats/chat_deepseek.py - Replace
"Input your api key here"with your DeepSeek API key
- Edit
-
Vertex AI Configuration:
- Edit
code/generation/chats/chat_vertex.py - Replace
VERTEX_PROJECT_IDandVERTEX_LOCATIONwith your Google Cloud project ID and location
- Edit
The main script for generating extensions is generation_exts.py. Here's how to use it:
python code/generation/generation_exts.py <base_path> <output_path> [options]base_path: Directory containing the extension datasets (e.g.,/path/to/datasets)output_path: Directory where generated extensions will be saved
--chat-type: AI model provider to use (choices: vertex, openai, deepseek, azure, claude; default: vertex)--model-name: Specific model name to use (see model enums in respective chat files)--workers: Number of worker processes for parallel generation (default: 2)--temperature: Temperature for response generation (0-1, default: 1.0)--attempts: Number of generation attempts per extension (default: 12)--ext-id: Process only a specific extension ID (e.g., AUTH01)
Generate extensions using Vertex AI (default):
python code/generation/generation_exts.py datasets/ output/ --temperature 0.7Generate extensions using Claude:
python code/generation/generation_exts.py datasets/ output/ --chat-type claude --model-name CLAUDE_35_SONNETGenerate a single extension with OpenAI:
python code/generation/generation_exts.py datasets/ output/ --chat-type openai --ext-id AUTH01After generating extensions, you can analyze them for security vulnerabilities using CoCo, a tool that utilizes Coverage-guided, Concurrent Abstract Interpretation as described in our paper.
-
Navigate to the evaluation directory and clone the CoCo repository:
cd code/evaluation git clone https://github.com/Suuuuuzy/CoCo.git -
Set up the CoCo environment:
cd CoCo chmod +x install.sh ./install.sh
After setting up CoCo, you can use the runing_analysis.py script to analyze the generated extensions:
cd ../ # Return to the code/evaluation directory
python runing_analysis.py <base_path> --processes <num_processes>base_path: Directory containing the generated extensions (e.g.,../../output/)--processes: Number of parallel processes to use (default: 8)
python runing_analysis.py ../../output/AUTH01/GEMINI_15_PRO_002/ --processes 4This will:
- Find all valid extension implementations in the specified directory
- Run CoCo analysis on each implementation in parallel
- Display progress information and time estimates
- Provide a summary of successful and failed analyses
The analysis results will be stored in each extension's directory under opgen_generated_files/used_time.txt.
The results directory contains our experimental findings, including:
paper_results.json: A comprehensive collection of vulnerability data from our experiments, organized by model and extension scenario. This file includes information about which attempts produced vulnerable extensions and the specific vulnerable data flows identified in each case.
These results demonstrate that all evaluated LLMs produce vulnerable Chrome extensions at concerning rates (29-50% of scenarios), with privileged storage access being the most prevalent vulnerability type across all models.
The datasets directory contains multiple Chrome extension specifications, each in its own folder:
AUTH01-AUTH06: Authentication & Identity - Managing user credentials, authentication tokens, and identity verificationBHT01-BHT07: Bookmark/History/Tab Management - Improving browser functionality by managing tabs, bookmarks, and historyCM01-CM16: Content Manipulation - Modifying webpage content, injecting functionality, or enhancing web interfacesCOOK01-COOK09: Cookie Management - Handling browser cookies, reading, writing, or manipulating cookie dataDH01-DH22: Data Handling - Processing, transforming, or synchronizing data across platforms or servicesDM01-DM14: Download Management - Handling file downloads within the browserDT01-DT08: Developer Tools - Providing additional functionality for web developersEC01-EC11: External Communication - Integrating extensions with external websites, services, or APIsESI01-ESI08: External System Integration - Connecting the browser with external hardware, software, or specialized servicesWA01-WA12: Workflow Automation - Automating repetitive tasks, form filling, or business processes
Each extension folder contains:
description.txt: A brief description of the extension's functionalitymanifest.json: The Chrome extension manifest file
The tool uses these files to generate complete implementations of the extensions.
Generated extensions are saved in the specified output directory with the following structure:
output/
├── AUTH01/
│ └── MODEL_NAME/
│ ├── prompt.txt
│ ├── 1/
│ │ ├── response.txt
│ │ ├── background.js
│ │ ├── content_scripts.js
│ │ └── manifest.json
│ ├── 2/
│ │ └── ...
│ └── ...
├── AUTH02/
│ └── ...
└── ...
Each numbered folder (1, 2, etc.) represents a different generation attempt with the same parameters.
- Higher temperature values (closer to 1.0) produce more diverse implementations
- Lower temperature values (closer to 0.0) produce more deterministic implementations
- The tool automatically skips already generated extensions to allow for resuming interrupted runs
- For best results, use models with strong code generation capabilities