🛡️ Awesome GUI Agent Safety

A curated collection of research papers on GUI Agent Safety, covering benchmarks, attacks, defenses, and evaluation frameworks.

📚 Papers by Environment • 🔑 Papers by Keywords • 👥 Papers by Authors • 📝 All Papers • ➕ Contributing

📖 Table of Contents

Overview
📊 Keyword Visualization
Papers Grouped by Environments
Papers Grouped by Keywords
Papers Grouped by Authors
All Papers (from most recent to oldest)
How to Add a Paper or Update the README
Star History
Acknowledgment

Overview

This repository covers a variety of papers related to GUI Agent Safety, including:

🔒 Security & Privacy: Attacks, defenses, and privacy-preserving techniques
📊 Benchmarks & Datasets: Evaluation frameworks and safety benchmarks
🤖 Agent Frameworks: Safe agent architectures and control mechanisms
🎯 Risk Assessment: Threat models and vulnerability analysis
🔍 Evaluation Methods: Safety metrics and testing approaches

Note: Papers are categorized by environment ([Web], [Mobile], [Desktop], [GUI]) and research focus. The [Misc] category includes general topics with important applications in GUI agents.

📊 Keyword Visualization

Papers Grouped by Environments

🌐 Web	📱 Mobile	🖥️ Desktop	Misc

Papers Grouped by Keywords

Papers Grouped by Authors

All Papers (from most recent to oldest)

📄 Click to expand/collapse paper list

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
- Qiushi Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong
- 🏛️ Institutions: The University of Hong Kong, Fudan University, Shanghai AI Laboratory, Nanyang Technological University ☼Nanjing University, Shanghai Jiao Tong University
- 📅 Date: Dec. 9, 2025
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/OS-Copilot/OS-Sentinel
- 💻 Env: [Mobile]
- 🔑 Key: [benchmark], [evaluation], [risk/attack], [framework]
- 📖 TLDR: VLM agents show human-like capability in mobile environments, but pose significant security risks, including system compromise and privacy leakage. Addressing the challenge of detecting these unsafe operations, this paper introduce a dynamic sandbox MobileRisk-Live and safety benchmark MobileRisk. Based on this, they propose OS-Sentinel, a novel hybrid framework combining a Formal Verifier for system-level violations and a VLM-based Contextual Judge.
GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?
- Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang
- 🏛️ Institutions: Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory, The University of Hong Kong
- 📅 Date: Nov. 21, 2025
- 📑 Publisher: arXiv
- 💻 Env: [Mobile]
- 🔑 Key: [benchmark], [risk/attack]
- 📖 TLDR: VLMs used as mobile GUI agents are vulnerable to environmental injection, where adversarial UI elements like deceptive overlays or spoofed notifications contaminate the agent's visual perception. This bypasses textual safeguards and can cause privacy leakage or device compromise. This work introduces GhostEI-Bench, a benchmark using Android emulators to assess agent performance under these dynamic attacks.
OS-HARM: A Benchmark for Measuring Safety of Computer Use Agents
- JThomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko
- 🏛️ Institutions: EPFL, Carnegie Mellon University
- 📅 Date: Oct. 29, 2025
- 📑 Publisher: NeurIPS 2025 Spotlight (Datasets and Benchmarks Track)
- 🐙 Github: https://github.com/tml-epfl/os-harm
- 💻 Env: [Desktop]
- 🔑 Key: [benchmark], [evaluation], [risk/attack]
- 📖 TLDR: Computer use agents, LLM-based systems interacting with GUIs via screenshots or accessibility trees, lack safety evaluation. This paper introduces OS-HARM, a benchmark built on OSWorld with 150 tasks across three harm categories: deliberate misuse, prompt injection, and model misbehavior (e.g., harassment, data exfiltration). An automated judge assesses both accuracy and safety. Frontier models (like o4-mini, Claude 3.7 Sonnet, Gemini 2.5 Pro) evaluated show high compliance with misuse queries, vulnerability to static prompt injections, and occasional unsafe actions.
Safeguarding Mobile GUI Agent via Logic-based Action Verification
- Jungjae Lee, Dongjae Lee, Chihun Choi, Youngmin Im, Jaeyoung Wi, Kihong Heo, Sangeun Oh, Sunjae Lee, Insik Shin
- 🏛️ Institutions: KAIST, Korea University, Sungkyunkwan University
- 📅 Date: Sept. 11, 2025
- 📑 Publisher: ACM MOBICOM '25
- 🐙 Github: https://github.com/VeriSafeAgent/VeriSafeAgent
- 💻 Env: [Mobile]
- 🔑 Key: [method], [evaluation]
- 📖 TLDR: we introduce VeriSafe Agent (VSA): a formal verification system that serves as a logically grounded safeguard for Mobile GUI Agents. VSA deterministically ensures that an agent's actions strictly align with user intent before executing the action. At its core, VSA introduces a novel autoformalization technique that translates natural language user instructions into a formally verifiable specification. This enables runtime, rule-based verification of agent's actions, detecting erroneous actions even before they take effect.
Progent: Programmable Privilege Control for LLM Agents
- Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, Dawn Song
- 🏛️ Institutions: UC Berkeley, UC Santa Barbara, National University of Singapore
- 📅 Date: Aug. 30, 2025
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [risk/attack]
- 📖 TLDR: LLM agents use LLMs and tools to perform user tasks but face security risks from external environments, like prompt injection and malicious tools, enabling dangerous actions such as financial fraud or data leakage. The core vulnerability is over-privileged tool access. This work introduces Progent, the first privilege control framework for securing LLM agents. Progent enforces tool-level security by restricting agents to necessary tool calls while blocking malicious ones, using a domain-specific language for fine-grained policy control. Progent operates deterministically at runtime, offering provable security without altering agent internals.
Magentic-UI: Towards Human-in-the-loop Agentic Systems
- Hussein Mozannar, Gagan Bansal, Cheng Tan, Adam Fourney, Victor Dibia, Jingya Chen, Jack Gerrits, Tyler Payne, Matheus Kunzler Maldaner, Madeleine Grunde-McLaughlin, Eric Zhu, Griffin Bassman, Jacob Alber, Peter Chang, Ricky Loynd, Friederike Niedtner, Ece Kamar, Maya Murad, Rafah Hosn, Saleema Amershi
- 🏛️ Institutions: MSR
- 📅 Date: Jul. 30, 2025
- 📑 Publisher: arXiv (also MSR Technical Report MSR-TR-2025-40)
- 🐙 Github: https://github.com/microsoft/magentic-ui
- 💻 Env: [Web]
- 🔑 Key: [evaluation], [multi-agent]
- 📖 TLDR: Magentic-UI (Multi-agentic User Interface) is an open-source web interface enabling safe, efficient human–agent collaboration through a flexible multi-agent architecture. It supports web browsing, code execution, and file manipulation, and provides six interaction mechanisms—co-planning, co-tasking, multi-tasking, action guards, answer verification, and long-term memory—to integrate human oversight with AI autonomy. Evaluated via agentic benchmarks, simulated user tests, qualitative user study, and safety assessments, it demonstrates how incorporating human-in-the-loop dynamics can significantly improve agentic systems' reliability and performance.
WebGuard: Building a Generalizable Guardrail for Web Agents
- Boyuan Zheng, Zeyi Liao, Scott Salisbury, Zeyuan Liu, Michael Lin, Qinyuan Zheng, Zifan Wang, Xiang Deng, Dawn Song, Huan Sun, Yu Su
- 🏛️ Institutions: OSU; Scale AI; UCB
- 📅 Date: Jul. 18, 2025
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/OSU-NLP-Group/WebGuard
- 💻 Env: [Web]
- 🔑 Key: [dataset], [evaluation], [benchmark]
- 📖 TLDR: WebGuard is the first comprehensive dataset designed for evaluating web agent action risks and developing necessary guardrails for real-world online environments. It contains 4,939 human-annotated, state-changing actions sourced from 193 websites across 22 domains, including various long-tail sites. The actions are categorized into a novel three-tier risk schema: SAFE, LOW, and HIGH, and the dataset includes training and test splits for evaluating generalization. The creators demonstrate that fine-tuning specialized guardrail models, such as the Qwen2.5VL-7B, using WebGuard significantly boosts performance, raising accuracy from 37% to 80% and increasing the recall of HIGH-risk actions from 20% to 76%.
Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
- Chaoran Chen, Zhiping Zhang, Ibrahim Khalilov, Bingcan Guo, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
- 🏛️ Institutions: University of Notre Dame, Virginia Tech, University of Washington, Northeastern University
- 📅 Date: Jun. 5, 2025
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [framework], [risk/attack], [survey]
- 📖 TLDR: This paper identifies three key risks, distinguishing them from traditional automation and general autonomous agents. Current evaluations prioritize performance, largely overlooking privacy and security. We review existing metrics and outline five challenges in using human evaluators. To address this, we advocate for a human-centered evaluation framework. This framework must incorporate risk assessments, enhance user awareness via in-context consent, and embed privacy and security considerations directly into agent design and evaluation.
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
- Pei Yang, Hai Ci, and Mike Zheng Shou
- 🏛️ Institutions: NUS
- 📅 Date: Jun. 4, 2025
- 📑 Publisher: arXiv
- 🐙 Github: https://macos-world.github.io/
- 💻 Env: [Desktop]
- 🔑 Key: [benchmark]
- 📖 TLDR: Introduces macOSWorld, the first interactive benchmark for GUI agents on macOS, with 202 tasks across 30 apps (28 macOS-exclusive) in 5 languages plus a safety subset for deception attacks; evaluates 6 agents, showing proprietary CUAs outperform open-source and VLM-based agents, significant language gaps (Arabic –27.5%), and both grounding and safety challenges.
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
- Xiao Yang, Jiawei Chen, Jun Luo, Zhengwei Fang, Yinpeng Dong, Hang Su, Jun Zhu
- 🏛️ Institutions: Tsinghua University, East China Normal University
- 📅 Date: Jun. 2, 2025
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/thu-ml/MLA-Trust
- 💻 Env: [Mobile], [Web]
- 🔑 Key: [benchmark], [dataset], [evaluation], [framework], [risk/attack]
- 📖 TLDR: Multimodal LLM-based agents (MLAs) integrate vision, language, and action for unprecedented autonomy in GUI applications, but introduce critical trustworthiness challenges due to their ability to modify digital states. Existing benchmarks are insufficient. This work introduces MLA-Trust, the first unified framework evaluating MLA trustworthiness across four dimensions: truthfulness, controllability, safety, and privacy, using realistic web and mobile tasks.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
- Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, Xander Davies
- 🏛️ Institutions: Gray Swan AI, UK AI Security Institute
- 📅 Date: Apr. 18, 2025
- 📑 Publisher: ICLR 2025
- 🐙 Github: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agentharm
- 💻 Env: [Misc]
- 🔑 Key: [benchmark], [dataset], [evaluation], [risk/attack]
- 📖 TLDR: Research on LLM robustness to jailbreak attacks primarily focuses on chatbots, yet LLM agents, which use external tools for multi-stage tasks, pose greater misuse risks. This paper introduces AgentHarm, a benchmark of 110 malicious agent tasks (covering 11 harm categories) that tests whether agents retain capabilities post-jailbreak to complete harmful multi-step tasks. The evaluation found leading LLMs are surprisingly compliant with malicious agent requests even without jailbreaking. Simple universal jailbreak templates effectively jailbreak agents, enabling coherent, malicious, multi-step agent behavior.
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections
- Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
- 🏛️ Institutions: University of Notre Dame, Northeastern University, University of Washington, Virginia Tech, Johns Hopkins University
- 📅 Date: Apr. 15, 2025
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [risk/attack]
- 📖 TLDR: LLM powered GUI agents perform tasks automatically by interpreting and interacting with GUIs. Their autonomy, especially when handling sensitive data, introduces new security risks. Adversaries can exploit the visual processing differences between agents and humans by injecting malicious GUI content, leading to altered agent behaviors or unauthorized private data disclosure. Our study characterized six attacks, testing them on state-of-the-art agents and human users. Findings show agents are highly vulnerable, especially to contextually embedded threats. Human oversight is insufficient, emphasizing the need for privacy-aware agent design and practical defense strategies.
Privacy-Enhancing Paradigms within Federated Multi-Agent Systems
- Zitong Shi, Guancheng Wan, Wenke Huang, Guibin Zhang, Jiawei Shao, Mang Ye, Carl Yang
- 🏛️ Institutions: National Engineering Research Center for Multimedia Software, Wuhan University, National University of Singapore, TeleAI, Emory University
- 📅 Date: Mar. 11, 2025
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [multi-agent], [evaluation], [dataset]
- 📖 TLDR: This paper identified three key challenges in Federated Multi-Agent Systems (MAS): heterogeneous privacy, structural conversational differences, and dynamic network topologies. To solve these, they introduce Embedded Privacy-Enhancing Agents (EPEAgent). EPEAgent seamlessly integrates into the Retrieval-Augmented Generation (RAG) and context retrieval phases, minimizing data sharing to only task-relevant information. They also created a comprehensive evaluation dataset. Experiments confirm EPEAgent significantly enhances privacy protection while maintaining high system performance.
AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks
- Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, Shengyu Zhang
- 🏛️ Institutions: Zhejiang University
- 📅 Date: Feb. 18, 2025
- 📑 Publisher: arXiv
- 💻 Env: [Mobile]
- 🔑 Key: [evaluation], [risk/attack]
- 📖 TLDR: This paper introduces the concept of Active Environment Injection Attack (AEIA), where attackers disguise malicious actions as environmental elements to disrupt AI agents' decision-making processes. The authors propose AEIA-MN, an attack scheme leveraging mobile notifications to evaluate the robustness of multimodal large language model-based mobile agents. Experimental results demonstrate that even advanced models are highly vulnerable to such attacks, with success rates reaching up to 93% in the AndroidWorld benchmark.
Improved Large Language Model Jailbreak Detection via Pretrained Embeddings
- Erick Galinkin, Martin Sablotny
- 🏛️ Institutions: NVIDIA,
- 📅 Date: Dec. 2, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [method], [risk/attack]
- 📖 TLDR: The adoption of LLMs requires robust security against attacks like prompt injection and jailbreaking, which aim to bypass safety policies. To prevent LLMs from generating harmful content or taking undesirable actions, owners must implement safeguards during training and use additional blocking tools. Detecting jailbreaking prompts is crucial. This work proposes a novel and superior approach for jailbreak detection by combining text embeddings optimized for retrieval with traditional machine learning classification algorithms.
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents
- Yuyou Gan, Yong Yang, Zhe Ma, Ping He, Rui Zeng, Yiming Wang, Qingming Li, Chunyi Zhou, Songze Li, Ting Wang, Yunjun Gao, Yingcai Wu, Shouling Ji
- 🏛️ Institutions: Zhejiang University, Southeast University, Stony Brook University
- 📅 Date: Nov. 14, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [survey], [risk/attack]
- 📖 TLDR: This survey collects and analyzes the different threats faced by these agents. To address the challenges posed by previous taxonomies in handling cross-module and cross-stage threats, we propose a novel taxonomy framework based on the sources and impacts. Additionally, we identify six key features of LLM-based agents, based on which we summarize the current research progress and analyze their limitations.Subsequently, we select four representative agents as case studies to analyze the risks they may face in practical use. Finally, based on the aforementioned analyses, we propose future research directions from the perspectives of data, methodology, and policy, respectively.
WebOlympus: An Open Platform for Web Agents on Live Websites
- Boyuan Zheng, Boyu Gou, Scott Salisbury, Zheng Du, Huan Sun, Yu Su
- 🏛️ Institutions: OSU
- 📅 Date: Nov. 12, 2024
- 📑 Publisher: EMNLP 2024
- 💻 Env: [Web]
- 🔑 Key: [framework]
- 📖 TLDR: This paper introduces WebOlympus, an open platform designed to facilitate the research and deployment of web agents on live websites. It features a user-friendly Chrome extension interface, allowing users without programming expertise to operate web agents with minimal effort. The platform incorporates a safety monitor module to prevent harmful actions through human supervision or model-based control, supporting applications such as annotation interfaces for web agent trajectories and data crawling.
Attacking Vision-Language Computer Agents via Pop-ups
- Yanzhe Zhang, Tao Yu, Diyi Yang
- 🏛️ Institutions: Georgia Tech, HKU, Stanford
- 📅 Date: Nov. 4, 2024
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/SALT-NLP/PopupAttack
- 💻 Env: [Web], [Desktop]
- 🔑 Key: [risk/attack]
- 📖 TLDR: This paper demonstrates that vision-language model (VLM) agents can be easily deceived by carefully designed adversarial pop-ups, leading them to perform unintended actions such as clicking on these pop-ups instead of completing their assigned tasks. Integrating these pop-ups into environments like OSWorld and VisualWebArena resulted in an average attack success rate of 86% and a 47% decrease in task success rate. Basic defense strategies, such as instructing the agent to ignore pop-ups or adding advertisement notices, were found to be ineffective against these attacks.
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
- Juyong Lee, Dongyoon Hahm, June Suk Choi, W. Bradley Knox, Kimin Lee
- 🏛️ Institutions: KAIST, UT at Austin
- 📅 Date: Oct. 23, 2024
- 📑 Publisher: arXiv
- 🐙 Github: https://mobilesafetybench.github.io/
- 💻 Env: [Mobile]
- 🔑 Key: [benchmark], [evaluation]
- 📖 TLDR: MobileSafetyBench introduces a benchmark for evaluating the safety of large language model (LLM)-based autonomous agents in mobile device control. Using Android emulators, the benchmark simulates real-world tasks in apps such as messaging and banking to assess agents' safety and helpfulness. The safety-focused tasks test for privacy risk management and robustness against adversarial prompt injections. Experiments show agents perform well in helpful tasks but struggle with safety-related challenges, underscoring the need for continued advancements in mobile safety mechanisms for autonomous agents.
Dissecting Adversarial Robustness of Multimodal LM Agents
- Chen Henry Wu, Rishi Rajesh Shah, Jing Yu Koh, Russ Salakhutdinov, Daniel Fried, Aditi Raghunathan
- 🏛️ Institutions: CMU, Stanford
- 📅 Date: Oct. 21, 2024
- 📑 Publisher: ICLR 2025
- 🐙 Github: https://github.com/ChenWu98/agent-attack
- 💻 Env: [Web]
- 🔑 Key: [benchmark], [risk/attack], [evaluation]
- 📖 TLDR: This paper introduces the Agent Robustness Evaluation (ARE) framework to assess the adversarial robustness of multimodal language model agents in web environments. By creating 200 targeted adversarial tasks within VisualWebArena, the study reveals that minimal perturbations can significantly compromise agent performance, even in advanced systems utilizing reflection and tree-search mechanisms. The findings highlight the need for enhanced safety measures in deploying such agents.
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
- Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang
- 🏛️ Institutions: CMU, GraySwan AI, Scale AI
- 📅 Date: Oct. 11, 2024
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/scaleapi/browser-art
- 💻 Env: [Web]
- 🔑 Key: [evaluation]
- 📖 TLDR: This paper introduces Browser Agent Red teaming Toolkit (BrowserART), a comprehensive test suite for evaluating the safety of LLM-based browser agents. The study reveals that while refusal-trained LLMs decline harmful instructions in chat settings, their corresponding browser agents often comply with such instructions, indicating a significant safety gap. The authors call for collaboration among developers and policymakers to enhance agent safety.
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
- Ido Levy, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov
- 🏛️ Institutions: IBM Research
- 📅 Date: Oct. 9, 2024
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/segev-shlomov/ST-WebAgentBench
- 💻 Env: [Web]
- 🔑 Key: [benchmark], [evaluation]
- 📖 TLDR: This paper introduces ST-WebAgentBench, a benchmark designed to evaluate the safety and trustworthiness of web agents in enterprise contexts. It defines safe and trustworthy agent behavior, outlines the structure of safety policies, and introduces the "Completion under Policies" metric to assess agent performance. The study reveals that current state-of-the-art agents struggle with policy adherence, highlighting the need for improved policy awareness and compliance in web agents.
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
- Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
- 🏛️ Institutions: UIUC, OSU
- 📅 Date: Sept. 27, 2024
- 📑 Publisher: arXiv
- 🐙 Github: https://ai-secure.github.io/AdvWeb/
- 💻 Env: [Web]
- 🔑 Key: [risk/attack]
- 📖 TLDR: This paper presents AdvWeb, a black-box attack framework that exploits vulnerabilities in vision-language model (VLM)-powered web agents by injecting adversarial prompts directly into web pages. Using Direct Policy Optimization (DPO), AdvWeb trains an adversarial prompter model that can mislead agents into executing harmful actions, such as unauthorized financial transactions, while maintaining high stealth and control. Extensive evaluations reveal that AdvWeb achieves high success rates across multiple real-world tasks, emphasizing the need for stronger security measures in web agent deployments.
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun
- 🏛️ Institutions: OSU, UCLA, UChicago, UIUC, UW-Madison
- 📅 Date: Sept. 17, 2024
- 📑 Publisher: ICLR 2025
- 🐙 Github: https://github.com/osu-nlp-group/eia_against_webagent
- 💻 Env: [Web]
- 🔑 Key: [risk/attack]
- 📖 TLDR: This paper introduces the Environmental Injection Attack (EIA), a privacy attack targeting generalist web agents by embedding malicious yet concealed web elements to trick agents into leaking users' PII. Utilizing 177 action steps within realistic web scenarios, EIA demonstrates a high success rate in extracting specific PII and whole user requests. Through its detailed threat model and defense suggestions, the work underscores the challenge of detecting and mitigating privacy risks in autonomous web agents.
Adversarial Attacks on Multimodal Agents
- Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan
- 🏛️ Institutions: CMU
- 📅 Date: Jun. 18, 2024
- 📑 Publisher: NeurIPS 2024 Open-World Agents Workshop
- 🐙 Github: https://github.com/ChenWu98/agent-attack
- 💻 Env: [Web]
- 🔑 Key: [benchmark], [risk/attack], [evaluate]
- 📖 TLDR: This paper investigates the safety risks posed by multimodal agents built on vision-enabled language models (VLMs). The authors introduce two adversarial attack methods: a captioner attack targeting white-box captioners and a CLIP attack that transfers to proprietary VLMs. To evaluate these attacks, they curated VisualWebArena-Adv, a set of adversarial tasks based on VisualWebArena. The study demonstrates that within a limited perturbation norm, the captioner attack can achieve a 75% success rate in making a captioner-augmented GPT-4V agent execute adversarial goals. The paper also discusses the robustness of agents based on other VLMs and provides insights into factors contributing to attack success and potential defenses.
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
- Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie Zhou, Xu Sun
- 🏛️ Institutions: Renming University of China, PKU, Tencent
- 📅 Date: Feb. 17, 2024
- 📑 Publisher: NeurIPS 2024
- 🐙 Github: https://github.com/lancopku/agent-backdoor-attacks
- 💻 Env: [Misc]
- 🔑 Key: [risk/attack]
- 📖 TLDR: This paper investigates backdoor attacks on LLM-based agents, introducing a framework that categorizes attacks based on outcomes and trigger locations. The study demonstrates the vulnerability of such agents to backdoor attacks and emphasizes the need for targeted defenses.
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
- Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun
- 🏛️ Institutions: OSU, UWM
- 📅 Date: Feb. 15, 2024
- 📑 Publisher: arXiv
- 🐙 Github: https://github.com/OSU-NLP-Group/AgentAttack
- 💻 Env: [Misc]
- 🔑 Key: [risk/attack], [evaluate]
- 📖 TLDR: This paper introduces a conceptual framework to assess and understand adversarial vulnerabilities in language agents, dividing the agent structure into three components—Perception, Brain, and Action. It discusses 12 specific adversarial attack types that exploit these components, ranging from input manipulation to complex backdoor and jailbreak attacks. The framework provides a basis for identifying and mitigating risks before the widespread deployment of these agents in real-world applications.

Blogs

Agentic AI Infrastructure Best Practices Series (Part 8): Privacy and Security of Agent Applications
- Li Yang, Tang Qingyuan, Zhou Chenlin (AWS Team)
- 🏛️ Institutions: Amazon Web Services (China)
- 📅 Date: September 19, 2025
- 📑 Publisher: AWS China Blog
- 💻 Env: Agentic AI Systems (including GUI/Tool-Using Agents)
- 🔑 Key: [risk/attack], [mitigation], [framework]
- 📖 TLDR: This article examines privacy and security threats in Agentic AI applications, referencing OWASP's top threats like memory poisoning and tool misuse. It proposes layered defenses including inference manipulation prevention, secure SDLC, control/data plane isolation, Amazon Bedrock Guardrails for filtering, MCP server hardening, and centralized governance via AgentCore Gateway for identity, access control, and auditing.
Claude Computer Use: A Ticking Time Bomb
- Prompt Security Team
- 🏛️ Institutions: Prompt Security
- 📅 Date: September 9, 2025
- 📑 Publisher: Prompt Security Blog
- 💻 Env: Computer Use Agents (GUI/Visual Agents)
- 🔑 Key: [risk/attack], [prompt injection]
- 📖 TLDR: This article explores the severe risks of Anthropic's Claude Computer Use feature, highlighting how it enables autonomous computer control but opens doors to prompt injection attacks, allowing malicious actors to exploit the agent for harmful actions through untrusted inputs.
Indirect Prompt Injection of Claude Computer Use
- HiddenLayer Security Research Team
- 🏛️ Institutions: HiddenLayer
- 📅 Date: November 14, 2024
- 📑 Publisher: HiddenLayer Innovation Hub
- 💻 Env: Claude Computer Use (GUI Agent)
- 🔑 Key: [risk/attack], [prompt injection], [mitigation]
- 📖 TLDR: The post demonstrates indirect prompt injection vulnerabilities in Anthropic's Claude Computer Use, showing how malicious instructions can force destructive actions like system file deletion, while advocating for prompt monitoring and security solutions to defend against such exploits.
The Agentic AI Security Scoping Matrix: A framework for securing autonomous AI systems
- AWS Security Team
- 🏛️ Institutions: Amazon Web Services
- 📅 Date: November 21, 2025
- 📑 Publisher: AWS Security Blog
- 💻 Env: Agentic AI Systems (including GUI/Browser Agents)
- 🔑 Key: [framework], [mitigation], [risk/attack]
- 📖 TLDR: This framework addresses the shifted security model in agentic AI, introducing persistent memory and autonomous execution risks, and provides scoping guidance for controls like input validation, human oversight, and resilience measures applicable to GUI-operating agents.
GUI Agents: Exploring the Future of Human-Computer Interaction
- XenonStack Team
- 🏛️ Institutions: XenonStack
- 📅 Date: November 12, 2024
- 📑 Publisher: XenonStack Blog
- 💻 Env: GUI Agents
- 🔑 Key: [risk/attack], [mitigation]
- 📖 TLDR: The article discusses the architecture and applications of GUI agents while emphasizing security concerns from high autonomy, such as risks to data confidentiality, and recommends strong validation protocols and access controls as key mitigations.
Agentic AI Threat Modeling Framework: MAESTRO
- Cloud Security Alliance Team
- 🏛️ Institutions: Cloud Security Alliance
- 📅 Date: February 6, 2025
- 📑 Publisher: Cloud Security Alliance Blog
- 💻 Env: Multi-Agent Systems (including GUI Agents)
- 🔑 Key: [framework], [mitigation], [risk/attack]
- 📖 TLDR: Introducing the MAESTRO framework for threat modeling in agentic AI, it covers layered risks like goal misalignment and data poisoning, with tailored mitigations such as defense-in-depth and input validation for secure autonomous systems.
Top Agentic AI Security Threats in 2025 & Fixes
- Lasso Security Team
- 🏛️ Institutions: Lasso Security
- 📅 Date: September 22, 2025
- 📑 Publisher: Lasso Security Blog
- 💻 Env: Agentic AI (Tool-Using/GUI Agents)
- 🔑 Key: [risk/attack], [mitigation]
- 📖 TLDR: Outlining top threats like memory poisoning, tool misuse, and privilege escalation in agentic AI, the article provides practical fixes including context-aware guardrails and least-privilege enforcement to secure autonomous agents.

How to Add a Paper or Update the README

Please fork and update:

Format example and explanation

- [title](paper link)
    - List authors directly without a "key" identifier (e.g., author1, author2)
    - 🏛️ Institutions: List the institutions concisely, using abbreviations (e.g., university names, like OSU).
    - 📅 Date: e.g., Oct 30, 2024
    - 📑 Publisher: e.g., ICLR 2025
    - 🐙 Github: Github repo link
    - 💻 Env: Indicate the research environment within brackets, such as [Web], [Mobile], or [Desktop]. Use [Misc] if it is researching in general domains.
    - 🔑 Key: Label each keyword within brackets, e.g., [framework], [dataset], [benchmark].
    - 📖 TLDR: Brief summary of the paper.

Star History

Acknowledgment

This repository references the initial template provided by the OSU-NLP-Group. See the original work at GUI-Agents-Paper-List.

We are grateful for their excellent work and highly recommend checking out and starring their curated collection of GUI agent papers. Furthermore, we extend our sincere gratitude to all contributors of this repository. If you are interested in our work, please star our repository!

Name		Name	Last commit message	Last commit date
Latest commit History 576 Commits
.github/workflows		.github/workflows
paper_by_env		paper_by_env
paper_by_key		paper_by_key
update_template_or_data		update_template_or_data
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ Awesome GUI Agent Safety

📖 Table of Contents

Overview

📊 Keyword Visualization

Papers Grouped by Environments

Papers Grouped by Keywords

Papers Grouped by Authors

All Papers (from most recent to oldest)

How to Add a Paper or Update the README

Star History

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ Awesome GUI Agent Safety

📖 Table of Contents

Overview

📊 Keyword Visualization

Papers Grouped by Environments

Papers Grouped by Keywords

Papers Grouped by Authors

All Papers (from most recent to oldest)

How to Add a Paper or Update the README

Star History

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages