Jekyll2026-03-03T15:55:57+00:00https://mphute.github.io/feed.xmlMansi PhuteMansi Phute's personal websiteMansi Phute[email protected]UNDREAM: Bridging Differentiable Rendering and Photorealistic Simulation for End-to-end Adversarial Attacks2025-09-22T00:00:00+00:002025-09-22T00:00:00+00:00https://mphute.github.io/papers/undreamDeep learning models deployed in safety critical applications like autonomous driving use simulations to test their robustness against adversarial attacks in realistic conditions. However, these simulations are non-differentiable, forcing researchers to create attacks that do not integrate simulation environmental factors, reducing attack success. To address this limitation, we introduce UNDREAM, the first software framework that bridges the gap between photorealistic simulators and differentiable renderers to enable end-to-end optimization of adversarial perturbations on any 3D objects. UNDREAM enables manipulation of the environment by offering complete control over weather, lighting, backgrounds, camera angles, trajectories, and realistic human and object movements, thereby allowing the creation of diverse scenes. We showcase a wide array of distinct physically plausible adversarial objects that UNDREAM enables researchers to swiftly explore in different configurable environments. This combination of photorealistic simulation and differentiable optimization opens new avenues for advancing research of physical adversarial attacks

]]>
Mansi Phute
VISOR++ - Transferrable Visual Input based Steering for Output Redirection in Large Vision Language Models2025-09-01T00:00:00+00:002025-09-01T00:00:00+00:00https://mphute.github.io/papers/visor-plusVision Language Models (VLMs) are increasingly being used in a broad range of applications. While existing approaches for behavioral control or output redirection are easily detectable and often ineffective, activation-based steering vectors require invasive runtime access to model internals incompatible with API-based services and closed source deployments. We introduce VISOR (Visual Input based Steering for Output Redirection), a novel method that achieves sophisticated behavioral control through optimized visual inputs alone. It enables practical deployment across all VLM serving modalities while remaining imperceptible compared to explicit textual instructions. A single 150KB steering image matches, and often outperforms, steering vector performance. When compared to system prompting, VISOR provides more robust bidirectional control while maintaining equivalent performance on 14,000 unrelated MMLU tasks showing a maximum performance drop of 0.1\% across different models and datasets. Beyond eliminating runtime overhead and model access requirements, VISOR exposes a critical security vulnerability: adversaries can achieve sophisticated behavioral manipulation through visual channels alone, bypassing text-based defenses. Our work fundamentally re-imagines multimodal model control and highlights the urgent need for defenses against visual steering attacks.

]]>
Ravi Balakrishnan
VISOR - Visual Input based Steering for Output Redirection in Large Vision Language Models2025-08-01T00:00:00+00:002025-08-01T00:00:00+00:00https://mphute.github.io/papers/visorVision Language Models (VLMs) are increasingly being used in a broad range of applications. While existing approaches for behavioral control or output redirection are easily detectable and often ineffective, activation-based steering vectors require invasive runtime access to model internals incompatible with API-based services and closed source deployments. We introduce VISOR (Visual Input based Steering for Output Redirection), a novel method that achieves sophisticated behavioral control through optimized visual inputs alone. It enables practical deployment across all VLM serving modalities while remaining imperceptible compared to explicit textual instructions. A single 150KB steering image matches, and often outperforms, steering vector performance. When compared to system prompting, VISOR provides more robust bidirectional control while maintaining equivalent performance on 14,000 unrelated MMLU tasks showing a maximum performance drop of 0.1\% across different models and datasets. Beyond eliminating runtime overhead and model access requirements, VISOR exposes a critical security vulnerability: adversaries can achieve sophisticated behavioral manipulation through visual channels alone, bypassing text-based defenses. Our work fundamentally re-imagines multimodal model control and highlights the urgent need for defenses against visual steering attacks.

]]>
Mansi Phute
ComplicitSplat: Downstream Models are Vulnerable to Blackbox Attacks by 3D Gaussian Splat Camouflages2025-07-02T00:00:00+00:002025-07-02T00:00:00+00:00https://mphute.github.io/papers/complicitsplat

Abstract: As 3D Gaussian Splatting (3DGS) gains rapid adoption in safety-critical tasks for efficient novel-view synthesis from static images, how might an adversary tamper images to cause harm? We introduce ComplicitSplat, the first attack that exploits standard 3DGS shading methods to create viewpoint-specific camouflage - colors and textures that change with viewing angle - to embed adversarial content in scene objects that are visible only from specific viewpoints and without requiring access to model architecture or weights. Our extensive experiments show that ComplicitSplat generalizes to successfully attack a variety of popular detector - both single-stage, multi-stage, and transformer-based models on both real-world capture of physical objects and synthetic scenes. To our knowledge, this is the first black-box attack on downstream object detectors using 3DGS, exposing a novel safety risk for applications like autonomous navigation and other mission-critical robotic systems

]]>
Matthew Hull
3D Gaussian Splat Vulnerabilities2025-06-01T00:00:00+00:002025-06-01T00:00:00+00:00https://mphute.github.io/papers/gssplatWith 3D Gaussian Splatting (3DGS) being increasingly used in safety-critical applications, how can an adversary manipulate the scene to cause harm? We introduce CLOAK, the first attack that leverages view-dependent Gaussian appearances—colors and textures that change with viewing angle—to embed adversarial content visible only from specific viewpoints. We further demonstrate DAGGER, a targeted adversarial attack directly perturbing 3D Gaussians without access to underlying training data, deceiving multi-stage object detectors e.g., Faster R-CNN, through established methods such as projected gradient descent. These attacks highlight underexplored vulnerabilities in 3DGS, introducing a new potential threat to robotic learning for autonomous navigation and other safety-critical 3DGS applications.

]]>
Matthew Hull
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety2025-05-02T00:00:00+00:002025-05-02T00:00:00+00:00https://mphute.github.io/papers/interpret-safety-surveyAs large language models (LLMs) see wider real-world use, understanding and mitigating their unsafe behaviors is critical. Interpretation techniques can reveal causes of unsafe outputs and guide safety, but such connections with safety are often overlooked in prior surveys. We present the first survey that bridges this gap, introducing a unified framework that connects safety-focused interpretation methods, the safety enhancements they inform, and the tools that operationalize them. Our novel taxonomy, organized by LLM workflow stages, summarizes nearly 70 works at their intersections. We conclude with open challenges and future directions. This timely survey helps researchers and practitioners navigate key advancements for safer, more interpretable LLMs.

]]>
Seongmin Lee
RenderBender: A Survey on Adversarial Attacks Using Differentiable Rendering2025-05-01T00:00:00+00:002025-05-01T00:00:00+00:00https://mphute.github.io/papers/renderbenderDifferentiable rendering techniques like Gaussian Splatting and Neural Radiance Fields have become powerful tools for generating high-fidelity models of 3D objects and scenes. Their ability to produce both physically plausible and differentiable models of scenes are key ingredient needed to produce physically plausible adversarial attacks on DNNs. However, the adversarial machine learning community has yet to fully explore these capabilities, partly due to differing attack goals (e.g., misclassification, misdetection) and a wide range of possible scene manipulations used to achieve them (e.g., alter texture, mesh). This survey contributes a framework that unifies diverse goals and tasks, facilitating easy comparison of existing work, identifying research gaps, and highlighting future directions—ranging from expanding attack goals and tasks to account for new modalities, state-of-the-art models, tools, and pipelines, to underscoring the importance of studying real-world threats in complex scenes.

]]>
Matthew Hull
Semi Truths: A Large-Scale Dataset for Testing Robustness of AI-Generated Image Detectors2024-09-21T00:00:00+00:002024-09-21T00:00:00+00:00https://mphute.github.io/papers/semi_truthsWhile text-to-image diffusion models have demonstrated impactful applications in art, design, and entertainment, these technologies also facilitate the spread of misinformation. Recent efforts have developed AI-generated image detectors claiming robustness against various augmentations, but their effectiveness remains unclear. Can these systems detect varying degrees of augmentation? Do they exhibit biases towards specific scenes or data distributions? To address these questions, we introduce Semi-Truths, featuring 27,635 real images, 245,360 masks, and 850,226 AI-augmented images featuring varying degrees of targeted and localized edits, created using diverse augmentation methods, diffusion models, and data distributions. Each augmented image includes detailed metadata for standardized, targeted evaluation of detector robustness. Our findings suggest that state-of-the-art detectors are sensitive to different degrees of edits, data distributions, and editing techniques, providing deeper insights into their functionality.

]]>
Anisha Pal
LLM Attributor: Interactive Visual Attribution for LLM Generation2024-03-10T00:00:00+00:002024-03-10T00:00:00+00:00https://mphute.github.io/papers/llm-attributorSeongmin LeeRobust Principles: Architectural Design Principles for Adversarially Robust CNNs2023-08-01T00:00:00+00:002023-08-01T00:00:00+00:00https://mphute.github.io/papers/robust_principlesWe aim to unify existing works’ diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through adopting squeeze and excitation blocks and non-parametric smooth activation functions. Through extensive experiments across a wide spectrum of dataset scales, adversarial training methods, model parameters, and network design spaces, our principles consistently and markedly improve AutoAttack accuracy: 1-3 percentage points (pp) on CIFAR-10 and CIFAR-100, and 4-9 pp on ImageNet.

]]>
ShengYun Peng