GenAI systems introduce data security risks that traditional frameworks were not designed to address. This document identifies 21 critical data security risks (DSGAI01-DSGAI21) spanning the full GenAI lifecycle — from training-time poisoning and credential exposure to inference attacks, compliance failures, and model IP theft. Each risk includes how the attack unfolds, attacker capabilities, real-world scenarios, and a three-tier mitigation model (Foundational, Hardening, Advanced). An interactive web application accompanies this document to make the research actionable.
Interactive Web App: emmanuelgjr.github.io/DSGAI
Published Document: OWASP GenAI Data Security Guide
The context window in GenAI systems aggregates data from multiple trust domains — system prompts, user inputs, RAG results, tool outputs, and conversation history — into a single flat namespace with no internal access control. There is no mechanism to mark a context segment as "available for reasoning but not for direct output." This architectural fusion of control and data planes creates an entirely new class of data security risk.
Organizations deploying LLMs, RAG pipelines, AI agents, and fine-tuned models need a purpose-built framework to identify, prioritize, and mitigate these risks. That is what DSGAI provides.
| # | Risk | Category |
|---|---|---|
| DSGAI01 | Sensitive Data Leakage | Data Leakage |
| DSGAI02 | Agent Identity & Credential Exposure | Identity & Access |
| DSGAI03 | Shadow AI & Unsanctioned Data Flows | Governance & Lifecycle |
| DSGAI04 | Data, Model & Artifact Poisoning | Data Poisoning |
| DSGAI05 | Data Integrity & Validation Failures | Data Poisoning |
| DSGAI06 | Tool, Plugin & Agent Data Exchange Risks | Identity & Access |
| DSGAI07 | Data Governance, Lifecycle & Classification | Governance & Lifecycle |
| DSGAI08 | Non-Compliance & Regulatory Violations | Compliance & Regulatory |
| DSGAI09 | Multimodal Capture & Cross-Channel Leakage | Data Leakage |
| DSGAI10 | Synthetic Data, Anonymization & Transformation Pitfalls | Governance & Lifecycle |
| DSGAI11 | Cross-Context & Multi-User Conversation Bleed | Infrastructure |
| DSGAI12 | Unsafe Natural-Language Data Gateways | Infrastructure |
| DSGAI13 | Vector Store Platform Data Security | Infrastructure |
| DSGAI14 | Excessive Telemetry & Monitoring Leakage | Governance & Lifecycle |
| DSGAI15 | Over-Broad Context Windows & Prompt Over-Sharing | Compliance & Regulatory |
| DSGAI16 | Endpoint & Browser Assistant Overreach | Adversarial Attacks |
| DSGAI17 | Data Availability & Resilience Failures | Infrastructure |
| DSGAI18 | Inference & Data Reconstruction | Adversarial Attacks |
| DSGAI19 | Human-in-the-Loop & Labeler Overexposure | Governance & Lifecycle |
| DSGAI20 | Model Exfiltration & IP Replication | Adversarial Attacks |
| DSGAI21 | Disinformation & Integrity Attacks via Data Poisoning | Data Poisoning |
Each risk provides mitigations organized by implementation maturity:
Tier 1 — Foundational: Controls deployable within a single sprint. Policy enforcement, configuration hardening, access reviews, and operational hygiene that require no architectural changes.
Tier 2 — Hardening: Controls requiring architecture changes, new tooling, or cross-team coordination. DLP integration, advanced RAG hardening, cryptographic integrity pipelines.
Tier 3 — Advanced: Controls for mature security programs. Differential privacy, machine unlearning, red-team exercises, formal verification, and custom detection models.
The web app transforms the published document into a searchable, filterable, and actionable platform. Everything runs in the browser — no data leaves your machine.
For CISOs and Risk Managers
- Risk Assessment — 20-question interactive questionnaire that profiles your GenAI stack and generates a personalized risk report with severity scoring and recommended mitigations
- Policy Generator — Wizard that generates downloadable Acceptable Use, Data Classification, Model Governance, and Vendor Assessment policies customized to your organization
For Security Practitioners and SOC Teams
- Implementation Checklists — Actionable task lists for all 21 risks with real tool recommendations (AWS Macie, HashiCorp Vault, Lakera Guard, Splunk, etc.), effort estimates, and browser-persisted progress tracking
- Detection & Response Playbooks — SIEM queries for Splunk, Microsoft Sentinel, and Elastic; detection signatures; step-by-step response runbooks; and indicators of compromise
- Threat Model Templates — STRIDE-based analysis for every risk with assets, trust boundaries, data flows, and exportable JSON
For GenAI Developers and Architects
- Architecture Decision Records — 10 secure-by-default patterns with Python and JavaScript code snippets covering RAG security, agent credentials, vector store hardening, LLM-to-SQL safety, plugin sandboxing, and more
- Network Diagrams — Animated infrastructure topology views showing how each attack unfolds across cloud, server, database, and agent components
- Attack Paths — Visual step-by-step attack flow diagrams with vector breakdowns
Each of the 21 entries follows a consistent format:
- How the attack unfolds — GenAI-specific attack narrative
- Attacker capabilities — Who can exploit this and what they need
- Illustrative scenario — Grounded in documented incidents or current research
- Impact — Confidentiality, regulatory, operational, and downstream consequences
- Mitigations — Three tiers with scope annotations (Buy / Build / Both)
- Known CVEs — Referenced where applicable
The document introduces AI-DSPM as an extension of traditional DSPM with 13 capability areas tailored to GenAI:
- GenAI data asset discovery & inventory
- Data classification, labeling & policy binding
- Data flow mapping, lineage & GenAI bill of materials
- Access governance & entitlement posture (including agents)
- Prompt, RAG, and output-layer DLP controls
- Vector store & embedding security posture
- Data integrity, poisoning & tamper detection
- Observability, telemetry & log-retention posture
- Third-party, plugin/tool, and connector governance
- Lifecycle management, erasure & compliance readiness
- Training governance & privacy-enhancing fine-tuning
- Resilience posture for GenAI data dependencies
- Human and "Shadow AI" controls
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
You are free to share and adapt this material for any purpose, including commercial, provided you give appropriate credit and distribute contributions under the same license.
Contributions are welcome. Open an issue or submit a pull request.