Đề tài 13: Xây dựng CI/CD pipeline bảo mật (DevSecOps) — tích hợp SAST, DAST, IaC scanning, container image scanning (Trivy/Anchore)
Mô tả ngắn:
Xây dựng một CI/CD pipeline thực tế cho ứng dụng mẫu (web service + IaC) với tích hợp kiểm thử bảo mật toàn diện: SAST (code scanning), DAST (runtime scan), IaC scanning (Checkov/tfsec), container image scanning (Trivy/Anchore), SBOM generation/verification, ký image, và policy‑gating. Pipeline phải tự động hóa (pre‑commit → PR → build → test → deploy), có báo cáo, triage flow và tích hợp DevSecOps (shift‑left).
- Sai sót bảo mật dễ chui vào production qua CI/CD: secrets trong code, vuln trong dependency, misconfig IaC, insecure container images.
- Các tổ chức cần pipeline tự động block các thay đổi kém an toàn, đồng thời giữ trải nghiệm developer (fast feedback).
- Mục tiêu: thực hiện end‑to‑end DevSecOps pipeline reproducible cho lab/course, hướng tới nghề Cloud Security Engineer / DevSecOps.
- Thiết kế repo mẫu chứa: ứng dụng (e.g., simple web app), Dockerfile, Terraform/Ansible IaC.
- Triển khai CI pipeline (GitHub Actions / GitLab CI / Jenkins) tích hợp: pre‑commit checks, SAST (semgrep/sonar), dependency scan (Dependabot/Snyk), IaC scan (Checkov/tfsec), container build, image scan (Trivy/Anchore), SBOM (syft), image signing (cosign), DAST (OWASP ZAP) trên staging.
- Định nghĩa gating policy: threshold vulnerabilities, block PR merge, auto PR fix for IaC where safe.
- Tích hợp triage: create ticket (JIRA) hoặc create issue with remediation hints.
- Đánh giá: pipeline latency, false positive rate, developer friction, security improvement (pre-prod vuln reduction).
- VCS / CI: GitHub + GitHub Actions (recommended) or GitLab CI / Jenkins.
- App stack: sample web app (Node/Python/Go) + Dockerfile + Terraform.
- SAST: semgrep, Bandit (Python), ESLint (JS), Gosec (Go), SonarQube/SonarCloud (optional).
- Dependency scanning: Dependabot (GitHub), Snyk, GitHub Code Scanning.
- IaC scanning: Checkov, tfsec, Conftest/OPA Rego for policies.
- Container scanning: Trivy (quick), Anchore Engine / Clair (policy/gates).
- SBOM & signing: Syft (generate SBOM), Cosign / Notary v2 (sign & verify images).
- DAST: OWASP ZAP (automation mode) or w3af.
- Policy & Admission: OPA Gatekeeper (Kubernetes admission), Notary verification in deployment.
- Registry: GitHub Container Registry / Harbor / Docker Registry with vulnerability scanning.
- Ticketing / ChatOps: JIRA / GitHub Issues, Slack for notifications.
- IaC provisioning: Terraform for infra, Ansible for config.
- Secrets management: HashiCorp Vault / GitHub Secrets / Azure Key Vault.
- Observability: ELK / Grafana for pipeline metrics & vulnerability dashboards.
- T1 — Sensitive secrets checked into repo (API keys, DB passwords). Detect via pre‑commit (gitleaks) and fail build.
- T2 — Vulnerable dependency introduced (high CVE). Block merge if severity threshold exceeded.
- T3 — IaC misconfiguration merged (public storage, open SG). Block merge or create auto-fix PR.
- T4 — Insecure container image published (unpatched base). Prevent push to prod registry.
- T5 — Runtime exploit discovered in staging: DAST finds remote RCE → fail promotion to prod.
Suggested stages: Pre-commit → PR (CI) → Build → Image Scan → SBOM & Sign → Push to Registry (staging) → DAST in staging → Policy Gate → Deploy to Prod (if gated).
-
Pre‑commit hooks (local)
pre-commitconfig: runsemgrep,gitleaks,tfsecfor changed files, linters. Provide quick feedback to devs.
-
PR / CI checks (on push / PR)
- Checkout code; install deps.
- Run SAST (semgrep rules + language-specific analyzers).
- Run dependency scan (e.g.,
npm audit,pip-audit, Snyk test). - Run IaC static scan (Checkov/tfsec) against Terraform/Ansible.
- Run unit tests & code quality metrics.
- Report results as GitHub Checks with annotations (files/lines). Block merge on failure for critical checks.
-
Build & Container image stage
- Build image (Docker BuildKit). Tag with commit SHA.
- Generate SBOM:
syft <image> -o json > sbom.json. - Scan image with Trivy/Anchore: if vulnerabilities > threshold, fail.
- Sign image with cosign:
cosign sign --key ... ghcr.io/org/image:sha. - Push signed image to registry (only if signed & scan OK).
-
Staging & DAST
- Deploy to isolated staging namespace via Terraform/Kubernetes manifests.
- Run OWASP ZAP scan automation against staging endpoint → gather findings.
- If DAST finds critical issues, fail promotion and create issue with context.
-
Policy Gate (pre-prod → prod)
- Verify SBOM, signature, and registry policy (admission webhook to check cosign signature + policy via OPA Gatekeeper).
- Enforce runtime policies (no privileged containers, read-only filesystem, drop capabilities) via PodSecurity and OPA.
- If all gates pass, promote deployment to production.
name: CI-Secure
on: [push, pull_request]
jobs:
prechecks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Run Semgrep
uses: returntocorp/semgrep-action@v1
- name: Run gitleaks
uses: zricethezav/gitleaks-action@v1
- name: Run Checkov (IaC)
uses: bridgecrewio/checkov-action@v10
build_and_scan:
needs: prechecks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build container
run: docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
- name: Generate SBOM with Syft
run: syft ghcr.io/${{ github.repository }}:${{ github.sha }} -o json > sbom.json
- name: Scan image with Trivy
run: trivy image --severity CRITICAL,HIGH --exit-code 1 ghcr.io/${{ github.repository }}:${{ github.sha }}
- name: Sign image with Cosign
run: cosign sign --key ${{ secrets.COSIGN_KEY }} ghcr.io/${{ github.repository }}:${{ github.sha }}
- name: Push to GHCR
run: docker push ghcr.io/${{ github.repository }}:${{ github.sha }}(Extend with caching, buildx, and artifact upload as needed.)
- Use Checkov/tfsec in CI for Terraform. Use Conftest with Rego to implement org‑specific rules (e.g., disallow public S3, require tags, restrict instance types).
- Example Rego snippet (deny public S3 in terraform plan):
package terraform.s3
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg = sprintf("S3 %s is public", [resource.address])
}- Use Gatekeeper on K8s to block deployments that violate PodSecurity / OPA policies.
- Static scanning: Trivy CLI during CI to detect CVEs in OS packages and application dependencies.
- Policy scanning: Anchore can apply policy bundles and provide detailed reports.
- SBOM: syft to create SBOM (SPDX/CycloneDX), store SBOM artifact in registry or artifact store. SBOM used for vulnerability triage & SCA.
- Image signing & verification: cosign for supply chain signing; verify signature in admission webhook before runtime.
- Runtime protections: Falco for runtime detection, image vulnerability blocking via admission controller, PodSecurity.
- Registry hardening: use Harbor or GHCR with vulnerability scanning and immutable tags for prod images.
- Automate ZAP in CI or post‑deploy stage: use ZAP baseline & active scan against staging URL.
- Parse ZAP report (JSON) and convert to GitHub annotations or create issue for critical findings.
- For authenticated scans, provision test accounts / test tokens with limited privileges.
- Limit DAST scope to non‑destructive tests and ensure staging is isolated.
- On failed checks, create annotated GitHub Check runs + COMMENTS on PR with details and remediation hints (e.g., semgrep rule, vulnerable dependency upgrade path).
- Critical failures automatically create JIRA issue (with trace, SBOM) and assign to owner.
- For IaC issues, optionally create auto-fix PR (use Bridgecrew/Checkov auto-fix or scripted terraform patch) — require manual review.
- Maintain suppression/ALLOWLIST process: require documented exception ticket with TTL.
- Security metrics: number of critical/high vuln prevented from reaching prod; average vulnerabilities per image over time; % PRs blocked due to security checks.
- Operational metrics: pipeline latency (time from PR → build → gated), false-positive rate (FP per check), time-to-fix (median), developer satisfaction (survey).
- Compliance metrics: SBOM coverage %, images signed %, IaC compliance % (CIS/Terraform checks).
- Quality metrics: mobile test coverage, unit tests passing rate.
- Unit tests & integration tests for app.
- Simulate injection of secrets & ensure gitleaks detects.
- Add synthetic vulnerable dependency to check enforcement (e.g., pin a package with known CVE).
- Inject IaC misconfig and observe Checkov blocking PR.
- Build image with known CVE base and ensure Trivy fails the CI.
- Run DAST against staging with a test RCE endpoint to verify detection.
- Validate admission webhook denies unsigned/unverified images.
- Do NOT store production secrets in CI logs; use secret masking and secure storage for keys.
- Run DAST only in isolated staging; avoid attacking external services.
- Ensure SBOMs that include PII are sanitized before sharing.
- Manage cosign keys in secure vault (not in plain secrets).
- GitHub repo: sample app, Terraform/Ansible, pipeline workflows (GitHub Actions), pre-commit config.
- Scripts & Dockerfiles: image build & SBOM generation scripts.
- CI run artifacts: sample reports from semgrep, Checkov, Trivy, ZAP.
- Policy definitions: OPA Rego / Checkov policies, Gatekeeper manifests.
- Demo video (5–10 phút) showing pipeline run, blocking & remediation.
- Technical report (15–25 trang) with evaluation metrics, developer impact analysis, lessons learned.
- Playbooks: triage flow, false-positive handling, exception process.
- Pipeline completeness (25%): pre-commit → CI → build → scan → deploy stages implemented.
- Security coverage (25%): SAST + IaC + Container + DAST integrated and demonstrated.
- Automation & triage (20%): issue creation, auto-fix PRs or remediation playbooks, suppression process.
- Testing & metrics (15%): test plan executed, metrics collected & analyzed.
- Documentation & reproducibility (15%): README, runbook, scripts, demo.
- Tuần 1–2: Project planning, choose sample app & infra, baseline CI.
- Tuần 3–4: Implement pre-commit hooks (semgrep, gitleaks, tfsec).
- Tuần 5–6: Integrate SAST & dependency scanning in CI.
- Tuần 7–8: Implement container build, SBOM generation, Trivy/Anchore scanning.
- Tuần 9: Image signing with cosign & push to registry.
- Tuần 10: Deploy staging & integrate OWASP ZAP DAST.
- Tuần 11: Set up admission policies (OPA Gatekeeper) & verify gating.
- Tuần 12: Automation: issue creation & triage flows, auto-fix PR for IaC.
- Tuần 13: End-to-end testing, metrics collection.
- Tuần 14: Finalize report, demo and presentation.
Run semgrep locally:
semgrep --config=autoRun Checkov on Terraform:
checkov -d ./terraform --skip-check CKV_AWS_20Generate SBOM with syft:
syft ghcr.io/org/image:tag -o spdx-json > sbom.spdx.jsonTrivy scan image:
trivy image --severity HIGH,CRITICAL --exit-code 1 ghcr.io/org/image:tagSign image with cosign:
cosign sign --key cosign.key ghcr.io/org/image:tag
# verify
cosign verify --key cosign.pub ghcr.io/org/image:tagRun OWASP ZAP baseline scan (Docker):
docker run --rm -v $(pwd):/zap/wrk/:Z owasp/zap2docker-stable zap-baseline.py -t https://staging.example.com -r zap_report.html- Integrate in-toto for supply chain provenance & SLSA levels.
- Use graph DB to correlate SBOM components to vuln databases and produce impact analysis.
- Add automated patching pipelines for low‑risk vulns (create PR, run tests, merge with approvals).
- Implement feature flags to decouple security gating for fast rollback.
- Explore reproducible builds and reproducible SBOMs.
- Provide sandbox registry & test accounts; rotate test keys after course.
- Enforce non-destructive DAST scope & preapproval for test endpoints.
- Encourage reproducibility: pin versions and containerize tools.