Offensive Development

UEFI Image Acquisition: Navigating Windows’ Hidden Firmware Landscape

ch3rn0byl — Tue, 10 Dec 2024 00:41:53 +0000

Table of Contents

The Unified Extensible Firmware Interface, UEFI for short, serves as the brain of your system’s most fundamental operations. It’s the first code that runs when you power on your machine, guiding the boot process and bridging the gap between your hardware and operating system. Despite its critical role, this component remains a mystery for many — its complexity and low-level operation often deters even the most seasoned security researchers from exploring it, but what if you could take a peek behind the curtain? What if you can gain access to your device’s UEFI image, analyze its contents, and uncover what lies beneath? Imagine harnessing it into a ~~killer implant~~ feature-rich loader!

Dumping firmware isn’t just about curiosity; it’s a gateway to understanding and securing your system on a deeper level. Whether you’re a security researcher, a firmware enthusiast, or someone who loves to tinker with low-level components, dumping firmware opens up a world of possibilities!

In this blog post, I’ll walk you through the process of dumping UEFI firmware from a live system via software. We’ll explore various registers, consult the Intel Software Development Manual, and follow best practices to safely and effectively extract this image yourself, so you, too, can start diving into the heart of your system. Let’s get started.

Publicly available tools like CHIPSEC can obtain the UEFI image. It’s a powerful tool, no doubt, but when it comes to compromising the UEFI image of ~~your~~ my machine, simply loading it won’t cut it…hehehe

What’s Needed

This project will take place on an Intel 12th Generation i7-1270PE, which is based off of the 600 series chipset. The family platform controller hub for this CPU can be found here.

Realistically you can get by from referencing my data but you mayyyy run into a chipset using a particular offset differently. Would be ideal just to reference just in case 🙂

Weapons of choice:

Visual Studio 2022 with the WDK installed for Windows 11 (preferably)
A signed driver (feeling adventurous)
Windows Kernel programming experience (This is not a how-to for kernel programming)
Willingness to endure pain

This project will only apply for Intel based systems. If you have an AMD system, you will need to see how this would translate for AMD!

Acquiring the Firmware

Platform Controller Hub (PCH)

The Platform Controller Hub (PCH) is a critical component in modern computers, acting as a central hub for data transfers between the CPU, RAM, and peripherals. It manages essential I/O devices including USB ports, SATA controllers, and PCIe slots, while also supporting advanced features such as Rapid Storage Technology, Smart Sound Technology, Intel’s Management Engine, etc. The PCH’s extensive control over system components makes it an attractive target for vulnerability research, bootkit development, and achieving extreme persistence like your favorite Nation State actors.

Security researchers have demonstrated the PCH’s critical role in system security. For example, Satoshi’s SmmExploit showed how UEFI vulnerabilities can enable attackers to tamper with critical system components, successfully compromising an endpoint with Virtualization Based Security (VBS) and Hypervisor-enforced Code Integrity (HVCI) enabled…Wild!

While not directly PCH-related but demonstrating innovative persistence techniques, researchers have also explored persistence mechanisms leveraging PCIe-connected components, such as GPUs. Notable examples include Jellyfish, a Linux-based GPU rootkit leveraging OpenCL’s API to establish persistence within GPU memory, a component that can be reached via PCIe slots. Similarly, Smelly of Vx Underground developed a Proof-of-Concept GPU memory rootkit for Nvidia hardware, leveraging Nvidia’s API to manage logic and interactions over the PCIe bus.

These three examples highlight the PCH’s susceptibility to exploitation and its importance in maintaining system integrity.

The PCH is essential because it provides access to the UEFI image, which resides in flash memory connected via Serial Peripheral Interface (SPI). It features two SPI interfaces: SPI0, dedicated to managing flash memory operations, and SPI0 support for Trusted Platform Module (TPM) functionality. The SPI0 interface for flash is further divided into six distinct regions, as shown in Table 81: SPI0 Flash Regions:

Serial Peripheral Interface (SPI)

The PCH exposes registers and control bits within its PCI configuration space, enabling software to interact with the SPI controller. Accessing this address space involves calculating values for the bus, device, function, and offset, using a simple and straightforward algorithm as shown below:

DWORD GetPciValue(USHORT Bus, USHORT Device, USHORT Function, USHORT Offset)
{
    return 0x80000000 | Bus << 16 | Device << 11 | Function << 8 | Offset;
}

Think of this as plugging in the address of your Tinder date on a Friday night. You plug in the necessary values and voila! You know exactly where to meet your date. Well, same here. You plug in the bus, device, function, and offset (if applicable), and now you know where to read and write to! Hopefully, the SPI controller doesn’t ghost you — a technique I’m all too familiar about!

The resulting PCI value is typically used with x86 assembly IN and OUT instructions to perform low-level register access. For example:

mov eax, PCI_VALUE ; The value from GetPciValue
mov dx, 0xcf8      ; PCI_CONFIG_ADDRESS
out dx, eax        ; Write the calculated PCI address
mov dx, 0xcfc      ; PCI_DATA_PORT
in eax, dx         ; Read the data from the register

By plugging the address into these instructions, you can access the precise register you’re targeting. Just don’t swipe left on the bus, device, or function; otherwise, you’ll end up in the wrong neighborhood and an angry date!

The SPI interface is a key component, as it controls access to the SPI flash memory where the UEFI image is stored. On Intel 600 and 700 series chipsets, the SPI controller is located at Bus 0, Device 31, Function 5. This address maps to the SPI Configuration Registers block, which includes all the registers and control bits necessary for interacting with the SPI controller, as shown below:

Some third-party vendors attempt to obscure the SPI interface by masking the Device ID and Vendor ID at offset 0x00000000 with -1. Nice try, but this “security measure” is more of a speed bump than a roadblock. You can still identify the SPI controller’s physical base address by querying the Base Address Register 0 (BAR0) at offset 0x00000010.

By reading BAR0 anyways, you effectively bypass this attempt at obfuscation, gaining access to the SPI interface. This underscores the classic phrase: security through obscurity. Am I right, or am I right??

The SPI BAR0 MMIO (Memory-Mapped I/O) is a structure that represents the region that describes the SPI memory-mapped registers. The SPI BAR0 MMIO is represented below:

union BIOS_SPI_BAR0
{
	struct
	{
		ULONG MemorySpace : 1;
		ULONG Type : 2;
		ULONG Prefetchable : 1;
		ULONG MemorySize : 8;
		ULONG MemoryBar : 20;
	};

	ULONG AsULong;
};

The SPI memory-mapped registers are shown below:

This region is essential for interacting with the SPI controller, as it holds all the necessary information to perform operations, including extracting the UEFI image.

Although this blog post focuses on UEFI, the SPI bus offers the opportunity to experiment with other peripherals. By exploring these devices, you can ~~tamper~~ ~~modify~~ analyze their firmware or memory spaces — opening a fascinating avenue for research and learning! 🙂

SPI Registers

BIOS Flash Primary Region

The BIOS Flash Primary Region member is a ULONG that represents the core memory segment of flash storage where the main firmware, including the UEFI image, resides. This region is critical because it contains essential system code and data required for the initial boot sequence, hardware initialization, and system management before the operating system fully loads.

The C++ representation of the BIOS Flash Primary Region register is shown below:

union BIOS_BFPREG
{
	struct
	{
		ULONG BiosFlashPrimaryRegionBase : 15;  // Base address of the primary region
		ULONG Reserved : 1;                     // Reserved
		ULONG BiosFlashPrimaryRegionLimit : 15; // Limit of the primary region
		ULONG Reserved2 : 1;                    // Reserved
	};

	ULONG AsULong;
};

This register provides the base address and limit of the BIOS flash primary region, defining its location and size within the SPI flash memory. The base address specifies where the primary region begins, while the limit determines its end. Together, these fields enable precise location and access to firmware components stored in this region.

Calculating the base address of the primary region and the size of the image relies on information from the SPI BAR0 MMIO, specifically the MemorySize and MemoryBar members. These members are hardwired to zero to indicate that the MemoryBar is aligned to a 4KB boundary, and the region spans 4KB of memory.

This information is important for operations like reading or modifying firmware, as it ensures precise targeting of the desired flash memory segment. Understanding and working with this register forms the foundation for tasks such as UEFI image extraction and comprehensive firmware analysis.

Hardware Sequencing Flash Status and Control

The Hardware Sequencing Flash Status and Control (HSFSTS) member is a ULONG that uses individual bits to represent various control and status flags. Below is the C++ representation for reference:

union BIOS_HSFSTS_CTL
{
	struct
	{
		ULONG FlashCycleDone : 1;
		ULONG FlashCycleError : 1;
		ULONG AccessErrorLog : 1;
		ULONG Reserved : 2;
		ULONG SpiCycleInProgress : 1;
		ULONG Reserved2 : 5;
		ULONG WriteStatusDisable : 1;
		ULONG PRR34_LOCKDN : 1;
		ULONG FDOPSS : 1;
		ULONG FDV : 1;
		ULONG FLOCKDN : 1;
		ULONG FGO : 1;
		ULONG FCYCLE : 4;
		ULONG WET : 1;
		ULONG Reserved3 : 2;
		ULONG FDBC : 6;
		ULONG Reserved4 : 1;
		ULONG FSMIE : 1;
	};

	ULONG AsULong;
};

The HSFSTS register serves as the main communication channel with the SPI controller, enabling control over all flash memory operations. Several members of the HSFSTS register provide real-time feedback on operation status. For instance, the FlashCycleDone member is set by the CPU when a flash cycle completes, making it a key indicator to monitor during operations. In cases of errors, the FlashCycleError and AccessErrorLog members are set, signaling issues such as attempting to access restricted memory regions without proper permissions.

Control over flash operations is managed primarily through the FCYCLE and FGO bits. While my focus for this blog post is extracting the UEFI image, which only requires read operations, the FCYCLE member can initiate several types of flash cycles, including:

Read (for data extraction)
Write (for data modifications)
4K Block Erase and 64K Sector Erase (for clearing specific memory regions)
Read Serial Flash Discoverable Parameters (SFDP)
Read JEDEC ID (for device identification)
Write Status and Read Status (for status configuration)
RPMC Op1 and RPMC Op2 (for RPMC operations)

To start a flash operation, the desired type is set in the FCYCLE member, and the FGO bit is set to begin the flash operation. Monitoring the status flags — FlashCycleDone, FlashCycleError, and AccessErrorLog — is important for effective error handling and troubleshooting.

After setting the type of flash operation in the FCYCLE member, the FGO bit is set to begin the flash operation. For example, after initiating a cycle, check FlashCycleDone to confirm successful completion. If FlashCycleError is set, review the permissions and command parameters, clear the error flag and retry. If AccessErrorLog is set, ensure the targeted memory region has the appropriate access level and adjust the address if necessary.

For reading data from flash memory, setting FCYCLE to Read enables the necessary flash operations. During the process, monitoring FlashCycleDone ensures that each segment completes as expected. Since each read cycle retrieves up to a maximum of 64 bytes, extracting the full UEFI image requires a sequential series of commands to cover the entire region. Watching these operations in real-time with tools like RWEverything, as the SPI controller iterates through the flash cycle commands, is very interesting to see! Who would have thought a simple ULONG could have so much influence?

The Flash address

The Flash Linear Address specifies the starting point for the SPI controller’s read operations. This address determines where the controller begins accessing data within the SPI flash memory. If the target region lacks the necessary access permissions, the I/O operation will fail, resulting in errors in FlashCycleError or AccessErrorLog.

When extracting UEFI images, the Flash Linear Address corresponds to the specific region being iterated through in the flash memory. Each flash region is defined by its base address and limit, as outlined in the SPI configuration. By calculating and setting the correct linear address, you can target the desired region and ensure accurate data retrieval during the extraction process.

Flash Data

The Flash Data array, consisting of 16 DWORDs, stores the data retrieved by the SPI controller during a read operation. Each read cycle retrieves up to 64 bytes of data, as determined by the size specified in the Flash Data Byte Count member of the HSFSTS register. Unlike a typical file download where data is transferred in a single continuous stream, retrieving flash data requires sequential commands. Each command directs the SPI controller to read the next chunk of data until the entire UEFI image is retrieved.

Watching the SPI controller process data 64 bytes at a time with tools like RWEverything is such a cool thing to watch — like watching someone fill a gallon of water using one cup at a time!

Error Registers

The error registers include three key status bits: Flash Cycle Done, Flash Cycle Error, and Access Error Log. These bits, the lower three bits of the HSFSTS register, are critical for monitoring the success of a flash operation.

Flash Cycle Done: This bit indicates that the flash cycle has completed successfully. It should be checked after each operation to confirm proper execution.
Flash Cycle Error: If set, this bit signals that an error occurred during the operation, such as invalid parameters or permission issues.
Access Error Log: This bit logs access violations, which often occur when attempting to interact with restricted or protected regions of flash memory.

If any of these bits are set, it means something went wrong during the operation. Proper error handling involves diagnosing the issue, clearing the error flags, and retrying the operation with corrected parameters or permissions.

Pulling it all together

The goal of this project is to extract the UEFI image via SPI interface; however, the same method can be repurposed to interact with other peripherals on the bus, such as leveraging GPU memory as mentioned earlier. This process begins by resolving the SPI base address, which is calculated using the BAR0 register. According to Intel’s 700 series datasheet, the base address aligns to a 4KB boundary and is mapped into the process’ address space for further interaction:

DWORD dwSpiBaseAddress = bar0.MemoryBar << 12;
DWORD dwSpiSize = 1 << 12;

At this point, we have a pointer to the SPI Memory-Mapped structure, as previously discussed.

The next step involves calculating the size of the UEFI region. This is done by taking the value from the BiosFlashPrimaryRegionLimit member, aligning it to a 4KB boundary, and adding 1 to account for the entire region. The calculation is implemented below:

dwUefiRegionSize = pSpiFlashMemory->BiosFlashPrimaryRegion.BiosFlashPrimaryRegionLimit;
dwUefiRegionSize = ((dwUefiRegionSize << 12) | 0xfff) + 1;

pUefiRegionAsBytes = std::make_unique<BYTE[]>(dwUefiRegionSize);

Once the size is calculated, the program iterates through the six defined flash regions to validate their boundaries and extract data. Each region is checked to ensure the base address is not larger than the limit, as this would indicate an invalid region:

for (DWORD dwRegionIndex = 0; dwRegionIndex < ((sizeof(pSpiFlashMemory->FlashRegion) / sizeof(pSpiFlashMemory->FlashRegion[0])) << 1); dwRegionIndex++)
{
    UefiTypes::BIOS_FREG BiosFreg = pSpiFlashMemory->FlashRegion[dwRegionIndex];
    DWORD dwRegionBase = BiosFreg.RegionBase << 12;
    DWORD dwRegionLimit = ((BiosFreg.RegionLimit << 12) | 0xfff) + 1;

    if (dwRegionBase > dwRegionLimit)
    {
        continue;
    }
  --snipped--

For valid regions, the SPI controller begins reading data in 64-byte increments. The FlashCycleDone flag is monitored to ensure each cycle completes successfully, and the data is copied from the SPI flash memory into the allocated buffer:

  --snipped--
    pSpiFlashMemory->FlashAddress.FlashLinearAddress = dwRegionBase;

    do
    {
        pSpiFlashMemory->HardwareSequencingFlashStatusAndControl.FDBC = 64 - 1;
        pSpiFlashMemory->HardwareSequencingFlashStatusAndControl.FCYCLE = static_cast<DWORD>(UefiTypes::HSFSTS_CYCLE::Read);
        pSpiFlashMemory->HardwareSequencingFlashStatusAndControl.FGO = 1;

        while (!pSpiFlashMemory->HardwareSequencingFlashStatusAndControl.FlashCycleDone);

        if (!pSpiFlashMemory->HardwareSequencingFlashStatusAndControl.FlashCycleError &&
            !pSpiFlashMemory->HardwareSequencingFlashStatusAndControl.AccessErrorLog)
        {
            for (DWORD dwIndex = 0; dwIndex < sizeof(pSpiFlashMemory->FlashData) / sizeof(pSpiFlashMemory->FlashData[0]); dwIndex++)
            {
                RtlCopyMemory(
                    &pUefiRegionAsBytes[(pSpiFlashMemory->FlashAddress.FlashLinearAddress) + (dwIndex * sizeof(DWORD))],
                    &pSpiFlashMemory->FlashData[dwIndex],
                    sizeof(DWORD)
                );
            }
        }

        pSpiFlashMemory->FlashAddress.FlashLinearAddress += 64;
    } while (pSpiFlashMemory->FlashAddress.FlashLinearAddress < dwRegionLimit);
}

This process repeats for all valid regions until the entire UEFI image is extracted. By ensuring proper validation and monitoring error flags such as FlashCycleError and AccessErrorLog, the program can reliably interact with the SPI controller and retrieve the firmware data.

With the UEFI image successfully extracted, the next steps could involve analyzing or modifying the firmware for research purposes. Well, there you have it! You have successfully extracted the UEFI image of ~~their~~ your endpoint 🙂

The full project can be found here.

Risks of Software-Based UEFI Acquisition

Sophisticated attackers who have compromised System Management Mode (SMM) can manipulate FDATAx registers by taking control of the System Management Interrupt (SMI) handler, allowing the malicious actor to present a falsified “clean” UEFI image, effectively concealing malicious modifications. Instead of revealing the actual infected firmware, the compromised system serves a sanitized image that appears legitimate.

This creates a false sense of security, as a forensic analyst will see only what the malicious actor wants them to see. Meanwhile, the bootkit continues to operate undetected, and the investigation concludes there are no threats present. Therefore, analysts should treat all software-based UEFI acquisitions with healthy skepticism and considering them untrusted unless verified through a hardware-based acquisition method.

While this analysis reflects security research findings, organizations may have different Standard Operating Procedures (SOPs) for UEFI acquisition based on their specific risk models and requirements. Readers should consult their organization’s forensic protocols…or just wing it and perfect the art of apologizing later 😉

References

Matrosov, Alex, et al. “Chapter 19: BIOS/UEFI Forensics: Firmware Acquisition and Analysis Approaches.” Rootkits and Bootkits, No Starch Press, 2019, pp. 363–374.

Tanda, Satoshi. “Debugging System with DCI and Windbg.” Debugging System with DCI and Windbg, Mar. 2021, standa-note.blogspot.com/2021/03/debugging-system-with-dci-and-windbg.html.

LucaBongiorni. “Lucabongiorni/Jellyfish: GPU Rootkit Poc by Team Jellyfish.” GitHub, 2015, github.com/LucaBongiorni/jellyfish.

Smelly. “VXUG-Papers/Gpumemoryabuse.Cpp at Main · Vxunderground/VXUG-Papers.” GitHub, 2021, github.com/vxunderground/VXUG-Papers/blob/main/GpuMemoryAbuse.cpp.

“Intel® 600 Series Chipset Family Platform Controller Hub (PCH) Datasheet, Volume 1 of 2.” Intel 600 Series Chipset Family Platform Controller Hub (PCH) Datasheet, Volume 1 and 2, Intel, May 2022, www.intel.com/content/www/us/en/content-details/648364/intel-600-series-chipset-family-platform-controller-hub-pch-datasheet-volume-1-of-2.html.

Shoutouts

I’d like to do a quick shoutout to @standa_t and @matrosov for answering some questions I had at the time when I was working on this project.

From vulnerability to insight: Root cause analysis of CVE-2023-28218

ch3rn0byl — Tue, 18 Jul 2023 04:59:49 +0000

On Patch Tuesday for April 2023, one of the vulnerabilities patched was a local privilege escalation within afd.sys (CVE-2023-28218). The Ancillary Function Driver (AFD.sys) component manages all networking from user-mode to kernel-mode and is a very convoluted component. This vulnerability was actually going to be used as an entry in TianfuCup by Ezrak1e, but in an unfortunate turn of events, this vulnerability was reported to Microsoft by Junoh Lee of Theori. ~~Damn it~~Nice work, Junoh!

Before we move on to the analysis, please rise and pay respects to Ezrak1e’s sad story:

Continuing my sad story, another exploit I prapared for tfc got fixed today cve-2023-28218 reported by Junoh Lee
A double fetch+integer overflow can allow the size of memmove 0xffffffff. Since src is user space, it can actually copy any size pic.twitter.com/o4ba0F21p7
— Ezrak1e (@ezrak1e) April 12, 2023

Root Cause Analysis

A root cause analysis must be done to identify how the vulnerability was remediated that way we can figure out how to exploit the vulnerability. This can be done by first bindiffing the March and April afd.sys images.

As an aside, the vulnerable and patched versions of AFD may be downloaded below.

I’m always drawn to functions related to memory operations that have changed, so AfdCopyCMSGBuffer naturally piqued my interest. Several other functions were refactored, but the most notable changes were observed in AfdCopyCMSGBuffer and AfdComputeCMSGLength. It appears that AfdCopyCMSGBuffer received the bulk of the remediation efforts, among a few others. My Usual approach is to thoroughly reverse-engineer these functions and refine the disassembler output because it makes it easier for me to paint a better picture (I can’t stay between the lines no matter how hard I try.)

Patch Diffing

So what exactly is happening? The only way to find out is by patch-diffing these functions to see what part of AfdCopyCMSGBuffer has been remediated.

In my opinion, it is easiest to spot the bug(s) by looking at the decompiler side by side.

As an aside, I highly recommend you not to rely on this output as you may miss crucial details in regard to exploitation. The devil is in the details, hehe.

For the sake of this blog post, this function was reverse-engineered to discover what the prototype is for AfdCopyCMSGBuffer:

NTSTATUS AfdCopyCMSGBuffer(
    _In_ PVOID SmtxBuffer,
    _In_ PVOID UmPointerWithControlledData,
    _In_ DWORD SizeOfUmPointer
);

One of the differences in the change is the function now takes an additional parameter: the size of the CMSGBuffer. The CMSGBuffer is the first buffer that contains a message. The prototype now looks like this:

NTSTATUS AfdCopyCMSGBuffer(
    _In_ PVOID SmtxBuffer,
    _In_ DWORD SizeOfCMSGBuffer,
    _In_ PVOID UmPointerWithControlledData,
    _In_ DWORD SizeOfUmPointer
);

Microsoft developers have taken a proactive approach to mitigating the variables involved in arithmetic operations. This solution ensures the appropriate NTSTATUS value is returned in the event of an integer overflow. It’s important to note the specifics of the data are not yet known as this observation comes from the initial analysis. Let’s get into what kind of data is being parsed and how to reach this vulnerable code path.

Reverse Engineering

There aren’t many cross-references to AfdCopyCMSGBuffer, as shown below:

Each of these functions was analyzed by following the cross-references to reach the parent function and it boiled down to these three functions: AfdRioFastIo, AfdFastIoDeviceControl, and AfdSendMessageDispatch. The challenge lies in determining the more practical (read: easiest) approach to reach this path. Can I use CreateFile? Can I use the WinSock API? If I use the CreateFile API, I would not be able to set what I need to interact with AFD. I can use NtCreateFile, but I would need to know how to correctly set the Extended Attributes, which is undocumented (mostly); however, WinSock would be the easiest to work with…

Several years ago, a good friend of mine and I worked together on a project that involved working with raw sockets at the lowest level, which included reverse engineering the AFD component. One of the issues we ran into was static analysis for Afd’s dispatch table. The dispatch table is shown below:

NTSTATUS AfdDispatchDeviceControl(
    _In_ PDEVICE_OBJECT DeviceObject,
    _In_ PIRP IrpsyDaisy
)
{
    --snipped--
    IoControlCode = IoStackLocation->Parameters.DeviceIoControl.IoControlCode;
    
    offset = (IoControlCode >> 2) & 0x3ff;
    
    if (offset < 0x49 &&
        AfdIoctlTable[offset] == IoControlCode &&
        (IoStackLocation->MinorFunction = (IoControlCode >> 2)) &&
        (fnAfdIoctlFunction = AfdIrpCallDispatch[offset]) != nullptr)
    {
        return fnAfdIoctlFunction(IrpsyDaisy, IoStackLocation);
    }
    --snipped--
}

Retrieving these functions is challenging due to the indirect call being made, especially when faced with a maximum of 73 (0x49) potential IOCTLs. It is necessary to calculate an offset from each IOCTL to determine the AFD method about to be used, which is then used as an index for the AfdIrpCallDispatch array. Doing this for 73 IOCTLs can be painfully tedious so I created a native WinDbg plugin to help ease retrieving this information. This plugin can extract information based on offset, symbol, IOCTL, or even provide a complete dump of the AfdIrpCallDispatchTable, including the AfdDispatchImmediateIrp methods.

Now the question remains: would my plugin succeed in identifying any cross-referenced functions? Turns out yes — absolutely!

0: kd> !afdext /s afd!AfdSendMessageDispatch
  IOCTL  | OFFSET | SYMBOL
000120d3 |   34   | afd!AfdSendMessageDispatch

AfdSendMessageDispatch leads to the vulnerable AfdCopyCMSGBuffer function by using IOCTL 0x120d3. It is unknown at this point what data is required to hit the vulnerable function so what I do is place a breakpoint on this function to see if it gets hit with the intention of examing the data an actual process uses this for. I do this while I push through with the initial buffers containing identifiable data. Luckily for me, the breakpoint wasn’t hit during my testing by any normal processes and their legitimate use. Nonetheless, the breakpoint was hit but failed after stepping through a few checks…but why?

3: kd> g
Breakpoint 0 hit
afd!AfdSendMessageDispatch:
fffff800`36977180 4c8bdc          mov     r11,rsp

The relevant part of where we are failing in AfdSendMessageDispatch is shown below:

3: kd> 
afd!AfdSendMessageDispatch+0x31:
fffff800`369771b1 b8d1af0000      mov     eax,0AFD1h
3: kd> 
afd!AfdSendMessageDispatch+0x36:
fffff800`369771b6 663903          cmp     word ptr [rbx],ax
3: kd> dw @rbx l1
ffff998e`51fa5b70  afd0

The failure is coming from our process containing the afd0 tag which now poses the question of what these tags mean and where they come from. The number 0xafd1 indicates an endpoint representing a datagram socket. The numbers 0xafd0, 0xafd2, 0xafd4, and 0xafd6 indicate endpoints representing TCP sockets in various states (CodeMachine, 2023). Interesting.

CodeMachine contributes a one-liner in WinDbg that displays the AFD endpoints in a debugging session. By reverse engineering and building upon that one-liner, I enhanced its capabilities. My refined version not only retrieves the AFD identifier but also pinpoints the specific process that owns the AFD endpoint, as well as provides details into the endpoint’s type and its current operation state:

3: kd> !afdext_endpoints
Information for AFD Endpoint ffffab03b392ea10:
  AFD Identifier: afd1 (Datagram endpoint in any state)
  State: Connected
  Owning process: CVE-2023-28218
  Endpoint type: tcpip!UdpTlProviderMessageDispatch

Information for AFD Endpoint ffffab03b392d620:
  AFD Identifier: afd1 (Datagram endpoint in any state)
  State: Connected
  Owning process: svchost.exe
  Endpoint type: tcpip!UdpTlProviderMessageDispatch

Information for AFD Endpoint ffffab03b392f320:
  AFD Identifier: afd2 (VC endpoint after connect/accept)
  State: Cleaning
  Owning process: OneDrive.exe
  Endpoint type: tcpip!TcpTlProviderEndpointDispatch

Now it is understood why the exploit was initially failing: TCP sockets were being used. To pass this check, simply switching the socket type from TCP to UDP should work.

AfdExtractAfdSendMsgInfo

The next function to tackle is AfdExtractAfdSendMsgInfo. The type of data that needs to be set is still unknown at this point but will make itself known as we step through, coercing code execution into reaching the vulnerable function.

NTSTATS AfdExtractAfdSendMsgInfo(
    _In_ PIRP Irp,
    _In_ PIO_STACK_LOCATION IoStackLocation,
    _Inout_ PVOID AllocatedSMTXBuffer,
    _Inout_ PDWORD Something // Change me when you get it
)
{
    NTSTATUS Status;
    --snipped--
    DWORD InputBufferLength = IoStackLocation->Parameters.DeviceIoControl.InputBufferLength;
    
    BOOL bIs32Bit = IoIs32bitProcess(Irp);
    if (!bIs32Bit)
    {
        // logic for x64 processes
    }
    
    if (InputBufferLength < 0x24)
        Status = STATUS_INVALID_PARAMETER;
        goto end;
    --snipped--
}

A call is made to IoIs32BitProcess, which checks if the process is a 32-bit process or not. To save you from the painstaking details, I will jump right into the details of opting for the 32-bit process.

As an aside, reach out to me via Twitter, email, or Discord if you would like to go over these painful details.

We also have to ensure the input buffer length needs to be above 36 (0x24), otherwise the function will return with the STATUS_INVALID_PARAMETER error. This tells me I am most likely working with nine DWORDs at a minimum.

NTSTATS AfdExtractAfdSendMsgInfo(
    _In_ PIRP Irp,
    _In_ PIO_STACK_LOCATION IoStackLocation,
    _Inout_ PVOID AllocatedSMTXBuffer,
    _Inout_ PDWORD Something // Change me when you get it
)
{
    DWORD unknown;
    DWORD unknown2;
    DWORD unknown3;
    
    --snipped--
    PVOID InputBuffer = IoStackLocation->Parameters.DeviceIoControl.Type3InputBuffer;
    
    if (Irp->RequestorMode)
    {
        if ((InputBuffer & 3) != NULL)
        {
            ExRaiseDatatypeMisalignment();
        }
        
        if (InputBuffer > MM_USER_PROBE_ADDRESS)
        {
            InputBuffer = MM_USER_PROBE_ADDRESS;
            *InputBuffer = NULL;
        }
    }
    
    unknown = *(InputBuffer + 0x1c);
    if (unknown)
    {
        unknown2 = *(InputBuffer + 0x18);
        Status = AfdComputeCMSGLength(unknown2, unknown, &unknown3);
        --snipped--
    }
    --snipped--
}

If the calling process is UserMode, AfdExtractAfdSendMsgInfo ensures the buffer is aligned at the 4kb boundary and the input buffer is within the bounds of the user-mode address space.

Identifiable data is now beginning to get referenced from our input buffer.

AfdComputeCMSGLength

After reverse engineering the AfdComputeCMSGLength function, it was determined that it requires three parameters: a pointer to the data, the size of that pointer, and an output variable. This function’s primary role is to calculate the amount of bytes needed for subsequent memory allocations. This calculation is achieved by iterating through the array as DWORDs and summing up the referenced values, factoring in any necessary padding.

The prototype of AfdComputeCMSGLength is shown below:

NTSTATUS AfdComputeCMSGLength(
    _In_ PVOID UmPointerWithControlledData,
    _In_ DWORD SizeOfUmPointer,
    _Out_ PDWORD FixedSize
);

The next call after the successful completion of AfdComputeCMSGLength is AfdAllocateMdlChain32.

AfdAllocateMdlChain32

AfdAllocateMdlChain32 is responsible for ensuring the user-mode buffers supplied are locked into place that way the kernel can safely access these buffers without worrying about incurring any page faults.

For this function to succeed, various rules are set to ensure the supplied buffers are aligned on a 4-byte boundary only if the callee is from a user-mode process, as seen below:

NTSTATUS AfdAllocateMdlChain32(...)
{
    --snipped--
    if (Irp->RequestorMode)
    {
        if (!UmPointerWithControlledData || SizeOfUmPointer - 1 > 0x1ffffffe)
            ExRaiseStatus(STATUS_INVALID_PARAMETER);
            
        NumberOfBytes = SizeOfUmPointer * 8;
        if (NumberOfBytes)
        {
            if ((UmPointerWithControlledData & 3) != NULL)
                ExRaiseDatatypeMisalignment();
                
            if (UmPointerWithControlledData + NumberOfBytes > MmUserProbeAddress || 
                UmPointerWithControlledData + NumberOfBytes < UmPointerWithControlledData)
                *MmUserProbeAddress = NULL;
        }
    }
    --snipped--
}

If the user-mode buffer is NULL, not aligned on a 4-byte boundary, higher than the MmUserProbeAddress, or less than the actual address due to integer wrapping, an exception will be raised.

IOCTL 0x120d3 uses the METHOD_NEITHER buffer type, which exposes risks due to the lack of validation routines on the user-mode buffer when accessed directly by the kernel. What AfdAllocateMdlChain32 does is ensure the user-mode buffers are resident and can be safely accessible by the kernel without incurring any page faults by using IoAllocateMdl and MmProbeAndLockPages.

NTSTATUS AfdAllocateMdlChain32(...)
{
    --snipped--
    while (NumberOfBytes)
    {
        Length = UmPointerWithControlledData->Size;
        if (Length)
        {
            --snipped--
            Mdl = IoAllocateMdl(UmPointerWithControlledData->VirtualAddress, Length, FALSE, TRUE, FALSE);
            if (!Mdl)
            {
                Status = STATUS_INSUFFICIENT_RESOURCES;
                break;
            }
            
            MmProbeAndLockPages(Mdl, Irp->RequestorMode, Operation);
            --snipped--
        }
        --snipped--
    }
    --snipped--
}

The prototype of AfdAllocateMdlChain32 is shown below:

NTSTATUS AfdAllocateMdlChain32(
    _In_ PIRP IrpsyDaisy,
    _In_ PVOID UmPointerWithControlledData,
    _In_ DWORD SizeOfUmPointer,
    _In_ LOCK_OPERATION Operation,
    _Out_ PDWORD NumberOfBytesProcessed,
    _Out_ PDWORD unknown; 
);

Once the user-mode buffers are locked in, a series of additional checks take place before reaching the next call to AfdBuildSendMsgTracker.

AfdBuildSendMsgTracker

The AfdBuildSendMsgTracker function is responsible for memory allocations. Through reverse engineering, it was discovered the parameters of this function rely on the third and fourth DWORDs from the input buffer. The first parameter defines the length of a string, while the second indicates the number of elements in an array. The data type of this array is currently unknown.

AfdBuildSendMsgTracker performs size validation on these parameters ensuring the CMSG data does not exceed 134 (0x86) in size and the secondary array does not exceed 65,536 (0x10000) items. If both of these conditions are met, the lengths are adjusted to align with the closest 8-byte boundary. These adjusted lengths are then combined with an additional 80 bytes and then used as the size parameter for ExAllocatePoolWithQuotaTag:

PVOID AfdBuildSendMsgTracker(...)
{
    if (SendMessageLength > 0x86 || SomeArrayLength > 0x10000)
        return nullptr;
        
    UINT64 Value = (SendMessageLength + 7) & 0xfffffffffffffff8;
    UINT64 Value2 = (SomeArrayLength + 7) & 0xfffffffffffffff8;
    
    PVOID Allocated = ExAllocatePoolWithQuotaTag(0x210, Value + Value2 + 80, 'MdfA');
    memset(Allocated, 0, Value + Value2 + 80);
    --snipped--
    return Allocated;
}

Once the buffer is created, an ‘SMTX’ header is written at the beginning of the buffer.

The prototype for AfdBuildSendMsgTracker is shown below:

PVOID AfdBuildSendMsgTracker(
    _In_ DWORD SendMessageLength, 
    _In_ DWORD SomeArrayLength
);

An additional series of checks are done against the AFD endpoint structure but like before, the proof of concept in its current state satisfied these checks and allowed me to continue on to the next call…memcpy.

2: kd> 
afd!AfdExtractAfdSendMsgInfo+0x207:
fffff807`101a757b e84016fdff      call    afd!memcpy (fffff807`10178bc0)
2: kd> r @rcx, @rdx, @r8
rcx=ffffd98a8b24b610 rdx=00000000014b4d60 r8=0000000000000086
2: kd> db @rdx l@R8
00000000`014b4d60  63 00 68 00 33 00 72 00-6e 00 30 00 62 00 79 00  c.h.3.r.n.0.b.y.
00000000`014b4d70  6c 00 20 00 3b 00 29 00-00 00 00 00 00 00 00 00  l. .;.).........
00000000`014b4d80  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`014b4d90  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`014b4da0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`014b4db0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`014b4dc0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`014b4dd0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
00000000`014b4de0  00 00 00 00 00 00                                ......

The send message buffer is then copied into the pool that was previously allocated in AfdBuildSendMsgTracker. After this call to memcpy, an additional series of checks that we can pass take place and we finally land at the vulnerable function!

0: kd> g
Breakpoint 0 hit
afd!AfdExtractAfdSendMsgInfo+0x2a3:
fffff807`101a7617 e81490fdff      call    afd!AfdCopyCMSGBuffer (fffff807`10180630)

AfdCopyCMSGBuffer

AfdCopyCMSGBuffer is where the vulnerabilities are located, so it is especially important to understand what is going on. The prototype for this method is shown below:

NTSTATUS AfdCopyCMSGBuffer(
    _In_ PVOID SmtxBuffer,
    _In_ PVOID UmPointerWithControlledData,
    _In_ DWORD SizeOfUmPointer
);

Through reverse engineering, it was determined the second parameter of this function points to an array of data that can be translated to the following structure:

typedef struct _SECONDARY_MESSAGE_BUFFER
{
    DWORD Length;
    PDWORD Buffer;
} SECONDARY_MESSAGE_BUFFER, * PSECONDARY_MESSAGE_BUFFER;

NTSTATUS AfdCopyCMSGBuffer(...)
{
    DWORD SizeOfUmPointerTemp = SizeOfUmPointer;
    
    if (SizeOfUmPointer < 0x10)
        return STATUS_SUCCESS;
    --snipped--
    return STATUS_INVALID_PARAMETER;
}

The initial check in AfdCopyCMSGBuffer verifies that SizeOfUmPointer is at least 16 (0x10) bytes. If not, STATUS_SUCCESS is returned.

This section of code begins parsing the UmPointerWithControlledData and populating the SmtxBuffer. This is where things get really interesting!

NTSTATUS AfdCopyCMSGBuffer(...)
{
    --snipped--
    while (true)
    {
        Length = UmPointerWithControlledData->Length;
        if (Length < 0xc)
            break;
            
        TempLength = -1;
        Temp = Length - 12;
        if (Temp + 16 >= Temp)
            TempLength = (Length - 12) + 16;
            
        *SMTXBuffer = TempLength;
        
        if (Temp + 16 < Temp)
            break;
            
        Rounded = (TempLength + 7) & 0xfffffff8;
        if (SizeOfUmPointer < Rounded)
            return STATUS_SUCCESS;
            
        *(SMTXBuffer + 8) = UmPointerWithControlledData->Buffer;
        SizeOfUmPointer -= Rounded;
        *(SMTXBuffer + 12) = UmPointerWithControlledData->Length;
        memcpy(SmtxBuffer + 16, UmPointerWithControlledData->Buffer, Length - 12);
        
        if ((UmPointerWIthControlledData + ((Length + 3) & 0xfffffffc)) < UmPointerWithControlledData)
            break;
            
        UmPointerWithControlledData = UmPointerWithControlledData + ((Length + 3) & 0xfffffffc);
        SmtxBuffer += (TempLength + 7) & 0xfffffffffffffff8;
        if (SizeOfUmPointer < 0x10)
            return STATUS_SUCCESS;        
    }
    
    return STATUS_INVALID_PARAMETER;
}

This is a good chunk of code going on here so for clarity’s sake, I will summarize what is going on in this function.

In AfdCopyCMSGBuffer, user-mode data is parsed within a loop. Each array element consists of the length of the buffer and the buffer itself. This function ensures each buffer’s length is greater than 12 (0x0c) bytes. Additional checks are conducted to prevent integer overflows with the length variable and confirm the length aligns on an 8-byte boundary.

The kernel buffer is updated to include both the user-mode buffer and length. This data is subsequently copied into the kernel buffer on increments of 16. Both user-mode and kernel-mode buffers are adjusted on a 4-byte and 8-byte boundary to continue parsing.

Should any of these checks fail, the appropriate NTSTATUS value is returned otherwise code execution continues on to do whatever an AFD man does.

Exploitation

We now know what and where the vulnerability is based on analyzing both patched and vulnerable versions of AfdCopyCMSGBuffer.

The first issue we have is a double fetch that occurs here:

NTSTATUS AfdCopyCMSGBuffer(...)
{
    while (true)
    {
        Length = UmPointerWithControlledData->Length;
        if (Length < 0xc)
            break;
       --snipped--      
    }

    return STATUS_INVALID_PARAMETER;
}

The kernel is referencing data in the user-mode buffer multiple times (the while loop) after all bounds checking have taken place. An additional check ensures the length is greater than 12 (0x0c); if not, parsing will stop and a STATUS_INVALID_PARAMETER is returned. Double fetches are a type of race condition where an attacker can modify the data within a specific time window between the kernel’s fetch to the user-mode data.

The idea here is to contain multiple secondary arrays to coerce the while loop to continuously “fetch” data from the user-mode buffer, effectively expanding the time window and making it easier to win the race. This can be accomplished by creating a separate “attacker” thread in the PoC whose only responsibility would be to mutate the length field to contain large integers leading to the next vulnerability: an integer overflow.

For testing purposes, an attacker thread was created to modify the original size of 0x1c to 0x4141.

g afd!AfdCopyCMSGBuffer+0x52: fffff80d`cc740682 83e1f8 and ecx,0FFFFFFF8h 2: kd> r @ecx ecx=414c" style="color:#adbac7;display:none" aria-label="Copy" class="code-block-pro-copy-button">

0: kd> bp @rip ".if (@ecx != 23) {} .else {gc}"
2: kd> g
afd!AfdCopyCMSGBuffer+0x52:
fffff80d`cc740682 83e1f8          and     ecx,0FFFFFFF8h
2: kd> r @ecx
ecx=414c

The next vulnerability exploited is an integer overflow that occurs in the following code block:

fffff80d`cc740658 448b33       mov     r14d, dword ptr [rbx]
fffff80d`cc74065b 4183fe0c     cmp     r14d, 0Ch
fffff80d`cc74065f 0f8288000000 jb      afd!AfdCopyCMSGBuffer+0xbd (fffff80dcc7406ed)
fffff80d`cc740665 418d46f4     lea     eax, [r14-0Ch]
fffff80d`cc740669 4883cdff     or      rbp, 0FFFFFFFFFFFFFFFFh
fffff80d`cc74066d 8bc8         mov     ecx, eax
fffff80d`cc74066f 4883c010     add     rax, 10h
fffff80d`cc740673 483bc1       cmp     rax, rcx
fffff80d`cc740676 480f43e8     cmovae  rbp, rax
fffff80d`cc74067a 48892e       mov     qword ptr [rsi], rbp
fffff80d`cc74067d 726e         jb      afd!AfdCopyCMSGBuffer+0xbd (fffff80dcc7406ed)
fffff80d`cc74067f 8d4d07       lea     ecx, [rbp+7]
fffff80d`cc740682 83e1f8       and     ecx, 0FFFFFFF8h
fffff80d`cc740685 3bf9         cmp     edi, ecx
fffff80d`cc740687 7246         jb      afd!AfdCopyCMSGBuffer+0x9f (fffff80dcc7406cf)

If the attacker thread modifies the length to become UINT_MAX after all the previous checks have taken place, the only check that would be done is validating the length is above 12 (0x0c).

12 (0x0c) is subtracted from the length, which now becomes 0xfffffff3. When 16 (0x10) is added to this value, it now becomes 0x100000003.

The issue lies in the lea instruction as the value gets truncated from a 64-bit (RBP) instruction to a 32-bit (ECX) instruction. So what is supposed to be 0x100000003 now becomes 0x0a after seven gets added. The AND instruction aligns 0x0a to an 8-byte boundary and now becomes 8.

The final check ensures the number of elements in the secondary array is not less than the calculated value. This statement can be seen as “if 0x80 < 8”. Because 0x80 is clearly larger than 8, this validation is satisfied and code execution is allowed to proceed to the next vulnerability: a heap-based buffer overflow.

Lastly, a heap-based buffer overflow occurs by allowing a size of 0xfffffff3 bytes to be copied into the Smtx buffer that consists of 80 bytes.

3: kd> tc
afd!AfdCopyCMSGBuffer+0x73:
fffff80d`cc7406a3 e81885ffff      call    afd!memcpy (fffff80d`cc738bc0)
3: kd> r @rcx, @rdx, @r8
rcx=ffffbd834781af90 rdx=00000000009c03c4 r8=00000000fffffff3
3: kd> db @rdx
00000000`009c03c4  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c03d4  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c03e4  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c03f4  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c0404  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c0414  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c0424  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
00000000`009c0434  18 00 00 00 18 00 00 00-18 00 00 00 18 00 00 00  ................
3: kd> !pool @rcx
Pool page ffffbd834781af90 region is Special pool
*ffffbd834781a000 size:   80 data: ffffbd834781af80 (NonPaged) *Afd 
		Pooltag Afd  : AFD objects, Binary : afd.sys

Alas, the vulnerability has been exploited!

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: ffffba0fb87d3000, memory referenced.
Arg2: 0000000000000002, X64: bit 0 set if the fault was due to a not-present PTE.
	bit 1 is set if the fault was due to a write, clear if a read.
	bit 3 is set if the processor decided the fault was due to a corrupted PTE.
	bit 4 is set if the fault was due to attempted execute of a no-execute PTE.
	- ARM64: bit 1 is set if the fault was due to a write, clear if a read.
	bit 3 is set if the fault was due to attempted execute of a no-execute PTE.
Arg3: fffff80e5cf98da7, If non-zero, the instruction address which referenced the bad memory
	address.
Arg4: 0000000000000002, (reserved)

Debugging Details:
------------------
KEY_VALUES_STRING: 1

    Key  : AV.Type
    Value: Write

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Version
    Value: 10.0.19041.1


BUGCHECK_CODE:  50

BUGCHECK_P1: ffffba0fb87d3000

BUGCHECK_P2: 2

BUGCHECK_P3: fffff80e5cf98da7

BUGCHECK_P4: 2

READ_ADDRESS:  ffffba0fb87d3000 Special pool

MM_INTERNAL_CODE:  2

IMAGE_NAME:  afd.sys

MODULE_NAME: afd

FAULTING_MODULE: fffff80e5cf90000 afd

PROCESS_NAME:  CVE-2023-28218.exe

TRAP_FRAME:  ffffe58597dc1d30 -- (.trap 0xffffe58597dc1d30)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ffffba0fb87d2e60 rbx=0000000000000000 rcx=ffffba0fb87d3030
rdx=000045f048276d0c rsi=0000000000000000 rdi=0000000000000000
rip=fffff80e5cf98da7 rsp=ffffe58597dc1ec8 rbp=0000000100000002
 r8=0000000000000022  r9=0000000003fffff8 r10=ffffe58597dc24c0
r11=0000000100a49b5e r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
afd!memcpy+0x1e7:
fffff80e`5cf98da7 0f2b51d0        movntps xmmword ptr [rcx-30h],xmm2 ds:ffffba0f`b87d3000=????????????????????????????????
Resetting default scope

STACK_TEXT:  
ffffe585`97dc12d8 fffff803`4272a2c2     : ffffe585`97dc1440 fffff803`42593ab0 fffff80e`5cf90000 00000000`00000000 : nt!DbgBreakPointWithStatus
ffffe585`97dc12e0 fffff803`427298a6     : fffff80e`00000003 ffffe585`97dc1440 fffff803`42626490 ffffe585`97dc1990 : nt!KiBugCheckDebugBreak+0x12
ffffe585`97dc1340 fffff803`4260f217     : 00000000`00000000 00000000`00000000 ffffba0f`b87d3000 ffffba0f`b87d3000 : nt!KeBugCheck2+0x946
ffffe585`97dc1a50 fffff803`426894bf     : 00000000`00000050 ffffba0f`b87d3000 00000000`00000002 ffffe585`97dc1d30 : nt!KeBugCheckEx+0x107
ffffe585`97dc1a90 fffff803`424b9d70     : 00000000`00000000 00000000`00000002 ffffe585`97dc1db0 00000000`00000000 : nt!MiSystemFault+0x1b171f
ffffe585`97dc1b90 fffff803`4261ebd8     : ffffe585`00000000 00000000`00000000 ffffe585`97dc3000 00000000`00000000 : nt!MmAccessFault+0x400
ffffe585`97dc1d30 fffff80e`5cf98da7     : fffff80e`5cfa06a8 fffff80e`5cfc4578 ffffba0f`b9f61540 00000000`00000000 : nt!KiPageFault+0x358
ffffe585`97dc1ec8 fffff80e`5cfa06a8     : fffff80e`5cfc4578 ffffba0f`b9f61540 00000000`00000000 ffffba0f`b9f61540 : afd!memcpy+0x1e7
ffffe585`97dc1ed0 fffff80e`5cfc4593     : 00000000`00000000 ffffe585`97dc2540 00000000`00a405e8 ffffba0f`b8722e40 : afd!AfdCopyCMSGBuffer+0x78
ffffe585`97dc1f00 fffff803`428ba908     : 00000000`00000000 00000000`00000000 ffffe585`97dc2540 ffffba0f`b9f61540 : afd!AfdFastIoDeviceControl+0x1513
ffffe585`97dc22a0 fffff803`428ba1c6     : ffff8a02`ff46a134 00000000`00000108 00000000`00000001 00000000`00000000 : nt!IopXxxControlFile+0x728
ffffe585`97dc23e0 fffff803`426228f5     : 00000000`00000000 fffff803`429027fe 00000000`00000000 00000000`0083eb30 : nt!NtDeviceIoControlFile+0x56
ffffe585`97dc2450 00000000`77171cfc     : 00000000`77171933 00000023`771f2b4c 00007ff9`564a0023 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25
00000000`0083ec48 00000000`77171933     : 00000023`771f2b4c 00007ff9`564a0023 00000000`00000000 00000000`0093fdfc : wow64cpu!CpupSyscallStub+0xc
00000000`0083ec50 00000000`771711b9     : 00000000`0093fb30 00007ff9`564a39b4 00000000`0083ed20 00007ff9`564a3aaf : wow64cpu!DeviceIoctlFileFault+0x31
00000000`0083ed00 00007ff9`564a38c9     : 00000000`00717000 00000000`00450100 00000000`00000000 00000000`0083f560 : wow64cpu!BTCpuSimulate+0x9
00000000`0083ed40 00007ff9`564a32bd     : 00000000`00000000 00000000`00972338 00000000`00000000 00000000`00000000 : wow64!RunCpuSimulation+0xd
00000000`0083ed70 00007ff9`57b039e7     : 00007ff9`57b55a10 00007ff9`57b55a10 00007ff9`57b55900 00000000`00000010 : wow64!Wow64LdrpInitialize+0x12d
00000000`0083f020 00007ff9`57aa4deb     : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`00000001 : ntdll!LdrpInitializeProcess+0x1ae7
00000000`0083f440 00007ff9`57aa4c73     : 00000000`00000000 00007ff9`57a30000 00000000`00000000 00000000`00718000 : ntdll!LdrpInitialize+0x15f
00000000`0083f4e0 00007ff9`57aa4c1e     : 00000000`0083f560 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!LdrpInitialize+0x3b
00000000`0083f510 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!LdrInitializeThunk+0xe


SYMBOL_NAME:  afd!memcpy+1e7

STACK_COMMAND:  .cxr; .ecxr ; kb

BUCKET_ID_FUNC_OFFSET:  1e7

FAILURE_BUCKET_ID:  AV_VRF_W_(null)_afd!memcpy

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {6020e241-9055-4260-9558-49821a9067f3}

Followup:     MachineOwner
---------

What’s the issue?

The issue lies in the multiple “fetches” into the user-mode buffer and processing the data. It is assumed that all checks on the data could not be modified during the time the CMSGBuffer is being processed. This assumption enables an attacker to exploit a race condition, allowing them to modify a buffer’s length to UINT_MAX, despite bounds checking previously taking place. Successful exploitation of this race condition leads to a heap-based buffer overflow, ultimately resulting in privilege escalation (though the current proof-of-concept only causes a denial of service).

Detection

How can something like this be detected? The answer is it’s very difficult (read: impossible).

Throughout testing, there weren’t any 32-bit processes hitting this code path. It is important to note that although I did not have any 32-bit processes hitting this vulnerable code path on my machine doesn’t mean it won’t hit on other machines. This is the classic “It works on my machine” scenario.

If it is the case where it doesn’t normally get hit, this could be a possible indicator. False positives in this case could be better than nothing I suppose but now raises the question: how do I know where to check? Yo…where’s Rod Serling at??

Thoughts

It is unclear to me why the remediation was to perform ntintsafe functions on the variables referenced. What I think would make sense to me is to change the IOCTL to something other than METHOD_NEITHER that way the memory manager can handle all memory operations but seeing how the user-mode provides buffers within buffers in the first place, this remediation choice was probably chosen to address performance overhead.

Downloads:

march_afd.sys 10.0.22621.1344

april_afd.sys 10.0.22621.1555

References:

M. (2023, March 8). Windows Ancillary Function Driver for WinSock Elevation of Privilege Vulnerability. MSRC Security Updates. Retrieved July 29, 2023, from https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-28218

C. (n.d.). Finding AFD Endpoints. CodeMachine. Retrieved July 29, 2023, from https://www.codemachine.com/articles/find_afd_endpoints.html