DataDome VM Analysis

Summary

This repository documents the first public version of DataDome's in-browser JavaScript virtual machine (VM) used in their CAPTCHA/interstitial flow. This analysis covers:

Bytecode loading and decoding mechanisms
VM memory layout and architecture
A proof-of-concept disassembler
Control-flow analysis notes

Note: This repository covers only one (static) VM version and is intended for security research and analysis purposes. It does not include dynamic solvers or production solver implementations.

Background

On January 14, 2026, DataDome began shipping a new VM-based component in their client tag.

Deobfuscation

The VM code has been extracted from the captcha challenge into vm.js (available in this repository).

The first step was deobfuscating the script:

The obfuscation is straightforward: evaluate each variable and replace it with its actual value. A deobfuscation script is available in deobf.js.

Initial Analysis

Running the deobfuscated code (out.js) in DevTools reveals the VM's expected output:

The output is a JSON object containing two numbers and a string. Now let's dive into the actual VM implementation.

Bytecode Decoding

At the start of the Q.exports function, we can see how the bytecode is decoded:

The input string is base64 decoded
An array of length 129,263 is created
Each index is checked against a specific range:
- If the index falls within the range, the value is decoded
- Otherwise, a random number is returned (using B(), a pseudo-random number generator) -> D holds the decoded bytecode with some random "noise"

VM Architecture

Scrolling down reveals the VM entry point: a function with two parameters A (the bytecode) and Q (an empty dictionary used for error handling).

Memory Layout

The most interesting aspect of this VM is its architecture: everything lives in a single array (A). This array contains:

The stack
Registers
Opcodes
The bytecode itself
The instruction pointer

This design mirrors real computer architecture with distinct memory regions. The next step is to map out each offset to understand what's stored where:

var stack_pointer = 4593
var instruction_pointer = 4635
var frame_base_pointer = 4674
var last_result = 4633
var exit_flag = 4656
var current_opcode_handler = 4685
var current_opcode_id = 4675
var stack_offset = 124482
var vm_start = 5258

With these offsets mapped, the VM structure becomes clear.

Helper Functions

The VM begins with a collection of helper functions that handle:

Reading typed values from the stack
Moving data between the stack and "registers"

VM Initialization

After the helper functions, the VM initializes core values:

All pointers (stack, instruction, frame base)
Exit flag
Last result register

Below the initialization are all the instruction handlers.

The Dispatcher Loop

The dispatcher is the main VM loop that runs until `exit_flag` is set:

I represents the current instruction
P is the actual offset into the array (accounting for obfuscation)
The loop sets the current instruction to current_opcode_handler and updates current_opcode_id

Opcode Implementation

How Opcodes Work

Here's a basic example of an opcode handler:

Fetches an immediate value from the bytecode
Retrieves the top value from the stack
Performs an operation (e.g., %= or ^=)
Calls the fetch() function at the end

Interesting Opcodes

Opcode 4919: Function/Closure Creation

One of the most complex opcodes creates closures/functions:

A[4919] = function () {
    var Q = readUint8();  // Number of expected arguments
    var B = [];
    for (var E = readUint8(), D = 0; D < E; D++) {
        var g = readUint8();
        var a = A[A[frame_base_pointer] + g];
        B.push(a);  // Capture variables from current scope
    }
    var h = A[instruction_pointer] + 3;  // Save address of function body
    A[A[stack_pointer]++] = function (E) {
        // Set up new stack frame when called
        var e = A[stack_pointer] - E;
        while (E < Q) {
            A[e + E++] = undefined;  // Fill missing arguments with undefined
        }
        A[stack_pointer] = e + Q;
        for (var D = 0; D < B.length; D++) {
            var g = B[D];
            A[A[stack_pointer]++] = g;  // Push captured variables
        }
        A[e - 2] = A[frame_base_pointer];  // Save old frame pointer
        A[e - 1] = A[instruction_pointer];  // Save return address
        A[frame_base_pointer] = e;
        A[instruction_pointer] = h;  // Jump to function body
    };
    fetch();
};

This opcode:

Reads the expected argument count
Captures variables from the current scope (closure)
Creates a function that sets up a new stack frame with proper calling conventions
Handles missing arguments by filling with undefined
Saves the return address and frame pointer for proper returns

Opcode 5003: Dynamic Function Call

This opcode creates a wrapper for function calls that handles both regular and constructor calls:

A[5003] = function () {
    var Q = A[--A[stack_pointer]];  // POP function
    var B = A[--A[stack_pointer]];  // POP 'this' context

    function E(e) {  // e = argument count
        var D = A[stack_pointer];
        var g = A.slice(D - e, D);  // Get arguments from stack
        if (this instanceof E) {
            // Constructor call (new E(...))
            g.unshift(null);
            var h = Function.prototype.bind.apply(Q, g);
            A[stack_pointer] -= e;
            try {
                a = new h();
            } catch (A) {
                a = A.message;
            }
            A[A[stack_pointer]++] = a;
        } else {
            // Regular function call
            var t;
            try {
                t = Q.apply(B, g);
            } catch (A) {
                t = A.message;
            }
            A[stack_pointer] -= e + 2;
            A[A[stack_pointer]++] = t;
        }
    }

    A[A[stack_pointer]++] = E;
    fetch();
};

This opcode:

Pops the function and context from the stack
Creates a wrapper that can be called with arguments
Detects whether it's a constructor call (new) or regular call
Applies the function with proper context and error handling
Pushes the result back onto the stack

Opcode 4961: Object Literal Construction

A[4961] = function () {
    var Q = {};
    for (var E = readUint16(), e = 0; e < E; e++) {
        var D = A[--A[stack_pointer]];  // First POP
        var g = A[--A[stack_pointer]];  // Second POP
        Q[D] = g;
    }
    A[A[stack_pointer]++] = Q;
    fetch();
};

This opcode builds object literals by:

Reading the number of property pairs from bytecode
Popping pairs from the stack (first pop becomes the key)
Constructing an object: object[firstPop] = secondPop
Pushing the resulting object onto the stack

The Fetch Function

Each instruction concludes by calling fetch(), which prepares the next instruction:

function fetch() {
    var Q = A[instruction_pointer];
    var B = A[vm_start + Q];
    A[instruction_pointer] = Q + 1;
    var E = A[4783 + B];
    A[current_opcode_handler] = E;
    A[current_opcode_id] = B;
}

This function:

Reads the instruction pointer
Fetches the next opcode from bytecode
Increments the instruction pointer
Looks up the opcode handler
Updates current_opcode_handler and current_opcode_id

This mirrors the dispatcher logic, creating a fetch-decode-execute cycle typical of VM architectures.

Disassembler Implementation

To aid in analysis, a proof-of-concept disassembler (disasm.js) was developed to convert the VM's bytecode into human-readable assembly.

Approach

The disassembler operates in two passes:

Pass 1: Label Discovery

The first pass scans through the bytecode to identify all jump targets. This includes:

Forward and backward jumps (JMP_FWD, JMP_BACK)
Conditional jumps (JZ, JNZ_KEEP, JZ_KEEP)
Closure boundaries (function bodies and their end points)

Each target address is marked with a label (e.g., L_0042) to make control flow easier to follow.

Pass 2: Disassembly

The second pass converts each instruction into assembly-like output:

000042:  fa 00 0a           PUSH_IMM 10
000045:  19 00 19           PUSH_REG 25
000048:  eb                 ADD

Each line includes:

Address: Hexadecimal offset in the bytecode
Raw bytes: The actual bytes making up the instruction (useful for verification) and idk looks tuff asf
Opcode: Mnemonic name of the instruction
Arguments: Decoded operands (register numbers, immediates, jump targets)

Value Decoding

One of the more complex aspects is decoding immediate values embedded in the bytecode. The VM uses type markers to indicate how to interpret the following bytes:

Simple types (no additional data):

0x28 → true
0x7D → false
0x4C → null
0x3D → undefined

Small integers (0-127): Encoded with high bit set

0x85 → 5 (0x80 | 5)

Strings: XOR-encoded and null-terminated

ASCII strings: marker 0x67, XOR key starts at 183
UTF-8 strings: marker 0x27, XOR key starts at 46

Numeric types:

8-bit signed: 0x6F + 1 byte
16-bit signed: 0x61 + 2 bytes (big-endian)
24-bit signed: 0x65 + 3 bytes (big-endian)
32-bit signed: 0x54 + 4 bytes (big-endian)
IEEE 754 double: 0x05 + 8 bytes

The XOR encoding for strings is straightforward but prevents casual inspection:

let str = '';
let xorKey = 183;  // Initial key for ASCII strings
let ch;
while ((ch = readByte() ^ (xorKey++ & 0xFF)) !== 0) {
    str += String.fromCharCode(ch);
}

Special Opcodes

Some opcodes require custom handling:

CLOSURE (opcode 136): Creates functions/closures with captured variables

Format: CLOSURE locals, capture_count, [capture_indices...], skip_offset

The skip offset points past the function body, allowing the VM to skip over the function definition during linear execution.

PUSH_MULTI_IMM (opcode 96): Pushes multiple values at once

Format: PUSH_MULTI_IMM count, val1, val2, ...

Usage

# Disassemble from file
node disasm.js bytecode.txt

Example Output

; DataDome VM Disassembly
; Bytecode size: 5428 bytes
; VM Constants: VM_START=5258, OPCODE_BASE=4783

000000:  88 01 00 04        CLOSURE locals=1, captures=[0, 4], body=L_0006, end=L_0a3f
L_0006:
000006:  fa 67 ...          PUSH_IMM "window"
00001f:  2b                 PUSH_WINDOW
000020:  02                 SET
000021:  fa 67 ...          PUSH_IMM "navigator"
00003a:  19 00 00           PUSH_REG 0
00003d:  fa 67 ...          PUSH_IMM "navigator"
000056:  ee                 GET
000057:  02                 SET
...

This output format makes it possible to:

Trace execution flow by following jump labels
Identify function boundaries via CLOSURE opcodes
See exactly what values are being pushed and manipulated
Cross-reference with the actual VM implementation

Notes

I'm still pretty new to VMs so take everything with a grain of salt. AI has been used to help document the code and write parts of this readme (because writing docs is painful).

Disclaimer

This is purely for educational/security research purposes. No solvers or bypasses included - just documenting how the VM works because it's genuinely interesting.

DataDome: if you're reading this, hello!!! This is just me trying to get a scholarship 🙏. Please don't sue me, I'm broke asf. If you have any issues with this repo just let me know and we can talk about it :)

sorry for the people that thought I'm going to talk about the inside of the vm... that's not happening

DO NOT DM ME AND ASK FOR A DATADOME API, I WILL NOT HELP YOU

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
.gitattributes		.gitattributes
.gitignore		.gitignore
bytecode.bin		bytecode.bin
deobf.js		deobf.js
disasm.js		disasm.js
out.js		out.js
out.txt		out.txt
package.json		package.json
readme.md		readme.md
vm.html		vm.html
vm.js		vm.js
vm_labeled.js		vm_labeled.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataDome VM Analysis

Summary

Background

Deobfuscation

Initial Analysis

Bytecode Decoding

VM Architecture

Memory Layout

Helper Functions

VM Initialization

The Dispatcher Loop

Opcode Implementation

How Opcodes Work

Interesting Opcodes

Opcode 4919: Function/Closure Creation

Opcode 5003: Dynamic Function Call

Opcode 4961: Object Literal Construction

The Fetch Function

Disassembler Implementation

Approach

Pass 1: Label Discovery

Pass 2: Disassembly

Value Decoding

Special Opcodes

Usage

Example Output

Notes

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataDome VM Analysis

Summary

Background

Deobfuscation

Initial Analysis

Bytecode Decoding

VM Architecture

Memory Layout

Helper Functions

VM Initialization

The Dispatcher Loop

Opcode Implementation

How Opcodes Work

Interesting Opcodes

Opcode 4919: Function/Closure Creation

Opcode 5003: Dynamic Function Call

Opcode 4961: Object Literal Construction

The Fetch Function

Disassembler Implementation

Approach

Pass 1: Label Discovery

Pass 2: Disassembly

Value Decoding

Special Opcodes

Usage

Example Output

Notes

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages