Jekyll2026-02-04T04:49:33-08:00https://machiry.github.io/feed.xmlAravind Machiry @ PurS3 LabAssistant Professor at Purdue UniversityAravind Machiry[email protected]Trivial Suggestions: Doing Effective Related Work (Part 1)2021-05-27T00:00:00-07:002021-05-27T00:00:00-07:00https://machiry.github.io/posts/2021/05/related-work-survey-part1-7When I was a graduate student, I did face several difficulties in dealing with writing related work and organizing references.

This “Trivial Suggestions” post is the first part of a dual part post of managing citations and writing related work section.

Handling Citations

First, install a citation manager.

There are many citation managers such as: Zotero, Mendely, Refworks, etc.

I use Zotero and am happy with it. The chrome plugin is amazing. You can add a citation to an article or to the list of Google scholar articles with a single click.

Here is a decent tutorial on using Zotero: https://www.youtube.com/watch?v=Hm0TboOcAuM

Also, check out EndNote - Click (Thanks to Keerthi), which downloads PDFs (which need special access) using your university account.

Advantages of using citation managers:

  • Organization: You can perform high-level categorize (e.g., into folders) of all the works.

For instance: I can organize all the works related to bootloaders based on the type of problem they focus on. E.g., Bootloader (Top most level) -> Defenses, Attacks, etc.

  • Interoperability: It can easily port to various formats without reformatting the works. You can dump all your related works by a single button into the desired format and put it into your proposal, paper, etc.

First, have answers to at least the following questions:

  • What is the problem you are trying to solve?

E.g., “Finding vulnerabilities in bootloaders”, “Helping students learn better programming”, “Automatically understanding human emotions from their voice”, etc.

Once you know the problem:

– Find techniques (hopefully, other than yours) people have used to solve it.

– Find works that show that the problem is important.

  • What techniques are you trying to use to solve the problem?

E.g., “Static analysis”, “Fuzzing”, etc.

Once you figured out the technique:

– Find other problems which most commonly use the technique.

– Find works that introduced the technique.

How far back (chronologically) should we go to consider a work to be relevant?

This depends on the specific stream and how active is the area of research. E.g., for machine learning, with its ultra-active area of research, anything older than three years can be (or maybe) considered irrelevant.

For system security, I suggest five years. However, this again depends on the specific problem and approach you are trying to use. Maybe you are using a very old approach (say ten years old) for a new problem. In that case, even though the work is old, you should cite the paper proposing the old approach.

In Part 2, we will see how to write a good related work section.

]]>
Aravind Machiry[email protected]
Setting up DLXOS (ECE 469) on Cent OS 72021-01-19T00:00:00-08:002021-01-19T00:00:00-08:00https://machiry.github.io/posts/2021/01/setting-up-dlxos-centos-7This post describes the steps to setup DLXOS needed for Purdue ECE 469 labs on Cent OS 7 VirtualBox VM.

There are three steps here: Setting up VM, Install dependencies, and Setting up DLXOS tools.

VM Setup

  1. Download the Cent OS 7 Virtual Box Image
  2. Extract the above folder and import the .ova VM

    You may need to change the network adapter to NAT

Login into the VM with username: centos and password: centos

The following steps have to be run inside the VM.

Install Dependencies

sudo yum install glibc.i686
sudo yum install libstdc++.so.5

Setting up DLXOS tools

mkdir ~/ee469
cd ~/ee469
scp [your-account]@ecegrid.ecn.purdue.edu:~ee469/labs/common/dlxos_new.tar.gz .
tar -xvzf dlxos_new.tar.gz

Setting up the PATH

gedit ~/.bashrc
# at the end of the file append the following line
export PATH=~/ee469/dlxos_new/bin:$PATH
(save and clode gedit)

That’s it. You are all set.

To test, open a terminal and run dlxsim, you should see some help message.

]]>
Aravind Machiry[email protected]
Kernel Debugging Ubuntu 16.042020-08-24T00:00:00-07:002020-08-24T00:00:00-07:00https://machiry.github.io/posts/2020/08/kernel-debugging-ubuntu16.04

All the following steps are tested on Ubuntu 16.04

There are 3 steps here: Building qemu image, building and installing kernel, debugging

Building qemu image

Install qemu

sudo apt-get install qemu-kvm qemu virt-manager virt-viewer libvirt-bin

Create a qcow disk

Here, we will create a virtual disk on which ubuntu will be installed.

Why qcow2? not regular image? Because we can increase the size of qcow later, but increasing the size of regular image is tricky.

qemu-img create -f qcow2 ubuntu16.04.qcow 40G

Download ubuntu image

wget https://releases.ubuntu.com/16.04/ubuntu-16.04.7-desktop-amd64.iso

Installing Ubuntu 16.04 on the disk

qemu-system-x86_64 -hda ubuntu16.04.qcow -boot d -cdrom ./ubuntu-16.04.7-desktop-amd64.iso -device virtio-net,netdev=vmnic -netdev user,id=vmnic -m 4G

This will open a window on which you follow the instructions to complete the installation.

Build and install kernel

Follow these instructions on the host machine

Clone the kernel sources

git clone git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git
cd ubuntu-xenial
# check out the required kernel
git checkout tags/ubuntu-hwe-4.15.0-112.113_16.04.1

Configure

get the default config

Copy the config from the QEMU vm, you can find the config at the following path on the vm.

/boot/config-`uname -r`

Lets say you got the config out from the VM into the host as ubuntu16.04config

Now copy the config as .config in the ubuntu-xenial (i.e., folder where we checked out our kernel sources) i.e.,

cp ubuntu16.04config <path to ubuntu-xenial>/.config

Modify the config (Optional)

make menuconfig

This will open a window where you can enable or disable additional kernel configuration options.

Building kernel

	chmod a+x debian/scripts/*
	chmod a+x debian/scripts/misc/*
	cp debian/scripts/retpoline-extract-one scripts/ubuntu-retpoline-extract-one
	make deb-pkg

Installing kernel on to the guest

Copy all *.deb from host to guest and install the built kernel into the vm.

You should run the following command in guest VM

sudo dpkg -i linux-image-<..>.deb
sudo dpkg -i linux-headers-<..>.deb

Debugging Guest VM

First, run the QEMU vm and make qemu wait for the debugger using the following command:

qemu-system-x86_64 -s -S -hda ubuntu16.04.qcow -device virtio-net,netdev=vmnic -netdev user,id=vmnic -m 4G -enable-kvm -append "console=ttyS0"

This will cause the qemu wait untill the debugger gets attached.

Now in an other terminal window

# Go to the folder where we built the kernel
cd <path to ubuntu-xenial>
# gdb
> file vmlinux
> target remote:1234
# You are inside debugger and see that the break point is being hit.

Thats it!! You can use the regular gdb commands from now on.

References

[1] https://wiki.gentoo.org/wiki/QEMU/Options [2] https://help.ubuntu.com/community/Kernel/Compile#Alternate_Build_Method_.28B.29:_The_Old-Fashioned_Debian_Way

]]>
Aravind Machiry[email protected]
Making Kernel Drivers Great Again2017-09-23T00:00:00-07:002017-09-23T00:00:00-07:00https://machiry.github.io/posts/2017/09/making-kernel-drivers-great-againProject: MKDGA

Kernel drivers were once good. A few years ago (circa 2008), Security issues in the Linux kernel were mostly in the non-driver components. Most of us thought Linux kernel is getting better w.r.t security.

In the year 2010, Android came into popularity. Hundreds of vendors started quickly producing android compliant devices. Competition between the vendors became fierce and time to the market became an important factor to capture the growing market. Android uses Linux kernel as its core. Vendors write drivers to support their Hardware. However, because of Factor 1, These drivers were not properly vetted, resulting in drivers becoming the bug-prone components of the Android kernel [1]. If you take a look at the CVEs [2] most of these bugs are embarrassing, it is incredible that such code even exists.

I want to solve this problem and make Linux kernel drivers great again. My grand plan: 1) Develop a precise static analysis technique that can find easy bugs.

Before actually developing yet another static bug finding tool, I wanted to check, how the existing tools perform on the android kernel drivers. The results are not good, a huge number of warnings and few times even the code as simple as below snippet raises multiple warnings.

char buf[100];
strcpy(buf, "Hello");

Although, I understand that I should never use strcpy, but still the above code is fine. We need a tool that can spot easy bugs with low false positives (< 20%). By easy I mean, memory corruption vulnerabilities triggerable by the user data. In program analysis lingo, these are called Taint based vulnerabilities. Myself along with few amazing people from UC Santa Barbara developed this tool called DR.CHECKER (published at USENIX Security 2017) which tries to achieve exactly this in a completely automated way. Furthermore, it has amazing UI, where you can see exactly how user data could cause a reported vulnerability.

Refer:https://github.com/ucsb-seclab/dr_checker , for the usage guide. 2) Develop a smart fuzzer customized for the drivers.

While looking up existing work on fuzzing Linux kernel fuzzers, I found syzkaller by Google, which truly is a masterpiece and gold standard for fuzzing Linux kernel syscalls. However, one problem with it is that it requires the specification of driver interface. Such as device name, possible ioctl cmd ids and corresponding structures. Although this information could be easily specified by the driver developers, it is a non-trivial task for a security analyst to do this. We developed a technique called DIFUZE (going to be published at CCS 2017) which retrieves the driver interface in an automated way. These interfaces could be used in syzkaller (recommended) or use our simple fuzzer called MangoFuzz to fuzz the drivers.

3) Develop a website where people can submit their kernel.tar.gz and it gives a self-contained docker image customized to analyze the kernel sources both statically and dynamically with a single command run.py.

I registered the domain drchecker.io to integrate DR.CHECKER and DIFUZE into a self-contained docker image, for the analysts to use.

I will be working on this, whenever I find free time. Any additional help is greatly appreciated. Please do not hesitate to contact me for any details.

References:

[1] https://events.linuxfoundation.org/sites/events/files/slides/Android-%20protecting%20the%20kernel.pdf

[2] https://source.android.com/security/bulletin/

[3] https://github.com/google/syzkaller

]]>
Aravind Machiry[email protected]
The need for Extensible and configurable Static Taint Tracking for C/C++2017-05-31T00:00:00-07:002017-05-31T00:00:00-07:00https://machiry.github.io/posts/2017/05/static-taint-trackingUpdate:

There is an open-source extensible framework: https://phasar.org/

Taint Tracking, as the name implies is a technique to tracks the “taint” of the data throughout the program. The taint of the data is usually a binary attribute, as such can have Boolean values true/false or 1/0. There are other possible representations of the taint, which we ignore for simplicity. Most often taint is used to indicate whether the data is “controlled” by the user or not. Refer [1] for a comprehensive treatment of taint tracking.

One of the most common use case of taint tracking is input validation vulnerability detection. i.e., checking whether the tainted data can reach a program point (or sensitive function) that expects untainted or non-tainted data. For ex: using tainted string as the source string in a strcpy call, this can lead to overflow of the destination buffer.

Depending on the method of tracking, Taint Tracking techniques are classified as dynamic or static.

In the case of Dynamic taint tracking, the program is instrumented with taint propagation instructions along with checks to make sure that tainted data does not reach sensitive functions. Dynamic taint tracking is the popular choice for taint tracking. As such there are many tools available to perform dynamic taint tracking on Binaries[3, 4], C/C++ using LLVM [5], Java[6], etc. But, Dynamic Taint Tracking suffers from same disadvantages as any dynamic analysis techniques like Input generation, Speed, etc. Refer [2] for more details about the disadvantages of Dynamic analysis techniques.

However, In the case of Static taint tracking, standard data-flow techniques are used to propagate taint and warnings are raised when a tainted data may reach a sensitive function. Static taint tracking is not popular. There are only a few tools available for Java, Binaries, Web, etc.

One interesting thing to note here is that there is No usable static taint tracking tool available for C/C++. Few works try to achieve this, but they are either discontinued [7, 8] or not extensible [7]. One work that comes close to achieving this is by Marcelo [9], where they modify the clang static analyzer to perform taint tracking. But clang has disadvantages as in it cannot analyze more than one source file, and it does not have access to the LLVM analyses which are helpful to do interesting stuff.

The need of the hour is to have a static taint tracking as LLVM pass. It is sad to see that a multi-decade technique is not available for the languages for which it is most applicable.

Lack of an extensible and configurable static taint tracking is an open opportunity ignored by the academia. Anyone willing to take up Static taint tracking for C/C++ using LLVM as their project? I am with you and can help you in all stages of the project.

Good to know: The compilation flag -gsrc to clang produces a bitcode file with accurate source lines information.

Cheers.

[1] All You Ever Wanted to know about Dynamic Taint Tracking: https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf

[2] Table 1 of the pdf: https://link.springer.com/chapter/10.1007/978-3-319-11933-5_13

[3] libdft: http://www.cs.columbia.edu/~vpk/research/libdft/

[4] Google: “Dynamic Taint Tracking for binaries.”

[5] DataFlowSanitizer: http://clang.llvm.org/docs/DataFlowSanitizer.html

[6] Google: “dynamic taint tracking for java”

[7] Context sensitive static taint tracking: https://ece.uwaterloo.ca/~xnoumbis/noumbissi-thesis.pdf

[8] https://github.com/dceara/tanalysis/tree/master/tanalysis

[9] https://www.researchgate.net/publication/312938554_An_User_Configurable_Clang_Static_Analyzer_Taint_Checker

]]>
Aravind Machiry[email protected]