Tinghao

ENDC: Ensemble of Narrow DNN Chains

2021-12-21T00:00:00-08:00

Our paper available at: “Ensemble of Narrow DNN Chains” (my Machine Learning course essay at Oxford).

Our code is publicly available at https://github.com/vtu81/ENDC.

We propose the Ensemble of Narrow DNN Chains (ENDC) framework:

first train such narrow DNN chains that perform well on one-vs-all binary classification tasks,
then aggregate them together by voting to predict for the multiclassification task.

Our ensemble framework could:

utilize the abstract interpretability of DNNs,
outperform traditional ML significantly on CIFAR-10,
while being 2-4 orders of magnitude smaller than normal DNN and 6+ times smaller than traditional ML models,
furthermore compatible with full parallelism in both the training and deployment stage.

Our empirical study shows that a narrow DNN chain could learn binary classifications well. Moreover, our experiments on three MNIST, Fashion-MNIST, CIFAR-10 confirm the potential power of ENDC. Compared with traditional ML models, ENDC, with the smallest parameter number, could achieve similar accuracy on MNIST and Fashion-MNIST, and significantly better accuracy on CIFAR-10.

Results

Overall Accuracy

Dataset	Accuracy	Arch	#Param
MNIST	93.40%	1-channel	1300
Fashion-MNIST	80.39%	1-channel	1300
CIFAR-10	47.72%	2-channel	4930

Each binary classifier’s parameter number is even smaller than the input entry (130 < 28x28 for MNIST and Fashion-MNIST, 493 < 3x32x32 for CIFAR-10)!

Comparison

We compare ENDC with traditional ML models:

Logistic Regression (LR)
Support Vector Classifier (SVC)

and normal DNNs. Their results are referenced from internet, see our paper for sources and details.

MNIST

Method	Accuracy (%)	# Param
ENDC (ours)	93.4	1.3K
LR	91.7	7.7K+
SVC	97.8	7.7K+
Normal DNN (LeNet)	99.3	0.41M

Fashion-MNIST

Method	Accuracy (%)	# Param
ENDC (ours)	80.4	1.3K
LR	84.2	7.7K+
SVC	89.7	7.7K+
Normal DNN (VGG-16)	93.5	26M

CIFAR-10

Method	Accuracy (%)	# Param
ENDC (ours)	47.7	4.8K
LR	39.9	30.0K+
SVC (PCA)	40.2	0.44M+
Normal DNN (VGG-16-BN)	93.9	15M

Per-class Accuracy

Dataset	#0 (%)	#1 (%)	#2 (%)	#3 (%)	#4 (%)	#5 (%)	#6 (%)	#7 (%)	#8 (%)	#9 (%)
MNIST	97.04	97.53	96.51	88.91	95.52	92.38	90.29	94.55	88.71	91.67
Fashion-MNIST	80.60	92.90	77.60	77.60	75.50	92.30	40.70	81.30	90.00	95.50
CIFAR-10	48.90	55.70	43.50	31.80	41.00	45.40	61.90	42.00	49.90	57.10

Backdoor Trigger Restoration

2021-12-02T00:00:00-08:00

This is my on-going project only for demonstration, advised by Prof. Ting Wang at PSU.

Introduction

This is a project diverged from Backdoor Certification, you may first want to read that.

Backdoors within DNN models are dangerous, and an important line of work focus on detecting these potential backdoors. Some of these detection methods (e.g. Neural Cleanse) first reverse engineer (restore) the potential backdoor, then utilize anomaly detaction to tell if there is indeed a backdoor.

We propose an efficient heuristic algorithm that focuses on restoring the potential backdoor trigger in a given DNN. Our algorithm requires NO or very few clean inputs, while supporting both perturbation triggers (add the pattern to an image) and patch triggers (stamp a pattern onto an image). Our restored triggers reach high ASR and match the real trigger well.

Method

Intuitively, for a batch of $N$ inputs, searching for the potential backdoor trigger is similar to the following optimization:

\[\text{trigger} = \text{argmin}_{r} \sum_{i=1}^N \Big(f_{source}(x_i + r) - f_{target}(x_i + r)\Big)\]

Nevertheless, directly optimizing the equation by Stochastic Gradient Descent is empirically difficult. As shown in the three following figures, the gradient information (orange) could be quite noisy:

Remember that CROWN relaxes NN to linear function, and as shown in the figures above, we may view the CROWN weight for each input dimension (blue) as an “approximate gradient” in a certain vicinity. And this “approximate gradient” is usually less noisy.

So we simply replace the exact gradients with the “approximate gradients”:

\[\mathbf r_{t+1} = \mathbf r_t - \text{lr} * \sum_{i=1}^N \nabla_{\mathbf x,approx} f(\mathbf x_i + \mathbf r_t)\]

This makes the optimization (restoring or searching for triggers) much easier, and our experiments have confirmed this.

Results

Some restoration results:

I’m still refining both the idea and experiments.

Backdoor Certification

2021-12-01T00:00:00-08:00

This is my on-going project only for demonstration, advised by Prof. Ting Wang at PSU.

Introduction

In the field of DNN security, adversarial attacks and backdoor attacks are the typical ones.

Adversarial Attack: For a given input, the attacker adds an imperceptible noise(perturbation), leading to the DNN misclassifying the perturbed input; The adversarial perturbation is input-spcific, and usually obtained via PGD.
Backdoor Attack: For all inputs, the attacker stamped a trigger pattern to them, leading to the DNN misclassifying all the stamped input; There are a variety of trigger types and implantation strategies, and backdoors are usually injected via data poisoning at training stage.

Certified robustness has been widely discussed, to end the arm race between adversarial attacks and defenses. We aim at taking the first step by introducing certification to stop the arm race of backdoor attacks and defenses.

Method

We first formulate the backdoor certification problem; No (perturbation-)backdoor exists in a norm ball $S$ can be expressed as the inequation:

\[\min_{r\in S}\max_i f_{source}(x_i + r) - f_{target}(x_i + r) > 0\]

We base our work on an existing NN verifier, CROWN (LiRPA). As shown in the following figure, CROWN would relax the non-convex NN function $f$ into a linear function $\underline f$ w.r.t. the input dimensions, where $f(x + r) \ge \underline f(x + r)$ for any $r\in S$.

We use the lower bound linear function for certifying backdoor:

\[\min_{r\in S}\max_i \underline f_{source}(x_i + r) - \overline f_{target}(x_i + r) > 0\]

Notice that (2) naturally yields a sufficient condition for (1). The following figure shows our backdoor certification process:

Each solid line corresponds to the linear relaxation $\underline f_{source}(x_i + r) - \overline f_{target}(x_i + r)$ of the NN given input $x_i$. After grouping the inputs, we are able to give a certification like: There is no perturbation trigger $r \in S$ that would lead to $\rho\%$ inputs being misclassified.

We could further introduce optimization, Bound and Branch to tighten the bounds.

Results

A metric for certified adversarial robustness is the $\textit{adversarial-attack-free radius}$, under which it’s impossible to perform adversarial attack. Likewise, we extend the metric to $\textit{backdoor-free radius}$ under which it’s impossible to perform backdoor attack.

Obviously: $\textit{adversarial-attack-free radius} \le \textit{backdoor-free radius}$ and our initial experiment results show that for the same NN, there could be $>15\%$ improvement/gap between the two radius.

I am still refinining the experiments.

Naive VQA: Implementations of a Strong VQA Baseline

2021-07-17T00:00:00-07:00

What’s VQA?

Visual Qustion Answering (VQA) is a type of tasks, where given an image and a question about the image, a model is expected to give a correct answer.

For example, a visual image looks like this:

The question is: What color is the girl’s necklace?

Our model would generate the answer ‘white’.

What’s MindSpore?

MindSpore is a new AI framework developed by Huawei.

NaiveVQA: MindSpore & PyTorch Implementations of a Strong VQA Baseline

This repository contains a naive VQA model, which is our final project (mindspore implementation) for course DL4NLP at ZJU. It’s a reimplementation of the paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering.

Checkout branch pytorch for our pytorch implementation.

git checkout pytorch

Performance

Framework	Y/N	Num	Other	All
MindSpore	62.2	7.5	2.4	25.8
PyTorch	66.3	24.5	25.0	40.6

Per Question Type Accuracy (MindSpore)
Per Question Type Accuracy (PyTorch)

File Directory

data/
- annotations/ – annotations data (ignored)
- images/ – images data (ignored)
- questions/ – questions data (ignored)
- results/ – contains evaluation results when you evaluate a model with ./evaluate.ipynb
- clean.py – a script to clean up train.json in both data/annotations/ and data/questions/
- align.py – a script to sort and align up the annotations and questions
resnet/ – resnet directory, cloned from pytorch-resnet
logs/ – should contain saved .pth model files
config.py – global configure file
train.py – training
view-log.py – a tool for visualizing an accuracy\epoch figure
val_acc.png – a demo for the accuracy\epoch figure
model.py – the major model
preprocess-image.py – preprocess the images, using ResNet152 to extract features for further usages
preprocess-image-test.py – to extract images in the test set
preprocess-vocab.py – preprocess the questions and annotations to get their vocabularies for further usages
data.py – dataset, dataloader and data processing code
utils.py – helper code
evaluate.ipynb – evaluate a model and visualize the result
cover_rate.ipynb – calculate the selected answers’ coverage
assets/
PythonHelperTools/ (currently not used)
- vqaDemo.py – a demo for VQA dataset APIs
- vqaTools/
PythonEvaluationTools/ (currently not used)
- vqaEvalDemo.py – a demo for VQA evaluation
- vaqEvaluation/
README.md

Prerequisite

Free disk space of at least 60GB
Nvidia GPU / Ascend Platform

Notice: We have successfully tested our code with MindSpore 1.2.1 on Nvidia RTX 2080ti. Thus we strongly suggest you use MindSpore 1.2.1 GPU version. Since MindSpore is definitely not stable, any version different from 1.2.1 might cause failures.

Also, due to some incompatibility among different versions of MindSpore, we still can’t manage to run the code on Ascend now. Fortunately, people are more possible to have an Nvidia GPU rather than an Ascend chip :)

Quick Begin

Get and Prepare the Dataset

Get our VQA dataset (a small subset of VQA 2.0) from here. Unzip the file and move the subdirectories

annotations/
images/
questions/

into the repository directory data/.

Prepare your dataset with:

# Only run the following command once!

cd data

# Save the original json files
cp annotations/train.json annotations/train_backup.json
cp questions/train.json questions/train_backup.json
cp annotations/val.json annotations/val_backup.json
cp questions/val.json questions/val_backup.json
cp annotations/test.json annotations/test_backup.json
cp questions/test.json questions/test_backup.json

python clean.py # run the clean up script
mv annotations/train_cleaned.json annotations/train.json
mv questions/train_cleaned.json questions/train.json

python align.py # run the aligning script
mv annotations/train_cleaned.json annotations/train.json
mv annotations/val_cleaned.json annotations/val.json
mv annotations/test_cleaned.json annotations/test.json

mv questions/train_cleaned.json questions/train.json
mv questions/val_cleaned.json questions/val.json
mv questions/test_cleaned.json questions/test.json

The scripts upon would

clean up your dataset (there are some images whose ids are referenced in the annotation & question files, while the images themselves don’t exist!)
align the questions’ ids for convenience while training

Preprocess Images

You actually don’t have to preprocess the images yourself. We have prepared the prerocessed features file for you, feel free to download it through here (the passcode is ‘dl4nlp’). You should download the resnet-14x14.h5 (42GB) file and place it at the repository root directory. Once you’ve done that, skip this chapter!

Preprocess the images with:

python preprocess-images.py

If you want to accelerate it, tune up preprocess_batch_size at config.json
If you run out of CUDA memory, tune down preprocess_batch_size ata config.json

The output should be ./resnet-14x14.h5.

Preprocess Vocabulary

The vocabulary only depends on the train set, as well as the config.max_answers (the number of selected candidate answers) you choose.

Preprocess the questions and annotations to get their vocabularies with:

python preprocess-vocab.py

The output should be ./vocab.json.

Train

Now, you can train the model with:

python train.py

During training, a ‘.ckpt’ file and a ‘.json’ file would be saved under ./logs. The .ckpt file contains the parameters of your model and can be reloaded. The .json file contains training metainfo records.

View the training process with:

python view-log.py

The output val_acc.png should look like these:

(a real train of PyTorch implementation)

(a real train of MindSpore implementation)

To continue training from a pretrained model, set the correct pretrained_model_path and the pretrained to True in config.py.

Test Your Model

Likewise, you need to preprocess the test set’s images before testing. Run

python preprocess-images-test.py

to extract features from test/images. The output should be ./resnet-14x14-test.h5.

Likewise, we have prepared the resnet-14x14-test.h5 for you. Download it here (the passcode is ‘dl4nlp’)

We provide evaluatie.ipynb to test/evaluate the model. Open the notebook, and set the correct eval_config, you’re good to go! Just run the following cell one by one, you should be able to visualize the performance of your trained model.

More Things

To calculate the selected answers’ cover rate (determined by config.max_answers), check cover_rate.ipynb.

Acknowledgement

The current version of codes are translated from pytorch branch, where some codes are borrowed from repository pytorch-vqa.

RCC (C-like Compiler, R for either remarkable or retarded)

2021-06-09T00:00:00-07:00

Authors: Haoyang Shi, Tinghao Xie

This repository contains our course project for Compiler Principle at ZJU.

Differences with C

type system: char, int, double and n-dimensional array type; Pointers and struct type is not supported in this version.
no controled jumps, gotos and labels , i.e. break, continue and switch statements are not supported.
pre-compile MARCO not supported
scanf and printf are automaticly declared and linked with libc in runtime
calling convention of scanf modified. e.g. you shall use scanf("%d",i) to read the value into variable i and drop the & symbol.
for loop snippet is switched to pascal-like for(i: 0 to n){}, where i is only seen within the scope of this loop
unary operators not supported

try out the test samples to get a better understanding of the gramma.

Prerequsite

flex 2.5+
bison 3.0+
clang 7.0+
llvm 7.0+

which is easily accessible via apt and other package managers.

It has been successfully tested with

flex 2.6.4 + bison 3.0.4 + llvm-12 on Ubuntu 18.04 (x86_64)
flex 2.5.35 + bison 3.7.6 + llvm-12 on MacOS (x86_64)

Install

Clean the directory with:

make clean

Install with:

make

If you want to install with a specific version of bison, install with:

make BISON=[YOUR-BISON-PATH]

If you are installing RCC with LLVM12 on MacOS, install with:

make DEFINE='-D MACOS'

Usage

./rcc src_file
./a.out

The generated ELF object file and executable are named output.o and a.out respectively by default.

Enchecap

2021-03-17T00:00:00-07:00

An encrypted (enclave-based) heterogeneous calculation protocol based on Nvidia CUDA and Intel SGX, with a simple sample of matrix multiplication using CUBLAS, designed and implemented by Tinghao Xie, Haoyang Shi, Zihang Li.

Enchecap illustration:

Enchecap illustration (with protected and trusted regions):

Enchecap performance:

To build the project, you’ll need to install and configure:

SGX SDK
CUDA Toolkit
CUDA Samples

, then set your CUDA_PATH and INCLUDES in Makefile, and make sure your SGX environment activated by

source /PATH_OF_SGXSDK/environment

(check SGX SDK official site for more details)

Then build with:

make # SGX hardware mode

make SGX_MODE=SIM  # SGX simulation mode

(check README_SGX.txt for more details)

Your linux OS version might be limited by SGX SDK, check https://01.org/intel-software-guard-extensions for more details. We’re using Ubuntu 18.04 x86_64, and cannot guarantee it work successfully on other platforms. We are also compiling with gcc version 7.5.0 and nvcc v11.1, which do not pose such strict limitations compared to Intel SGX.

To run the project, you’ll need to install and configure correctly:

SGX PSW
SGX driver, if you build it in hardware mode and that your CPU & BIOS support SGX
CUDA Driver (of course you must have an Nvidia GPU)

Run with:

./app

TODO

Notice: We have not implemented the user-server code into the library/sample now, since it’s similar to the host-device part of our protocol. For now, we just implement the host-device part. In this repository, we show how to wrap up the cudaMemcpy() into secureCudaMemcpy(), doing implicit en/decryption for handy secure deployment.

Phase I: Initialization

Create an enclave
Enclave generates its own keys (generation is yet an empty shell now), then broadcasts its public key to user & device
GPU generates its own keys (generation is yet an empty shell now), then broadcasts its public key to host & user

Phase II: Calculation

En/Decrypt in enclave (decrypt with SGX’s private key, encrypt with GPU’s public key)
En/Decrypt on GPU (decrypt with GPU’s private key, encrypt with SGX’s public key)

Future Work

The GPU’s and SGX’s keys are both simply welded in the code currently, need FIX
The current RSA en/decrypt algorithm is yet extremely naive! (further works include regrouping, big number supports…)
Add the user-server part into the sample, including
- Remote attestation with Intel SGX
- Broadcast his/her public key to the enclave and GPU, meanwhile record their public keys
- Send encrypted data to the server
- Receive encrypted results from the server
Intergration with real industrial work based on CUDA (like PyTorch)
Intergration with a real trusted GPU (far from our reach now)

Tron: a 3D WebGL Engine with a Flying Game Demo

2021-02-04T00:00:00-08:00

A group project in Computer Graphics course, including a simple but fully-featured 3D engine based on native WebGL and a wonderful flying game demo, live available here. Feel free to check out the source code at GitHub.

A screenshot in navigation mode: