<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://tinghaoxie.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://tinghaoxie.com/" rel="alternate" type="text/html" /><updated>2025-09-30T14:22:19-07:00</updated><id>https://tinghaoxie.com/feed.xml</id><title type="html">Tinghao</title><subtitle>personal description</subtitle><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><entry><title type="html">ENDC: Ensemble of Narrow DNN Chains</title><link href="https://tinghaoxie.com/posts/2021/12/ENDC/" rel="alternate" type="text/html" title="ENDC: Ensemble of Narrow DNN Chains" /><published>2021-12-21T00:00:00-08:00</published><updated>2021-12-21T00:00:00-08:00</updated><id>https://tinghaoxie.com/posts/2021/12/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/12/ENDC/"><![CDATA[<blockquote>
  <p>Our <strong>paper</strong> available at: <em><a href="/files/Ensemble-of-Narrow-DNN-Chains.pdf">“Ensemble of Narrow DNN Chains”</a></em> (my Machine Learning course essay at Oxford).</p>
</blockquote>

<blockquote>
  <p>Our <strong>code</strong> is publicly available at <a href="https://github.com/vtu81/ENDC">https://github.com/vtu81/ENDC</a>.</p>
</blockquote>

<p>We propose the <strong>Ensemble of Narrow DNN Chains (ENDC)</strong> framework:</p>

<ol>
  <li>first train such narrow DNN chains that perform well on one-vs-all binary classification tasks,</li>
  <li>then aggregate them together by voting to predict for the multiclassification task.</li>
</ol>

<p>Our ensemble framework could:</p>
<ul>
  <li>utilize the abstract interpretability of DNNs,</li>
  <li>outperform traditional ML significantly on CIFAR-10,</li>
  <li>while being <strong>2-4 orders of magnitude smaller</strong> than normal DNN and <strong>6+ times smaller</strong> than traditional ML models,</li>
  <li>furthermore compatible with full parallelism in both the training and deployment stage.</li>
</ul>

<p>Our empirical study shows that a narrow DNN chain could learn binary classifications well. Moreover, our experiments on three MNIST, Fashion-MNIST, CIFAR-10 confirm the potential power of ENDC. <strong>Compared with traditional ML models, ENDC, with the smallest parameter number, could achieve similar accuracy on MNIST and Fashion-MNIST, and significantly better accuracy on CIFAR-10.</strong></p>

<!-- Thanks to non-convexity, even very narrow DNN (with only 1 or 2 channels) could perform well in some abstract binary classification tasks.

> So what if we aggregate a lot of 1(or 2)-channel DNN chains to handle multi-classification tasks (e.g. MNIST, Fashion-MNIST, CIFAR-10)? Let's see. -->

<p><img src="/images/ENDC_workflow.png" alt="" /></p>

<h2 id="results">Results</h2>

<h3 id="overall-accuracy">Overall Accuracy</h3>

<table>
  <thead>
    <tr>
      <th>Dataset</th>
      <th>Accuracy</th>
      <th>Arch</th>
      <th>#Param</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>MNIST</strong></td>
      <td>93.40%</td>
      <td>1-channel</td>
      <td>1300</td>
    </tr>
    <tr>
      <td><strong>Fashion-MNIST</strong></td>
      <td>80.39%</td>
      <td>1-channel</td>
      <td>1300</td>
    </tr>
    <tr>
      <td><strong>CIFAR-10</strong></td>
      <td>47.72%</td>
      <td>2-channel</td>
      <td>4930</td>
    </tr>
  </tbody>
</table>

<ul>
  <li><strong>Each binary classifier’s parameter number is even smaller than the input entry (130 &lt; 28x28 for MNIST and Fashion-MNIST, 493 &lt; 3x32x32 for CIFAR-10)!</strong></li>
</ul>

<h3 id="comparison">Comparison</h3>

<p>We compare ENDC with traditional ML models:</p>
<ul>
  <li>Logistic Regression (LR)</li>
  <li>Support Vector Classifier (SVC)</li>
</ul>

<p>and normal DNNs. Their results are referenced from internet, see our paper for sources and details.</p>

<p><strong>MNIST</strong></p>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Accuracy (%)</th>
      <th># Param</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ENDC (ours)</strong></td>
      <td><strong>93.4</strong></td>
      <td><strong>1.3K</strong></td>
    </tr>
    <tr>
      <td>LR</td>
      <td>91.7</td>
      <td>7.7K+</td>
    </tr>
    <tr>
      <td>SVC</td>
      <td>97.8</td>
      <td>7.7K+</td>
    </tr>
    <tr>
      <td>Normal DNN (LeNet)</td>
      <td>99.3</td>
      <td>0.41M</td>
    </tr>
  </tbody>
</table>

<p><strong>Fashion-MNIST</strong></p>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Accuracy (%)</th>
      <th># Param</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ENDC (ours)</strong></td>
      <td><strong>80.4</strong></td>
      <td><strong>1.3K</strong></td>
    </tr>
    <tr>
      <td>LR</td>
      <td>84.2</td>
      <td>7.7K+</td>
    </tr>
    <tr>
      <td>SVC</td>
      <td>89.7</td>
      <td>7.7K+</td>
    </tr>
    <tr>
      <td>Normal DNN (VGG-16)</td>
      <td>93.5</td>
      <td>26M</td>
    </tr>
  </tbody>
</table>

<p><strong>CIFAR-10</strong></p>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Accuracy (%)</th>
      <th># Param</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ENDC (ours)</strong></td>
      <td><strong>47.7</strong></td>
      <td><strong>4.8K</strong></td>
    </tr>
    <tr>
      <td>LR</td>
      <td>39.9</td>
      <td>30.0K+</td>
    </tr>
    <tr>
      <td>SVC (PCA)</td>
      <td>40.2</td>
      <td>0.44M+</td>
    </tr>
    <tr>
      <td>Normal DNN (VGG-16-BN)</td>
      <td>93.9</td>
      <td>15M</td>
    </tr>
  </tbody>
</table>

<h3 id="per-class-accuracy">Per-class Accuracy</h3>

<table>
  <thead>
    <tr>
      <th>Dataset</th>
      <th>#0 (%)</th>
      <th>#1 (%)</th>
      <th>#2 (%)</th>
      <th>#3 (%)</th>
      <th>#4 (%)</th>
      <th>#5 (%)</th>
      <th>#6 (%)</th>
      <th>#7 (%)</th>
      <th>#8 (%)</th>
      <th>#9 (%)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>MNIST</strong></td>
      <td>97.04</td>
      <td>97.53</td>
      <td>96.51</td>
      <td>88.91</td>
      <td>95.52</td>
      <td>92.38</td>
      <td>90.29</td>
      <td>94.55</td>
      <td>88.71</td>
      <td>91.67</td>
    </tr>
    <tr>
      <td><strong>Fashion-MNIST</strong></td>
      <td>80.60</td>
      <td>92.90</td>
      <td>77.60</td>
      <td>77.60</td>
      <td>75.50</td>
      <td>92.30</td>
      <td>40.70</td>
      <td>81.30</td>
      <td>90.00</td>
      <td>95.50</td>
    </tr>
    <tr>
      <td><strong>CIFAR-10</strong></td>
      <td>48.90</td>
      <td>55.70</td>
      <td>43.50</td>
      <td>31.80</td>
      <td>41.00</td>
      <td>45.40</td>
      <td>61.90</td>
      <td>42.00</td>
      <td>49.90</td>
      <td>57.10</td>
    </tr>
  </tbody>
</table>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="Deep Learning" /><category term="Ensemble" /><category term="Narrow DNN" /><category term="MNIST" /><category term="Fashion-MNIST" /><category term="CIFAR-10" /><summary type="html"><![CDATA[Use an ensemble of very narrow (1/2-channel wide) DNNs to classify MNIST, Fashion-MNIST and CIFAR-10.]]></summary></entry><entry><title type="html">Backdoor Trigger Restoration</title><link href="https://tinghaoxie.com/posts/2021/12/Backdoor-Trigger-Restoration/" rel="alternate" type="text/html" title="Backdoor Trigger Restoration" /><published>2021-12-02T00:00:00-08:00</published><updated>2021-12-02T00:00:00-08:00</updated><id>https://tinghaoxie.com/posts/2021/12/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/12/Backdoor-Trigger-Restoration/"><![CDATA[<blockquote>
  <p>This is my on-going project only for demonstration, advised by Prof. <a href="https://alps-lab.github.io/about/">Ting Wang</a> at PSU.</p>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p><strong>This is a project diverged from <a href="/posts/2021/12/Backdoor-Certification/">Backdoor Certification</a>, you may first want to read that.</strong></p>

<p>Backdoors within DNN models are dangerous, and an important line of work focus on detecting these potential backdoors. Some of these detection methods (<em>e.g.</em> <a href="https://ieeexplore.ieee.org/abstract/document/8835365/">Neural Cleanse</a>) first reverse engineer (restore) the potential backdoor, then utilize anomaly detaction to tell if there is indeed a backdoor.</p>

<p>We propose an efficient heuristic algorithm that focuses on <strong>restoring the potential backdoor trigger</strong> in a given DNN. Our algorithm requires NO or very few clean inputs, while supporting both <em>perturbation triggers</em> (add the pattern to an image) and <em>patch triggers</em> (stamp a pattern onto an image). Our restored triggers reach high ASR and match the real trigger well.</p>

<h2 id="method">Method</h2>

<p>Intuitively, for a batch of $N$ inputs, searching for the potential backdoor trigger is similar to the following optimization:</p>

\[\text{trigger} = \text{argmin}_{r} \sum_{i=1}^N \Big(f_{source}(x_i + r) - f_{target}(x_i + r)\Big)\]

<p>Nevertheless, directly optimizing the equation by Stochastic Gradient Descent is empirically difficult. As shown in the three following figures, the gradient information (orange) could be quite noisy:</p>

<p><img style="width: 30%" src="/images/backdoor_restore_demo1.png" />
<img style="width: 30%" src="/images/backdoor_restore_demo2.png" />
<img style="width: 30%" src="/images/backdoor_restore_demo3.png" /></p>

<p>Remember that CROWN relaxes NN to linear function, and as shown in the figures above, we may view the CROWN weight for each input dimension (blue) as an “<em>approximate</em> gradient” in a certain vicinity. And this “<em>approximate</em> gradient” is usually less noisy.</p>

<p>So we simply replace the exact gradients with the “<em>approximate</em> gradients”:</p>

\[\mathbf r_{t+1} = \mathbf r_t - \text{lr} * \sum_{i=1}^N \nabla_{\mathbf x,approx} f(\mathbf x_i + \mathbf r_t)\]

<p>This makes the optimization (restoring or searching for triggers) much easier, and our experiments have confirmed this.</p>

<h2 id="results">Results</h2>

<p>Some restoration results:</p>

<p><img src="/images/backdoor_restore_results.png" alt="" /></p>

<ul>
  <li><u>I’m still refining both the idea and experiments.</u></li>
</ul>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="Adversarial Machine Learning" /><category term="Backdoor Attack" /><category term="Backdoor Detection" /><summary type="html"><![CDATA[Towards faithful backdoor trigger restoration.]]></summary></entry><entry><title type="html">Backdoor Certification</title><link href="https://tinghaoxie.com/posts/2021/12/Backdoor-Certification/" rel="alternate" type="text/html" title="Backdoor Certification" /><published>2021-12-01T00:00:00-08:00</published><updated>2021-12-01T00:00:00-08:00</updated><id>https://tinghaoxie.com/posts/2021/12/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/12/Backdoor-Certification/"><![CDATA[<blockquote>
  <p>This is my on-going project only for demonstration, advised by Prof. <a href="https://alps-lab.github.io/about/">Ting Wang</a> at PSU.</p>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p>In the field of DNN security, adversarial attacks and backdoor attacks are the typical ones.</p>
<ul>
  <li><em>Adversarial Attack</em>: For a given input, the attacker adds an imperceptible noise(perturbation), leading to the DNN misclassifying the perturbed input; The adversarial perturbation is input-spcific, and usually obtained via PGD.</li>
  <li><em>Backdoor Attack</em>: For all inputs, the attacker stamped a trigger pattern to them, leading to the DNN misclassifying all the stamped input; There are a variety of trigger types and implantation strategies, and backdoors are usually injected via data poisoning at training stage.</li>
</ul>

<p><strong>Certified robustness</strong> has been widely discussed, to end the arm race between <strong>adversarial</strong> attacks and defenses. We aim at taking the first step by introducing certification to stop the arm race of <strong>backdoor</strong> attacks and defenses.</p>

<h2 id="method">Method</h2>

<p>We first formulate the backdoor certification problem; No (perturbation-)backdoor exists in a norm ball $S$ can be expressed as the inequation:</p>

\[\min_{r\in S}\max_i f_{source}(x_i + r) - f_{target}(x_i + r) &gt; 0\]

<p>We base our work on an existing NN verifier, <a href="https://arxiv.org/abs/1811.00866">CROWN</a> (LiRPA). As shown in the following figure, CROWN would relax the non-convex NN function $f$ into a linear function $\underline f$ <em>w.r.t.</em> the input dimensions, where $f(x + r) \ge \underline f(x + r)$ for any $r\in S$.</p>

<p><img style="width: 100%" src="/images/CROWN.png" /></p>

<p>We use the lower bound linear function for certifying backdoor:</p>

\[\min_{r\in S}\max_i \underline f_{source}(x_i + r) - \overline f_{target}(x_i + r) &gt; 0\]

<p>Notice that (2) naturally yields a sufficient condition for (1). The following figure shows our backdoor certification process:</p>

<p><img style="width: 100%" src="/images/cert_backdoor_workflow.png" /></p>

<p>Each solid line corresponds to the linear relaxation $\underline f_{source}(x_i + r) - \overline f_{target}(x_i + r)$ of the NN given input $x_i$. After grouping the inputs, we are able to give a certification like: <strong><em>There is no perturbation trigger $r \in S$ that would lead to $\rho\%$ inputs being misclassified</em></strong>.</p>

<p>We could further introduce optimization, Bound and Branch to tighten the bounds.</p>

<h2 id="results">Results</h2>

<p>A metric for certified adversarial robustness is the $\textit{adversarial-attack-free radius}$, under which it’s impossible to perform adversarial attack. Likewise, we extend the metric to $\textit{backdoor-free radius}$ under which it’s impossible to perform backdoor attack.</p>

<p>Obviously:
\(\textit{adversarial-attack-free radius} \le \textit{backdoor-free radius}\)
and our initial experiment results show that for the same NN, there could be $&gt;15\%$ improvement/gap between the two radius.</p>

<ul>
  <li><u>I am still refinining the experiments.</u></li>
</ul>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="Adversarial Machine Learning" /><category term="Certified Robustness" /><category term="Backdoor Attack" /><category term="Neural Network Verification" /><summary type="html"><![CDATA[Certified robustness for backdoor attack.]]></summary></entry><entry><title type="html">Naive VQA: Implementations of a Strong VQA Baseline</title><link href="https://tinghaoxie.com/posts/2021/07/NaiveVQA/" rel="alternate" type="text/html" title="Naive VQA: Implementations of a Strong VQA Baseline" /><published>2021-07-17T00:00:00-07:00</published><updated>2021-07-17T00:00:00-07:00</updated><id>https://tinghaoxie.com/posts/2021/07/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/07/NaiveVQA/"><![CDATA[<blockquote>
  <p>What’s VQA?</p>
</blockquote>

<p><strong>Visual Qustion Answering (VQA)</strong> is a type of tasks, where given an image and a question about the image, a model is expected to give a correct answer.</p>

<p>For example, a <strong>visual</strong> image looks like this:</p>

<p><img src="/images/NaiveVQA_image_demo1.jpg" alt="" /></p>

<p>The <strong>question</strong> is: <em>What color is the girl’s necklace?</em></p>

<p>Our model would generate the <strong>answer</strong> ‘white’.</p>

<blockquote>
  <p>What’s MindSpore?</p>
</blockquote>

<p><a href="https://www.mindspore.cn/en">MindSpore</a> is a new AI framework developed by Huawei.</p>

<hr />

<h1 id="naivevqa-mindspore--pytorch-implementations-of-a-strong-vqa-baseline">NaiveVQA: MindSpore &amp; PyTorch Implementations of a Strong VQA Baseline</h1>

<p><img src="https://visitor-badge.laobi.icu/badge?page_id=vtu.NaiveVQA" alt="" /></p>

<p><a href="https://github.com/vtu81/NaiveVQA">This repository</a> contains a naive VQA model, which is our final project (<strong>mindspore</strong> implementation) for course DL4NLP at ZJU. It’s a reimplementation of the paper <a href="https://arxiv.org/abs/1704.03162">Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering</a>.</p>

<blockquote>
  <p>Checkout branch <code class="language-plaintext highlighter-rouge">pytorch</code> for our <strong>pytorch</strong> implementation.</p>
</blockquote>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout pytorch
</code></pre></div></div>

<h2 id="performance">Performance</h2>

<table>
  <thead>
    <tr>
      <th>Framework</th>
      <th>Y/N</th>
      <th>Num</th>
      <th>Other</th>
      <th>All</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>MindSpore</strong></td>
      <td>62.2</td>
      <td>7.5</td>
      <td>2.4</td>
      <td><strong>25.8</strong></td>
    </tr>
    <tr>
      <td><strong>PyTorch</strong></td>
      <td>66.3</td>
      <td>24.5</td>
      <td>25.0</td>
      <td><strong>40.6</strong></td>
    </tr>
  </tbody>
</table>

<ul>
  <li>
    <p>Per Question Type Accuracy (<strong>MindSpore</strong>)
<img src="/images/NaiveVQA_ms_result_per_question_type.png" alt="" /></p>
  </li>
  <li>
    <p>Per Question Type Accuracy (<strong>PyTorch</strong>)
<img src="/images/NaiveVQA_pt_result_per_question_type.png" alt="" /></p>
  </li>
</ul>

<h2 id="file-directory">File Directory</h2>

<ul>
  <li><code class="language-plaintext highlighter-rouge">data/</code>
    <ul>
      <li><code class="language-plaintext highlighter-rouge">annotations/</code> – annotations data (ignored)</li>
      <li><code class="language-plaintext highlighter-rouge">images/</code> – images data (ignored)</li>
      <li><code class="language-plaintext highlighter-rouge">questions/</code> – questions data (ignored)</li>
      <li><code class="language-plaintext highlighter-rouge">results/</code> – contains evaluation results when you evaluate a model with <code class="language-plaintext highlighter-rouge">./evaluate.ipynb</code></li>
      <li><code class="language-plaintext highlighter-rouge">clean.py</code> – a script to clean up <code class="language-plaintext highlighter-rouge">train.json</code> in both <code class="language-plaintext highlighter-rouge">data/annotations/</code> and <code class="language-plaintext highlighter-rouge">data/questions/</code></li>
      <li><code class="language-plaintext highlighter-rouge">align.py</code> – a script to sort and align up the annotations and questions</li>
    </ul>
  </li>
  <li><code class="language-plaintext highlighter-rouge">resnet/</code> – resnet directory, cloned from <a href="https://github.com/Cyanogenoid/pytorch-resnet/tree/9332392b01317d57e92f81e00933c48f423ff503">pytorch-resnet</a></li>
  <li><code class="language-plaintext highlighter-rouge">logs/</code> – should contain saved <code class="language-plaintext highlighter-rouge">.pth</code> model files</li>
  <li><code class="language-plaintext highlighter-rouge">config.py</code> – global configure file</li>
  <li><code class="language-plaintext highlighter-rouge">train.py</code> – training</li>
  <li><code class="language-plaintext highlighter-rouge">view-log.py</code> – a tool for visualizing an accuracy\epoch figure</li>
  <li><code class="language-plaintext highlighter-rouge">val_acc.png</code> – a demo for the accuracy\epoch figure</li>
  <li><code class="language-plaintext highlighter-rouge">model.py</code> – the major model</li>
  <li><code class="language-plaintext highlighter-rouge">preprocess-image.py</code> – preprocess the images, using ResNet152 to extract features for further usages</li>
  <li><code class="language-plaintext highlighter-rouge">preprocess-image-test.py</code> – to extract images in the test set</li>
  <li><code class="language-plaintext highlighter-rouge">preprocess-vocab.py</code> – preprocess the questions and annotations to get their vocabularies for further usages</li>
  <li><code class="language-plaintext highlighter-rouge">data.py</code> – dataset, dataloader and data processing code</li>
  <li><code class="language-plaintext highlighter-rouge">utils.py</code> – helper code</li>
  <li><code class="language-plaintext highlighter-rouge">evaluate.ipynb</code> – evaluate a model and visualize the result</li>
  <li><code class="language-plaintext highlighter-rouge">cover_rate.ipynb</code> – calculate the selected answers’ coverage</li>
  <li><code class="language-plaintext highlighter-rouge">assets/</code></li>
  <li><code class="language-plaintext highlighter-rouge">PythonHelperTools/</code> (currently not used)
    <ul>
      <li><code class="language-plaintext highlighter-rouge">vqaDemo.py</code> – a demo for VQA dataset APIs</li>
      <li><code class="language-plaintext highlighter-rouge">vqaTools/</code></li>
    </ul>
  </li>
  <li><code class="language-plaintext highlighter-rouge">PythonEvaluationTools/</code> (currently not used)
    <ul>
      <li><code class="language-plaintext highlighter-rouge">vqaEvalDemo.py</code> – a demo for VQA evaluation</li>
      <li><code class="language-plaintext highlighter-rouge">vaqEvaluation/</code></li>
    </ul>
  </li>
  <li><code class="language-plaintext highlighter-rouge">README.md</code></li>
</ul>

<h2 id="prerequisite">Prerequisite</h2>

<ul>
  <li>Free disk space of at least 60GB</li>
  <li>Nvidia GPU / Ascend Platform</li>
</ul>

<blockquote>
  <p><strong>Notice</strong>: We have successfully tested our code with <strong>MindSpore 1.2.1</strong> on <strong>Nvidia RTX 2080ti</strong>. Thus we strongly suggest you use MindSpore 1.2.1 GPU version. Since MindSpore is definitely not stable, any version different from 1.2.1 might cause failures.</p>
</blockquote>

<blockquote>
  <p>Also, due to some incompatibility among different versions of MindSpore, we still can’t manage to run the code on Ascend now. Fortunately, people are more possible to have an Nvidia GPU rather than an Ascend chip :)</p>
</blockquote>

<h2 id="quick-begin">Quick Begin</h2>

<h3 id="get-and-prepare-the-dataset">Get and Prepare the Dataset</h3>

<p>Get our VQA dataset (a small subset of VQA 2.0) from <a href="https://drive.google.com/open?id=1_VvBqqxPW_5HQxE6alZ7_-SGwbEt2_zn">here</a>. Unzip the file and move the subdirectories</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">annotations/</code></li>
  <li><code class="language-plaintext highlighter-rouge">images/</code></li>
  <li><code class="language-plaintext highlighter-rouge">questions/</code></li>
</ul>

<p>into the repository directory <code class="language-plaintext highlighter-rouge">data/</code>.</p>

<p>Prepare your dataset with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Only run the following command once!</span>

<span class="nb">cd </span>data

<span class="c"># Save the original json files</span>
<span class="nb">cp </span>annotations/train.json annotations/train_backup.json
<span class="nb">cp </span>questions/train.json questions/train_backup.json
<span class="nb">cp </span>annotations/val.json annotations/val_backup.json
<span class="nb">cp </span>questions/val.json questions/val_backup.json
<span class="nb">cp </span>annotations/test.json annotations/test_backup.json
<span class="nb">cp </span>questions/test.json questions/test_backup.json

python clean.py <span class="c"># run the clean up script</span>
<span class="nb">mv </span>annotations/train_cleaned.json annotations/train.json
<span class="nb">mv </span>questions/train_cleaned.json questions/train.json

python align.py <span class="c"># run the aligning script</span>
<span class="nb">mv </span>annotations/train_cleaned.json annotations/train.json
<span class="nb">mv </span>annotations/val_cleaned.json annotations/val.json
<span class="nb">mv </span>annotations/test_cleaned.json annotations/test.json

<span class="nb">mv </span>questions/train_cleaned.json questions/train.json
<span class="nb">mv </span>questions/val_cleaned.json questions/val.json
<span class="nb">mv </span>questions/test_cleaned.json questions/test.json
</code></pre></div></div>

<p>The scripts upon would</p>

<ul>
  <li>clean up your dataset (there are some images whose ids are referenced in the annotation &amp; question files, while the images themselves don’t exist!)</li>
  <li>align the questions’ ids for convenience while training</li>
</ul>

<h3 id="preprocess-images">Preprocess Images</h3>

<blockquote>
  <p>You actually don’t have to preprocess the images yourself. We have prepared the prerocessed features file for you, feel free to download it through <a href="https://e-share.obs-website.cn-north-1.myhuaweicloud.com?token=jA7oiibO5h2G1jmMINVC+oum3Lfah+Ut5bGgDFeTu5sI4zchCijmfATwP8KLRi9T5n7q00BW/bs2ugmV6RsBjmPdOUWaQEBJ0Fm0ND9DIrBCRfNNYmqbIH+Q2J0VgDY70KEHNOK3GW+0179M5NphG9YUSz9+JT3f4G3Jx4MLo6zky+l2nB6VdYLBxGspSx98Iq566+3aRL7NFJ/KbSRtUesX9iHSFJaFyBNNeyflZyzTQOmvs+xK17NWIeeJ7zdTuk/ojRn157m0m8uNzKg8+KQawvp53i/4y6kZ1qMh/ryBfjHsKIP18vz6OD0htixD66E/lr450IxpQHzqWp35Lixr8pptgrtBE4aWkcsvjTpupOfZdnqSzLY91QzCqU2578RDctILAb8mpvURWd7im2yUZUexBCsdCzp4HHUL1H3+C6UCTPe7XMDtz4yWhsZFATstbIHs6opMs3Ktp5/6HfA976nJJeJZnjLQp8NxwTVAoPUsckIxwFplhCIkpE38IrBq6mndpEP8G0VHLIKzYfDn6pS83JNzl4EPxknKkNL22OyWAge3ZC+Gh1mqrvCq">here</a> (the passcode is ‘dl4nlp’). You should download the <code class="language-plaintext highlighter-rouge">resnet-14x14.h5</code> (42GB) file and place it at the repository root directory. Once you’ve done that, skip this chapter!</p>
</blockquote>

<p><strong>Preprocess the images</strong> with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python preprocess-images.py
</code></pre></div></div>

<ul>
  <li>If you want to accelerate it, tune up <code class="language-plaintext highlighter-rouge">preprocess_batch_size</code> at <code class="language-plaintext highlighter-rouge">config.json</code></li>
  <li>If you run out of CUDA memory, tune down <code class="language-plaintext highlighter-rouge">preprocess_batch_size</code> ata <code class="language-plaintext highlighter-rouge">config.json</code></li>
</ul>

<p>The output should be <code class="language-plaintext highlighter-rouge">./resnet-14x14.h5</code>.</p>

<h3 id="preprocess-vocabulary">Preprocess Vocabulary</h3>

<blockquote>
  <p>The vocabulary only depends on the <strong>train</strong> set, as well as the <code class="language-plaintext highlighter-rouge">config.max_answers</code> (the number of selected candidate answers) you choose.</p>
</blockquote>

<p><strong>Preprocess the questions and annotations</strong> to get their vocabularies with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python preprocess-vocab.py
</code></pre></div></div>

<p>The output should be <code class="language-plaintext highlighter-rouge">./vocab.json</code>.</p>

<h3 id="train">Train</h3>

<p>Now, you can <strong>train the model</strong> with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python train.py
</code></pre></div></div>

<p>During training, a ‘.ckpt’ file and a ‘.json’ file would be saved under <code class="language-plaintext highlighter-rouge">./logs</code>. The <code class="language-plaintext highlighter-rouge">.ckpt</code> file contains the parameters of your model and can be reloaded. The <code class="language-plaintext highlighter-rouge">.json</code> file contains training metainfo records.</p>

<p><strong>View the training process</strong> with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python view-log.py &lt;path to .json train record&gt;
</code></pre></div></div>

<p>The output <code class="language-plaintext highlighter-rouge">val_acc.png</code> should look like these:</p>

<p><img src="/images/NaiveVQA_val_acc1.png" alt="" /></p>

<p>(a real train of PyTorch implementation)</p>

<p><img src="/images/NaiveVQA_val_acc2.png" alt="" /></p>

<p>(a real train of MindSpore implementation)</p>

<blockquote>
  <p>To continue training from a pretrained model, set the correct <code class="language-plaintext highlighter-rouge">pretrained_model_path</code> and the <code class="language-plaintext highlighter-rouge">pretrained</code> to True in <code class="language-plaintext highlighter-rouge">config.py</code>.</p>
</blockquote>

<h2 id="test-your-model">Test Your Model</h2>

<p>Likewise, you need to preprocess the test set’s images before testing. Run</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python preprocess-images-test.py
</code></pre></div></div>

<p>to extract features from <code class="language-plaintext highlighter-rouge">test/images</code>. The output should be <code class="language-plaintext highlighter-rouge">./resnet-14x14-test.h5</code>.</p>

<blockquote>
  <p>Likewise, we have prepared the <code class="language-plaintext highlighter-rouge">resnet-14x14-test.h5</code> for you. Download it <a href="https://e-share.obs-website.cn-north-1.myhuaweicloud.com?token=jA7oiibO5h2G1jmMINVC+oum3Lfah+Ut5bGgDFeTu5sI4zchCijmfATwP8KLRi9T5n7q00BW/bs2ugmV6RsBjmPdOUWaQEBJ0Fm0ND9DIrBCRfNNYmqbIH+Q2J0VgDY70KEHNOK3GW+0179M5NphG9YUSz9+JT3f4G3Jx4MLo6zky+l2nB6VdYLBxGspSx98Iq566+3aRL7NFJ/KbSRtUesX9iHSFJaFyBNNeyflZyzTQOmvs+xK17NWIeeJ7zdTuk/ojRn157m0m8uNzKg8+KQawvp53i/4y6kZ1qMh/ryBfjHsKIP18vz6OD0htixD66E/lr450IxpQHzqWp35Lixr8pptgrtBE4aWkcsvjTpupOfZdnqSzLY91QzCqU2578RDctILAb8mpvURWd7im2yUZUexBCsdCzp4HHUL1H3+C6UCTPe7XMDtz4yWhsZFATstbIHs6opMs3Ktp5/6HfA976nJJeJZnjLQp8NxwTVAoPUsckIxwFplhCIkpE38IrBq6mndpEP8G0VHLIKzYfDn6pS83JNzl4EPxknKkNL22OyWAge3ZC+Gh1mqrvCq">here</a> (the passcode is ‘dl4nlp’)</p>
</blockquote>

<p>We provide <code class="language-plaintext highlighter-rouge">evaluatie.ipynb</code> to test/evaluate the model. Open the notebook, and set the correct <code class="language-plaintext highlighter-rouge">eval_config</code>, you’re good to go! Just run the following cell one by one, you should be able to <strong>visualize the performance</strong> of your trained model.</p>

<h2 id="more-things">More Things</h2>

<ul>
  <li>To calculate the selected answers’ cover rate (determined by <code class="language-plaintext highlighter-rouge">config.max_answers</code>), check <code class="language-plaintext highlighter-rouge">cover_rate.ipynb</code>.</li>
</ul>

<h2 id="acknowledgement">Acknowledgement</h2>

<p>The current version of codes are translated from <code class="language-plaintext highlighter-rouge">pytorch</code> branch, where some codes are borrowed from repository <a href="https://github.com/Cyanogenoid/pytorch-vqa">pytorch-vqa</a>.</p>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="Deep Learning" /><category term="MindSpore" /><category term="PyTorch" /><category term="VQA" /><category term="NLP" /><category term="Machine Learning" /><summary type="html"><![CDATA[A Visual Question Answering model implemented in **MindSpore** and **PyTorch**. The model is a reimplementation of the paper *Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering*. It's our final project for course DL4NLP at ZJU.**Question**: *What color is the girl's necklace?***Answer**: 'white']]></summary></entry><entry><title type="html">RCC (C-like Compiler, R for either remarkable or retarded)</title><link href="https://tinghaoxie.com/posts/2021/06/RCC/" rel="alternate" type="text/html" title="RCC (C-like Compiler, R for either remarkable or retarded)" /><published>2021-06-09T00:00:00-07:00</published><updated>2021-06-09T00:00:00-07:00</updated><id>https://tinghaoxie.com/posts/2021/06/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/06/RCC/"><![CDATA[<p><img src="https://visitor-badge.laobi.icu/badge?page_id=Luke-Skycrawler.rcc" alt="visitors" /></p>

<blockquote>
  <p>Authors: <a href="https://github.com/Luke-Skycrawler">Haoyang Shi</a>, <a href="http://vtu.life">Tinghao Xie</a></p>
</blockquote>

<p><a href="https://github.com/Luke-Skycrawler/rcc">This repository</a> contains our course project for <em>Compiler Principle</em> at ZJU.</p>

<h3 id="differences-with-c">Differences with C</h3>

<ul>
  <li>type system: char, int, double and n-dimensional array type; Pointers and struct type is not supported in this version.</li>
  <li>no controled jumps, gotos and labels , i.e. break, continue and switch statements are not supported.</li>
  <li>pre-compile MARCO not supported</li>
  <li><code class="language-plaintext highlighter-rouge">scanf</code> and <code class="language-plaintext highlighter-rouge">printf</code> are automaticly declared and linked with libc in runtime</li>
  <li>calling convention of <code class="language-plaintext highlighter-rouge">scanf</code> modified. e.g. you shall use <code class="language-plaintext highlighter-rouge">scanf("%d",i)</code> to read the value into variable i and drop the <code class="language-plaintext highlighter-rouge">&amp;</code> symbol.</li>
  <li><code class="language-plaintext highlighter-rouge">for</code> loop snippet is switched to pascal-like <code class="language-plaintext highlighter-rouge">for(i: 0 to n){}</code>, where i is only seen within the scope of this loop</li>
  <li>unary operators not supported</li>
</ul>

<p>try out the test samples to get a better understanding of the gramma.</p>

<h3 id="prerequsite">Prerequsite</h3>

<ul>
  <li>flex 2.5+</li>
  <li>bison 3.0+</li>
  <li>clang 7.0+</li>
  <li>llvm 7.0+</li>
</ul>

<p>which is easily accessible via apt and other package managers.</p>

<p>It has been successfully tested with</p>
<ul>
  <li>flex 2.6.4 + bison 3.0.4 + llvm-12 on Ubuntu 18.04 (x86_64)</li>
  <li>flex 2.5.35 + bison 3.7.6 + llvm-12 on MacOS (x86_64)</li>
</ul>

<h3 id="install">Install</h3>

<p>Clean the directory with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make clean
</code></pre></div></div>

<p>Install with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make
</code></pre></div></div>

<p>If you want to install with a specific version of bison, install with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nv">BISON</span><span class="o">=[</span>YOUR-BISON-PATH]
</code></pre></div></div>

<p>If you are installing RCC with LLVM12 on MacOS, install with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nv">DEFINE</span><span class="o">=</span><span class="s1">'-D MACOS'</span>
</code></pre></div></div>

<h3 id="usage">Usage</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./rcc src_file
./a.out
</code></pre></div></div>
<p>The generated ELF object file and executable are named output.o and a.out respectively by default.</p>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="Compiler" /><category term="System" /><summary type="html"><![CDATA[A C-like Compiler, R for either remarkable or retarded, implemented with **Flex**, **Bison** and **LLVM**.]]></summary></entry><entry><title type="html">Enchecap</title><link href="https://tinghaoxie.com/posts/2021/03/Enchecap/" rel="alternate" type="text/html" title="Enchecap" /><published>2021-03-17T00:00:00-07:00</published><updated>2021-03-17T00:00:00-07:00</updated><id>https://tinghaoxie.com/posts/2021/03/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/03/Enchecap/"><![CDATA[<p><img src="https://visitor-badge.laobi.icu/badge?page_id=vtu81.Enchecap" alt="" /></p>

<p>An <strong>enc</strong>rypted (<strong>enc</strong>lave-based) <strong>he</strong>terogeneous <strong>ca</strong>lculation <strong>p</strong>rotocol based on Nvidia CUDA and Intel SGX, with a simple sample of matrix multiplication using CUBLAS, designed and implemented by <a href="http://vtu.life">Tinghao Xie</a>, <a href="https://github.com/Luke-Skycrawler">Haoyang Shi</a>, <a href="https://github.com/zjulzhhh">Zihang Li</a>.</p>

<h3 id="enchecap-illustration">Enchecap illustration:</h3>

<p><img src="/images/Enchecap_demo.png" alt="demo" /></p>

<h3 id="enchecap-illustration-with-protected-and-trusted-regions">Enchecap illustration (with <strong>protected</strong> and <strong>trusted</strong> regions):</h3>

<p><img src="/images/Enchecap_demo_box.png" alt="demo" /></p>

<h3 id="enchecap-performance">Enchecap performance:</h3>

<p><img src="/images/Enchecap_performance_0.png" alt="performance" /></p>

<hr />

<p>To <strong>build</strong> the project, you’ll need to install and configure:</p>
<ul>
  <li>SGX SDK</li>
  <li>CUDA Toolkit</li>
  <li>CUDA Samples</li>
</ul>

<p>, then set your <code class="language-plaintext highlighter-rouge">CUDA_PATH</code> and <code class="language-plaintext highlighter-rouge">INCLUDES</code> in Makefile, and make sure your SGX environment activated by</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">source</span> /PATH_OF_SGXSDK/environment
</code></pre></div></div>

<p>(check SGX SDK official <a href="https://01.org/intel-software-guard-extensions">site</a> for more details)</p>

<p>Then build with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="c"># SGX hardware mode</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make <span class="nv">SGX_MODE</span><span class="o">=</span>SIM  <span class="c"># SGX simulation mode</span>
</code></pre></div></div>

<p>(check README_SGX.txt for more details)</p>

<blockquote>
  <p>Your linux OS version might be limited by SGX SDK, check https://01.org/intel-software-guard-extensions for more details. We’re using Ubuntu 18.04 x86_64, and cannot guarantee it work successfully on other platforms. We are also compiling with gcc version 7.5.0 and nvcc v11.1, which do not pose such strict limitations compared to Intel SGX.</p>
</blockquote>

<hr />

<p>To <strong>run</strong> the project, you’ll need to install and configure correctly:</p>
<ul>
  <li>SGX PSW</li>
  <li>SGX driver, if you build it in hardware mode and that your CPU &amp; BIOS support SGX</li>
  <li>CUDA Driver (of course you must have an Nvidia GPU)</li>
</ul>

<p>Run with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./app
</code></pre></div></div>

<h2 id="todo">TODO</h2>

<p><strong>Notice</strong>: We have not implemented the user-server code into the library/sample now, since it’s similar to the host-device part of our protocol. For now, we just implement the host-device part. In this repository, we show how to wrap up the <code class="language-plaintext highlighter-rouge">cudaMemcpy()</code> into <code class="language-plaintext highlighter-rouge">secureCudaMemcpy()</code>, doing implicit en/decryption for handy secure deployment.</p>

<h3 id="phase-i-initialization">Phase I: Initialization</h3>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Create an enclave</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Enclave generates its own keys (generation is yet an empty shell now), then broadcasts its public key to user &amp; device</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />GPU generates its own keys (generation is yet an empty shell now), then broadcasts its public key to host &amp; user</li>
</ul>

<h3 id="phase-ii-calculation">Phase II: Calculation</h3>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />En/Decrypt in enclave (decrypt with SGX’s private key, encrypt with GPU’s public key)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />En/Decrypt on GPU (decrypt with GPU’s private key, encrypt with SGX’s public key)</li>
</ul>

<h3 id="future-work">Future Work</h3>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />The GPU’s and SGX’s keys are both simply welded in the code currently, need FIX</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />The current RSA en/decrypt algorithm is yet extremely naive! (further works include regrouping, big number supports…)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Add the user-server part into the sample, including
    <ul>
      <li>Remote attestation with Intel SGX</li>
      <li>Broadcast his/her public key to the enclave and GPU, meanwhile record their public keys</li>
      <li>Send encrypted data to the server</li>
      <li>Receive encrypted results from the server</li>
    </ul>
  </li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Intergration with real industrial work based on CUDA (like PyTorch)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Intergration with a real trusted GPU (far from our reach now)</li>
</ul>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="System Security" /><category term="CUDA" /><category term="SGX" /><summary type="html"><![CDATA[An (onbuilding) **enc**rypted (**enc**lave-based) **he**terogeneous **ca**lculation **p**rotocol based on Nvidia CUDA and Intel SGX.]]></summary></entry><entry><title type="html">Tron: a 3D WebGL Engine with a Flying Game Demo</title><link href="https://tinghaoxie.com/posts/2021/02/Tron/" rel="alternate" type="text/html" title="Tron: a 3D WebGL Engine with a Flying Game Demo" /><published>2021-02-04T00:00:00-08:00</published><updated>2021-02-04T00:00:00-08:00</updated><id>https://tinghaoxie.com/posts/2021/02/post</id><content type="html" xml:base="https://tinghaoxie.com/posts/2021/02/Tron/"><![CDATA[<p><img src="https://visitor-badge.laobi.icu/badge?page_id=ShawHaines.Tron" alt="visitors" /></p>

<p>A group project in Computer Graphics course, including a simple but fully-featured 3D engine based on native WebGL and a wonderful flying game demo, live available <a href="http://code.vtu.life/Tron">here</a>. Feel free to check out the <a href="https://github.com/ShawHaines/Tron">source code</a> at GitHub.</p>

<video controls="" autoplay="" name="media" style="width: 80%"><source src="/files/Tron_overview.mp4" type="video/mp4" /></video>

<p>A screenshot in navigation mode:</p>

<p><img src="/images/Tron_demo.png" alt="demo" /></p>]]></content><author><name>Tinghao Xie&lt;br/&gt;谢廷浩</name><email>thx@princeton.edu</email></author><category term="WebGL" /><category term="Computer Graphics" /><category term="3D Engine" /><category term="Game" /><summary type="html"><![CDATA[A group project in Computer Graphics course, including a simple but fully-featured 3D engine based on native WebGL and a wonderful flying game demo.]]></summary></entry></feed>