<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://2003pro.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://2003pro.github.io/" rel="alternate" type="text/html" /><updated>2025-06-06T00:43:10+08:00</updated><id>https://2003pro.github.io/feed.xml</id><title type="html">Jipeng ZHANG</title><subtitle>Ph.D. student at Hong Kong University of Science and Technology</subtitle><author><name>Jipeng ZHANG</name><email>zhangjipeng20@outlook.com</email></author><entry><title type="html">LLM for beginners</title><link href="https://2003pro.github.io/posts/2024/02/llm-for-beginners/" rel="alternate" type="text/html" title="LLM for beginners" /><published>2024-02-14T00:00:00+08:00</published><updated>2024-02-14T00:00:00+08:00</updated><id>https://2003pro.github.io/posts/2024/02/blog-post-1</id><content type="html" xml:base="https://2003pro.github.io/posts/2024/02/llm-for-beginners/"><![CDATA[<p>The following are two parts for helping the beginners in large language models (LLM) to get quick insight about it.</p>

<ol>
  <li>A collection of kernel papers of large language models for the beginners, which are aiming at helping the fresh LLMers for getting the idea of LLM techniques.</li>
  <li>Several reference materials for optimization, machine learning and pre-LLM NLP techniques.</li>
</ol>

<p>======</p>
<h2 id="1-kernel-paper-collection">1. Kernel Paper Collection</h2>
<p>======</p>

<hr />

<h3 id="11-distributed-word-representation">1.1 Distributed Word Representation</h3>
<hr />
<p><a href="http://arxiv.org/pdf/1301.3781.pdf">Efficient Estimation of Word Representations in Vector Space</a>
<a href="http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf">Distributed Representations of Words and Phrases and their Compositionality</a>
<a href="http://nlp.stanford.edu/pubs/glove.pdf">GloVe: Global Vectors for Word Representation</a></p>

<h3 id="12-contextual-word-representations-and-mlm-pretraining">1.2 Contextual Word Representations and MLM Pretraining</h3>
<hr />
<p><a href="https://arxiv.org/pdf/1810.04805.pdf">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a>
<a href="https://arxiv.org/pdf/1802.05365.pdf">ELMo: Deep contextualized word representations</a>
<a href="https://arxiv.org/pdf/1902.06006.pdf">Contextual Word Representations: A Contextual Introduction</a>
<a href="http://jalammar.github.io/illustrated-bert/">The Illustrated BERT, ELMo, and co.</a>
<a href="https://web.stanford.edu/~jurafsky/slpdraft/11.pdf">Jurafsky and Martin Chapter 11 (Fine-Tuning and Masked Language Models)</a></p>

<h3 id="13-generative-pretraining">1.3 Generative Pretraining</h3>
<hr />
<p><a href="https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf">GPT-2: Language Models are Unsupervised Multitask Learners</a>
<a href="https://arxiv.org/abs/2005.14165">GPT-3: Language Models are Few-Shot Learners</a>
<a href="https://arxiv.org/pdf/2302.13971.pdf">LLaMA: Open and Efficient Foundation Language Models</a></p>

<h3 id="14-instruction-tuning-and-alignment">1.4 Instruction Tuning and Alignment</h3>
<hr />
<p><a href="https://openai.com/research/instruction-following">InstructGPT: Aligning language models to follow instructions</a>
<a href="https://arxiv.org/abs/2210.11416">Scaling Instruction-Finetuned Language Models</a>
<a href="https://arxiv.org/abs/2212.10560">Self-Instruct: Aligning Language Models with Self-Generated Instructions</a>
<a href="https://crfm.stanford.edu/2023/03/13/alpaca.html">Alpaca: A Strong, Replicable Instruction-Following Model</a>
<a href="https://lmsys.org/blog/2023-03-30-vicuna/">Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality</a>
<a href="https://arxiv.org/abs/2305.18290">Direct Preference Optimization: Your Language Model is Secretly a Reward Model</a></p>

<h3 id="15-efficient-finetuning-techniques">1.5 Efficient Finetuning Techniques</h3>
<hr />
<p><a href="https://arxiv.org/abs/1902.00751">Parameter-Efficient Transfer Learning for NLP</a>
<a href="https://arxiv.org/abs/2106.09685">LoRA: Low-Rank Adaptation of Large Language Models</a>
<a href="https://arxiv.org/pdf/2305.14314.pdf">QLoRA: Efficient Finetuning of Quantized LLMs</a></p>

<h3 id="16-acceleration-and-efficiency">1.6 Acceleration and Efficiency</h3>
<hr />
<p><a href="https://arxiv.org/abs/2205.14135">FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness</a>
<a href="https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/">ZeRO &amp; DeepSpeed: New system optimizations enable training models with over 100 billion parameters</a>
<a href="https://arxiv.org/pdf/1604.06174.pdf">Gradient Checkpoint: Training Deep Nets with Sublinear Memory Cost</a>
<a href="https://intuitiveshorts.substack.com/p/short-7-what-is-gradient-accumulation?utm_source=profile&amp;utm_medium=reader2">What is Gradient Accumulation ?</a></p>

<h3 id="17-deployment-and-speed-up-inference">1.7 Deployment and Speed-Up Inference</h3>
<hr />
<p><a href="https://arxiv.org/pdf/2309.06180.pdf">vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention</a>
<a href="https://arxiv.org/pdf/2211.17192.pdf">Fast Inference from Transformers via Speculative Decoding</a></p>

<h2 id="2-reference-for-basic-ml-techniques">2. Reference for Basic ML techniques</h2>
<p>======</p>

<h3 id="21-optimization-and-neural-network-basics">2.1 Optimization and Neural Network Basics</h3>
<hr />

<p><a href="https://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03-neuralnets.pdf">Stanford SLP book notes on Neural Networks, Backpropagation</a></p>

<p><a href="https://www.youtube.com/playlist?list=PLlMkM4tgfjnJ3I-dbhO9JTw7gNty6o_2m">HKUST Prof.Kim’s PyTorchZeroToAll Tutorial</a></p>

<p><a href="https://www.deeplearningbook.org/contents/guidelines.html">Deep Learning Practical Methodology</a></p>

<hr />

<h3 id="22-language-model-and-neural-network-architectures">2.2 Language Model and Neural Network Architectures</h3>
<hr />

<p><a href="https://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes05-LM_RNN.pdf">Stanford CS224N notes on Language Models, RNN, GRU and LSTM </a></p>

<p><a href="https://web.stanford.edu/class/cs224n/readings/cs224n-self-attention-transformers-2023_draft.pdf">Stanford CS224N notes on Self-Attention &amp; Transformers  </a></p>

<p><a href="https://nlp.seas.harvard.edu/2018/04/03/attention.html">The Annotated Transformer</a></p>

<p><a href="https://jalammar.github.io/illustrated-transformer/">The Illustrated Transformer</a></p>

<hr />

<h3 id="23-word-vectors-and-tokenizers">2.3 Word Vectors and Tokenizers</h3>
<hr />

<p><a href="https://web.stanford.edu/class/cs224n/readings/cs224n_winter2023_lecture1_notes_draft.pdf">Stanford CS224N notes on Word Vectors </a></p>

<p><a href="https://huggingface.co/docs/transformers/tokenizer_summary#wordpiece">Huggingface Tokenizer’s Summary</a></p>

<p><a href="https://zhuanlan.zhihu.com/p/360290118">Tokenizers’ Chinese Summary on Zhihu</a></p>]]></content><author><name>Jipeng ZHANG</name><email>zhangjipeng20@outlook.com</email></author><category term="large language models" /><category term="paper collection" /><summary type="html"><![CDATA[The following are two parts for helping the beginners in large language models (LLM) to get quick insight about it.]]></summary></entry><entry><title type="html">Blog Post number 2</title><link href="https://2003pro.github.io/posts/2013/08/blog-post-2/" rel="alternate" type="text/html" title="Blog Post number 2" /><published>2013-08-14T00:00:00+08:00</published><updated>2013-08-14T00:00:00+08:00</updated><id>https://2003pro.github.io/posts/2013/08/blog-post-2</id><content type="html" xml:base="https://2003pro.github.io/posts/2013/08/blog-post-2/"><![CDATA[<p>This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.</p>

<h1 id="headings-are-cool">Headings are cool</h1>

<h1 id="you-can-have-many-headings">You can have many headings</h1>

<h2 id="arent-headings-cool">Aren’t headings cool?</h2>]]></content><author><name>Jipeng ZHANG</name><email>zhangjipeng20@outlook.com</email></author><category term="cool posts" /><category term="category1" /><category term="category2" /><summary type="html"><![CDATA[This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.]]></summary></entry></feed>