CREDIT: This list has been taken from an x.com account where the author shared the following list originally in https://arxiv.org/abs/2412.12121v1 and I provided a markdown version for easy access. All credit goes to @skillissue99 and authors on arxiv.
Top 40 papers: “… AI research continues to focus on large language models (LLMs) but is increasingly exploring alternatives to the transformer architecture, including diffusion and state space models.
Natural Language Processing continues to dominate the top papers, its importance, in terms of papers within the top-40, has successively decreased over the course of our previous reports.
Computer vision and more general machine learning are more strongly represented, in turn; this may indicate a shift towards broader and more diversified (multimodal) architectures.
While the use of LLMs to assist in writing is on the rise overall, our analysis of the top-40 papers reveals that their use remains surprisingly low in this subset, suggesting a potential inverse correlation between the use of LLMs and the overall quality of research conducted.”
Top 40 Natural Language Learning & Generation arXiv papers from 2023/1/1 to 2024/9/30
| No | Title | Cat. | Link | Week | Cit. | z-score | vs. 1/24 | vs. 9/23 | vs. 6/23 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Qwen2 Technical Report | CL | http://arxiv.org/abs/2407.10671 | 24/07/14-24/07/20 | 320 | 38.4 | - | - | - |
| 2 | The Llama 3 Herd of Models | AI | http://arxiv.org/abs/2407.21783 | 24/07/28-24/08/03 | 1192 | 35.4 | - | - | - |
| 3 | GPT-4 Technical Report | CL | http://arxiv.org/abs/2303.08774 | 23/03/12-23/03/18 | 8670 | 35.3 | ↓2 | ↓2 | ↓1 |
| 4 | Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | CL | http://arxiv.org/abs/2306.05685 | 23/06/04-23/06/10 | 2551 | 34.5 | → | ↑1 | ↑7 |
| 5 | Llama 2: Open Foundation and Fine-Tuned Chat Models | CL | http://arxiv.org/abs/2307.09288 | 23/07/16-23/07/22 | 8402 | 34.3 | ↓3 | ↓3 | - |
| 6 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | LG | http://arxiv.org/abs/2312.00752 | 23/11/26-23/12/02 | 1280 | 33.6 | ↑3 | - | - |
| 7 | Direct Preference Optimization: Your Language Model is Secretly a Reward Model | LG | http://arxiv.org/abs/2305.18290 | 23/05/28-23/06/03 | 2025 | 33.6 | ↑7 | ↑15 | - |
| 8 | LLaMA: Open and Efficient Foundation Language Models | CL | http://arxiv.org/abs/2302.13971 | 23/02/26-23/03/04 | 9170 | 33.5 | ↓6 | ↓6 | - |
| 9 | Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | CL | http://arxiv.org/abs/2404.14219 | 24/04/21-24/04/27 | 479 | 31.7 | - | - | - |
| 10 | Retrieval-Augmented Generation for Large Language Models: A Survey | CL | http://arxiv.org/abs/2312.10997 | 23/12/17-23/12/23 | 777 | 31.3 | ↑N | - | - |
| 11 | YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information | CV | http://arxiv.org/abs/2402.13616 | 24/02/18-24/02/24 | 404 | 30.3 | - | - | - |
| 12 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | CV | http://arxiv.org/abs/2301.12597 | 23/01/29-23/02/04 | 2977 | 29.6 | ↓5 | ↓6 | ↓6 |
| 13 | Segment Anything | CV | http://arxiv.org/abs/2304.02643 | 23/04/02-23/04/08 | 4496 | 29.6 | ↓5 | ↓6 | ↓3 |
| 14 | 3D Gaussian Splatting for Real-Time Radiance Field Rendering | GR | http://arxiv.org/abs/2308.04079 | 23/08/06-23/08/12 | 1811 | 29.5 | ↑19 | - | - |
| 15 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 | CL | http://arxiv.org/abs/2303.12712 | 23/03/19-23/03/25 | 2494 | 29.3 | ↓10 | ↓11 | ↓12 |
| 16 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | CL | http://arxiv.org/abs/2403.05530 | 24/03/03-24/03/09 | 682 | 29.0 | - | - | - |
| 17 | Improved Baselines with Visual Instruction Tuning | CV | http://arxiv.org/abs/2310.03744 | 23/10/01-23/10/07 | 1503 | 28.0 | ↑N | - | - |
| 18 | QLorA: Efficient Finetuning of Quantized LLMs | LG | http://arxiv.org/abs/2305.14314 | 23/05/21-23/05/27 | 1648 | 27.6 | ↓8 | ↓7 | ↓9 |
| 19 | Code Llama: Open Foundation Models for Code | CL | http://arxiv.org/abs/2308.12950 | 23/08/20-23/08/26 | 1349 | 27.3 | ↓8 | - | - |
| 20 | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | CV | http://arxiv.org/abs/2409.12191 | 24/09/15-24/09/21 | 88 | 27.0 | - | - | - |
| 21 | Mistral 7B | CL | http://arxiv.org/abs/2310.06825 | 23/10/08-23/10/14 | 1323 | 26.4 | ↑3 | - | - |
| 22 | Mixtral of Experts | LG | http://arxiv.org/abs/2401.04038 | 24/01/07-24/01/13 | 648 | 25.4 | ↑1 | - | - |
| 23 | Adding Conditional Control to Text-to-Image Diffusion Models | CV | http://arxiv.org/abs/2302.05543 | 23/02/05-23/02/11 | 2738 | 25.4 | ↓11 | ↓8 | - |
| 24 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | CL | http://arxiv.org/abs/2406.12793 | 24/06/16-24/06/22 | 150 | 25.3 | - | - | - |
| 25 | Efficient Memory Management for Large Language Model Serving with PagedAttention | LG | http://arxiv.org/abs/2309.10235 | 23/09/10-23/09/16 | 971 | 24.4 | → | - | - |
| 26 | Visual Instruction Tuning | CV | http://arxiv.org/abs/2304.08485 | 23/04/16-23/04/22 | 2011 | 24.2 | ↓11 | ↓14 | ↓19 |
| 27 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | CV | http://arxiv.org/abs/2305.06500 | 23/05/07-23/05/13 | 1409 | 23.8 | ↓8 | ↓6 | ↑6 |
| 28 | Qwen Technical Report | CL | http://arxiv.org/abs/2309.16609 | 23/09/24-23/09/30 | 960 | 23.1 | ↑N | - | - |
| 29 | DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence | SE | http://arxiv.org/abs/2401.14196 | 24/01/21-24/01/27 | 356 | 21.6 | ↑N | - | - |
| 30 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | CL | http://arxiv.org/abs/2405.04434 | 24/05/05-24/05/11 | 152 | 21.3 | - | - | - |
| 31 | Baichuan 2: Open Large-scale Language Models | CL | http://arxiv.org/abs/2309.10305 | 23/09/17-23/09/23 | 552 | 21.1 | ↓14 | ↑6 | - |
| 32 | Universal and Transferable Adversarial Attacks on Aligned Language Models | CL | http://arxiv.org/abs/2307.15043 | 23/07/23-23/07/29 | 881 | 20.0 | ↑N | - | - |
| 33 | PaLM-E: An Embodied Multimodal Language Model | LG | http://arxiv.org/abs/2303.03378 | 23/03/05-23/03/11 | 1214 | 19.9 | ↓17 | ↓23 | ↓28 |
| 34 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | CV | http://arxiv.org/abs/2307.01952 | 23/07/02-23/07/08 | 1224 | 19.2 | ↓4 | - | - |
| 35 | Tree of Thoughts: Deliberate Problem Solving with Large Language Models | CL | http://arxiv.org/abs/2305.10601 | 23/05/14-23/05/20 | 1140 | 18.8 | ↓13 | ↓21 | ↓22 |
| 36 | Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators | LG | http://arxiv.org/abs/2404.04475 | 24/03/31-24/04/06 | 175 | 18.3 | - | - | - |
| 37 | Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | AI | http://arxiv.org/abs/2408.11039 | 24/08/18-24/08/24 | 36 | 18.0 | - | - | - |
| 38 | KAN: Kolmogorov-Arnold Networks | LG | http://arxiv.org/abs/2404.19756 | 24/04/28-24/05/04 | 181 | 18.0 | - | - | - |
| 39 | Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | CV | http://arxiv.org/abs/2401.09417 | 24/01/14-24/01/20 | 385 | 17.9 | ↑N | - | - |
| 40 | Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets | CV | http://arxiv.org/abs/2311.15127 | 2023/11/19-2023/11/25 | 499 | 16.6 | ↑N | - | - |