Awesome LLM Technical Reports (2025-01 ~ 2026-04)

A curated, structured local archive of frontier LLM / multimodal / medical-vertical model documentation — papers, system cards, model cards, and official blog posts — organized by year / company.

Project Scope

Systematically archives major model releases from January 2025 to April 2026 across LLM, multimodal, and medical-vertical domains.
Downloads official papers, system cards, model cards as local PDFs; exports web-only blog pages to PDF via headless browser.
Provides a single searchable Markdown index sorted in reverse chronological order.

Release Timeline

Legend (Camp Colors): OpenAI · Anthropic · Google · China-based Labs · Other Global
Impact Highlight: nodes with ★ are ecosystem-shaping releases (community discussion, benchmark influence, or deployment adoption).

Monthly Density Snapshot

Bubble size follows the release count from the model index table.

Company Quick Links

2026: Zhipu AI · Google · OpenAI · MiniMax · Meituan · NVIDIA · Microsoft · InternLM · Anthropic · InclusionAI (Ant Group) · Snowflake · ByteDance · Moonshot AI

2025: StepFun · Zhipu AI · MiniMax · Meituan · Allen AI · Alibaba · Google · NVIDIA · xAI · OpenAI · Anthropic · InternLM · Quark · Moonshot AI · ByteDance · Tencent · Meta · DeepSeek

Company Directory Index

Alibaba / Qwen: 2025/alibaba_qwen/
Allen AI: 2025/allenai/
Anthropic: 2025/anthropic/, 2026/anthropic/
ByteDance: 2025/bytedance/, 2026/bytedance/
DeepSeek: 2025/deepseek/
Google: 2025/google/, 2026/google/
InclusionAI (Ant Group): 2026/inclusionai/
InternLM: 2025/internlm/, 2026/internlm/
Meituan: 2025/meituan/, 2026/meituan/
Meta: 2025/meta/
Microsoft: 2026/microsoft/
MiniMax: 2025/minimax/, 2026/minimax/
Moonshot AI: 2025/moonshot/, 2026/moonshot/
NVIDIA: 2025/nvidia/, 2026/nvidia/
OpenAI: 2025/openai/, 2026/openai/
Quark (Alibaba): 2025/quark/
Snowflake: 2026/snowflake/
StepFun: 2025/stepfun/
Tencent: 2025/tencent/
Zhipu AI: 2025/zhipu/, 2026/zhipu/
xAI: 2025/xai/

Model Index (Folded by Year)

2026 (25 models)

Release Date	Organization	Model	Core Highlights (from PDF)	Official Link	Local File
2026-04	Zhipu AI	GLM-5V-Turbo	multimodal coding and agentic tasks, as well as pure-text coding, GLM-5V-Turbo delivers strong performance with a smaller model size 30+ Task Joint Reinforcement Learning : During RL, the model is jointly optimized across 30+ task types, spanning STEM, grounding, video, GUI agents, and coding agents, resulting in more robust gains in perception, reasoning, and agentic execution	https://docs.z.ai/guides/vlm/glm-5v-turbo	2026/zhipu/2026-04_glm-5v-turbo.pdf
2026-04	Google	Gemma 4	Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding	https://ai.google.dev/gemma/docs/core/model_card_4?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content	2026/google/2026-04_gemma-4.pdf
2026-03	OpenAI	GPT-5.4 Thinking	Frontier reasoning model that unifies recent gains in coding, agentic workflows, and deep web research, while adding high-capability cybersecurity mitigations and stronger chain-of-thought monitoring.	https://deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf	2026/openai/2026-03_gpt-5.4-thinking.pdf
2026-03	OpenAI	GPT-5.3 Instant	General-purpose GPT-5 update tuned for richer web-grounded answers, smoother follow-up behavior, fewer dead ends and caveats, and improved everyday conversational usefulness.	https://deploymentsafety.openai.com/gpt-5-3-instant/gpt-5-3-instant.pdf	2026/openai/2026-03_gpt-5.3-instant.pdf
2026-03	Google	Gemini 3.1 Flash-Lite	Evaluation Approach : Gemini 3.1 Flash-Lite was evaluated across a range of benchmarks, including speed, reasoning, multimodal capabilities, factuality, agentic tool use, multi-lingual performance, coding, and long-context Model dependencies: Gemini 3.1 Flash-Lite is based on Gemini 3 Pro	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Lite-Model-Card.pdf	2026/google/2026-03_gemini-3.1-flash-lite.pdf
2026-03	Google	Gemini 3.1 Flash Live	Real-time multimodal model with native audio input/output, 128K context, and evaluation emphasis on low-latency voice and video interactions, conversational audio understanding, and multi-step function use.	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Live-Model-Card.pdf	2026/google/2026-03_gemini-3.1-flash-live.pdf
2026-03	MiniMax	MiniMax M2.7	M2.7 is our first model deeply participating in its own evolution M2.7 is capable of building complex agent harnesses and completing highly elaborate productivity tasks, leveraging capabilities such as Agent Teams, complex Skills, and dynamic tool search	https://www.minimax.io/news/minimax-m27-en	2026/minimax/2026-03_minimax-m2.7.pdf
2026-03	Meituan	LongCat-Next	Building on this foundation, we develop LongCat-Next, a native multimodal model that processes text, vision, and audio under a single autoregressive objective with minimal modality-specific design To transcend this limitation, we introduce Discrete Native Autoregressive (DiNA), a unified framework that represents multimodal information within a shared discrete space, enabling a consistent and principled autoregressive modeling across modalities	https://arxiv.org/pdf/2603.27538	2026/meituan/2026-03_longcat-next.pdf
2026-03	Meituan	LongCat-Flash-Prover	We introduce LongCat-Flash-Prover, a flagship 560-billion-parameter open-source Mixture-of-Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR) The overview of the training process is shown in Figure 3, it begins with an initial checkpoint derived from the LongCat Mid-train Base model, an early-stage version of our previous LongCat-Flash-Thinking-2601	https://arxiv.org/pdf/2603.21065	2026/meituan/2026-03_longcat-flash-prover.pdf
2026-03	NVIDIA	Nemotron 3 Super	We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model Nemotron 3 Super : Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning IFBench (Inst	https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf	2026/nvidia/2026-03_nemotron-3-super.pdf
2026-03	Microsoft	Phi-4-reasoning-vision-15B	Phi-4-reasoning-vision-15B is a compact open-weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user interfaces, as shown in Figure 1	https://arxiv.org/pdf/2603.03975	2026/microsoft/2026-03_phi-4-reasoning-vision-15b.pdf
2026-03	InternLM	Intern-S1-Pro	Intern- S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T to- kens, including over 2.5T tokens from scientific domains 2 Intern-S1 Technical Report In the pre-training stage, the key challenge is to prepare large-scale pre-training data for those low-resource but high-value science domains	https://arxiv.org/pdf/2508.15763	2026/internlm/2026-03_intern-s1-pro.pdf
2026-03	Anthropic	Claude Opus 4.6	Nico Christie Co-founder & CTO , Shortcut.ai 01 / 20 Evaluating Claude Opus 4.6 Across agentic coding, computer use, tool use, search, and finance , Opus 4.6 is an industry-leading model, often by a wide margin Claude Opus 4.6 is available today on claude.ai , our API, and all major cloud platforms	https://www.anthropic.com/news/claude-opus-4-6	2026/anthropic/2026-03_claude-opus-4.6.pdf
2026-02	Zhipu AI	GLM-5	Next-generation foundation model designed for agentic engineering; adopts DSA (DeepSeek Sparse Attention) on top of MoE 744B/40B with async RL to strengthen reasoning, coding, and agent capabilities.	https://docs.z.ai/guides/llm/glm-5	2026/zhipu/2026-02_glm-5.pdf
2026-02	OpenAI	GPT-5.3-Codex	29 2 1 Introduction GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2 As explained in the GPT-5.1-Codex-Max system card, the model is not intended for conversational use	https://deploymentsafety.openai.com/gpt-5-3-codex/gpt-5-3-codex.pdf	2026/openai/2026-02_gpt-5.3-codex.pdf
2026-02	MiniMax	MiniMax M2.5	Extensively RL-trained frontier model; SOTA in coding (80.2% SWE-Bench Verified), agentic tool use, and search; 37% faster than M2.1 at 100 tok/s with costs as low as $1/hour continuous operation.	https://www.minimax.io/news/minimax-m25	2026/minimax/2026-02_minimax-m2.5.pdf
2026-02	InclusionAI (Ant Group)	Ling 2.5	1T total / 63B active parameters with hybrid linear attention; supports up to 1M context via YaRN, features composite reward RL for efficiency-performance balance, and is compatible with mainstream agent platforms.	https://github.com/inclusionAI/Ling-V2.5	2026/inclusionai/2026-02_ling-2.5.pdf
2026-02	Google	Gemini 3.1 Pro	Advanced sparse-MoE multimodal reasoning model with 1M context, stronger agentic coding and long-context performance than Gemini 3 Pro, and published safety assessments under Google DeepMind's Frontier Safety Framework.	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf	2026/google/2026-02_gemini-3.1-pro.pdf
2026-02	Google	Gemini 3.1 Flash Image	Gemini 3.1 Flash Image can comprehend input from different information sources, including text, images, audio and video 1 Model Data Training Dataset: Gemini 3.1 Flash Image is based on Gemini 3 Flash	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Image-Model-Card.pdf	2026/google/2026-02_gemini-3.1-flash-image.pdf
2026-02	Snowflake	Arctic-AWM	Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments However, scaling such agent training is limited by the lack of di- verse and reliable environments	https://arxiv.org/pdf/2602.10090	2026/snowflake/2026-02_arctic-awm.pdf
2026-02	ByteDance	MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs	Med XIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities We present Med XIAOHE, a medical vision-language foundation model designed to advance general- purpose medical understanding and reasoning in real-world clinical applications	https://arxiv.org/pdf/2602.12705	2026/bytedance/2026-02_medxiaohe-a-comprehensive-recipe-for-building-medical-mllms.pdf
2026-01	Zhipu AI	GLM-4.7-Flash	Production-Ready Performance: Built for enterprise workloads with the reliability your applications demand Key Features 🎯 All-in-One API: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more	https://huggingface.co/zai-org/GLM-4.7-Flash	2026/zhipu/2026-01_glm-4.7-flash.pdf
2026-01	Google	MedGemma 1.5	To our knowledge, MedGemma 1.5 is the first public release of an open multimodal large language model that can interpret high-dimensional medical data while also retaining the ability to interpret general 2D data and text MedGemma 1.5 4B improves at text-based tasks over MedGemma 1 4B, including on medical reasoning (Med QA) and electronic health record information retrieval (EHRQA)	https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/	2026/google/2026-01_medgemma-1.5.pdf
2026-01	Meituan	LongCat-Flash-Thinking-2601	We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability In this work, we introduce LongCat-Flash-Thinking-2601, a powerful and efficient Mixture-of-Experts (MoE) reasoning model with 560B total parameters and 27B activated parameters on average per token, featuring strong agentic reasoning capability	https://arxiv.org/pdf/2601.16725	2026/meituan/2026-01_longcat-flash-thinking-2601.pdf
2026-01	Moonshot AI	Kimi K2.5	Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base Key Features Native Multimodality : Pre-trained on vision–language tokens, K2.5 excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs	https://github.com/MoonshotAI/Kimi-K2.5	2026/moonshot/2026-01_kimi-k2.5.pdf

2025 (58 models)

Release Date	Organization	Model	Core Highlights (from PDF)	Official Link	Local File
2025-12	StepFun	Step-DeepResearch	To address this, we introduce Step-Deep Research, a cost-effective, end-to-end Deep Research agent model Kimi-Researcher [3] supports long-horizon multi-turn search reasoning through end-to-end agentic RL training, employing context management mechanisms and asynchronous rollout systems	https://arxiv.org/pdf/2512.20491	2025/stepfun/2025-12_step-deepresearch.pdf
2025-12	Zhipu AI	GLM-4.7	This reduces the time developers spend on style “fine-tuning.” GLM-4.7 delivers significant upgrades in layout and aesthetics for office creation Multimodal Interaction and Real-Time Application Development In scenarios requiring cameras, real-time input, and interactive controls, GLM-4.7 demonstrates superior system-level comprehension	https://docs.z.ai/guides/llm/glm-4.7	2025/zhipu/2025-12_glm-4.7.pdf
2025-12	MiniMax	MiniMax M2.1	2025.12.23 MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks Access API Coding Plan Try Agent Now MiniMax has been continuously transforming itself in a more AI-native way Today we are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI- native ways of working (and living) sooner	https://www.minimax.io/news/minimax-m21	2025/minimax/2025-12_minimax-m2.1.pdf
2025-12	Meituan	LongCat-Image	We are releasing not only multiple model versions for text-to-image and image editing, including checkpoints after mid-training and post-training stages, but also the entire toolchain of training procedure Beyond generation, LongCat-Image also excels in image editing, achieving SOTA results on standard benchmarks with superior editing consistency compared to other open-source works	https://arxiv.org/pdf/2512.07584	2025/meituan/2025-12_longcat-image.pdf
2025-12	Allen AI	OLMo 3	Our flagship model,Olmo 3.1 Think32B, is the strongest fully-open thinking model released to-date 2.2 Post-training We post-train Olmo 3 Baseinto three model variants: • Olmo 3 Think(Section §4) is trained to perform extended reasoning by generating a structured thinking trace before a final answer	https://arxiv.org/pdf/2512.13961	2025/allenai/2025-12_olmo-3.pdf
2025-12	Alibaba	Qwen-Image	Based on this insight, we introduce Qwen-Image- Layered, an end-to-end diffusion model that directly de- composes a single RGB image into multiple semantically disentangled RGBA layers • 2) Unlike prior methods that decompose images into fore- ground and background [18, 45], we propose a VLD- MMDi T (Variable Layers Decomposition MMDi T), which supports decomposition into a variable number of layers and is compatible with multi-task training	https://arxiv.org/pdf/2512.15603	2025/alibaba_qwen/2025-12_qwen-image.pdf
2025-12	Google	Gemini 3 Flash	Gemini 3 Flash is built off of the Gemini 3 Pro reasoning foundation with thinking levels to control the mix of quality, cost and latency Model dependencies: Gemini 3 Flash is based on Gemini 3 Pro	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf	2025/google/2025-12_gemini-3-flash.pdf
2025-11	Alibaba	Qwen3-VL	Most capable vision-language model in the Qwen series; natively supports interleaved contexts up to 256K tokens, seamlessly integrating text, images, and video for multimodal reasoning.	https://arxiv.org/pdf/2511.21631	2025/alibaba_qwen/2025-11_qwen3-vl.pdf
2025-11	NVIDIA	Nemotron 3 Nano 4B	The accuracy shown is the average across all benchmarks: MATH-500, AIME-2024, AIME-2025, GPQA, Live Code Bench v5, and MMLU-Pro.Right: Scaling analysis comparing Nemotron Elastic and Minitron-SSM as model family size grows We validate our approach by training elastic vari- ants of Nemotron Nano V2 12B reasoning model [14], producing both homogeneous and heterogeneous 9B configurations plus a 6B variant, all from a single training run	https://arxiv.org/pdf/2511.16664	2025/nvidia/2025-11_nemotron-3-nano-4b.pdf
2025-11	xAI	Grok 4.1 Fast	November 2025 Copy for LLM View as Markdown Nov 19 Grok 4.1 Fast is available in Enterprise API You can now use Grok 4.1 Fast in the x AI Enterprise API October 2025 Copy for LLM View as Markdown Oct 15 Tools are now generally available New agentic server-side tools including web_search , x_search and code_execution are available	https://docs.x.ai/docs/release-notes	2025/xai/2025-11_grok-4.1-fast.pdf
2025-11	OpenAI	GPT-5.1-Codex-Max	26 2 1 Introduction GPT-5.1-Codex-Max is our new frontier agentic coding model 3 2 Baseline model safety evaluations 3 2.1 Disallowed content evaluations	https://cdn.openai.com/pdf/2a7d98b1-57e5-4147-8d0e-683894d782ae/5p1_codex_max_card_03.pdf	2025/openai/2025-11_gpt-5.1-codex-max.pdf
2025-11	Anthropic	Claude Haiku 4.5	Guy Gur-Ari Co-Founder Claude Haiku 4.5 is a leap forward for agentic coding , particularly for sub-agent orchestration and computer use tasks Claude Haiku 4.5 is available everywhere today	https://www.anthropic.com/news/claude-haiku-4-5	2025/anthropic/2025-11_claude-haiku-4.5.pdf
2025-11	Anthropic	Claude Sonnet 4.5	Claude Sonnet 4.5 is the best coding model in the world The model also shows improved capabilities on a broad range of evaluations including reasoning and math: Claude Sonnet 4.5 is our most powerful model to date	https://www.anthropic.com/news/claude-sonnet-4-5	2025/anthropic/2025-11_claude-sonnet-4.5.pdf
2025-11	Google	Gemini 3 Pro Image	Gemini 3 Pro Image is now Google’s most advanced model for image generation and can comprehend vast datasets, challenging problems from different information sources, including text and images Model dependencies: Gemini 3 Pro Image is based on Gemini 3 Pro	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Image-Model-Card.pdf	2025/google/2025-11_gemini-3-pro-image.pdf
2025-11	Google	Gemini 3 Pro	Gemini 3 Pro is now Google’s most advanced model for complex tasks, and can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and entire code repositories Gemini 3 Pro is trained using reinforcement learning techniques that can leverage multi-step reasoning, problem-solving and theorem-proving data	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf	2025/google/2025-11_gemini-3-pro.pdf
2025-10	Meituan	LongCat-Flash-Omni	Open-source omni-modal 560B model (27B activated) optimized for low-latency real-time audio-visual interaction; uses curriculum-inspired progressive multimodal training with modality-decoupled parallelism sustaining over 90% of text-only training throughput.	https://arxiv.org/pdf/2511.00279	2025/meituan/2025-10_longcat-flash-omni.pdf
2025-10	MiniMax	MiniMax M2	2025.10.27 MiniMax M2 & Agent: Ingenious in Sim plicity Access API Coding Plan Try Agent Now From Day 1 of our founding, we have been committed to the vision of "	https://www.minimax.io/news/minimax-m2	2025/minimax/2025-10_minimax-m2.pdf
2025-10	Meituan	LongCat-Video	Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video generation tasks In this report, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters that delivers strong performance across general video generation tasks, particularly excelling in efficient, high-quality long video generation	https://arxiv.org/pdf/2510.22200	2025/meituan/2025-10_longcat-video.pdf
2025-10	InternLM	Intern-S1	Intern- S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T to- kens, including over 2.5T tokens from scientific domains 2 Intern-S1 Technical Report In the pre-training stage, the key challenge is to prepare large-scale pre-training data for those low-resource but high-value science domains	https://arxiv.org/pdf/2508.15763	2025/internlm/2025-10_intern-s1.pdf
2025-10	Google	Gemini 2.5 Computer Use	Ethics and Safety Evaluation Approach: As the Gemini 2.5 Computer Use Model is based off of Gemini 2.5 Pro, we rely on Ethics & Safety evaluations reported for Gemini 2.5 Pro Frontier Safety Assessment: Because model usage is restricted to the Gemini 2.5 Computer Use tool, the scope of capabilities is limited to browser and mobile user interface controls; it is therefore not in scope for a Frontier Safety Framework assessment	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Computer-Use-Model-Card.pdf	2025/google/2025-10_gemini-2.5-computer-use.pdf
2025-09	Meituan	LongCat-Flash	560B MoE language model designed for computational efficiency and agentic capabilities; introduces Zero-computation Experts and novel routing for scalable inference.	https://arxiv.org/pdf/2509.01322	2025/meituan/2025-09_longcat-flash.pdf
2025-09	Meituan	LongCat-Flash-Thinking	Efficient 560B MoE reasoning model built on LongCat-Flash; cultivated through long CoT data cold-start and curriculum RL for formal and agentic reasoning.	https://arxiv.org/pdf/2509.18883	2025/meituan/2025-09_longcat-flash-thinking.pdf
2025-09	Alibaba	Qwen3-Omni	We present Qwen3-Omni, a single multimodal model that for the first time maintains state-of-the-art performance across text, image, audio, and video without any degra- dation relative to single-modal counterparts Based on these features, Qwen3-Omni supports a wide range of tasks, including but not limited to voice dialogue, video dialogue, and video reasoning	https://arxiv.org/pdf/2509.17765	2025/alibaba_qwen/2025-09_qwen3-omni.pdf
2025-09	Google	Gemini 2.5 Flash-Lite	Gemini 2.5 Flash-Lite is an addition to our hybrid reasoning model family, giving developers the ability to turn a model's thinking on or off This model offers improved performance compared to 2.0 Flash-Lite, with strong results in coding, math, science, and reasoning benchmarks	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Flash-Lite-Model-Card.pdf	2025/google/2025-09_gemini-2.5-flash-lite.pdf
2025-09	Google	Gemini 2.5 Flash and Gemini 2.5 Flash Image	image and audio) as additional outputs of Gemini 2.5 Flash; information specific to these modalities is specified in line (i.e Gemini 2.5 Flash is Google’s first fully hybrid reasoning model, giving developers the ability to turn a model’s thinking on or off	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Flash-Model-Card.pdf	2025/google/2025-09_gemini-2.5-flash-and-gemini-2.5-flash-image.pdf
2025-09	Alibaba	Qwen3.5	Under the 32k/256k context length, the decoding throughput of Qwen3.5-397B-A17B is 8.6x/19.0x that of Qwen3-Max, and the performance is comparable The decoding throughput of Qwen3.5-397B-A17B is 3.5x/7.2 times that of Qwen3-235B-A22B	https://qwen.ai/blog?id=qwen3.5	2025/alibaba_qwen/2025-09_qwen3.5.pdf
2025-09	Alibaba	Qwen3-Next	Post-training Instruct Model Performance Qwen3-Next-80B-A3B-Instruct significantly outperforms Qwen3-30B-A3B-Instruct-2507 and Qwen3- 32B-Non-thinking, and achieves results nearly matching our flagship Qwen3-235B-A22B- Instruct-2507 This base model achieves performance comparable to (or even slightly better than) the dense Qwen3-32B model, while using less than 10% of its training cost (GPU hours)	https://qwen.ai/blog?id=qwen3-next	2025/alibaba_qwen/2025-09_qwen3-next.pdf
2025-09	Alibaba	Qwen3-Max	Meanwhile, Qwen3-Max-Thinking — still under active training — is already demonstrating remarkable potential Moreover, on Tau2-Bench — a rigorous evaluation of agent tool-calling proficiency — Qwen3-Max-Instruct delivers a breakthrough score of 74.8, surpassing both Claude Opus 4 and Deep Seek V3.1	https://qwen.ai/blog?id=qwen3-max	2025/alibaba_qwen/2025-09_qwen3-max.pdf
2025-08	OpenAI	GPT-5	Unified system card covering multi-model routing architecture and comprehensive safety evaluations across the GPT-5 model family including reasoning and tool-use capabilities.	https://cdn.openai.com/gpt-5-system-card.pdf	2025/openai/2025-08_gpt-5.pdf
2025-08	OpenAI	gpt-oss-120b/20b	Apache 2.0 open-weight MoE models (120B and 20B); model card covers architecture, quantization, and post-training for reasoning and tool use.	https://deploymentsafety.openai.com/gpt-oss	2025/openai/2025-08_gpt-oss-120b-20b.pdf
2025-08	Google	Gemma 3 270M	Gemma 3 270M is its low power consumption For example, check out this Bedtime Story Generator web app : Link to Youtube Video (visible only when JS is disabled) Gemma 3 270M used to power a Bedtime Story Generator web app using Transformers.js	https://developers.googleblog.com/en/introducing-gemma-3-270m/	2025/google/2025-08_gemma-3-270m.pdf
2025-08	InternLM	Intern-S1-mini	We introduce Intern-S1-mini, a lightweight open-source multimodal reasoning model based on the same techniques as Intern-S1 Built upon an 8B dense language model (Qwen3) and a 0.3B Vision encoder (Intern Vi T), Intern-S1-mini has been further pretrained on 5 trillion tokens of multimodal data, including over 2.5 trillion scientific-domain tokens	https://huggingface.co/internlm/Intern-S1-mini	2025/internlm/2025-08_intern-s1-mini.pdf
2025-08	Anthropic	Claude Opus 4.5	For instance, the open-source model Search-R1, when paired with the BM25 retriever, achieves 3.86% accuracy, whereas the GPT-5 achieves 55.9% Integrating the GPT-5 with the Qwen3-Embedding-8B retriever further enhances its accuracy to 70.1% with fewer search calls	https://arxiv.org/pdf/2508.06600	2025/anthropic/2025-08_claude-opus-4.5.pdf
2025-08	Quark (Alibaba)	QuarkMed Medical Foundation Model	This report introduces Quark Med, a medical foundation model designed to meet these demands Unlike general-domain text, medical language is characterized by a highly specialized vocabulary, complex clinical concepts, and a nuanced syntax that is often ambiguous and context-dependent	https://arxiv.org/pdf/2508.11894	2025/quark/2025-08_quarkmed-medical-foundation-model.pdf
2025-08	Google	Gemma 3	Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions	https://ai.google.dev/gemma/docs/core/model_card_3	2025/google/2025-08_gemma-3.pdf
2025-08	Google	Gemini 2.5 Deep Think	Gemini 2.5 Deep Think is an enhanced reasoning model that is part of our Gemini 2.5 family that uses parallel thinking and reinforcement learning to test multiple hypotheses at once Gemini 2.5 Deep Think IMO 2025 results are computed as pass@1 while all the other results coming from matharena.ai are best of 32	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Deep-Think-Model-Card.pdf	2025/google/2025-08_gemini-2.5-deep-think.pdf
2025-08	Anthropic	Claude Opus 4.1	This page is displayed while the website verifies you are not a bot Check your network settings: 1	https://www.anthropic.com/news/claude-opus-4-1	2025/anthropic/2025-08_claude-opus-4.1.pdf
2025-07	xAI	Grok 4	December 2025 Copy for LLM View as Markdown Dec 16 Grok Voice Agent API is released Grok Voice Agent API is generally available November 2025 Copy for LLM View as Markdown Nov 19 Grok 4.1 Fast is available in Enterprise API You can now use Grok 4.1 Fast in the x AI Enterprise API	https://docs.x.ai/docs/release-notes	2025/xai/2025-07_grok-4.pdf
2025-07	Moonshot AI	Kimi K2: Open Agentic Intelligence	Post-training must transform those priors into actionable behaviors, yet agentic capabilities such as multi-step reasoning, long-term planning, and tool use are rare in natural data and costly to scale We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters	https://arxiv.org/pdf/2507.20534	2025/moonshot/2025-07_kimi-k2-open-agentic-intelligence.pdf
2025-07	Alibaba	Qwen3-Coder	Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team Table of Contents Introduction Key Features Basic Information Quick Start 👉🏻 Chat with Qwen3-Coder Fill in the middle with Qwen3-Coder Use Cases Example: Releasing a Website Example: Desktop Tidy Example: Zombies vs	https://github.com/QwenLM/Qwen3-Coder	2025/alibaba_qwen/2025-07_qwen3-coder.pdf
2025-07	Zhipu AI	GLM-4.5	GLM-4.5 and GLM-4.5-Air are optimized for tool invocation, web browsing, software engineering, and front-end development On charts such as SWE-Bench Verified, the GLM-4.5 series lies on the Pareto frontier for performance-to-parameter ratio, demonstrating that at the same scale, the GLM-4.5 series delivers optimal performance	https://docs.z.ai/guides/llm/glm-4.5	2025/zhipu/2025-07_glm-4.5.pdf
2025-06	Google	Gemma 3N	They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants For more information on Gemma 3n's efficient parameter management technology, see the Gemma 3n page	https://ai.google.dev/gemma/docs/gemma-3n/model_card	2025/google/2025-06_gemma-3n.pdf
2025-06	Google	Gemini 2.5 Pro	As Google’s most advanced model for complex tasks, Gemini 2.5 Pro can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and even entire code repositories 1 We’ve updated the naming convention throughout this model card to reflect that Gemini 2.5 Pro is generally available and to clearly differentiate between different Gemini 2.5 Pro versions	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Pro-Model-Card.pdf	2025/google/2025-06_gemini-2.5-pro.pdf
2025-05	ByteDance	Seed1.5-VL	Vision-language foundation model (MoE 20B active / 532M vision encoder) designed for general-purpose multimodal understanding and reasoning with enhanced visual capabilities.	https://arxiv.org/pdf/2505.07062	2025/bytedance/2025-05_seed1.5-vl.pdf
2025-05	Tencent	Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought	Hunyuan-Turbo S features an adaptive long-short chain-of-thought (Co T) mechanism, dynamically switching between rapid responses for simple queries and deep ”thinking” modes for complex problems, optimizing com- putational resources Aiming to further push these boundaries, we introduce Hunyuan-Turbo S, a large hybrid Transformer-Mamba Mixture of Experts (MoE) model	https://arxiv.org/pdf/2505.15431	2025/tencent/2025-05_hunyuan-turbos-advancing-large-language-models-through-mamba-transformer-synergy-and-adaptive-chain-of-thought.pdf
2025-04	OpenAI	o3 / o4-mini	Reasoning models combining state-of-the-art reasoning with full tool capabilities — web browsing, Python, image analysis, image generation, canvas, automations, file search, and memory.	https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf	2025/openai/2025-04_o3-o4-mini.pdf
2025-04	Meta	Llama 4 Scout/Maverick	First natively multimodal models in the Llama 4 herd; Scout features 10M token context with MoE architecture, Maverick optimized for quality and speed, both distilled from Llama 4 Behemoth.	https://ai.meta.com/blog/llama-4-multimodal-intelligence/	2025/meta/2025-04_llama-4-scout-maverick.pdf
2025-04	Google	Gemini 2.0 Flash-Lite	Gemini 2.0 Flash-Lite is Google’s most cost-e fficient model, striking a balance between efficiency and quality targeting low-cost workflows Each model within the 2.0 family, including Gemini 2.0 Flash-Lite, is carefully designed and calibrated to achieve an optimal balance between quality and performance for their speci fic downstream applications	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-0-Flash-Lite-Model-Card.pdf	2025/google/2025-04_gemini-2.0-flash-lite.pdf
2025-04	Google	Gemini 2.0 Flash	1% 3 Intended Usage and Limitations Benefit and Intended Usage: Gemini 2.0 Flash offers enhanced multimodal understanding, enabling reasoning across images, video, audio, and text Gemini 2.0 Flash improves upon the Gemini 1.5 Flash model and o ffers enhanced quality at similar speeds	https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-0-Flash-Model-Card.pdf	2025/google/2025-04_gemini-2.0-flash.pdf
2025-04	Alibaba	Qwen3	A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework Meanwhile, due to the advantage of the model architecture, the inference costs and training costs on each trillion tokens of Qwen3-235B-A22B-Base are much cheaper than those of Qwen2.5-72B-Base	https://raw.githubusercontent.com/QwenLM/Qwen3/main/Qwen3_Technical_Report.pdf	2025/alibaba_qwen/2025-04_qwen3.pdf
2025-03	Alibaba	Qwen2.5-Omni	In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultane- ously generating text and natural speech responses in a streaming manner Figure 1: Qwen2.5-Omni is a unified end-to-end model capable of processing multiple modalities, such as text, audio, image and video, and generating real-time text or speech response	https://github.com/QwenLM/Qwen2.5-Omni/raw/main/assets/Qwen2.5_Omni.pdf	2025/alibaba_qwen/2025-03_qwen2.5-omni.pdf
2025-02	Google	Gemma 2	The MoE Architecture (26B A4B): The 26B is a Mixture of Experts model This is why its baseline memory requirement is much closer to a dense 26B model than a 4B model	https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf	2025/google/2025-02_gemma-2.pdf
2025-02	Google	Gemma 1	Enhanced Coding & Agentic Capabilities: Achieves notable improvements in coding benchmarks alongside built-in function-calling support, powering highly capable autonomous agents Native System Prompt Support: Gemma 4 introduces built-in support for the system role, enabling more structured and controllable conversations	https://ai.google.dev/gemma/docs/model_card	2025/google/2025-02_gemma-1.pdf
2025-01	Alibaba	Qwen2.5-1M	Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre- training and post-training Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer	https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf	2025/alibaba_qwen/2025-01_qwen2.5-1m.pdf
2025-01	Alibaba	Qwen2.5-Max	Qwen2.5-Max is available in Qwen Chat, and you can directly chat with the model, or play with artifacts, search, etc The API of Qwen2.5-Max (whose model name is qwen-max-2025-01-25) is available	https://qwen.ai/blog?id=qwen2.5-max	2025/alibaba_qwen/2025-01_qwen2.5-max.pdf
2025-01	Alibaba	Qwen2.5-VL	Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images	https://qwen.ai/blog?id=qwen2.5-vl	2025/alibaba_qwen/2025-01_qwen2.5-vl.pdf
2025-01	InternLM	InternLM3	Remarkably, Intern LM3 is trained on only 4 trillion high-quality tokens, saving more than 75% of the training cost compared to other LLMs of similar scale Model Zoo Performance Evaluation Intern LM3 # Intern LM3-8B-Instruct # Introduction # Intern LM3 has open-sourced an 8-billion parameter instruction model, Intern LM3-8B-Instruct, designed for general-purpose usage and advanced reasoning	https://internlm.readthedocs.io/en/latest/model_card/InternLM3.html	2025/internlm/2025-01_internlm3.pdf
2025-01	DeepSeek	DeepSeek-R1	Deep Seek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super- vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities To address these issues and further enhance reasoning performance, we introduce Deep Seek-R1, which incorporates multi-stage training and cold-start data before RL	https://github.com/deepseek-ai/DeepSeek-R1/raw/main/DeepSeek_R1.pdf	2025/deepseek/2025-01_deepseek-r1.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.cursor/skills/quarterly-llm-repo-refresh		.cursor/skills/quarterly-llm-repo-refresh
2025		2025
2026		2026
assets/diagrams		assets/diagrams
pdf		pdf
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome LLM Technical Reports (2025-01 ~ 2026-04)

Table of Contents

Project Scope

Release Timeline

Company Quick Links

Company Directory Index

Model Index (Folded by Year)

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome LLM Technical Reports (2025-01 ~ 2026-04)

Table of Contents

Project Scope

Release Timeline

Company Quick Links

Company Directory Index

Model Index (Folded by Year)

Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages