Skip to content

joe1chief/awesome-llm-tech-reports

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome LLM Technical Reports (2025-01 ~ 2026-04)

A curated, structured local archive of frontier LLM / multimodal / medical-vertical model documentation — papers, system cards, model cards, and official blog posts — organized by year / company.

time range models local pdf status stars


Table of Contents

Project Scope

  • Systematically archives major model releases from January 2025 to April 2026 across LLM, multimodal, and medical-vertical domains.
  • Downloads official papers, system cards, model cards as local PDFs; exports web-only blog pages to PDF via headless browser.
  • Provides a single searchable Markdown index sorted in reverse chronological order.

Release Timeline

Legend (Camp Colors): OpenAI · Anthropic · Google · China-based Labs · Other Global
Impact Highlight: nodes with are ecosystem-shaping releases (community discussion, benchmark influence, or deployment adoption).

Release Timeline

Monthly Density Snapshot

Monthly Density Snapshot

Bubble size follows the release count from the model index table.

Company Quick Links

2026: Zhipu AI · Google · OpenAI · MiniMax · Meituan · NVIDIA · Microsoft · InternLM · Anthropic · InclusionAI (Ant Group) · Snowflake · ByteDance · Moonshot AI

2025: StepFun · Zhipu AI · MiniMax · Meituan · Allen AI · Alibaba · Google · NVIDIA · xAI · OpenAI · Anthropic · InternLM · Quark · Moonshot AI · ByteDance · Tencent · Meta · DeepSeek

Company Directory Index

  • Alibaba / Qwen: 2025/alibaba_qwen/
  • Allen AI: 2025/allenai/
  • Anthropic: 2025/anthropic/, 2026/anthropic/
  • ByteDance: 2025/bytedance/, 2026/bytedance/
  • DeepSeek: 2025/deepseek/
  • Google: 2025/google/, 2026/google/
  • InclusionAI (Ant Group): 2026/inclusionai/
  • InternLM: 2025/internlm/, 2026/internlm/
  • Meituan: 2025/meituan/, 2026/meituan/
  • Meta: 2025/meta/
  • Microsoft: 2026/microsoft/
  • MiniMax: 2025/minimax/, 2026/minimax/
  • Moonshot AI: 2025/moonshot/, 2026/moonshot/
  • NVIDIA: 2025/nvidia/, 2026/nvidia/
  • OpenAI: 2025/openai/, 2026/openai/
  • Quark (Alibaba): 2025/quark/
  • Snowflake: 2026/snowflake/
  • StepFun: 2025/stepfun/
  • Tencent: 2025/tencent/
  • Zhipu AI: 2025/zhipu/, 2026/zhipu/
  • xAI: 2025/xai/

Model Index (Folded by Year)

2026 (25 models)
Release Date Organization Model Core Highlights (from PDF) Official Link Local File
2026-04 Zhipu AI GLM-5V-Turbo multimodal coding and agentic tasks, as well as pure-text coding, GLM-5V-Turbo delivers strong performance with a smaller model size 30+ Task Joint Reinforcement Learning : During RL, the model is jointly optimized across 30+ task types, spanning STEM, grounding, video, GUI agents, and coding agents, resulting in more robust gains in perception, reasoning, and agentic execution https://docs.z.ai/guides/vlm/glm-5v-turbo 2026/zhipu/2026-04_glm-5v-turbo.pdf
2026-04 Google Gemma 4 Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding https://ai.google.dev/gemma/docs/core/model_card_4?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content 2026/google/2026-04_gemma-4.pdf
2026-03 OpenAI GPT-5.4 Thinking Frontier reasoning model that unifies recent gains in coding, agentic workflows, and deep web research, while adding high-capability cybersecurity mitigations and stronger chain-of-thought monitoring. https://deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf 2026/openai/2026-03_gpt-5.4-thinking.pdf
2026-03 OpenAI GPT-5.3 Instant General-purpose GPT-5 update tuned for richer web-grounded answers, smoother follow-up behavior, fewer dead ends and caveats, and improved everyday conversational usefulness. https://deploymentsafety.openai.com/gpt-5-3-instant/gpt-5-3-instant.pdf 2026/openai/2026-03_gpt-5.3-instant.pdf
2026-03 Google Gemini 3.1 Flash-Lite Evaluation Approach : Gemini 3.1 Flash-Lite was evaluated across a range of benchmarks, including speed, reasoning, multimodal capabilities, factuality, agentic tool use, multi-lingual performance, coding, and long-context Model dependencies: Gemini 3.1 Flash-Lite is based on Gemini 3 Pro https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Lite-Model-Card.pdf 2026/google/2026-03_gemini-3.1-flash-lite.pdf
2026-03 Google Gemini 3.1 Flash Live Real-time multimodal model with native audio input/output, 128K context, and evaluation emphasis on low-latency voice and video interactions, conversational audio understanding, and multi-step function use. https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Live-Model-Card.pdf 2026/google/2026-03_gemini-3.1-flash-live.pdf
2026-03 MiniMax MiniMax M2.7 M2.7 is our first model deeply participating in its own evolution M2.7 is capable of building complex agent harnesses and completing highly elaborate productivity tasks, leveraging capabilities such as Agent Teams, complex Skills, and dynamic tool search https://www.minimax.io/news/minimax-m27-en 2026/minimax/2026-03_minimax-m2.7.pdf
2026-03 Meituan LongCat-Next Building on this foundation, we develop LongCat-Next, a native multimodal model that processes text, vision, and audio under a single autoregressive objective with minimal modality-specific design To transcend this limitation, we introduce Discrete Native Autoregressive (DiNA), a unified framework that represents multimodal information within a shared discrete space, enabling a consistent and principled autoregressive modeling across modalities https://arxiv.org/pdf/2603.27538 2026/meituan/2026-03_longcat-next.pdf
2026-03 Meituan LongCat-Flash-Prover We introduce LongCat-Flash-Prover, a flagship 560-billion-parameter open-source Mixture-of-Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR) The overview of the training process is shown in Figure 3, it begins with an initial checkpoint derived from the LongCat Mid-train Base model, an early-stage version of our previous LongCat-Flash-Thinking-2601 https://arxiv.org/pdf/2603.21065 2026/meituan/2026-03_longcat-flash-prover.pdf
2026-03 NVIDIA Nemotron 3 Super We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model Nemotron 3 Super : Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning IFBench (Inst https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf 2026/nvidia/2026-03_nemotron-3-super.pdf
2026-03 Microsoft Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B is a compact open-weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user interfaces, as shown in Figure 1 https://arxiv.org/pdf/2603.03975 2026/microsoft/2026-03_phi-4-reasoning-vision-15b.pdf
2026-03 InternLM Intern-S1-Pro Intern- S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T to- kens, including over 2.5T tokens from scientific domains 2 Intern-S1 Technical Report In the pre-training stage, the key challenge is to prepare large-scale pre-training data for those low-resource but high-value science domains https://arxiv.org/pdf/2508.15763 2026/internlm/2026-03_intern-s1-pro.pdf
2026-03 Anthropic Claude Opus 4.6 Nico Christie Co-founder & CTO , Shortcut.ai 01 / 20 Evaluating Claude Opus 4.6 Across agentic coding, computer use, tool use, search, and finance , Opus 4.6 is an industry-leading model, often by a wide margin Claude Opus 4.6 is available today on claude.ai , our API, and all major cloud platforms https://www.anthropic.com/news/claude-opus-4-6 2026/anthropic/2026-03_claude-opus-4.6.pdf
2026-02 Zhipu AI GLM-5 Next-generation foundation model designed for agentic engineering; adopts DSA (DeepSeek Sparse Attention) on top of MoE 744B/40B with async RL to strengthen reasoning, coding, and agent capabilities. https://docs.z.ai/guides/llm/glm-5 2026/zhipu/2026-02_glm-5.pdf
2026-02 OpenAI GPT-5.3-Codex 29 2 1 Introduction GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2 As explained in the GPT-5.1-Codex-Max system card, the model is not intended for conversational use https://deploymentsafety.openai.com/gpt-5-3-codex/gpt-5-3-codex.pdf 2026/openai/2026-02_gpt-5.3-codex.pdf
2026-02 MiniMax MiniMax M2.5 Extensively RL-trained frontier model; SOTA in coding (80.2% SWE-Bench Verified), agentic tool use, and search; 37% faster than M2.1 at 100 tok/s with costs as low as $1/hour continuous operation. https://www.minimax.io/news/minimax-m25 2026/minimax/2026-02_minimax-m2.5.pdf
2026-02 InclusionAI (Ant Group) Ling 2.5 1T total / 63B active parameters with hybrid linear attention; supports up to 1M context via YaRN, features composite reward RL for efficiency-performance balance, and is compatible with mainstream agent platforms. https://github.com/inclusionAI/Ling-V2.5 2026/inclusionai/2026-02_ling-2.5.pdf
2026-02 Google Gemini 3.1 Pro Advanced sparse-MoE multimodal reasoning model with 1M context, stronger agentic coding and long-context performance than Gemini 3 Pro, and published safety assessments under Google DeepMind's Frontier Safety Framework. https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf 2026/google/2026-02_gemini-3.1-pro.pdf
2026-02 Google Gemini 3.1 Flash Image Gemini 3.1 Flash Image can comprehend input from different information sources, including text, images, audio and video 1 Model Data Training Dataset: Gemini 3.1 Flash Image is based on Gemini 3 Flash https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Image-Model-Card.pdf 2026/google/2026-02_gemini-3.1-flash-image.pdf
2026-02 Snowflake Arctic-AWM Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments However, scaling such agent training is limited by the lack of di- verse and reliable environments https://arxiv.org/pdf/2602.10090 2026/snowflake/2026-02_arctic-awm.pdf
2026-02 ByteDance MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs Med XIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities We present Med XIAOHE, a medical vision-language foundation model designed to advance general- purpose medical understanding and reasoning in real-world clinical applications https://arxiv.org/pdf/2602.12705 2026/bytedance/2026-02_medxiaohe-a-comprehensive-recipe-for-building-medical-mllms.pdf
2026-01 Zhipu AI GLM-4.7-Flash Production-Ready Performance: Built for enterprise workloads with the reliability your applications demand Key Features 🎯 All-in-One API: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more https://huggingface.co/zai-org/GLM-4.7-Flash 2026/zhipu/2026-01_glm-4.7-flash.pdf
2026-01 Google MedGemma 1.5 To our knowledge, MedGemma 1.5 is the first public release of an open multimodal large language model that can interpret high-dimensional medical data while also retaining the ability to interpret general 2D data and text MedGemma 1.5 4B improves at text-based tasks over MedGemma 1 4B, including on medical reasoning (Med QA) and electronic health record information retrieval (EHRQA) https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/ 2026/google/2026-01_medgemma-1.5.pdf
2026-01 Meituan LongCat-Flash-Thinking-2601 We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability In this work, we introduce LongCat-Flash-Thinking-2601, a powerful and efficient Mixture-of-Experts (MoE) reasoning model with 560B total parameters and 27B activated parameters on average per token, featuring strong agentic reasoning capability https://arxiv.org/pdf/2601.16725 2026/meituan/2026-01_longcat-flash-thinking-2601.pdf
2026-01 Moonshot AI Kimi K2.5 Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base Key Features Native Multimodality : Pre-trained on vision–language tokens, K2.5 excels in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs https://github.com/MoonshotAI/Kimi-K2.5 2026/moonshot/2026-01_kimi-k2.5.pdf
2025 (58 models)
Release Date Organization Model Core Highlights (from PDF) Official Link Local File
2025-12 StepFun Step-DeepResearch To address this, we introduce Step-Deep Research, a cost-effective, end-to-end Deep Research agent model Kimi-Researcher [3] supports long-horizon multi-turn search reasoning through end-to-end agentic RL training, employing context management mechanisms and asynchronous rollout systems https://arxiv.org/pdf/2512.20491 2025/stepfun/2025-12_step-deepresearch.pdf
2025-12 Zhipu AI GLM-4.7 This reduces the time developers spend on style “fine-tuning.” GLM-4.7 delivers significant upgrades in layout and aesthetics for office creation Multimodal Interaction and Real-Time Application Development In scenarios requiring cameras, real-time input, and interactive controls, GLM-4.7 demonstrates superior system-level comprehension https://docs.z.ai/guides/llm/glm-4.7 2025/zhipu/2025-12_glm-4.7.pdf
2025-12 MiniMax MiniMax M2.1 2025.12.23 MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks Access API Coding Plan Try Agent Now MiniMax has been continuously transforming itself in a more AI-native way Today we are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI- native ways of working (and living) sooner https://www.minimax.io/news/minimax-m21 2025/minimax/2025-12_minimax-m2.1.pdf
2025-12 Meituan LongCat-Image We are releasing not only multiple model versions for text-to-image and image editing, including checkpoints after mid-training and post-training stages, but also the entire toolchain of training procedure Beyond generation, LongCat-Image also excels in image editing, achieving SOTA results on standard benchmarks with superior editing consistency compared to other open-source works https://arxiv.org/pdf/2512.07584 2025/meituan/2025-12_longcat-image.pdf
2025-12 Allen AI OLMo 3 Our flagship model,Olmo 3.1 Think32B, is the strongest fully-open thinking model released to-date 2.2 Post-training We post-train Olmo 3 Baseinto three model variants: • Olmo 3 Think(Section §4) is trained to perform extended reasoning by generating a structured thinking trace before a final answer https://arxiv.org/pdf/2512.13961 2025/allenai/2025-12_olmo-3.pdf
2025-12 Alibaba Qwen-Image Based on this insight, we introduce Qwen-Image- Layered, an end-to-end diffusion model that directly de- composes a single RGB image into multiple semantically disentangled RGBA layers • 2) Unlike prior methods that decompose images into fore- ground and background [18, 45], we propose a VLD- MMDi T (Variable Layers Decomposition MMDi T), which supports decomposition into a variable number of layers and is compatible with multi-task training https://arxiv.org/pdf/2512.15603 2025/alibaba_qwen/2025-12_qwen-image.pdf
2025-12 Google Gemini 3 Flash Gemini 3 Flash is built off of the Gemini 3 Pro reasoning foundation with thinking levels to control the mix of quality, cost and latency Model dependencies: Gemini 3 Flash is based on Gemini 3 Pro https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf 2025/google/2025-12_gemini-3-flash.pdf
2025-11 Alibaba Qwen3-VL Most capable vision-language model in the Qwen series; natively supports interleaved contexts up to 256K tokens, seamlessly integrating text, images, and video for multimodal reasoning. https://arxiv.org/pdf/2511.21631 2025/alibaba_qwen/2025-11_qwen3-vl.pdf
2025-11 NVIDIA Nemotron 3 Nano 4B The accuracy shown is the average across all benchmarks: MATH-500, AIME-2024, AIME-2025, GPQA, Live Code Bench v5, and MMLU-Pro.Right: Scaling analysis comparing Nemotron Elastic and Minitron-SSM as model family size grows We validate our approach by training elastic vari- ants of Nemotron Nano V2 12B reasoning model [14], producing both homogeneous and heterogeneous 9B configurations plus a 6B variant, all from a single training run https://arxiv.org/pdf/2511.16664 2025/nvidia/2025-11_nemotron-3-nano-4b.pdf
2025-11 xAI Grok 4.1 Fast November 2025 Copy for LLM View as Markdown Nov 19 Grok 4.1 Fast is available in Enterprise API You can now use Grok 4.1 Fast in the x AI Enterprise API October 2025 Copy for LLM View as Markdown Oct 15 Tools are now generally available New agentic server-side tools including web_search , x_search and code_execution are available https://docs.x.ai/docs/release-notes 2025/xai/2025-11_grok-4.1-fast.pdf
2025-11 OpenAI GPT-5.1-Codex-Max 26 2 1 Introduction GPT-5.1-Codex-Max is our new frontier agentic coding model 3 2 Baseline model safety evaluations 3 2.1 Disallowed content evaluations https://cdn.openai.com/pdf/2a7d98b1-57e5-4147-8d0e-683894d782ae/5p1_codex_max_card_03.pdf 2025/openai/2025-11_gpt-5.1-codex-max.pdf
2025-11 Anthropic Claude Haiku 4.5 Guy Gur-Ari Co-Founder Claude Haiku 4.5 is a leap forward for agentic coding , particularly for sub-agent orchestration and computer use tasks Claude Haiku 4.5 is available everywhere today https://www.anthropic.com/news/claude-haiku-4-5 2025/anthropic/2025-11_claude-haiku-4.5.pdf
2025-11 Anthropic Claude Sonnet 4.5 Claude Sonnet 4.5 is the best coding model in the world The model also shows improved capabilities on a broad range of evaluations including reasoning and math: Claude Sonnet 4.5 is our most powerful model to date https://www.anthropic.com/news/claude-sonnet-4-5 2025/anthropic/2025-11_claude-sonnet-4.5.pdf
2025-11 Google Gemini 3 Pro Image Gemini 3 Pro Image is now Google’s most advanced model for image generation and can comprehend vast datasets, challenging problems from different information sources, including text and images Model dependencies: Gemini 3 Pro Image is based on Gemini 3 Pro https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Image-Model-Card.pdf 2025/google/2025-11_gemini-3-pro-image.pdf
2025-11 Google Gemini 3 Pro Gemini 3 Pro is now Google’s most advanced model for complex tasks, and can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and entire code repositories Gemini 3 Pro is trained using reinforcement learning techniques that can leverage multi-step reasoning, problem-solving and theorem-proving data https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf 2025/google/2025-11_gemini-3-pro.pdf
2025-10 Meituan LongCat-Flash-Omni Open-source omni-modal 560B model (27B activated) optimized for low-latency real-time audio-visual interaction; uses curriculum-inspired progressive multimodal training with modality-decoupled parallelism sustaining over 90% of text-only training throughput. https://arxiv.org/pdf/2511.00279 2025/meituan/2025-10_longcat-flash-omni.pdf
2025-10 MiniMax MiniMax M2 2025.10.27 MiniMax M2 & Agent: Ingenious in Sim plicity Access API Coding Plan Try Agent Now From Day 1 of our founding, we have been committed to the vision of " https://www.minimax.io/news/minimax-m2 2025/minimax/2025-10_minimax-m2.pdf
2025-10 Meituan LongCat-Video Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video generation tasks In this report, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters that delivers strong performance across general video generation tasks, particularly excelling in efficient, high-quality long video generation https://arxiv.org/pdf/2510.22200 2025/meituan/2025-10_longcat-video.pdf
2025-10 InternLM Intern-S1 Intern- S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T to- kens, including over 2.5T tokens from scientific domains 2 Intern-S1 Technical Report In the pre-training stage, the key challenge is to prepare large-scale pre-training data for those low-resource but high-value science domains https://arxiv.org/pdf/2508.15763 2025/internlm/2025-10_intern-s1.pdf
2025-10 Google Gemini 2.5 Computer Use Ethics and Safety Evaluation Approach: As the Gemini 2.5 Computer Use Model is based off of Gemini 2.5 Pro, we rely on Ethics & Safety evaluations reported for Gemini 2.5 Pro Frontier Safety Assessment: Because model usage is restricted to the Gemini 2.5 Computer Use tool, the scope of capabilities is limited to browser and mobile user interface controls; it is therefore not in scope for a Frontier Safety Framework assessment https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Computer-Use-Model-Card.pdf 2025/google/2025-10_gemini-2.5-computer-use.pdf
2025-09 Meituan LongCat-Flash 560B MoE language model designed for computational efficiency and agentic capabilities; introduces Zero-computation Experts and novel routing for scalable inference. https://arxiv.org/pdf/2509.01322 2025/meituan/2025-09_longcat-flash.pdf
2025-09 Meituan LongCat-Flash-Thinking Efficient 560B MoE reasoning model built on LongCat-Flash; cultivated through long CoT data cold-start and curriculum RL for formal and agentic reasoning. https://arxiv.org/pdf/2509.18883 2025/meituan/2025-09_longcat-flash-thinking.pdf
2025-09 Alibaba Qwen3-Omni We present Qwen3-Omni, a single multimodal model that for the first time maintains state-of-the-art performance across text, image, audio, and video without any degra- dation relative to single-modal counterparts Based on these features, Qwen3-Omni supports a wide range of tasks, including but not limited to voice dialogue, video dialogue, and video reasoning https://arxiv.org/pdf/2509.17765 2025/alibaba_qwen/2025-09_qwen3-omni.pdf
2025-09 Google Gemini 2.5 Flash-Lite Gemini 2.5 Flash-Lite is an addition to our hybrid reasoning model family, giving developers the ability to turn a model's thinking on or off This model offers improved performance compared to 2.0 Flash-Lite, with strong results in coding, math, science, and reasoning benchmarks https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Flash-Lite-Model-Card.pdf 2025/google/2025-09_gemini-2.5-flash-lite.pdf
2025-09 Google Gemini 2.5 Flash and Gemini 2.5 Flash Image image and audio) as additional outputs of Gemini 2.5 Flash; information specific to these modalities is specified in line (i.e Gemini 2.5 Flash is Google’s first fully hybrid reasoning model, giving developers the ability to turn a model’s thinking on or off https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Flash-Model-Card.pdf 2025/google/2025-09_gemini-2.5-flash-and-gemini-2.5-flash-image.pdf
2025-09 Alibaba Qwen3.5 Under the 32k/256k context length, the decoding throughput of Qwen3.5-397B-A17B is 8.6x/19.0x that of Qwen3-Max, and the performance is comparable The decoding throughput of Qwen3.5-397B-A17B is 3.5x/7.2 times that of Qwen3-235B-A22B https://qwen.ai/blog?id=qwen3.5 2025/alibaba_qwen/2025-09_qwen3.5.pdf
2025-09 Alibaba Qwen3-Next Post-training Instruct Model Performance Qwen3-Next-80B-A3B-Instruct significantly outperforms Qwen3-30B-A3B-Instruct-2507 and Qwen3- 32B-Non-thinking, and achieves results nearly matching our flagship Qwen3-235B-A22B- Instruct-2507 This base model achieves performance comparable to (or even slightly better than) the dense Qwen3-32B model, while using less than 10% of its training cost (GPU hours) https://qwen.ai/blog?id=qwen3-next 2025/alibaba_qwen/2025-09_qwen3-next.pdf
2025-09 Alibaba Qwen3-Max Meanwhile, Qwen3-Max-Thinking — still under active training — is already demonstrating remarkable potential Moreover, on Tau2-Bench — a rigorous evaluation of agent tool-calling proficiency — Qwen3-Max-Instruct delivers a breakthrough score of 74.8, surpassing both Claude Opus 4 and Deep Seek V3.1 https://qwen.ai/blog?id=qwen3-max 2025/alibaba_qwen/2025-09_qwen3-max.pdf
2025-08 OpenAI GPT-5 Unified system card covering multi-model routing architecture and comprehensive safety evaluations across the GPT-5 model family including reasoning and tool-use capabilities. https://cdn.openai.com/gpt-5-system-card.pdf 2025/openai/2025-08_gpt-5.pdf
2025-08 OpenAI gpt-oss-120b/20b Apache 2.0 open-weight MoE models (120B and 20B); model card covers architecture, quantization, and post-training for reasoning and tool use. https://deploymentsafety.openai.com/gpt-oss 2025/openai/2025-08_gpt-oss-120b-20b.pdf
2025-08 Google Gemma 3 270M Gemma 3 270M is its low power consumption For example, check out this Bedtime Story Generator web app : Link to Youtube Video (visible only when JS is disabled) Gemma 3 270M used to power a Bedtime Story Generator web app using Transformers.js https://developers.googleblog.com/en/introducing-gemma-3-270m/ 2025/google/2025-08_gemma-3-270m.pdf
2025-08 InternLM Intern-S1-mini We introduce Intern-S1-mini, a lightweight open-source multimodal reasoning model based on the same techniques as Intern-S1 Built upon an 8B dense language model (Qwen3) and a 0.3B Vision encoder (Intern Vi T), Intern-S1-mini has been further pretrained on 5 trillion tokens of multimodal data, including over 2.5 trillion scientific-domain tokens https://huggingface.co/internlm/Intern-S1-mini 2025/internlm/2025-08_intern-s1-mini.pdf
2025-08 Anthropic Claude Opus 4.5 For instance, the open-source model Search-R1, when paired with the BM25 retriever, achieves 3.86% accuracy, whereas the GPT-5 achieves 55.9% Integrating the GPT-5 with the Qwen3-Embedding-8B retriever further enhances its accuracy to 70.1% with fewer search calls https://arxiv.org/pdf/2508.06600 2025/anthropic/2025-08_claude-opus-4.5.pdf
2025-08 Quark (Alibaba) QuarkMed Medical Foundation Model This report introduces Quark Med, a medical foundation model designed to meet these demands Unlike general-domain text, medical language is characterized by a highly specialized vocabulary, complex clinical concepts, and a nuanced syntax that is often ambiguous and context-dependent https://arxiv.org/pdf/2508.11894 2025/quark/2025-08_quarkmed-medical-foundation-model.pdf
2025-08 Google Gemma 3 Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions https://ai.google.dev/gemma/docs/core/model_card_3 2025/google/2025-08_gemma-3.pdf
2025-08 Google Gemini 2.5 Deep Think Gemini 2.5 Deep Think is an enhanced reasoning model that is part of our Gemini 2.5 family that uses parallel thinking and reinforcement learning to test multiple hypotheses at once Gemini 2.5 Deep Think IMO 2025 results are computed as pass@1 while all the other results coming from matharena.ai are best of 32 https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Deep-Think-Model-Card.pdf 2025/google/2025-08_gemini-2.5-deep-think.pdf
2025-08 Anthropic Claude Opus 4.1 This page is displayed while the website verifies you are not a bot Check your network settings: 1 https://www.anthropic.com/news/claude-opus-4-1 2025/anthropic/2025-08_claude-opus-4.1.pdf
2025-07 xAI Grok 4 December 2025 Copy for LLM View as Markdown Dec 16 Grok Voice Agent API is released Grok Voice Agent API is generally available November 2025 Copy for LLM View as Markdown Nov 19 Grok 4.1 Fast is available in Enterprise API You can now use Grok 4.1 Fast in the x AI Enterprise API https://docs.x.ai/docs/release-notes 2025/xai/2025-07_grok-4.pdf
2025-07 Moonshot AI Kimi K2: Open Agentic Intelligence Post-training must transform those priors into actionable behaviors, yet agentic capabilities such as multi-step reasoning, long-term planning, and tool use are rare in natural data and costly to scale We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters https://arxiv.org/pdf/2507.20534 2025/moonshot/2025-07_kimi-k2-open-agentic-intelligence.pdf
2025-07 Alibaba Qwen3-Coder Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team Table of Contents Introduction Key Features Basic Information Quick Start 👉🏻 Chat with Qwen3-Coder Fill in the middle with Qwen3-Coder Use Cases Example: Releasing a Website Example: Desktop Tidy Example: Zombies vs https://github.com/QwenLM/Qwen3-Coder 2025/alibaba_qwen/2025-07_qwen3-coder.pdf
2025-07 Zhipu AI GLM-4.5 GLM-4.5 and GLM-4.5-Air are optimized for tool invocation, web browsing, software engineering, and front-end development On charts such as SWE-Bench Verified, the GLM-4.5 series lies on the Pareto frontier for performance-to-parameter ratio, demonstrating that at the same scale, the GLM-4.5 series delivers optimal performance https://docs.z.ai/guides/llm/glm-4.5 2025/zhipu/2025-07_glm-4.5.pdf
2025-06 Google Gemma 3N They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants For more information on Gemma 3n's efficient parameter management technology, see the Gemma 3n page https://ai.google.dev/gemma/docs/gemma-3n/model_card 2025/google/2025-06_gemma-3n.pdf
2025-06 Google Gemini 2.5 Pro As Google’s most advanced model for complex tasks, Gemini 2.5 Pro can comprehend vast datasets and challenging problems from different information sources, including text, audio, images, video, and even entire code repositories 1 We’ve updated the naming convention throughout this model card to reflect that Gemini 2.5 Pro is generally available and to clearly differentiate between different Gemini 2.5 Pro versions https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Pro-Model-Card.pdf 2025/google/2025-06_gemini-2.5-pro.pdf
2025-05 ByteDance Seed1.5-VL Vision-language foundation model (MoE 20B active / 532M vision encoder) designed for general-purpose multimodal understanding and reasoning with enhanced visual capabilities. https://arxiv.org/pdf/2505.07062 2025/bytedance/2025-05_seed1.5-vl.pdf
2025-05 Tencent Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Hunyuan-Turbo S features an adaptive long-short chain-of-thought (Co T) mechanism, dynamically switching between rapid responses for simple queries and deep ”thinking” modes for complex problems, optimizing com- putational resources Aiming to further push these boundaries, we introduce Hunyuan-Turbo S, a large hybrid Transformer-Mamba Mixture of Experts (MoE) model https://arxiv.org/pdf/2505.15431 2025/tencent/2025-05_hunyuan-turbos-advancing-large-language-models-through-mamba-transformer-synergy-and-adaptive-chain-of-thought.pdf
2025-04 OpenAI o3 / o4-mini Reasoning models combining state-of-the-art reasoning with full tool capabilities — web browsing, Python, image analysis, image generation, canvas, automations, file search, and memory. https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf 2025/openai/2025-04_o3-o4-mini.pdf
2025-04 Meta Llama 4 Scout/Maverick First natively multimodal models in the Llama 4 herd; Scout features 10M token context with MoE architecture, Maverick optimized for quality and speed, both distilled from Llama 4 Behemoth. https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 2025/meta/2025-04_llama-4-scout-maverick.pdf
2025-04 Google Gemini 2.0 Flash-Lite Gemini 2.0 Flash-Lite is Google’s most cost-e fficient model, striking a balance between efficiency and quality targeting low-cost workflows Each model within the 2.0 family, including Gemini 2.0 Flash-Lite, is carefully designed and calibrated to achieve an optimal balance between quality and performance for their speci fic downstream applications https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-0-Flash-Lite-Model-Card.pdf 2025/google/2025-04_gemini-2.0-flash-lite.pdf
2025-04 Google Gemini 2.0 Flash 1% 3 Intended Usage and Limitations Benefit and Intended Usage: Gemini 2.0 Flash offers enhanced multimodal understanding, enabling reasoning across images, video, audio, and text Gemini 2.0 Flash improves upon the Gemini 1.5 Flash model and o ffers enhanced quality at similar speeds https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-0-Flash-Model-Card.pdf 2025/google/2025-04_gemini-2.0-flash.pdf
2025-04 Alibaba Qwen3 A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework Meanwhile, due to the advantage of the model architecture, the inference costs and training costs on each trillion tokens of Qwen3-235B-A22B-Base are much cheaper than those of Qwen2.5-72B-Base https://raw.githubusercontent.com/QwenLM/Qwen3/main/Qwen3_Technical_Report.pdf 2025/alibaba_qwen/2025-04_qwen3.pdf
2025-03 Alibaba Qwen2.5-Omni In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultane- ously generating text and natural speech responses in a streaming manner Figure 1: Qwen2.5-Omni is a unified end-to-end model capable of processing multiple modalities, such as text, audio, image and video, and generating real-time text or speech response https://github.com/QwenLM/Qwen2.5-Omni/raw/main/assets/Qwen2.5_Omni.pdf 2025/alibaba_qwen/2025-03_qwen2.5-omni.pdf
2025-02 Google Gemma 2 The MoE Architecture (26B A4B): The 26B is a Mixture of Experts model This is why its baseline memory requirement is much closer to a dense 26B model than a 4B model https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf 2025/google/2025-02_gemma-2.pdf
2025-02 Google Gemma 1 Enhanced Coding & Agentic Capabilities: Achieves notable improvements in coding benchmarks alongside built-in function-calling support, powering highly capable autonomous agents Native System Prompt Support: Gemma 4 introduces built-in support for the system role, enabling more structured and controllable conversations https://ai.google.dev/gemma/docs/model_card 2025/google/2025-02_gemma-1.pdf
2025-01 Alibaba Qwen2.5-1M Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre- training and post-training Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf 2025/alibaba_qwen/2025-01_qwen2.5-1m.pdf
2025-01 Alibaba Qwen2.5-Max Qwen2.5-Max is available in Qwen Chat, and you can directly chat with the model, or play with artifacts, search, etc The API of Qwen2.5-Max (whose model name is qwen-max-2025-01-25) is available https://qwen.ai/blog?id=qwen2.5-max 2025/alibaba_qwen/2025-01_qwen2.5-max.pdf
2025-01 Alibaba Qwen2.5-VL Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images https://qwen.ai/blog?id=qwen2.5-vl 2025/alibaba_qwen/2025-01_qwen2.5-vl.pdf
2025-01 InternLM InternLM3 Remarkably, Intern LM3 is trained on only 4 trillion high-quality tokens, saving more than 75% of the training cost compared to other LLMs of similar scale Model Zoo Performance Evaluation Intern LM3 # Intern LM3-8B-Instruct # Introduction # Intern LM3 has open-sourced an 8-billion parameter instruction model, Intern LM3-8B-Instruct, designed for general-purpose usage and advanced reasoning https://internlm.readthedocs.io/en/latest/model_card/InternLM3.html 2025/internlm/2025-01_internlm3.pdf
2025-01 DeepSeek DeepSeek-R1 Deep Seek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super- vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities To address these issues and further enhance reasoning performance, we introduce Deep Seek-R1, which incorporates multi-stage training and cold-start data before RL https://github.com/deepseek-ai/DeepSeek-R1/raw/main/DeepSeek_R1.pdf 2025/deepseek/2025-01_deepseek-r1.pdf

Star History

Star History Chart

Releases

No releases published

Packages

 
 
 

Contributors