Inference models https://lambda.ai/inference-models Lambda's catalog of model cards for the LLMs that matter. Search by model name to get architecture breakdowns, hardware requirements, deployment guides, and throughput benchmarks on NVIDIA GPUs. en Thu, 12 Mar 2026 16:48:32 GMT 2026-03-12T16:48:32Z en nvidia/NVIDIA-Nemotron-3-Super-120B-A12B https://lambda.ai/inference-models/nvidia/nvidia-nemotron-3-super-120b-a12b <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/nvidia/nvidia-nemotron-3-super-120b-a12b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-nemotron-3-super-on-lambda-1773333896548.png" alt="How to deploy Nemotron 3 Super on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <h3>vLLM</h3> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPUs (NVFP4)</td> <td>2,057 tok/s</td> <td>4,040ms</td> <td>12ms</td> </tr> <tr> <td>1× NVIDIA B200 GPU (NVFP4)</td> <td>1,517 tok/s</td> <td>4,455ms</td> <td>16ms</td> </tr> <tr> <td>2× NVIDIA B200 GPUs (FP8)</td> <td>1,847 tok/s</td> <td>3,948ms</td> <td>13ms</td> </tr> <tr> <td>2× NVIDIA H100 GPUs (FP8)</td> <td>1,116 tok/s</td> <td>4,557ms</td> <td>24ms</td> </tr> <tr> <td>4× NVIDIA A100 GPUs (BF16)</td> <td>553 tok/s</td> <td>6,694ms</td> <td>51ms</td> </tr> </tbody> </table> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/nvidia/nvidia-nemotron-3-super-120b-a12b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-nemotron-3-super-on-lambda-1773333896548.png" alt="How to deploy Nemotron 3 Super on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <h3>vLLM</h3> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPUs (NVFP4)</td> <td>2,057 tok/s</td> <td>4,040ms</td> <td>12ms</td> </tr> <tr> <td>1× NVIDIA B200 GPU (NVFP4)</td> <td>1,517 tok/s</td> <td>4,455ms</td> <td>16ms</td> </tr> <tr> <td>2× NVIDIA B200 GPUs (FP8)</td> <td>1,847 tok/s</td> <td>3,948ms</td> <td>13ms</td> </tr> <tr> <td>2× NVIDIA H100 GPUs (FP8)</td> <td>1,116 tok/s</td> <td>4,557ms</td> <td>24ms</td> </tr> <tr> <td>4× NVIDIA A100 GPUs (BF16)</td> <td>553 tok/s</td> <td>6,694ms</td> <td>51ms</td> </tr> </tbody> </table> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fnvidia%2Fnvidia-nemotron-3-super-120b-a12b&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Thu, 12 Mar 2026 16:48:32 GMT https://lambda.ai/inference-models/nvidia/nvidia-nemotron-3-super-120b-a12b 2026-03-12T16:48:32Z Lambda allenai/Olmo-Hybrid-Instruct-DPO-7B https://lambda.ai/inference-models/allenai/olmo-hybrid-instruct-dpo-7b <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/allenai/olmo-hybrid-instruct-dpo-7b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-olmo-hybrid-7b-on-lambda-1772833629145.png" alt="How to deploy OLMo Hybrid 7B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput on vLLM</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA B200 GPU</td> <td>1,765 tok/s</td> <td>4,424ms</td> <td>14ms</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>1,066 tok/s</td> <td>4,665ms</td> <td>25ms</td> </tr> <tr> <td>1× NVIDIA A100 GPU</td> <td>551 tok/s</td> <td>7,191ms</td> <td>51ms</td> </tr> </tbody> </table> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/allenai/olmo-hybrid-instruct-dpo-7b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-olmo-hybrid-7b-on-lambda-1772833629145.png" alt="How to deploy OLMo Hybrid 7B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput on vLLM</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA B200 GPU</td> <td>1,765 tok/s</td> <td>4,424ms</td> <td>14ms</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>1,066 tok/s</td> <td>4,665ms</td> <td>25ms</td> </tr> <tr> <td>1× NVIDIA A100 GPU</td> <td>551 tok/s</td> <td>7,191ms</td> <td>51ms</td> </tr> </tbody> </table> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fallenai%2Folmo-hybrid-instruct-dpo-7b&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Fri, 06 Mar 2026 21:49:02 GMT https://lambda.ai/inference-models/allenai/olmo-hybrid-instruct-dpo-7b 2026-03-06T21:49:02Z Lambda Qwen/Qwen3.5-122B-A10B https://lambda.ai/inference-models/qwen/qwen3.5-122b-a10b <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/qwen/qwen3.5-122b-a10b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-qwen3-5-122b-a10b-on-lambda-1772129577550.png" alt="How to deploy Qwen3.5-122B-A10B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>4× B200</td> <td>2,197 tok/s</td> <td>1,156ms</td> <td>13ms</td> </tr> <tr> <td>8× H100</td> <td>1,585 tok/s</td> <td>2,613ms</td> <td>18ms</td> </tr> <tr> <td>8× A100</td> <td>930 tok/s</td> <td>4,602ms</td> <td>30ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>4× B200</td> <td>1,817 tok/s</td> <td>4,904ms</td> <td>13ms</td> </tr> <tr> <td>8× H100</td> <td>1,843 tok/s</td> <td>1,060ms</td> <td>16ms</td> </tr> <tr> <td>8× A100</td> <td>744 tok/s</td> <td>7,612ms</td> <td>35ms</td> </tr> </tbody> </table> </div> </div> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/qwen/qwen3.5-122b-a10b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-qwen3-5-122b-a10b-on-lambda-1772129577550.png" alt="How to deploy Qwen3.5-122B-A10B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>4× B200</td> <td>2,197 tok/s</td> <td>1,156ms</td> <td>13ms</td> </tr> <tr> <td>8× H100</td> <td>1,585 tok/s</td> <td>2,613ms</td> <td>18ms</td> </tr> <tr> <td>8× A100</td> <td>930 tok/s</td> <td>4,602ms</td> <td>30ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>4× B200</td> <td>1,817 tok/s</td> <td>4,904ms</td> <td>13ms</td> </tr> <tr> <td>8× H100</td> <td>1,843 tok/s</td> <td>1,060ms</td> <td>16ms</td> </tr> <tr> <td>8× A100</td> <td>744 tok/s</td> <td>7,612ms</td> <td>35ms</td> </tr> </tbody> </table> </div> </div> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fqwen%2Fqwen3.5-122b-a10b&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Thu, 26 Feb 2026 18:14:52 GMT https://lambda.ai/inference-models/qwen/qwen3.5-122b-a10b 2026-02-26T18:14:52Z Lambda Qwen/Qwen3-Coder-Next https://lambda.ai/inference-models/qwen/qwen3-coder-next <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/qwen/qwen3-coder-next" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-qwen3-coder-next-on-lambda-1772129576923.png" alt="How to deploy Qwen3-Coder-Next on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPUs</td> <td>1,877 tok/s</td> <td>1,330ms</td> <td>16ms</td> </tr> <tr> <td>4× NVIDIA H100 GPUs</td> <td>1,810 tok/s</td> <td>1,960ms</td> <td>16ms</td> </tr> <tr> <td>4× NVIDIA A100 GPUs</td> <td>1,069 tok/s</td> <td>3,969ms</td> <td>26ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPUs</td> <td>1,721 tok/s</td> <td>4,602ms</td> <td>14ms</td> </tr> <tr> <td>4× NVIDIA H100 GPUs</td> <td>2,180 tok/s</td> <td>933ms</td> <td>14ms</td> </tr> <tr> <td>4× NVIDIA A100 GPUs</td> <td>851 tok/s</td> <td>6,997ms</td> <td>31ms</td> </tr> </tbody> </table> </div> </div> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/qwen/qwen3-coder-next" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-qwen3-coder-next-on-lambda-1772129576923.png" alt="How to deploy Qwen3-Coder-Next on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPUs</td> <td>1,877 tok/s</td> <td>1,330ms</td> <td>16ms</td> </tr> <tr> <td>4× NVIDIA H100 GPUs</td> <td>1,810 tok/s</td> <td>1,960ms</td> <td>16ms</td> </tr> <tr> <td>4× NVIDIA A100 GPUs</td> <td>1,069 tok/s</td> <td>3,969ms</td> <td>26ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPUs</td> <td>1,721 tok/s</td> <td>4,602ms</td> <td>14ms</td> </tr> <tr> <td>4× NVIDIA H100 GPUs</td> <td>2,180 tok/s</td> <td>933ms</td> <td>14ms</td> </tr> <tr> <td>4× NVIDIA A100 GPUs</td> <td>851 tok/s</td> <td>6,997ms</td> <td>31ms</td> </tr> </tbody> </table> </div> </div> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fqwen%2Fqwen3-coder-next&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Thu, 26 Feb 2026 18:14:51 GMT https://lambda.ai/inference-models/qwen/qwen3-coder-next 2026-02-26T18:14:51Z Lambda Nanbeige/Nanbeige4.1-3B https://lambda.ai/inference-models/nanbeige/nanbeige4.1-3b <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/nanbeige/nanbeige4.1-3b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-nanbeige4-1-3b-on-lambda-1772129579017.png" alt="How to deploy Nanbeige4.1-3B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA B200 GPU</td> <td>4,547 tok/s</td> <td>766ms</td> <td>6ms</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>2,381 tok/s</td> <td>1,619ms</td> <td>12ms</td> </tr> <tr> <td>1× NVIDIA A100 GPU</td> <td>1,174 tok/s</td> <td>3,830ms</td> <td>29ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA B200 GPU</td> <td>4,806 tok/s</td> <td>526ms</td> <td>6ms</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>2,472 tok/s</td> <td>822ms</td> <td>12ms</td> </tr> <tr> <td>1× NVIDIA A100 GPU</td> <td>1,050 tok/s</td> <td>1,480ms</td> <td>29ms</td> </tr> </tbody> </table> </div> </div> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/nanbeige/nanbeige4.1-3b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-nanbeige4-1-3b-on-lambda-1772129579017.png" alt="How to deploy Nanbeige4.1-3B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA B200 GPU</td> <td>4,547 tok/s</td> <td>766ms</td> <td>6ms</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>2,381 tok/s</td> <td>1,619ms</td> <td>12ms</td> </tr> <tr> <td>1× NVIDIA A100 GPU</td> <td>1,174 tok/s</td> <td>3,830ms</td> <td>29ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA B200 GPU</td> <td>4,806 tok/s</td> <td>526ms</td> <td>6ms</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>2,472 tok/s</td> <td>822ms</td> <td>12ms</td> </tr> <tr> <td>1× NVIDIA A100 GPU</td> <td>1,050 tok/s</td> <td>1,480ms</td> <td>29ms</td> </tr> </tbody> </table> </div> </div> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fnanbeige%2Fnanbeige4.1-3b&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Thu, 26 Feb 2026 18:14:50 GMT https://lambda.ai/inference-models/nanbeige/nanbeige4.1-3b 2026-02-26T18:14:50Z Lambda Qwen/Qwen3.5-397B-A17B https://lambda.ai/inference-models/qwen/qwen3.5-397b-a17b <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/qwen/qwen3.5-397b-a17b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-qwen3-5-397b-a17b-on-lambda-1772129577611.png" alt="How to deploy Qwen3.5-397B-A17B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>8× B200</td> <td>1,269 tok/s</td> <td>1,943ms</td> <td>23ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>8× B200</td> <td>1,268 tok/s</td> <td>5,024ms</td> <td>20ms</td> </tr> </tbody> </table> </div> </div> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/qwen/qwen3.5-397b-a17b" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-qwen3-5-397b-a17b-on-lambda-1772129577611.png" alt="How to deploy Qwen3.5-397B-A17B on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput</h2> <div class="tab-container"> <div class="tab-buttons"> SGLang vLLM </div> <div class="tab-content active"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>8× B200</td> <td>1,269 tok/s</td> <td>1,943ms</td> <td>23ms</td> </tr> </tbody> </table> </div> </div> <div class="tab-content"> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware</th> <th>Gen. throughput</th> <th>TTFT</th> <th>ITL</th> </tr> </thead> <tbody> <tr> <td>8× B200</td> <td>1,268 tok/s</td> <td>5,024ms</td> <td>20ms</td> </tr> </tbody> </table> </div> </div> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fqwen%2Fqwen3.5-397b-a17b&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Thu, 26 Feb 2026 18:14:49 GMT https://lambda.ai/inference-models/qwen/qwen3.5-397b-a17b 2026-02-26T18:14:49Z Lambda zai-org/GLM-5 https://lambda.ai/inference-models/zai-org/glm-5 <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/zai-org/glm-5" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-glm-5-on-lambda-1771375311603.png" alt="How to deploy GLM-5 on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>NVIDIA HGX B200</td> <td>700</td> <td>6,300</td> <td>1,662</td> <td>103</td> <td>256</td> <td>4,194,304</td> <td>524,288</td> <td>32</td> </tr> </tbody> </table> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/zai-org/glm-5" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-glm-5-on-lambda-1771375311603.png" alt="How to deploy GLM-5 on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>NVIDIA HGX B200</td> <td>700</td> <td>6,300</td> <td>1,662</td> <td>103</td> <td>256</td> <td>4,194,304</td> <td>524,288</td> <td>32</td> </tr> </tbody> </table> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fzai-org%2Fglm-5&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Wed, 18 Feb 2026 00:50:01 GMT https://lambda.ai/inference-models/zai-org/glm-5 2026-02-18T00:50:01Z Lambda zai-org/GLM-4.7-Flash https://lambda.ai/inference-models/zai-org/glm-4.7-flash <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/zai-org/glm-4.7-flash" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-glm-4-7-flash-on-lambda-1771375310274.png" alt="How to deploy GLM-4.7-Flash on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA Blackwell B200 GPU</td> <td>902.74</td> <td>8,124.65</td> <td>6,170.78</td> <td>30.61</td> <td>256</td> <td>2,097,152</td> <td>262,144</td> <td>32</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>660.67</td> <td>5,946.05</td> <td>20,087.41</td> <td>27.24</td> <td>256</td> <td>2,097,152</td> <td>262,144</td> <td>32</td> </tr> </tbody> </table> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/zai-org/glm-4.7-flash" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-glm-4-7-flash-on-lambda-1771375310274.png" alt="How to deploy GLM-4.7-Flash on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>1× NVIDIA Blackwell B200 GPU</td> <td>902.74</td> <td>8,124.65</td> <td>6,170.78</td> <td>30.61</td> <td>256</td> <td>2,097,152</td> <td>262,144</td> <td>32</td> </tr> <tr> <td>1× NVIDIA H100 GPU</td> <td>660.67</td> <td>5,946.05</td> <td>20,087.41</td> <td>27.24</td> <td>256</td> <td>2,097,152</td> <td>262,144</td> <td>32</td> </tr> </tbody> </table> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fzai-org%2Fglm-4.7-flash&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Wed, 18 Feb 2026 00:49:35 GMT https://lambda.ai/inference-models/zai-org/glm-4.7-flash 2026-02-18T00:49:35Z Lambda arcee-ai/Trinity-Large-Preview https://lambda.ai/inference-models/arcee-ai/trinity-large-preview <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/arcee-ai/trinity-large-preview" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-trinity-large-preview-on-lambda-1771375312470.png" alt="How to deploy Trinity Large Preview on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>NVIDIA HGX B200</td> <td>1,735</td> <td>15,611</td> <td>1,850</td> <td>17</td> <td>256</td> <td>2,097,152</td> <td>262,144</td> <td>32</td> </tr> </tbody> </table> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/arcee-ai/trinity-large-preview" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-trinity-large-preview-on-lambda-1771375312470.png" alt="How to deploy Trinity Large Preview on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>NVIDIA HGX B200</td> <td>1,735</td> <td>15,611</td> <td>1,850</td> <td>17</td> <td>256</td> <td>2,097,152</td> <td>262,144</td> <td>32</td> </tr> </tbody> </table> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Farcee-ai%2Ftrinity-large-preview&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Wed, 18 Feb 2026 00:46:51 GMT https://lambda.ai/inference-models/arcee-ai/trinity-large-preview 2026-02-18T00:46:51Z Lambda MiniMaxAI/MiniMax-M2.5 https://lambda.ai/inference-models/minimaxai/minimax-m2.5 <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/minimaxai/minimax-m2.5" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-minimax-m2-5-on-lambda-1770998118656.png" alt="How to deploy MiniMax M2.5 on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPU</td> <td>896</td> <td>8,062</td> <td>3,091</td> <td>36</td> <td>512</td> <td>4,194,304</td> <td>524,288</td> <td>32</td> </tr> <tr> <td>4× NVIDIA H100 GPU</td> <td>849</td> <td>7,644</td> <td>13,131</td> <td>27</td> <td>512</td> <td>4,194,304</td> <td>524,288</td> <td>32</td> </tr> </tbody> </table> </div> <div class="hs-featured-image-wrapper"> <a href="proxy.php?url=https://lambda.ai/inference-models/minimaxai/minimax-m2.5" title="" class="hs-featured-image-link"> <img src="proxy.php?url=https://lambda.ai/hubfs/web-static/images/llm-pages/llm-how-to-deploy-minimax-m2-5-on-lambda-1770998118656.png" alt="How to deploy MiniMax M2.5 on Lambda featured image" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"> </a> </div> <h2>TL;DR: token throughput (SGLang)</h2> <div class="table-wrapper"> <table> <thead> <tr> <th>Hardware configuration</th> <th>Generation throughput (tok/s)</th> <th>Total throughput (tok/s)</th> <th>TTFT (ms)</th> <th>ITL (ms)</th> <th>Prompts</th> <th>Tokens in</th> <th>Tokens out</th> <th>Parallel requests</th> </tr> </thead> <tbody> <tr> <td>2× NVIDIA B200 GPU</td> <td>896</td> <td>8,062</td> <td>3,091</td> <td>36</td> <td>512</td> <td>4,194,304</td> <td>524,288</td> <td>32</td> </tr> <tr> <td>4× NVIDIA H100 GPU</td> <td>849</td> <td>7,644</td> <td>13,131</td> <td>27</td> <td>512</td> <td>4,194,304</td> <td>524,288</td> <td>32</td> </tr> </tbody> </table> </div> <img src="proxy.php?url=https://track.hubspot.com/__ptq.gif?a=21998649&amp;k=14&amp;r=https%3A%2F%2Flambda.ai%2Finference-models%2Fminimaxai%2Fminimax-m2.5&amp;bu=https%253A%252F%252Flambda.ai%252Finference-models&amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "> Fri, 13 Feb 2026 16:03:16 GMT https://lambda.ai/inference-models/minimaxai/minimax-m2.5 2026-02-13T16:03:16Z Lambda