Torch-TensorRT#
Torch-TensorRT compiles PyTorch models for NVIDIA GPUs using TensorRT,
delivering significant inference speedups with minimal code changes. It supports
just-in-time compilation via torch.compile and ahead-of-time export via
torch.export, integrating seamlessly with the PyTorch ecosystem.
New to Torch-TensorRT? Start with Installation, then try the quick start guide below.
Ready to optimize a model? See TensorRT Backend for torch.compile for torch.compile
or Compiling Exported Programs with Torch-TensorRT for ahead-of-time export with torch.export.
Additional resources:
Quick Start#
Option 1: torch.compile#
You can use Torch-TensorRT anywhere you use torch.compile:
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
x = torch.randn((1, 3, 224, 224)).cuda() # define what the inputs to the model will look like
optimized_model = torch.compile(model, backend="tensorrt")
optimized_model(x) # compiled on first run
optimized_model(x) # this will be fast!
Option 2: Export#
If you want to optimize your model ahead-of-time and/or deploy in a C++ environment, Torch-TensorRT provides an export-style workflow that serializes an optimized module. This module can be deployed in PyTorch or with libtorch (i.e. without a Python dependency).
Step 1: Optimize + serialize#
import torch
import torch_tensorrt
model = MyModel().eval().cuda() # define your model here
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # define a list of representative inputs here
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)
torch_tensorrt.save(trt_gm, "trt.ep", arg_inputs=inputs) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt.save(trt_gm, "trt.ts", output_format="torchscript", arg_inputs=inputs)
Step 2: Deploy#
Deployment in Python:#
import torch
import torch_tensorrt
inputs = [torch.randn((1, 3, 224, 224)).cuda()] # your inputs go here
# You can run this in a new python session!
model = torch.export.load("trt.ep").module()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model(*inputs)
Deployment in C++:#
#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"
auto trt_mod = torch::jit::load("trt.ts");
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});