Torch-TensorRT v1.2.0 #1352

narendasan · 2022-09-13T04:43:51Z

narendasan
Sep 13, 2022
Collaborator

PyTorch 1.2, Collections based I/O, FX Frontend, torchtrtc custom op support, CMake build system and Community Window Support

Torch-TensorRT 1.2.0 targets PyTorch 1.12, CUDA 11.6, cuDNN 8.4 and TensorRT 8.4. This release focuses on a couple key new APIs to handle function I/O that uses collection types which should enable whole new model classes to be compiled by Torch-TensorRT without source code modification. It also introduces the "FX Frontend", a new frontend for Torch-TensorRT which leverages FX, a high level IR built into PyTorch with extensive Python APIs. For uses cases which do not need to be run outside of Python this may be a strong option to try as it is easily extensible in a familar development enviornment. In Torch-TensorRT 1.2.0, the FX frontend should be considered beta level in stability. torchtrtc has received improvements which target the ability to handle operators outside of the core PyTorch op set. This includes custom operators from libraries such as torchvision and torchtext. Similarlly users can provide custom converters to torchtrtc to extend the compilers support from the command line instead of having to write an application to do so. Finally, Torch-TensorRT introduces community supported Windows and CMake support.

New Dependencies

`nvidia-tensorrt`

For previous versions of Torch-TensorRT, users had to install TensorRT via system package manager and modify their LD_LIBRARY_PATH in order to set up Torch-TensorRT. Now users should install the TensorRT Python API as part of the installation proceedure. This can be done via the following steps:

pip install nvidia-pyindex
pip install nvidia-tensorrt==8.4.3.1
pip install torch-tensorrt==1.2.0 -f https://github.com/pytorch/tensorrt/releases

Installing the TensorRT pip package will allow Torch-TensorRT to automatically load the TensorRT libraries without any modification to enviornment variables. It is also a necessary dependency for the FX Frontend.

`torchvision`

Some FX frontend converters are designed to target operators from 3rd party libraries like torchvision. As such, you must have torchvision installed in order to use them. However, this dependency is optional for cases where you do not need this support.

Jetson

Starting from this release we will be distributing precompiled binaries of our NGC release branches for aarch64 (as well as x86_64), starting with ngc/22.07. These releases are designed to be paired with NVIDIA distributed builds of PyTorch including the NGC containers and Jetson builds and are equivalent to the prepackaged distribution of Torch-TensorRT that comes in the containers. They represent the state of the master branch at the time of branch cutting so may lag in features by a month or so. These releases will come separately to minor version releases like this one. Therefore going forward, these NGC releases should be the primary release channel used on Jetson (including for building from source).

NOTE: NGC PyTorch builds are not identical to builds you might install through normal channels like pytorch.org. In the past this has caused issues in portability between pytorch.org builds and NGC builds. Therefore we strongly recommend in workflows such as exporting a TorchScript module on an x86 machine and then compiling on Jetson to ensure you are using the NGC container release on x86 for your host machine operations. More information about Jetson support can be found along side the 22.07 release (https://github.com/pytorch/TensorRT/releases/tag/v1.2.0a0.nv22.07)

Collections based I/O [Experimental]

Torch-TensorRT previously has operated under the assumption that nn.Module forward functions can trivially be reduced to the form forward([Tensor]) -> [Tensor]. Typically this implies functions fo the form forward(Tensor, Tensor, ... Tensor) -> (Tensor, Tensor, ..., Tensor). However as model complexity increases, grouping inputs may make it easier to manage many inputs. Therefore, function signatures similar to forward([Tensor], (Tensor, Tensor)) -> [Tensor] or forward((Tensor, Tensor)) -> (Tensor, (Tensor, Tensor)) might be more common. In Torch-TensorRT 1.2.0, more of these kinds of uses cases are supported using the new experimental input_signature compile spec API. This API allows users to group Input specs similar to how they might group the input Tensors they would use to call the original module's forward function. This informs Torch-TensorRT on how to map a Tensor input from its location in a group to the engine and from the engine into its grouping returned back to the user.

To make this concrete consider the following standard case:

class StandardTensorInput(nn.Module):
    def __init__(self):
        super(StandardTensorInput, self).__init__()

    def forward(self, x, y):
        r = x + y
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = StandardTensorInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    inputs=[
        torch_tensorrt.Input(x.shape),
        torch_tensorrt.Input(y.shape)
    ],
    min_block_size=1
)

out = trt_module(x,y)
print(out)

Here a user has defined two explicit tensor inputs and used the existing list based API to define the input specs.

With Torch-TensorRT the following use cases are now possible using the new input_signature API:

Tuple based input collection

class TupleInput(nn.Module):
    def __init__(self):
        super(TupleInput, self).__init__()

    def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
        r = z[0] + z[1]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=((x, y),), # Note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)

List based input collection

class ListInput(nn.Module):
    def __init__(self):
        super(ListInput, self).__init__()

    def forward(self, z: List[torch.Tensor]):
        r = z[0] + z[1]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module([x,y])
print(out)

Note how the input specs (in this case just example tensors) are provided to the compiler. The input_signature argument expects a Tuple[Union[torch.Tensor, torch_tensorrt.Input, List, Tuple]] grouped in a format representative of how the function would be called. In these cases its just a list or tuple of specs.

More advanced cases are supported as we:

Tuple I/O

class TupleInputOutput(nn.Module):
    def __init__(self):
        super(TupleInputOutput, self).__init__()

    def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = z[0] - z[1]
        r1 = r1 * 10
        r = (r1, r2)
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInputOutput()

trt_module = torch_tensorrt.compile(
    module,
    input_signature=((x,y),), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)

List I/O

class ListInputOutput(nn.Module):
    def __init__(self):
        super(ListInputOutput, self).__init__()

    def forward(self, z: List[torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = z[0] - z[1]
        r = [r1, r2]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInputOutput()

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)

Multple Groups of Mixed Types

class MultiGroupIO(nn.Module):
    def __init__(self):
        super(MultiGroupIO, self).__init__()

    def forward(self, z: List[torch.Tensor], a: Tuple[torch.Tensor, torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = a[0] + a[1]
        r3 = r1 - r2
        r4 = [r1, r2]
        return (r3, r4)
    
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = MultiGroupIO().eval.to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],(x,y)), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module([x,y],(x,y))
print(out)

These features are also supported in C++ as well:

torch::jit::Module mod;
try {
  // Deserialize the ScriptModule from a file using torch::jit::load().
  mod = torch::jit::load(path);
} catch (const c10::Error& e) {
  std::cerr << "error loading the model\n";
}
mod.eval();
mod.to(torch::kCUDA);

std::vector<torch::jit::IValue> inputs_;

for (auto in : inputs) {
  inputs_.push_back(torch::jit::IValue(in.clone()));
}

std::vector<torch::jit::IValue> complex_inputs;
auto input_list = c10::impl::GenericList(c10::TensorType::get());
input_list.push_back(inputs_[0]);
input_list.push_back(inputs_[0]);

torch::jit::IValue input_list_ivalue = torch::jit::IValue(input_list);

complex_inputs.push_back(input_list_ivalue);

auto input_shape = torch_tensorrt::Input(in0.sizes(), torch_tensorrt::DataType::kHalf);
auto input_shape_ivalue = torch::jit::IValue(std::move(c10::make_intrusive<torch_tensorrt::Input>(input_shape)));

c10::TypePtr elementType = input_shape_ivalue.type();
auto list = c10::impl::GenericList(elementType);
list.push_back(input_shape_ivalue);
list.push_back(input_shape_ivalue);

torch::jit::IValue complex_input_shape(list);
std::tuple<torch::jit::IValue> input_tuple2(complex_input_shape);
torch::jit::IValue complex_input_shape2(input_tuple2);

auto compile_settings = torch_tensorrt::ts::CompileSpec(complex_input_shape2);
compile_settings.min_block_size = 1;
compile_settings.enabled_precisions = {torch::kHalf};

// // Compile module
auto trt_mod = torch_tensorrt::ts::compile(mod, compile_settings);
auto trt_out = trt_mod.forward(complex_inputs);

Currently this feature should be considered experimental, APIs may be subject to change or folded into existing APIs. There are also limitations introduced by using this feature including the following:

Not all collection types are supported (e.g. Dict, namedtuple)
Not being able to require_full_compilation while using this feature
Certain operators are required to run in PyTorch throughout the graph which may impact performance
The maximum depth of collections nesting is limited.
These limitations will be addressed in subsequent versions.

Adding FX frontend to Torch-TensorRT [Beta]

This release includes the FX as one of its supported IRs to convert torch models to TensorRT through the new FX frontend. At a high level, this path transforms the model into or consumes an FX graph and similar to the TorchScript frontend converts the graph to TensorRT through the use of a library of converters. The key difference is that it is implemented purely in Python. The role of this FX frontend is to supplement the TS lowering path and to provide users better ease of use and easier extensibility in use cases where removing Python as a dependency is not strictly necessary. Detailed user instructions can be find in the document.
The FX path examples are located under //examples/fx
The FX path unit tests are located under //py/torch_tensorrt/fx/tests

Custom operators and converters in Torch-TensorRT

While both the C++ API and Python API provide systems to include and convert custom operators in your model (for instance those implemented in torchvision) torchtrtc has been limited to the core opset. In Torch-TensorRT 1.2.0 two new flags have been added to torchtrtc.

    --custom-torch-ops                (repeatable) Shared object/DLL containing custom torch operators
    --custom-converters               (repeatable) Shared object/DLL containing custom converters

These arguments accept paths to .so or DLL files which define custom operators for PyTorch or custom converters for Torch-TensorRT. These files will get DL_OPEN'd at runtime to extend the op and converter libraries.

For example:

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16

Community CMake and Windows support

Thanks to the great work of @gcuendet and others, CMake and consequentially Windows support has been added to the project! Users on Linux and Windows can now build the C++ API using this system and using torch_tensorrt_runtime.dll add support for executing Torch-TensorRT programs on Windows in both Python and C++. Detailed information on how to use this build system can be found here: https://pytorch.org/TensorRT/getting_started/installation.html

Bazel will continue to be the primary build system for the project and all testing and distributed builds will be built and run with Bazel (including future official Windows support) so users should consider this still the canonical version of Torch-TensorRT. However we aim to ensure as best as we can that the CMake system will be able to build the project properly including on Windows. Contributions to continue to grow the support for this build system and Windows as a platform are definitely welcomed.

Known Limitations

Collections I/O
- Not all collection types are supported (e.g. Dict, namedtuple)
- Not being able to require_full_compilation while using this feature
- Certain operators are required to run in PyTorch throughout the graph which may impact performance
- The maximum depth of collections nesting is limited.
FX
- Some of FX operators have limited dynamic shape capability. Please check here.
- Control flow in model could not be handled
Python API via the CMake build system.

Dependencies

- Bazel 5.2.0
- LibTorch 1.12.1
- CUDA 11.6 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
- cuDNN 8.4.1.50
- TensorRT 8.4.3.1

=================================
Operators Supported (TorchScript)

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::argmax(Tensor self, int dim, bool keepdim=False) -> (Tensor)
aten::argmin(Tensor self, int dim, bool keepdim=False) -> (Tensor)
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bitwise_not(Tensor self) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::scatter.src(Tensor self, int dim, Tensor index, Tensor src) -> (Tensor)
aten::scatter.value(Tensor self, int dim, Tensor index, Scalar value) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split.sizes(Tensor(a -> *) self, int[] split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::square(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::and.bool(bool a, bool b) -> (bool)
aten::__derive_index(int idx, int start, int step) -> int
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__range_length(int lo, int hi, int step) -> int
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add.str(str a, str b) -> (str)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None,
Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None,
Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None,
Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::eq.str(str a, str b) -> (bool)
aten::extend.t(t self, t[] other) -> ()
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::format(str self, ...) -> (str)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::pow.float(float a, float b) -> (float)
aten::pow.float_int(float a, int b) -> (float)
aten::pow.int(int a, int b) -> (float)
aten::pow.int_float(int a, float b) -> (float)
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::TupleIndex(Any tup, int i) -> (Any)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

What's Changed

chore: Bump version to 1.2.0a0 by @narendasan in chore: Bump version to 1.2.0a0 #1044
feat: Extending nox for cxx11 ABI version by @andi4191 in feat: Extending nox for cxx11 ABI version #1013
docs: Update the documentation theme to PyTorch by @narendasan in docs: Update the documentation theme to PyTorch #1063
Adding Code of Conduct file by @facebook-github-bot in Adding Code of Conduct file #1061
Update CONTRIBUTING.md by @frank-wei in Update CONTRIBUTING.md #1064
feat: Optimize hub.py download by @andi4191 in feat: Optimize hub.py download #1022
Adding an action to automatically assign reviewers and assignees by @narendasan in Adding an action to automatically assign reviewers and assignees #1078
Add PR assigner support by @narendasan in Add PR assigner support #1080
(//core): Align with prim::Enter in module fallback by @andi4191 in (//core): Align with prim::Enter in module fallback #991
(//core): Added a variant for aten::split by @andi4191 in (//core): Added a variant for aten::split #992
feat(nox): Replacing session with environment variable by @andi4191 in feat(nox): Replacing session with environment variable #1057
Refactor the internal codebase from fx2trt_oss to torch_tensorrt by @frank-wei in Refactor the internal codebase from fx2trt_oss to torch_tensorrt #1104
format by buildifier by @frank-wei in format by buildifier #1106
[fx2trt] Modify lower setting class by @frank-wei in [fx2trt] Modify lower setting class #1107
Modified the notebooks directory's README file by @svenchilton in Modified the notebooks directory's README file #1102
[FX] Sync to OSS by @frank-wei in [FX] Sync to OSS #1118
[fx_acc] Add acc_tracer support for torch.mm by @khabinov in [fx_acc] Add acc_tracer support for torch.mm #1120
Added Triton deployment instructions to documentation by @tanayvarshney in Added Triton deployment instructions to documentation #1116
amending triton deployment docs by @tanayvarshney in amending triton deployment docs #1126
fix: Update broken repo hyperlink by @lamhoangtung in fix: Update broken repo hyperlink #1131
fix: Fix keep_dims functionality for aten::max by @peri044 in fix: Fix keep_dims functionality for aten::max #1099
fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning by @peri044 in fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning #1140
feat(//tests): Update rtol and atol based tolerance for test cases by @andi4191 in feat(//tests): Update rtol and atol based tolerance for test cases #1055
doc: add the explanation for partition phases on docs by @bowang007 in doc: add the explanation for partition phases on docs #1090
feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc by @andi4191 in feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc #1052
CI/CD setup by @frank-wei in CI/CD setup #1137
Update README.md by @frank-wei in Update README.md #1142
[fx2trt] Engineholder feature improvement, test fixes by @frank-wei in [fx2trt] Engineholder feature improvement, test fixes #1143
feat (//core/conversion) : Add converter for torch.bitwise_not by @blchu in feat (//core/conversion) : Add converter for torch.bitwise_not #1029
fixed typos by @tanayvarshney in fixed typos #1098
[FX] --fx-only does not need to check bazel by @frank-wei in [FX] --fx-only does not need to check bazel #1147
[FX] refactor the fx path in compile function by @frank-wei in [FX] refactor the fx path in compile function #1141
[FX] Create getting_started_with_fx_path.rst by @frank-wei in [FX] Create getting_started_with_fx_path.rst #1145
[FX] move example folder by @frank-wei in [FX] move example folder #1149
[FX] Sync enhancement done internally at Meta by @yinghai in [FX] Sync enhancement done internally at Meta #1161
Update config.yml by @frank-wei in Update config.yml #1163
Use py3 next() syntax by @ptrblck in Use py3 next() syntax #1159
Add missing comma for proper torch versioning in setup.py by @dabauxi in Add missing comma for proper torch versioning in setup.py #1164
[docs] Update link to relative path by @zhiqwang in [docs] Update link to relative path #1171
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1172
fix: fix the model name typo error by @bowang007 in fix: fix the model name typo error #1176
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1178
[feat]: support slice with dynamic shape by @inocsin in [feat]: support slice with dynamic shape #1110
[FX] Update getting_started_with_fx_path.rst by @frank-wei in [FX] Update getting_started_with_fx_path.rst #1184
[FX] Update README.md by @frank-wei in [FX] Update README.md #1183
fix: Fix PTQ calibration when there are multiple inputs by @peri044 in fix: Fix PTQ calibration when there are multiple inputs #1191
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1194
[fix]: fix bug in aten::to, when network only have aten::to layer wil… by @inocsin in [fix]: fix bug in aten::to, when network only have aten::to layer wil… #1108
Add .circleci/config.yml by @narendasan in Add .circleci/config.yml #1153
feat: Upgrade TRT to 8.4 by @peri044 in feat: Upgrade TRT to 8.4 #1152
feat: Update Pytorch version to 1.12 by @peri044 in feat: Update Pytorch version to 1.12 #1177
fix: converter renaming already named tensors by @bowang007 in fix: converter renaming already named tensors #1167
feat(//py): Use TensorRT to fill in .so libraries automatically if possible by @narendasan in feat(//py): Use TensorRT to fill in .so libraries automatically if possible #1085
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1204
fix: fix the parsing related model loading bug by @bowang007 in fix: fix the parsing related model loading bug #1148
feat: support min_block_size != 1 caused fallback nodes re-segmentation by @bowang007 in feat: support min_block_size != 1 caused fallback nodes re-segmentation #1195
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1208
fix: fix the fallback related issue after merging collection by @bowang007 in fix: fix the fallback related issue after merging collection #1206
Add CMake support to build the libraries by @gcuendet in Add CMake support to build the libraries #1058
Fix typo in EfficientNet-example by @davinnovation in Fix typo in EfficientNet-example #1217
fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output by @bowang007 in fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output #1220
fix: fix the error that collection input segmented into trt subgraph by @bowang007 in fix: fix the error that collection input segmented into trt subgraph #1225
feat(//circleci): Adding release automation by @narendasan in feat(//circleci): Adding release automation #1215
fix: support int tensor * int scaler in aten::mul by @mfeliz-cruise in fix: support int tensor * int scaler in aten::mul #1095
[FX] Changes done internally at Facebook by @frank-wei in [FX] Changes done internally at Facebook #1221
Fix errors in unbind and list slice by @mfeliz-cruise in Fix errors in unbind and list slice #1088
Adding a Resnet C++ example by @vinhngx in Adding a Resnet C++ example #1175
[FX] disable 2 of conv3d and type_as tests by @frank-wei in [FX] disable 2 of conv3d and type_as tests #1224
[feat] Add support for integers in aten::abs converter (Support TensorRT Post Training Quantization #35) by @mfeliz-cruise in [feat] Add support for integers in aten::abs converter (#35) #1232
Update PTQ example to fix new compile_spec requirements by @ncomly-nvidia in Update PTQ example to fix new compile_spec requirements #1242
feat: support for grouped inputs by @narendasan in feat: support for grouped inputs #1201
feat: Added support for custom torch operators and converters in torchtrtc by @andi4191 in feat: Added support for custom torch operators and converters in torchtrtc #1219
Add outputPadding in deconv by @ruoqianguo in Add outputPadding in deconv #1234
chore: Apply linting and ignore new bazel dirs by @narendasan in chore: Apply linting and ignore new bazel dirs #1223
added qat-ptq workflow notebook by @tanayvarshney in added qat-ptq workflow notebook #1239
fix: Update cmake for the new collection files by @narendasan in fix: Update cmake for the new collection files #1246
chore: ignore dist dir for pre-commit by @narendasan in chore: ignore dist dir for pre-commit #1249
chore: Aligning bazel version for consistency across different docker… by @andi4191 in chore: Aligning bazel version for consistency across different docker… #1250
refactor: Changed the hardcoded values to macros for DLA memory sizes by @andi4191 in refactor: Changed the hardcoded values to macros for DLA memory sizes #1247
chore: update jetson pytorch baase by @narendasan in chore: update jetson pytorch baase #1251
[feat] Add automatic type promotion to element-wise ops by @mfeliz-cruise in [feat] Add automatic type promotion to element-wise ops #1240
Assorted small fixes by @narendasan in Assorted small fixes #1259
[FX] remove op_lowering_disallow_list and format revert by @frank-wei in [FX] remove op_lowering_disallow_list and format revert #1261
fix: fix the "schema not found for node" error by @bowang007 in fix: fix the "schema not found for node" error #1236
chore: Fix contributing doc by @peri044 in chore: Fix contributing doc #1268
feat: support scatter.value and scatter.src by @inocsin in feat: support scatter.value and scatter.src #1252
Internal workspace workflow by @narendasan in Internal workspace workflow #1269
Fix typo in README by @davinnovation in Fix typo in README #1273
Support swin/bert with dynamic batch by @Njuapp in Support swin/bert with dynamic batch #1270
Update release 1.2 by @narendasan in Update release 1.2 #1275
correct sha256sum of cudnn by @Njuapp in correct sha256sum of cudnn #1278
Update release branch by @narendasan in Update release branch #1279
Update release branch by @narendasan in Update release branch #1300

New Contributors

@facebook-github-bot made their first contribution in Adding Code of Conduct file #1061
@khabinov made their first contribution in [fx_acc] Add acc_tracer support for torch.mm #1120
@blchu made their first contribution in feat (//core/conversion) : Add converter for torch.bitwise_not #1029
@yinghai made their first contribution in [FX] Sync enhancement done internally at Meta #1161
@ptrblck made their first contribution in Use py3 next() syntax #1159
@dabauxi made their first contribution in Add missing comma for proper torch versioning in setup.py #1164
@zhiqwang made their first contribution in [docs] Update link to relative path #1171
@gcuendet made their first contribution in Add CMake support to build the libraries #1058
@davinnovation made their first contribution in Fix typo in EfficientNet-example #1217

Full Changelog: v1.1.1...v1.2.0

This discussion was created from the release Torch-TensorRT v1.2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch-TensorRT v1.2.0 #1352

{{title}}

Replies: 0 comments

Select a reply

Torch-TensorRT v1.2.0 #1352

narendasan Sep 13, 2022 Collaborator

PyTorch 1.2, Collections based I/O, FX Frontend, torchtrtc custom op support, CMake build system and Community Window Support

New Dependencies

nvidia-tensorrt

torchvision

Jetson

Collections based I/O [Experimental]

Adding FX frontend to Torch-TensorRT [Beta]

Custom operators and converters in Torch-TensorRT

Community CMake and Windows support

Known Limitations

Dependencies

================================= Operators Supported (TorchScript)

Operators Currently Supported Through Converters

Operators Currently Supported Through Evaluators

What's Changed

New Contributors

Replies: 0 comments

narendasan
Sep 13, 2022
Collaborator

`nvidia-tensorrt`

`torchvision`

=================================
Operators Supported (TorchScript)