❓ [Question] Lowering pass for PyTorch Linear #529

842974287 · 2021-07-08T05:45:28Z

842974287
Jul 8, 2021
Collaborator

❓ Question

Hi, I saw that one of the lowering pass TRTorch has is lowering linear to mm + add. I'm wondering what the reason behind this is. Does TensorRT provide better performance with matmul layer + elementwise sum layer than fully connected layer? Or breaking it down help the fusion process in TensorRT?

Answered by peri044

Jul 9, 2021

https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#optimize-layer
It says "A new development is encouraged to use MatrixMultiply in preference to FullyConnected layers for consistency of interface. Matrix multiplication is generally significantly faster in FP16 Tensor Cores compared to FP32."
Fully connected layers are expressed as convolutions. I'm not sure if there would be any perf difference for general layer dimensions that are used in final classification layers.

One reason to do this in TRTorch is that we usually keep our converter library light. Any operations that can be expressed as a composition of existing converters would be done so, unless we know for su…

View full answer

peri044 · 2021-07-09T01:32:42Z

peri044
Jul 9, 2021
Collaborator

https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#optimize-layer
It says "A new development is encouraged to use MatrixMultiply in preference to FullyConnected layers for consistency of interface. Matrix multiplication is generally significantly faster in FP16 Tensor Cores compared to FP32."
Fully connected layers are expressed as convolutions. I'm not sure if there would be any perf difference for general layer dimensions that are used in final classification layers.

One reason to do this in TRTorch is that we usually keep our converter library light. Any operations that can be expressed as a composition of existing converters would be done so, unless we know for sure there would be big performance penalty.
So at::linear can be expressed as aten::matmul + aten::add which reuses existing converters. However, we also have aten::linear converter (which uses fullyconnected layer in TRT) which is legacy code and has been implemented much earlier.

0 replies

narendasan · 2021-07-12T16:14:03Z

narendasan
Jul 12, 2021
Collaborator

Going to move this thread to discussions 😄

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ [Question] Lowering pass for PyTorch Linear #529

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

❓ [Question] Lowering pass for PyTorch Linear #529

842974287 Jul 8, 2021 Collaborator

❓ Question

Replies: 2 comments

peri044 Jul 9, 2021 Collaborator

narendasan Jul 12, 2021 Collaborator

842974287
Jul 8, 2021
Collaborator

peri044
Jul 9, 2021
Collaborator

narendasan
Jul 12, 2021
Collaborator