Dry-Run Flow RFC #1369
mfeliz-cruise
started this conversation in
RFCs
Replies: 2 comments
-
cc: @ncomly-nvidia |
Beta Was this translation helpful? Give feedback.
0 replies
-
We should make sure that the created methods can be linked to the blocks that produce them in partitioning (i.e. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Torch-TensorRT Dry Run Flow
TL;DR
Proposes adding a flow to Torch-TensorRT that embeds TensorRT convertible partition segments in the output module as methods and corresponding method calls in the top level graph. Enables partition inspection without running conversion and useful debugging features and workflows when combined with #621.
Original discussion here
Goal(s)
Usecases
Proposed APIs / UX
Add option
dry_run
(no_conversion
in prototype) to the Torch-TensorRT compile APIs to enable the flow. Setting this option will trigger the dry run flow in Torch-TensorRT which will produce a PyTorch module with a method created for each TensorRT convertible segment and a correspondingaten::CallMethod
op inserted in the top level graph.Example Workflow
Example output from prototype unit test showing usage ComputeResnet50NoConvertFallbackGraphCorrectly. A method is created for each TensorRT convertible segment and a corresponding
aten::CallMethod
op is inserted in the top levelforward
method graph.This can then be used to inspect the current convertibility/partition of the input model without needing to run conversion. If (#621) is implemented the user could run Torch-TensorRT again on the individual TensorRT convertible methods for debugging or to compile different engines with different settings.
Limitations
The prompt here in the template is unclear to me: "More important that what will this feature do is what can't this feature do. Clearly describe how a user can tell if a user"
Currently the prototype does not include an automated method of generating inputs for each generated TensorRT convertible method for use in subsequent conversion attempts. This would need to be done manually currently which may make the (#621) dependent use models difficult to use. This could potentially be addressed by running Torch-TensorRT's shape propagation using the original inputs to create inputs for each generated method or by using an external shape propagation tool to identify the shapes at the input of each method call in the output of the dry run.
Internal Implementation
Design
After partitioning, when the standard flow would call
ConvertBlockToEngine
andAddEngineToGraph
for each TensorRT targeted segment, we instead call AddSegmentedBlockToGraphAsFunction which will create a method from the segment graph and embed a correspondingaten::CallMethod
op in the top level graph.Extensions Required to Core API implementations
See AddSegmentedBlockToGraphAsFunction
Data Structures
No new data structures are required.
Implementation Phases
Prototype
#1360
MVP
(<TARGET RELEASE VERSION>)
- MThe functionality in the prototype which embeds the TensorRT targeted partitions in the output PyTorch module as methods and corresponding method calls in the graph. Enables partition inspection workflows.
Extension Phase 1
(<TARGET RELEASE VERSION>)
- M(#621) Enables use models dependent on the ability to convert a method other than
forward
individually. Benefits in debugging through isolation of issues to individual engines and finer per-engine control of conversion settings.Extension Phase 2
(<TARGET RELEASE VERSION>)
- SRemove GPU dependency from the dry run flow allowing inspection of convertibility/partition on host machines without access to a GPU.
Beta Was this translation helpful? Give feedback.
All reactions