-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PaddleMIX ppdiffusers Stable Diffusion 3 inference optimize #681
Open
chang-wenbin
wants to merge
59
commits into
PaddlePaddle:develop
Choose a base branch
from
chang-wenbin:SD3_PaddleMIX_819
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
59 commits
Select commit
Hold shift + click to select a range
a6631e7
optimize SD3
chang-wenbin b0ea9ef
optimize SD3 transformer_SD3
chang-wenbin f06a61a
optimize SD3 transformer_SD3
chang-wenbin dcff90c
update SD3
chang-wenbin 15c5e44
uodate triton &sim_SD3
chang-wenbin ab73a63
modify temb_silu && modify nvtx
chang-wenbin ed2b7b1
modify linear from fused_linear
chang-wenbin f4330d3
modify simplified_sd3
chang-wenbin cc1af0f
add split_concat triton kernel
chang-wenbin 70e6b6e
modify split_concat triton kernel
chang-wenbin 9543b11
update
chang-wenbin 357b75a
update transformer_sd3
chang-wenbin f54bf84
update transformer_sd3
chang-wenbin 3245b2f
update triton & simplified_sd3
chang-wenbin 5516df6
update simplified_sd3
chang-wenbin 874d5d7
update simplified_sd3
chang-wenbin 111f4cd
delete context_pre_only=False
chang-wenbin 18777b6
modify triton_optimize
chang-wenbin 7a288e4
modify triton_optimize
chang-wenbin 840b153
modify triton_optimize
chang-wenbin 95c9e47
modify triton_fuse & Modifying performance issues affected by CUDA sy…
chang-wenbin 84a9e7a
modify transformer_sd3 if optimize_prigin
chang-wenbin 9dd918d
update vae triton_split
chang-wenbin 3a0b7e1
vae T5 d2s & transformer forward d2s
chang-wenbin 6d02d79
update demo
chang-wenbin 5d81b44
update five model d2s
chang-wenbin 4bab118
update SD3 clip T5 vae
chang-wenbin 5a14a0f
update clip
chang-wenbin cd2ef01
uodate T5
chang-wenbin 624168c
uodate T5
chang-wenbin b009b9f
update scheduling_flow_match_euler_discrete
chang-wenbin 8caa10a
update normalization
chang-wenbin 377629a
update normalization
chang-wenbin 6863054
Merge remote-tracking branch 'upstream/develop' into SD3_PaddleMIX_819
chang-wenbin 15fda4e
update SD3
chang-wenbin cb993c5
merge develop
chang-wenbin 0e90eaf
update cutlass gemm&fast_gelu
chang-wenbin c5bb81f
update per-mmdit
chang-wenbin 2c8cc85
merge develop
chang-wenbin 499752a
update triton op split_concat
chang-wenbin 1084f4a
update embeddings
chang-wenbin e3a5d7c
merge
chang-wenbin fa84559
recovery
chang-wenbin 27c62f9
recovery
chang-wenbin 951f7a6
merge
chang-wenbin 9515323
update normalization
chang-wenbin d61e4cb
update dtype
chang-wenbin d961a4a
add SD3 doc
chang-wenbin ac1e139
merge develop
chang-wenbin 48c66a6
update SD3 doc
chang-wenbin 24c3c9e
add 'del transformer_blocks'
chang-wenbin 422f33b
update SD3
chang-wenbin c43d84f
update SD3
chang-wenbin 9d03624
update Notes
chang-wenbin ded06bf
add Notes
chang-wenbin d845da2
update demo
chang-wenbin db6aad1
update doc
chang-wenbin 3527954
update SD3
chang-wenbin e7848a3
merge zkk
chang-wenbin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import os | ||
from typing import Dict, Optional, Tuple, Union | ||
|
||
import paddle | ||
|
@@ -88,6 +89,9 @@ def __init__( | |
use_quant_conv: bool = True, | ||
use_post_quant_conv: bool = True, | ||
): | ||
# NOTE:(changwenbin,zhoukangkang) SD3 vae use memory_efficient_attention op which is not well supported by Paddle-TensorRT | ||
# so set USE_PPXFORMERS=False to avoid using memory_efficient_attention op. | ||
os.environ["USE_PPXFORMERS"] = "False" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里解释下为什么我们需要将其设置为False吧 |
||
super().__init__() | ||
# if down_block_out_channels not given, we will use block_out_channels | ||
_down_block_out_channels = block_out_channels if down_block_out_channels is None else down_block_out_channels | ||
|
@@ -116,6 +120,8 @@ def __init__( | |
norm_num_groups=norm_num_groups, | ||
act_fn=act_fn, | ||
) | ||
del os.environ["USE_PPXFORMERS"] | ||
# NOTE:(changwenbin,zhoukangkang) del set USE_PPXFORMERS=False to Restore Defaults | ||
|
||
self.quant_conv = nn.Conv2D(2 * latent_channels, 2 * latent_channels, 1) if use_quant_conv else None | ||
self.post_quant_conv = nn.Conv2D(latent_channels, latent_channels, 1) if use_post_quant_conv else None | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里加一句,请使用2024年9月6日之后的PaddleNLP,因为在该天,我们修复了一个针对PaddleNLP的bug。
https://github.com/PaddlePaddle/PaddleNLP/pull/9016/files