Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add triton to paddlemix #657

Open
wants to merge 21 commits into
base: release/2.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@ PaddleMIX是基于飞桨的多模态大模型开发套件,聚合图像、文

## 最新进展

📚《飞桨多模态大模型开发套件PaddleMIX 2.0 震撼发布》,图文音视频场景全覆盖,多模态高效助力产业创新。超大规模训练支持,覆盖图文预训练、文生图、跨模态视觉任务,覆盖金融、教育、电商、医疗等产业场景。8月8日(周四)20:00 带你直播了解多模态大模型最新架构,深度解析PaddleMIX高性能模型库,手把手演示LLaVA模型训推全流程。[报名链接](https://www.wjx.top/vm/wKqysjx.aspx?udsid=449688)

**2024.07.25 发布PaddleMIX v2.0**
* 多模态理解:新增QwenVL-vl,LLaVA等;新增Auto模块统一SFT训练流程;新增mixtoken训练策略,SFT吞吐量提升5.6倍。
* 多模态生成:发布[PPDiffusers 0.24.1](./ppdiffusers/README.md)版本,支持视频生成能力,文生图模型新增LCM。新增peft,accelerate后端。提供基于飞桨开发的ComfyUI插件。
* 多模态理解:新增LLaVA系列,Qwen-VL等;新增Auto模块统一SFT训练流程;新增mixtoken训练策略,SFT吞吐量提升5.6倍。
* 多模态生成:发布[PPDiffusers 0.24.1](./ppdiffusers/README.md)版本,支持视频生成能力,文生图模型新增LCM。新增飞桨版peft,accelerate后端。提供基于飞桨开发的ComfyUI插件。
* 多模态数据处理工具箱[DataCopilot](./paddlemix/datacopilot/):支持自定义数据结构,数据转换,离线格式检查;支持基本的统计信息,数据可视化功能。

**2023.10.7 发布 PaddleMIX v1.0**
Expand Down Expand Up @@ -171,6 +173,14 @@ pip install -e .
</table>

更多模型能力,可参考[模型能力矩阵](./paddlemix/examples/README.md)

## 社区交流

- 微信扫描二维码并填写问卷,即可加入交流群与众多社区开发者以及官方团队深度交流。
<div align="center">
<img src="https://github.com/user-attachments/assets/ecf292da-9ac6-41cb-84b6-df726ef4522d" width="300" height="300" />
</div>

## 许可证书

本项目的发布受Apache 2.0 license许可认证。
34 changes: 34 additions & 0 deletions deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,37 @@ python export.py \
--prompt "bus"

```

## 3. 推理 BenchMark

> Note:
> 测试环境为:
Paddle 3.0,
PaddleMIX release/2.0
PaddleNLP2.7.2
A100 80G单卡。

### 3.1 benchmark命令

在 `deploy` 对应模型目录下的运行后加 --benchmark,
如 GroundingDino 的benchmark命令为:

```bash
cd deploy/groundingdino
python predict.py \
--text_encoder_type GroundingDino/groundingdino-swint-ogc \
--model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
--input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
--output_dir ./groundingdino_predict_output \
--prompt "bus" \
--benchmark True
```

# A100性能数据
|模型|图片分辨率|数据类型 |Paddle Deploy |
|-|-|-|-|
|qwen-vl-7b|448*448|fp16|669.8 ms|
|llava-1.5-7b|336*336|fp16|981.2 ms|
|llava-1.6-7b|336*336|fp16|778.7 ms|
|groundingDino/groundingdino-swint-ogc|800*1193|fp32|100 ms|
|Sam/SamVitH-1024|1024*1024|fp32|121 ms|
32 changes: 32 additions & 0 deletions deploy/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,35 @@ Will be exported to the following directory, including `model_state.pdiparams`,

```

## 3. BenchMark

> Note:
> environment
Paddle 3.0
PaddleMIX release/2.0
PaddleNLP 2.7.2
A100 80G。

### 3.1 benchmark cmd

Add -- benchmark after running in the 'deploy' corresponding model directory to obtain the running time of the model.
example: GroundingDino benchmark:

```bash
cd deploy/groundingdino
python predict.py \
--text_encoder_type GroundingDino/groundingdino-swint-ogc \
--model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
--input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
--output_dir ./groundingdino_predict_output \
--prompt "bus" \
--benchmark True
```

|Model|image size|dtype |Paddle Deploy |
|-|-|-|-|
|qwen-vl-7b|448*448|fp16|669.8 ms|
|llava-1.5-7b|336*336|fp16|981.2 ms|
|llava-1.6-7b|336*336|fp16|778.7 ms|
|groundingDino/groundingdino-swint-ogc|800*1193|fp32|100 ms|
|Sam/SamVitH-1024|1024*1024|fp32|121 ms|
17 changes: 17 additions & 0 deletions deploy/groundingdino/predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,19 @@ def main(model_args, data_args):
image_pil = Image.open(data_args.input_image).convert("RGB")
else:
image_pil = Image.open(requests.get(url, stream=True).raw).convert("RGB")

if model_args.benchmark:
import time
start = 0.0
total = 0.0
for i in range(20):
if i>10:
start = time.time()
boxes_filt, pred_phrases = predictor.run(image_pil, data_args.prompt)
if i > 10:
total += time.time()-start

print("Time:",total/10)

boxes_filt, pred_phrases = predictor.run(image_pil, data_args.prompt)

Expand Down Expand Up @@ -294,6 +307,10 @@ class ModelArguments:
default="GPU",
metadata={"help": "Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU."},
)
benchmark: bool = field(
default=False,
metadata={"help": "benchmark"}
)


if __name__ == "__main__":
Expand Down
14 changes: 12 additions & 2 deletions deploy/llava/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,21 @@
* `paddlenlp_ops`依赖安装

```bash
wget https://github.com/PaddlePaddle/PaddleNLP/archive/refs/tags/v2.7.2.tar.gz
tar xf v2.7.2
git submodule update --init --recursive
cd PaddleNLP
git reset --hard 498f70988431be278dac618411fbfb0287853cd9
pip install -e .
cd csrc
python setup_cuda.py install
```
* 如果在V100上安装报错,可屏蔽 /PaddleNLP/csrc/generation/quant_int8.cu 以下语句:

```bash
# template<>
# __forceinline__ __device__ __nv_bfloat16 add_mul<__nv_bfloat16>(__nv_bfloat16 a, __nv_bfloat16 b, __nv_bfloat16 c) {
# return __hmul(__hadd(a, b), c);
# }
```

* `fused_ln`需要安装 /PaddleNLP/model_zoo/gpt-3/external_ops 下的自定义OP, `python setup.py install`

Expand All @@ -38,6 +46,7 @@ python setup_cuda.py install

```bash
#!/bin/bash
export PYTHONPATH=/path/to/PaddleNLP/:/path/to/PaddleMIX
python deploy/llava/export_model.py \
--model_name_or_path "paddlemix/llava/llava-v1.5-7b" \
--save_path "./llava_static" \
Expand All @@ -49,6 +58,7 @@ python deploy/llava/export_model.py \

```bash
#!/bin/bash
export PYTHONPATH=/path/to/PaddleNLP/:/path/to/PaddleMIX
python deploy/llava/export_model.py \
--model_name_or_path "paddlemix/llava/llava-v1.5-7b" \
--save_path "./llava_static" \
Expand Down
2 changes: 1 addition & 1 deletion deploy/llava/run_static_predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def __init__(self, args):
self.args = args
self.config = AutoConfigMIX.from_pretrained(args.model_name_or_path)

self.tokenizer = AutoTokenizerMIX.from_pretrained(args.model_name_or_path, use_fast=False)
self.tokenizer = AutoTokenizerMIX.from_pretrained(args.model_name_or_path)
self.processor, _ = AutoProcessorMIX.from_pretrained(args.model_name_or_path, eval="eval")

self.first_predictor = self.create_predictor(args.first_model_path)
Expand Down
15 changes: 12 additions & 3 deletions deploy/qwen_vl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,21 @@
* `paddlenlp_ops`依赖安装

```bash
git clone https://github.com/PaddlePaddle/PaddleNLP.git
git submodule update --init --recursive
cd PaddleNLP
git reset --hard 498f70988431be278dac618411fbfb0287853cd9
pip install -e .
cd csrc
python setup_cuda.py install
```
* 如果在V100上安装报错,可屏蔽 /PaddleNLP/csrc/generation/quant_int8.cu 以下语句:

```bash
# template<>
# __forceinline__ __device__ __nv_bfloat16 add_mul<__nv_bfloat16>(__nv_bfloat16 a, __nv_bfloat16 b, __nv_bfloat16 c) {
# return __hmul(__hadd(a, b), c);
# }
```

* `fused_ln`需要安装 /PaddleNLP/model_zoo/gpt-3/external_ops 下的自定义OP, `python setup.py install`

Expand Down Expand Up @@ -51,9 +60,9 @@ python deploy/qwen_vl/export_image_encoder.py \
#!/bin/bash

export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=/path/to/PaddleNLP/:/path/to/PaddleMIX:/path/to/PaddleNLP/llm
export PYTHONPATH=../../PaddleNLP/:../../PaddleNLP/llm

python ppredict/export_model.py \
python predict/export_model.py \
--model_name_or_path "qwen-vl/qwen-vl-7b-static" \
--output_path ./checkpoints/encode_text/ \
--dtype float16 \
Expand Down
4 changes: 2 additions & 2 deletions deploy/qwen_vl/export_image_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@

import paddle

from paddlemix import QWenLMHeadModel
from paddlemix.auto import AutoConfigMIX, AutoModelMIX


def export(args):
model = QWenLMHeadModel.from_pretrained(args.model_name_or_path, dtype="float16")
model = AutoModelMIX.from_pretrained(args.model_name_or_path, dtype="float16")
model.eval()

# convert to static graph with specific input description
Expand Down
17 changes: 17 additions & 0 deletions deploy/sam/predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,10 @@ class ModelArguments:
default=True,
metadata={"help": "save visual image."},
)
benchmark: bool = field(
default=False,
metadata={"help": "benchmark"}
)


def main(model_args, data_args):
Expand All @@ -317,6 +321,19 @@ def main(model_args, data_args):
auto_tune(model_args, [image_pil], tune_img_nums)

predictor = Predictor(model_args)

if model_args.benchmark:
import time
start = 0.0
total = 0.0
for i in range(20):
if i>10:
start = time.time()
seg_masks = predictor.run(image_pil, {"points": data_args.points_prompt, "boxs": data_args.box_prompt})
if i > 10:
total += time.time()-start

print("Time:",total/10)

seg_masks = predictor.run(image_pil, {"points": data_args.points_prompt, "boxs": data_args.box_prompt})

Expand Down
4 changes: 2 additions & 2 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@

#### 多模态理解

1. 新增模型:Qwen-vl,LLaVA: v1.5-7b, v1.5-13b, v1,6-7b,CogAgent, CogVLM
1. 新增模型:LLaVA: v1.5-7b, v1.5-13b, v1,6-7b,CogAgent, CogVLM, Qwen-VL, InternLM-XComposer2
2. 数据集增强:新增chatml_dataset图文对话数据读取方案,可自定义chat_template文件适配,支持混合数据集
3. 工具链升级:新增Auto模块,统一SFT训练流程,兼容全参数、lora训练。新增mixtoken训练策略,SFT吞吐量提升5.6倍。支持QwenVL,LLaVA推理部署,较torch推理性能提升2.38倍
3. 工具链升级:新增Auto模块,统一SFT训练流程,兼容全参数、lora训练。新增mixtoken训练策略,SFT吞吐量提升5.6倍。支持Qwen-VL,LLaVA推理部署,较torch推理性能提升2.38倍

#### 多模态生成

Expand Down
1 change: 1 addition & 0 deletions paddlemix/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@
from .models import *
from .optimization import *
from .processors import *
from .triton_ops import *
9 changes: 7 additions & 2 deletions paddlemix/auto/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from paddlenlp.utils.import_utils import import_module
from paddlenlp.utils.log import logger

from paddlemix.utils.env import MODEL_HOME as PPMIX_MODEL_HOME
from .configuration import get_configurations

__all__ = [
Expand Down Expand Up @@ -72,7 +73,8 @@ def resolve_cache_dir(from_hf_hub: bool, from_aistudio: bool, cache_dir: Optiona
return None
if from_hf_hub:
return PPNLP_HF_CACHE_HOME
return PPNLP_MODEL_HOME

return PPMIX_MODEL_HOME


def get_model_mapping():
Expand Down Expand Up @@ -177,6 +179,8 @@ def _from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
from_aistudio = kwargs.get("from_aistudio", False)
subfolder = kwargs.get("subfolder", "")
cache_dir = resolve_cache_dir(from_hf_hub, from_aistudio, cache_dir)
kwargs["cache_dir"] = cache_dir


if from_hf_hub:
if hf_file_exists(repo_id=pretrained_model_name_or_path, filename=cls.model_config_file):
Expand Down Expand Up @@ -226,6 +230,7 @@ def _from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
[COMMUNITY_MODEL_PREFIX, pretrained_model_name_or_path, cls.legacy_model_config_file]
)
cache_dir = os.path.join(cache_dir, pretrained_model_name_or_path, subfolder)

try:
if url_file_exists(standard_community_url):
resolved_vocab_file = get_path_from_url_with_filelock(standard_community_url, cache_dir)
Expand All @@ -243,7 +248,7 @@ def _from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
"- or a correct model-identifier of community-contributed pretrained models,\n"
"- or the correct path to a directory containing relevant modeling files(model_weights and model_config).\n"
)

if os.path.exists(resolved_vocab_file):
model_class = cls._get_model_class_from_config(pretrained_model_name_or_path, resolved_vocab_file)
logger.info(f"We are using {model_class} to load '{pretrained_model_name_or_path}'.")
Expand Down
2 changes: 1 addition & 1 deletion paddlemix/examples/eva02/run_eva02_finetune_dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ class FinetuneArguments(TrainingArguments):
)
fp16: bool = field(
default=False,
metadata={"help": "Whether to use fp16 (mixed) precision instead of 32-bit"},
metadata={"help": "Whether to Use fp16 (mixed) precision instead of 32-bit"},
)
fp16_opt_level: str = field(
default="O1",
Expand Down
2 changes: 1 addition & 1 deletion paddlemix/examples/eva02/run_eva02_pretrain_dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ class PretrainArguments(TrainingArguments):
)
fp16: bool = field(
default=False,
metadata={"help": "Whether to use fp16 (mixed) precision instead of 32-bit"},
metadata={"help": "Whether to Use fp16 (mixed) precision instead of 32-bit"},
)
fp16_opt_level: str = field(
default="O1",
Expand Down
1 change: 1 addition & 0 deletions paddlemix/models/llava/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@
class LlavaConfig(LlamaConfig):
model_type = "llava"
mm_patch_merge_type = "spatial_unpad"
use_cachekv_int8 = None
2 changes: 1 addition & 1 deletion paddlemix/tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3" paddlemix/tools/supervi
注:使用lora训练后,需要合并lora参数,我们提供LoRA参数合并脚本,可以将LoRA参数合并到主干模型并保存相应的权重。命令如下:

```bash
python paddlemix/paddlemix/tools/merge_lora_params.py \
python paddlemix/tools/merge_lora_params.py \
--model_name_or_path qwen-vl/qwen-vl-chat-7b \
--lora_path output_qwen_vl\
--merge_model_path qwen_vl_merge
Expand Down
2 changes: 1 addition & 1 deletion paddlemix/tools/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ python paddlemix/tools/supervised_finetune.py paddlemix/config/qwen_vl/lora_sft
Note: After training with LoRA, it's necessary to merge the LoRA parameters. We provide a script for merging LoRA parameters, which combines the LoRA parameters into the main model and saves the corresponding weights. The command is as follows:

```bash
python paddlemix/paddlemix/tools/merge_lora_params.py \
python paddlemix/tools/merge_lora_params.py \
--model_name_or_path qwen-vl/qwen-vl-chat-7b \
--lora_path output_qwen_vl\
--merge_model_path qwen_vl_merge
Expand Down
Loading