这是一个区别于automatic1111 webui,对开发者更友好的lora训练,或者说虚拟idol训练
展示一下我用少量迪丽热巴照片训练的lora效果,一个欧美混血热巴
pip install -r requirements.txt
git lfs install
# blip 模型
wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth -P ./pretrained_models
# bert-base-uncased
cd pretrained_models
git clone https://huggingface.co/bert-base-uncased
# diffusion base model
# 我选用的是chilloutmix_NiPrunedFp32Fix
git clone https://huggingface.co/naonovn/chilloutmix_NiPrunedFp32Fix
# safetenosor模型转换
cd ..
python process/convert_original_stable_diffusion_to_difdusers.py \
--checkpoint_path ./pretrained_models/chilloutmix_NiPrunedFp32Fix/chilloutmix_NiPrunedFp32Fix.safetensors \
--dump_path ./pretrained_models/chilloutmixNiPruned_Tw1O --from_safetensors
- huggingface数据[option]
pokemon数据为例
# 下载数据
mkdir -p dataset
cd dataset
git clone https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions/
- 用户数据[option]
单张图片的lora训练
# 图片文本获取
python process/run_caption.py --img_base ./dataset/custom
# 将a woman 替换成<dlrb>
python process/change_txt.py --img_base ./dataset/custom --ori_txt 'a woman' --new_txt "<dlrb>"
参数调整self.custom = True为True使用用户数据,False使用huggingfaec数据
--train_text_encoder # 开启text_encoder lora训练
--dist # 关闭DDP多机多卡训练模式
--batch_size 1 # 设置batch_size大小
# 训练脚本
python train.py --batch_size 1 --dist --train_text_encoder
python inference.py \
--mode 'lora' \
--lora_path checkpoint/Lora/000-00000600.pth \
--prompt "<dlrb>,solo, long hair, black hair, choker, breasts, earrings, blue eyes, jewelry, lipstick, makeup, dark, bare shoulders, mountain, night, upper body, dress, large breasts, ((masterpiece))" \
--outpath results/1.png \
--num_images_per_prompt 2
越少的训练图片,选取的模型迭代次数应该越小,比如单张图训练选1000左右,10张图训练选2500左右
新增controlnet转换,参考Here
- 下载原始模型v1-5-pruned.ckpt,control_sd15_openpose.pth到pretrained_models中
- 将自有基础模型转换成controlnet形式
python process/tool_transfer_control.py \
--path_input pretrained_models/chilloutmix_NiPrunedFp32Fix/chilloutmix_NiPrunedFp32Fix.safetensors \
--path_output pretrained_models/chilloutmix_control.pth
- controlnet转成diffuser形式
python process/convert_controlnet_to_diffusers.py \
--checkpoint_path pretrained_models/chilloutmix_control.pth \
--original_config_file model/third/cldm_v15.yaml \
--dump_path pretrained_models/chilloutmix_control --device cuda
- 下载openpose模型body_pose_model.pth,hand_pose_model.pth到pretrained_models/openpose下
- 推理
python inference.py \
--mode 'control' \
--lora_path checkpoint/Lora/000-00000600.pth \
--control_path pretrained_models/chilloutmix_control \
--pose_img assets/pose.png \
--prompt "<dlrb>,solo, long hair, black hair, choker, breasts, earrings, blue eyes, jewelry, lipstick, makeup, dark, bare shoulders, mountain, night, upper body, dress, large breasts, ((masterpiece))" \
--outpath results/1.png \
--num_images_per_prompt 2
- 下载模型
cd pretrained_models
git clone https://huggingface.co/runwayml/stable-diffusion-inpainting
# 下载parsing模型
wget https://github.com/LeslieZhoa/LVT/releases/download/v0.0/face_parsing.pt -P pretrained_models
- 推理
python inference.py \
--mode 'inpait' \
--inpait_path pretrained_models/stable-diffusion-inpainting \
--mask_area all \
--ref_img assets/ref.png \
--prompt "green hair,short hair,curly hair, green hair,beach,seaside" \
--outpath results/1.png \
--num_images_per_prompt 2
inpaiting更加丝滑
- 下载adapter模型
wget https://huggingface.co/TencentARC/T2I-Adapter/resolve/main/models/t2iadapter_seg_sd14v1.pth -P pretrained_models
- 推理
python inference.py \
--mode 't2iinpait' \
--ref_img assets/t2i-input.png \
--mask assets/t2i-mask.png \
--adapter_mask assets/t2i-adapter.png \
--prompt "green hair,curly hair, green hair,beach,seaside" \
--outpath results/1.png \
--num_images_per_prompt 2
- 模型下载
cd pretrained_models
git clone https://huggingface.co/timbrooks/instruct-pix2pix
- 推理
python inference.py \
--mode 'instruct' \
--ref_img assets/t2i-input.png \
--prompt "turn her face to comic style" \
--neg_prompt None \
--image_guidance_scale 1 \
--outpath results/1.png \
--num_images_per_prompt 1
模型主要来源于FaceVid2Vid增加了512高清清晰度
wget https://github.com/LeslieZhoa/Simple-Lora/releases/download/v0.0/script.zip
unzip script.zip && rm -rf script.zip
python script/run.py --input assets/6.png
ffmpeg -r 25 -f image2 -i results/%06d.png -vcodec libx264 11.mp4
11.mp4
https://github.com/huggingface/diffusers
https://github.com/AUTOMATIC1111/stable-diffusion-webui
https://github.com/salesforce/BLIP
https://github.com/haofanwang/Lora-for-Diffusers
https://github.com/lllyasviel/ControlNet
https://github.com/haofanwang/ControlNet-for-Diffusers
https://github.com/haofanwang/T2I-Adapter-for-Diffusers
https://github.com/TencentARC/T2I-Adapter
https://github.com/HimariO/diffusers-t2i-adapter
https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis