The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing (CVPR 2024)
The task of manipulating real image attributes through StyleGAN inversion has been extensively researched. This process involves searching latent variables from a well-trained StyleGAN generator that can synthesize a real image, modifying these latent variables, and then synthesizing an image with the desired edits. A balance must be struck between the quality of the reconstruction and the ability to edit. Earlier studies utilized the low-dimensional W-space for latent search, which facilitated effective editing but struggled with reconstructing intricate details. More recent research has turned to the high-dimensional feature space F, which successfully inverses the input image but loses much of the detail during editing. In this paper, we introduce StyleFeatureEditor -- a novel method that enables editing in both w-latents and F-latents. This technique not only allows for the reconstruction of finer image details but also ensures their preservation during editing. We also present a new training pipeline specifically designed to train our model to accurately edit F-latents. Our method is compared with state-of-the-art encoding approaches, demonstrating that our model excels in terms of reconstruction quality and is capable of editing even challenging out-of-domain examples.
SFE is able to edit a real face image with the desired editing. It first reconstructs (inverts) the original image and then edits it according to the chosen direction. On the left is an examples of how our method works for several directions with different editing power p. On the right we display a comparison with previous approaches. LPIPS (lower is better) indicates inversion quality, while FID (lower is better) indicates editing ability. The size of markers indicates the inference time of the method, with larger markers indicating a higher time.
18.06.2024
: StyleFeatureEditor release
15.07.2024
: Add gradio demo
20.07.2024
: Add DeltaEdit editings
02.08.2024
: Add image unalignment
- Linux or macOS
- NVIDIA GPU + CUDA CuDNN
- CMAKE
- Python 3.10
- Clone this repo:
git clone https://github.com/AIRI-Institute/StyleFeatureEditor
cd StyleFeatureEditor
- Install the environment:
Step 1, create new conda environment:
conda create -n sfe python=3.10 -y
source deactivate
conda activate sfe
Step 2, install all necessary libraries via script:
bash env_install.sh
- Download pretrained models:
git clone https://huggingface.co/AIRI-Institute/StyleFeatureEditor
cd StyleFeatureEditor && git lfs pull && cd ..
mv StyleFeatureEditor/pretrained_models pretrained_models
rm -rf StyleFeatureEditor
By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models
. However, you can use your own paths by changing the necessary values in configs/paths.py.
- Download full weights [optional]:
Weights of Inverter (result of 1 phase) and and Feature Editor (result of 2 phase) stored in pretrained_models/sfe_inverter_light.pt
and pretrained_models/sfe_editor_light.pt
respectively. If you need full checkpoints including weights of all parts of our pipeline (discriminator, optimisers, etc.), you can download them manually from Google Drive:
Path | Description |
---|---|
SFE Editor | SFE trained both phases on FFHQ dataset. |
SFE Inverter | SFE Inverter trained on FFHQ dataset first phase only. |
Examples of how our method works on several real images. You can find inference of these examples in our Google Colab notebook below.
We provide a Jupiter Notebook that demonstrates the workings of our method. It includes downloading all the necessary components, running our method on several examples and creating a gif.
If you need to edit single or several images, you can use SimpleRunner
from runners/simple_runner.py. You need to initialize it with the path to the sfe checkpoints. To edit the image you need to use .edit()
method and pass path to the input image, name of desired editing, power of desired editing and path where to save edited image:
from runners.simple_runner import SimpleRunner
runner = SimpleRunner(
editor_ckpt_pth="pretrained_models/sfe_editor_light.pt"
)
# Inference
result = runner.edit(
orig_img_pth="path/to/original/image.jpg",
editing_name="editing_name",
edited_power=1.0,
save_pth="path/to/save/edited/image.jpg",
align=False
)
You can find all available directions in available_directions.txt or by running:
print(runner.available_editings())
- Alignment
If you want to edit raw image, do not forget to align it and resize it to 1024 x 1024 by passing align=True
to runner.edit(...)
. Alignment means that the face is cropped from the original image. If you are using SimpleRunner, the edited image is automatically inserted into the original one, and can be found in the save_pth
parent directory with postscript "_unaligned"
.
- Masking
If during editing some artefacts appear on the background, or wrong parts are being edited to avoid this you could use image mask to choose, which regions of the image should be edited -- just pass use_mask=True
in runner.edit(...)
. By default we use FARL to separate the face zone from the background and leave the background unedited. You can control which part of the image counts as a background by passing mask_trashold=0.35
in runner.edit(...)
-- the more mask_trashold
, the more is background part.
result = runner.edit(
orig_img_pth="path/to/original/image.jpg",
editing_name="editing_name",
edited_power=1.0,
save_pth="path/to/save/edited/image.jpg",
use_mask=True,
mask_trashold=0.995
)
After using default masker, it saves the obtained mask, cropped face, and cropped background in directory where save_pth
is stored. If you need some specific regions to be unedited, you could pass your own mask. Just specify the path to it by passing mask_path="path/to/mask.jpg"
in runner.edit(...)
:
result = runner.edit(
orig_img_pth="path/to/original/image.jpg",
editing_name="editing_name",
edited_power=1.0,
save_pth="path/to/save/edited/image.jpg",
use_mask=True,
mask_path="path/to/mask.jpg"
)
- Script
You could also use a script with the same syntaxis:
python scripts/simple_inference.py \
--orig_img_pth=path/to/original/image.jpg \
--editing_name=editing_name \
--edited_power=1.0 \
--save_pth=path/to/save/edited/image.jpg \
--align \
--use_mask \
--mask_trashold=0.995 \
--mask_path=path/to/mask.jpg
If you need to inference a large set of images rather than a single image, you can use scripts/inference.py.
First, you need to select powers and directions you want to infer and pass them to configs/fse_inference.yaml as editings_data
argument (json dict-like format as in original config). Then you need to run a script:
python scripts/inference.py \
exp.config_dir=configs \
exp.config=fse_inference.yaml \
model.checkpoint_path="path/to/sfe/checkpoint" \
data.inference_dir="path/to/input/dir" \
exp.output_dir="path/where/to/save/results"
Remember that the input data should be aligned. If you are using a custom dataset (not FFHQ or CelebaHQ), do not forget to align it first.
- Inversion metrics
To calculate inversion metrics, you could use the script scripts/calculate_metrics.py:
python scripts/calculate_metrics.py \
--orig_path="path/to/original/aligned/images/dir" \
--reconstr_path="path/to/reconstructed/images/dir" \
--metrics fid l2 lpips
Available metrics are l2
, lpips
, fid
, id
, id_vit
and msssim
, more details can be found in metrics/metrcis.py. Metric names in --metrics
should be separated by spaces. If you need to save information about metric values of particular images, you can add --metrics_dir "path/where/to/save/metrics"
to arguments, this information will be saved in json format.
- Editing metric
To calculate editing metric (described in the paper) we assume that you have a dataset\subset of original CelebaHQ Images and its edited version (e.g. obtained by running scrpits/inference.py
of our method). You will need to use the following script:
python scripts/fid_calculation.py \
--orig_path="path/to/original/celeba/images/dir" \
--synt_path="path/to/edited/celeba/images/dir" \
--attr_name=Eyeglasses
Attribute name should be one of the names listed in the CelebAMask-HQ-attribute-anno.txt. If the selected attribute was not added but removed during editing, pass --attr_is_reversed
flag.
We use OmegaConf package to manage our configs. All configs can be found in configs/ directory. You can change them according to the lists of all arguments, stored in arguments/ directory. In addition, if you are using the script, you can change arguments directly on the command line (see examples below in section Scripts' ).
For each experiment you need to pass the path to the config directory exp.config_dir
, the name of the .yaml config exp.config
and the name of the experiment exp.name
. The directory associated with exp.name
will be created in exp.exp_dir
and all necessary results will be stored in it.
You will also need to pass path to the datasets. Pass path to the training dataset via data.input_train_dir
, path to the validation images via data.input_train_dir
. All inversion metrics will be calculated on the validation dataset. When using custom datasets, remember that all data should be aligned.
To track our experiments we use Weights & Biases (option exp.wandb
which is True
by default). It will log repository code (at the start of the training), passed config, metrics, losses and inversion of several selected aligned images (you need to pass path to them in data.special_dir
). If you are using W&B, do not forget to put your W&B API key into the WANDB_KEY
environment variable.
To reimplement results of our paper, you could use default configs from configs/.
- Stage 1 This stage is related to training Inverter. To start stage use:
python3 scripts/train.py \
exp.config_dir=configs \
exp.config=fse_inverter_train.yaml \
exp.name=fse_inverter_train \
data.input_train_dir=path/to/train/images \
data.input_val_dir=path/to/validation/images \
data.special_dir=path/to/several/special/images
- Stage 2 This stage is related to training Feature Editor. To start stage use:
python3 scripts/train.py \
exp.config_dir=configs \
exp.config=fse_editor_train.yaml \
exp.name=fse_editor_train \
methods_args.fse_full.inverter_pth=path/to/trained/inverter.pt \
data.input_train_dir=path/to/train/images \
data.input_val_dir=path/to/validation/images \
data.special_dir=path/to/several/special/images \
train.start_step=300001
If you are using W&B, it is better to pass train.start_step
according to the last training step of Inverter to get a better visualisation of the inversion metrics.
- Training stage 1
The Inverter training pipeline. Input image
- Training stage 2 and Inference
The Feature Editor training pipeline and inference. To obtain editing loss, one need to synthesize training samples:
ποΈTraining Runner # Training Runner responsible for ...
βββ π§ _setup_device(...) # Setting pipeline device
βββ π§ _setup_experiment_dir(...) # Setting directory to save checkpoints
βββ π§ _setup_datasets(...) # Setting train\val\special datasets
βββ π§ _setup_dataloaders(...) # Setting train\val\special loaders
βββ π§ run(...) # Training loop, responsible for ...
β βββ π§ train_step(...) # Model forward, loss calulation, optimizer step and etc.
β βββ π§ validate(...) # Metrics calculation, inference special images
β βββ π§ save_..._logs(...) # Saving training\validation logs
β βββ π§ save_checkpoint(...) # Saving models, optimizers chekpoints
β
βββ ποΈ Logger # Gather all training logs
βββ ποΈ Metrics # Inversion metrics to validate
βββ ποΈ Optimizers # Encoder and Discriminator optimizers
βββ ποΈ LossBuilder # Contain all losses used for training
βββ ποΈ LatentEditor # Latent Editor
β βββ ποΈ Editing models # Contain all models for editing
β βββ π§ get_[...]_editings(...) # Editing particular directions for [...] editing method
β
βββ ποΈ Method # Method
βββ π§ load_weights(...) # Responsible for loading checkpoints
βββ π§ forward(...) # Responsible for batch inversion via Inverter
βββ ποΈ Discriminator # StyleGAN 2 Discriminator, trainable for adv loss
βββ ποΈ Decoder # StyleGAN 2 Generator, not trainable
βββ ποΈ Encoder # Trainable part, either Inverter or Feture Editor
β β β ποΈ Inverter # Pretrained module used only in second stage
.
βββ π arguments # Contains all arguments used in training and inference
βββ π assets # Folder with method preview and example images
βββ π configs # Includes configs (associated with arguments) for training and inference
βββ π criteria # Contains original code for used losses and metrics
βββ π datasets
β βββ π datasets.py # A branch of custom datasets
β βββ π loaders.py # Custom infinite loader
β βββ π transforms.py # Transforms used in SFE
β
βββ π editings # Includes original code for various editing methods and an editor that applies them
β βββ ...
β βββ π latent_editor.py # Implementation of module that edits w or stylespace latents
β
βββ π metrics # Contains wrappers over original code for all used inversion metrics
βββ π models # Includes original code from several previous inversion methods
β βββ ...
β βββ π farl # Modified FARL module, used to search face mask
β βββ π psp
β β βββ π encoders # Contains all the Inverter, Feature Editor and E4E parts
β β βββ π stylegan2 # Includes modified StyleGAN 2 generator
β β
β βββ π methods.py # Contains code for Inverter and Feature Editor modules
β
βββ π notebook # Folder for Jupyter Notebook and raw images
βββ π runners # Includes main code for training and inference pipelines
βββ π scripts # Script to ...
β βββ π align_all_parallel.py # Align raw images
β βββ π calculate_metrics.py # Inversion metrics calculation
β βββ π fid_calculation.py # Editing metric calculation
β βββ π inference.py # Inference large set of data with several directions
β βββ π simple_inference.py # Inference single image with one direction and mask
β βββ π train.py # Start training process
β
βββ π training
β βββ π loggers.py # Code for loggers used in training
β βββ π losses.py # Wrappers over used losses
β βββ π optimizers.py # Wrappers over used optimizers
β
βββ π utils # Folder with utility functions
βββ π CelebAMask-HQ-attribute-anno.txt # Matches between CelebA HQ images and attributes
βββ π available_directions.txt # Info about available editings directions
βββ π requirements.txt # Lists required Python packages
βββ π env_install.sh # Script to install necessary enviroment
The code structure of this repository is heavily based on pSp and e4e.
The project has also been inspired by a number of existing inversion techniques, using the source code of several prominent examples. These include HyperInverter, FeatureStyleEncoder and StyleRes.
If you use this code for your research, please cite our paper:
@InProceedings{Bobkov_2024_CVPR,
author = {Bobkov, Denis and Titov, Vadim and Alanov, Aibek and Vetrov, Dmitry},
title = {The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {9337-9346}
}