This project fine-tunes the Flux.1-dev model using LoRA (Low-Rank Adaptation) to generate Calvin and Hobbes comic strip images. The entire process includes scraping data, annotating the dataset using a vision model, and fine-tuning the model for image generation. The fine-tuned model is available here, and the dataset used for training will be published soon. You can check try this model at HuggingFace Spaces
The base model used for fine-tuning is Flux.1-dev. This model is designed to generate comic strip images, and I fine-tuned it using LoRA to create unique Calvin and Hobbes comic strip scenes.
I began by collecting comic strip images from Reddit posts using a custom Python scraper. The script fetches and downloads images from a specified Reddit user’s posts, handling pagination and filtering based on file types like .jpg
, .png
, .jpeg
, and .gif
. This process allowed me to build a comprehensive dataset for fine-tuning the model.
Once the images were downloaded, I used merge_datasets.py
to combine datasets from multiple sources if needed. This script also removed duplicate images to ensure the dataset was clean and ready for annotation.
After preparing the dataset, I annotated the images using the LLava:13b vision model through Ollama. The annotate_dataset.py
script generated textual descriptions of each image, focusing on the interaction between characters in the comic strips. This annotation helped the model understand the content of the images for training.
- Input Image: Calvin and Hobbes fishing.
- Annotated Description: Calvin and Hobbes attempt to catch a fish while boating together.
The annotated dataset was saved in a CSV file (image_descriptions.csv
) for further processing.
I ran txt_prep.py
to convert the annotated dataset into the required format. This script generated individual text files for each image, providing detailed descriptions needed for fine-tuning the Flux.1-dev model.
To fine-tune the model, the AI Toolkit repository (with submodules) is already included in this repository, so there's no need to clone it again.
To set up the environment:
cd ComicStrips-LoRA/
python -m venv venv
source venv/bin/activate
pip install torch
pip install -r requirements.txt
pip install --upgrade accelerate transformers diffusers huggingface_hub
Create a new folder in the root of the repository called dataset
. Move the .jpg
, .jpeg
, .png
images and their corresponding .txt
files generated in the previous steps into this folder.
Login to Hugging Face and request access to the Flux.1-dev model:
- Get a READ token from Hugging Face.
- Request access to Flux.1-dev.
- Run the following command and paste the access token:
huggingface-cli login
I edited a configuration file for fine-tuning:
- Copy an example config file from
config/examples/
to theconfig/
folder and rename it (e.g.,calvin_and_hobbes_finetune.yml
). - Edit the config file:
- Set
folder_path: "/path/to/your/dataset/folder"
to the actual dataset folder path.
- Set
- Start the fine-tuning process:
python run.py config/calvin_and_hobbes_finetune.yml
The training will begin, and the model will fine-tune on the custom dataset.
The dataset used for this project is avaiable .here
Here are some sample GIFs that show how the model was converging:
-
Calvin and Hobbes swimming in the pool:
-
Calvin and Hobbes climbing the tree:
-
Calvin and Hobbes talking about quitting school:
-
Calvin and Hobbes eating dinner:
-
Calvin and Hobbes go shopping:
Special thanks to the creators of the AI Toolkit for providing the tools necessary to fine-tune the Flux.1-dev model using LoRA.