GitHub - SailorJoe6/Agentic_Story_Book_Workflow: An agentic workflow for story book generation

中文版

Agentic Story Book Workflow

A multi-agent workflow framework for creating children's picture books based on AutoGen.

AgenticWorkflow.mp4

Agentic workflow

The code involves various multi-agent collaboration methods based on AutoGen. For example:

Initially, the User_Proxy represents the user and communicates with the Receptionist to gather user requirements.
In the subsequent two stages, the GroupChat mechanism is used, with each GroupChat having a GroupChat Manager to coordinate the speakers in the current GroupChat.
In the two GroupChats, the content creation roles (e.g., Story Editor, Storyboard Editor, Prompt Editor) are accompanied by an Agent responsible for reviewing the content. If the review is not approved, the GroupManager sends it back to the content creation Editor for revision.
The final stage of generating images/videos/PPTs is currently placed in separate code (generate.py) for ease of use and potential future adjustments to the GroupChat organization. This part is temporarily handled by an Image Creator Agent, which is an independent Agent but contains two Sub-Agents internally: an Image Generation Agent responsible for AI-based image generation and another for reviewing the generated images.

System Requirements

LLM: It is recommended to use ChatGPT-4o. The current code is tested based on the ChatGPT-4o service in Azure OpenAI. In theory, it should also support OpenAI's native services with minor configuration adjustments. Although AutoGen supports multiple LLMs, practical tests with Claude 3.5 sonnet showed that it could not strictly follow the instructions in the Prompt 100% of the time, so other LLMs are not recommended.
Text2Image: Supports DALL-E 3 and Flux Schnell from Replicate. Considering cost and speed, I ultimately chose the Flux Schnell API endpoint from Replicate because:
- Using DALL-E 3 in HD mode costs $12/100 images, meaning $0.12 per image, and each image takes more than ten seconds to generate.
- Using the Flux Schnell API service costs only $0.003 per image, with a drawing time of 1-2 seconds. From a cost and scheduling perspective, Flux Schnell seems more suitable. Even if you find the quality of the Schnell version low, using the Flux Dev version API costs only $0.03 per image (the pro version on Replicate costs $0.055, but it seems to run on CPU and is very slow, so I didn't try it). You can adjust according to your needs.
Azure account with Speech service resources enabled.

How to use

Create a Python virtual environment (tested on Python 3.11) and install dependencies:

pip install -r requirements.txt

Create a .env file, copy the contents from .env.example, and modify it with your settings. Create a story

python app.py

Generate images/videos/PPTX: First, modify the story_id in generate.py to the story ID you want to generate (obtained from the output of app.py). Then run:

python generate.py

.env configurations

Enviroment Name	Description	Default Value
AGENTOPS_API_KEY	AgentOps API Key
MODEL	deployment name on azure or model name on OpenAI
API_VERSION	API Version	'2024-06-01'
API_TYPE	'azure' or 'openai'	azure
API_KEY	API Key
BASE_URL	API base url, Azure should be like 'https://{region_name}.openai.azure.com/'
IMAGE_GENERATION_TYPE	'azure', 'openai' or 'replicate'
IMAGE_SHAPE	'landscape', 'portrait' or 'square'	landscape
DALLE_MODEL	deployment name on azure or model name on OpenAI
DALLE_API_VERSION	API Version	'2024-06-01'
DALLE_API_KEY	API Key
DALLE_BASE_URL	API base url, Azure should be like 'https://{region_name}.openai.azure.com/'
DALLE_IMAGE_QUALITY	'hd' or 'standard'	'hd'
DALLE_IMAGE_STYLE	'vivid' or 'natural'	'vivid'
REPLICATE_API_TOKEN	repilicate api key
REPLICATE_MODEL_NAME	'black-forest-labs/flux-schnell', 'black-forest-labs/flux-dev' or 'black-forest-labs/flux-pro'	'black-forest-labs/flux-schnell'
IMAGE_GENERATION_RETRIES	max retry count per image	3
IMAGE_CRITICISM_RETRIES	max critic count per image	2
IMAGE_SAVE_FAILURED_IMAGES	save the critic failed image:True or False	False
AZURE_SPEECH_KEY	Azure voice API Key
AZURE_SPEECH_REGION	Azure voice deploy region
AZURE_SPEECH_VOICE_NAME	Azure voice speaker name	'zh-CN-XiaoxiaoMultilingualNeural'

Roadmap

Add more FLUX models and channels
Improve the logic of content generation
Add "human-in-the-loop" logic during story content creation and generation
Background music

FAQ

I see that the story content in your demo is in Chinese. Does it support other languages? Yes, it does. In the prompt section for content creation, there are instructions to follow the user's requirements or the language used by the user.
What about multilingual voice support? Azure's TTS supports hundreds of languages. You just need to specify the desired language's voice name in the AZURE_SPEECH_VOICE_NAME field in the .env file (some voices support dozens of different languages).
Why are your prompts written in English? Undoubtedly, English prompts are slightly more effective than Chinese ones. A very useful tip is that there is a tool in Anthropic's Portal that helps you generate prompts. You can input your initial ideas there, and it will help you generate prompts that you only need to modify slightly before using them in your program.
The visual quality seems low There are two factors here:
- First, the test content I currently display uses the Schnell model from Flux, which is fast and cost-effective. Using the dev or pro models will undoubtedly improve the visual quality of the images. These models are not yet supported in the current code but will be added in the future.
- Second, the existing image review logic is not sufficient and has room for improvement.

Others

See some generated content demos here

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
images		images
output		output
story_book_agents		story_book_agents
.env.example		.env.example
.gitignore		.gitignore
DEMO-Results.md		DEMO-Results.md
LICENSE		LICENSE
README.md		README.md
README.zh-cn.md		README.zh-cn.md
app.py		app.py
generate.py		generate.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Story Book Workflow

Agentic workflow

System Requirements

How to use

.env configurations

Roadmap

FAQ

Others

About

Releases

Packages

Languages

License

SailorJoe6/Agentic_Story_Book_Workflow

Folders and files

Latest commit

History

Repository files navigation

Agentic Story Book Workflow

Agentic workflow

System Requirements

How to use

.env configurations

Roadmap

FAQ

Others

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages