Img2Text

Image Caption Generator

Using CNN-RNN Merge Architecture with Glove Embedding

Let's take a look:


man in black jacket is standing in front of large building	little girl in pink shirt is sitting in front of rainbow painting	group of people are standing in front of fence

Model and Hyperparameter:


CNN	InceptionV3
RNN	CuDNNLSTM
Word Embedding	Glove
Loss	sparse_categorical_crossentropy
Optimizer	Adam
Embedding Dimension	300
Embedding Trainable	True
Layer Size	256
Dropout Rate	0.5
Max Epochs	20
Early stopping	monitor='val_loss', min_delta=0.01, patience=10
Model Checkpoint	monitor='val_loss', save_best_only=True
Batch Size	2048

Performance:

1. Crossentropy Loss (Lower the better)

	With Glove	Without Glove
Train	2.6006	2.6338
Dev	3.0556	3.1157

2. CIDEr Score (Test Set)

		CIDEr
With Glove	Greedy Search	0.44643053
With Glove	Beam Search (B = 3)	0.48076379
Without Glove	Greedy Search	0.46058652
Without Glove	Beam Search (B = 3)	0.49261228

3. BLEU Score (Test Set)

		BLEU-1	BLEU-2	BLEU-3	BLEU-4
With Glove	Greedy Search	0.594145	0.373877	0.242624	0.152859
With Glove	Beam Search (B = 3)	0.604485	0.393628	0.267615	0.177017
Without Glove	Greedy Search	0.612248	0.391558	0.254003	0.16078
Without Glove	Beam Search (B = 3)	0.619149	0.405042	0.27374	0.179306

Conclusion:

Use InceptionV3 to encode image.
Beam Search is better then Greedy Search by 10% on both CIDEr and BLEU Score.
Use large Batch with early stopping and checkpoints can achieve lower crossentropy loss
Fine-tune a pre-trained embedding maynot always gives better result.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
old notebook		old notebook
personal_test_image		personal_test_image
result_imgs		result_imgs
.gitignore		.gitignore
README.md		README.md
best_model_with_glove.h5		best_model_with_glove.h5
best_model_without_glove.h5		best_model_without_glove.h5
img2text_8k_no_glove.ipynb		img2text_8k_no_glove.ipynb
img2text_8k_with_glove.ipynb		img2text_8k_with_glove.ipynb
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Img2Text

Image Caption Generator

Let's take a look:

Model and Hyperparameter:

Performance:

1. Crossentropy Loss (Lower the better)

2. CIDEr Score (Test Set)

3. BLEU Score (Test Set)

Conclusion:

About

Releases

Packages

Languages

DavidMouse1118/Img2Text

Folders and files

Latest commit

History

Repository files navigation

Img2Text

Image Caption Generator

Let's take a look:

Model and Hyperparameter:

Performance:

1. Crossentropy Loss (Lower the better)

2. CIDEr Score (Test Set)

3. BLEU Score (Test Set)

Conclusion:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages