-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an Example for BLEU. #806
base: master
Are you sure you want to change the base?
Conversation
I would like to inform you that I have made some updates to the content. Firstly, I have added an example for BLEU which was not present before. This new example serves as an illustration of the concept and should be helpful for those who are new to this topic. Furthermore, I have also fixed the URLs by wrapping them in markdown. This should make it easier for readers to access the resources mentioned in the content.
@abheesht17 I have deleted the previous pull request because I messed up with it. Please review this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Most of them our minor, but there is one major comment regarding rank of inputs!
keras_nlp/metrics/bleu.py
Outdated
@@ -4,7 +4,7 @@ | |||
# you may not use this file except in compliance with the License. | |||
# You may obtain a copy of the License at | |||
# | |||
# https://www.apache.org/licenses/LICENSE-2.0 | |||
# https://www.apache.org/licenses/LICENSE-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove extra spaces.
keras_nlp/metrics/bleu.py
Outdated
>>> bleu = keras_nlp.metrics.Bleu(max_order=4) | ||
>>> ref_sentence = "the quick brown fox jumps over the lazy dog" | ||
>>> pred_sentence = "the quick brown fox jumps over the box" | ||
>>> score = bleu([ref_sentence], [pred_sentence]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we mention that this example is with Python string inputs, this should ideally be
bleu([ref_sentence], pred_sentence)
keras_nlp/metrics/bleu.py
Outdated
"the quick brown fox jumps over the lazy frog" | ||
] | ||
>>> pred_sentence = ["the quick brown fox jumps over the box"] | ||
>>> score = bleu(ref_sentence, pred_sentence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of assigning the score to a variable, let's just have
>>> bleu(ref_sentence, pred_sentence)
<tf.Tensor(0.7420885, shape=(), dtype=float32)>
Please do the same in other places as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the above example works, the ideal input formula is:
rank(ref_sentence) = rank(pred_sentence) + 1
I think we should probably stick to this formula in our examples. Otherwise, it becomes confusing for the reader. Could you please make this change everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the above example works, the ideal input formula is:
rank(ref_sentence) = rank(pred_sentence) + 1
I think we should probably stick to this formula in our examples. Otherwise, it becomes confusing for the reader. Could you please make this change everywhere?
Correct me if I am wrong, for this, do I need to change blue(ref_sentence, pred_sentence) to bleu([ref_sentence], pred_sentence)?
keras_nlp/metrics/bleu.py
Outdated
c. RaggedTensor. | ||
>>> bleu = keras_nlp.metrics.Bleu(max_order=4) | ||
>>> ref_sentence = tf.ragged.constant([ | ||
[ | ||
"the quick brown fox jumps over the lazy dog", | ||
"the quick brown fox jumps over the lazy frog" | ||
] | ||
]) | ||
>>> pred_sentence = tf.ragged.constant([ | ||
["the quick brown fox jumps over the box"] | ||
]) | ||
>>> score = bleu(ref_sentence, pred_sentence) | ||
<tf.Tensor(0.7420885, shape=(), dtype=float32)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we mention that we are using ragged tensors here, it would be great if we could actually have a tensor with variable dimension along an axis. Also, I think your example will throw an error; pred_sentence
should be a dense tensor here, not a ragged tensor.
So, something like:
>>> bleu = keras_nlp.metrics.Bleu(max_order=4)
>>> ref_sentence = tf.ragged.constant([
... ["the quick brown fox jumps over the lazy dog"],
... ["the quick brown fox jumps over the lazy dog", "the quick brown fox jumps over the lazy frog"]
... ])
>>> pred_sentence = tf.constant([
... ["the quick brown fox jumps over the box"],
... ["the quick brown fox jumps over the box"]
... ])
>>> bleu(ref_sentence, pred_sentence)
<tf.Tensor: shape=(), dtype=float32, numpy=0.7420885>
>>> score = bleu([ref_sentence], [pred_sentence]) | ||
<tf.Tensor(0.7420885, shape=(), dtype=float32)> | ||
1.2. rank 1 inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change these to
rank 1 prediction inputs, rank 2 reference inputs
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the Rouge L metric documentation, where this same convention is used. Is it okay to make these changes?
@mattdangerw - for context, this is the previous PR: #799 |
Minor improvements are done!
Hi @abheesht17 and @mattdangerw, I made the minor changes as you suggested. Thank you for your feedback! However, I am not sure about the part where you mentioned changing the rank of inputs. Could you please provide more details or clarify what you meant by that? |
I would like to inform you that I have made some updates to the content. Firstly, I have added an example for BLEU which was not present before. This new example serves as an illustration of the concept and should be helpful for those who are new to this topic.
Furthermore, I have also fixed the URLs by wrapping them in markdown. This should make it easier for readers to access the resources mentioned in the content.