Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding semantic correctness of sentences #169

Open
rphlstck opened this issue Sep 27, 2023 · 0 comments
Open

Question regarding semantic correctness of sentences #169

rphlstck opened this issue Sep 27, 2023 · 0 comments

Comments

@rphlstck
Copy link

rphlstck commented Sep 27, 2023

Excuse me if this is not the right venue to ask this question, but maybe your expertise could help me out!

Consider following candidates:
'My left ear hurts but my right eye is good'
and
'My left eye is good but my right ear hurts.'

Now, I calculate the BERTScore to my ground truth:
'My right ear hurts but my left eye is good.'

I would expect that the second sentence yields a greater BERTScore as it more correctly captures the underlying information of the ground truth sentence. However, this would require the score to operate on a higher abstraction than on the token level (right?).
Is there a way to adapt the BERTScore to achieve this? Or are you aware of a metric capable of capturing the meaning of a sentence?

These are the results I get:

Reference: My right ear hurts but my left eye is good.
Candidate:
My left ear hurts but my right eye is good.
microsoft/deberta-xlarge-mnli_L40_no-idf_version=0.3.12(hug_trans=4.33.2): P=0.966590 R=0.951585 F=0.959029
Candidate:
My left eye is good but my right ear hurts.
microsoft/deberta-xlarge-mnli_L40_no-idf_version=0.3.12(hug_trans=4.33.2): P=0.895558 R=0.891666 F=0.893608

This is my code:

from typing import List
from bert_score import score

def calc_score(refs: List, cands: List):
    (P, R, F), hashname = score(cands, refs, model_type='microsoft/deberta-xlarge-mnli', return_hash=True)
    print(f'Candidate:\n{cands[0]}')
    print(
        f"{hashname}: P={P.mean().item():.6f} R={R.mean().item():.6f} F={F.mean().item():.6f}"
    )

def sentence_ordering():
    '''
    Calculate the BERTScore for two sentences. 
    In the first sentence the meaning is altered by switching "left"/"right".
    In the second sentence the meaning is preserved but the order is switched.
    '''
    cands = ['My left ear hurts but my right eye is good.', 
             'My left eye is good but my right ear hurts.']
    refs = ['My right ear hurts but my left eye is good.']

    print(f'Reference: {refs[0]}')
    for cand in cands:
        calc_score(refs, [cand])

if __name__ == "__main__":
    sentence_ordering()

Note that ChatGPT3.5 will give following answer:
Prompt:

Which of the following sentences (`Sentence 1` or `Sentence 2`) more accurately contains the information provided in the `Ground truth`?

Sentence 1: 'My left ear hurts but my right eye is good.'
Sentence 2: 'My left eye is good but my right ear hurts.'
Ground truth: 'My right ear hurts but my left eye is good.'

Answer:

Sentence 2 more accurately contains the information provided in the Ground truth. It correctly represents the information about the right ear hurting and the left eye being in good condition.
@rphlstck rphlstck changed the title Question regarding Question regarding semantic correctness of sentences Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant