Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine translation metrics #1891

Open
mahmoudaymo opened this issue Dec 5, 2024 · 2 comments
Open

Machine translation metrics #1891

mahmoudaymo opened this issue Dec 5, 2024 · 2 comments

Comments

@mahmoudaymo
Copy link

Machine translation metrics such as Bleu or PED are usually computed at corpus level meaning that to evaluate the program we need to feed it all the Examples or Devset instead of one example at a time. How can we use these metrics in DSPY

@chenmoneygithub
Copy link
Collaborator

I don't fully understand the issue, computing BLEU in DSPy shouldn't be different from other cases. You can define any custom metric that takes in the train/dev/test data, and the output from DSPy program: https://dspy.ai/cheatsheet/#dspy-metrics

@okhat okhat closed this as completed Jan 3, 2025
@okhat okhat reopened this Jan 3, 2025
@mahmoudaymo
Copy link
Author

In the example here https://dspy.ai/cheatsheet/#dspy-metrics, the evaluation metrics function takes a single model prediction and ground truth.


def gsm8k_metric(gold, pred, trace=None) -> int:
    return int(parse_integer_answer(str(gold.answer))) == int(parse_integer_answer(str(pred.answer)))

To compute NMT metric at corpus level we will be evaluating many sentences at the same time as follow:

from sacrebleu.metrics import BLEU, CHRF, TER

sys = ['The dog bit the man.', "It wasn't surprising.", 'The man had just bitten him.']
refs = [['The dog bit the man.', 'It was not unexpected.', 'The man bit him first.']]

bleu = BLEU()
bleu.corpus_score(sys, refs).socer

chrf = CHRF()
chrf.corpus_score(sys, refs).score

In the above example the corpus is three sentences and I am not computing the metric for each sentence but for the whole corpus.

I hope this made it clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants