Machine translation metrics #1891

mahmoudaymo · 2024-12-05T18:43:13Z

Machine translation metrics such as Bleu or PED are usually computed at corpus level meaning that to evaluate the program we need to feed it all the Examples or Devset instead of one example at a time. How can we use these metrics in DSPY

chenmoneygithub · 2024-12-11T20:53:49Z

I don't fully understand the issue, computing BLEU in DSPy shouldn't be different from other cases. You can define any custom metric that takes in the train/dev/test data, and the output from DSPy program: https://dspy.ai/cheatsheet/#dspy-metrics

mahmoudaymo · 2025-01-07T12:29:44Z

In the example here https://dspy.ai/cheatsheet/#dspy-metrics, the evaluation metrics function takes a single model prediction and ground truth.


def gsm8k_metric(gold, pred, trace=None) -> int:
    return int(parse_integer_answer(str(gold.answer))) == int(parse_integer_answer(str(pred.answer)))

To compute NMT metric at corpus level we will be evaluating many sentences at the same time as follow:

from sacrebleu.metrics import BLEU, CHRF, TER

sys = ['The dog bit the man.', "It wasn't surprising.", 'The man had just bitten him.']
refs = [['The dog bit the man.', 'It was not unexpected.', 'The man bit him first.']]

bleu = BLEU()
bleu.corpus_score(sys, refs).socer

chrf = CHRF()
chrf.corpus_score(sys, refs).score

In the above example the corpus is three sentences and I am not computing the metric for each sentence but for the whole corpus.

I hope this made it clear.

okhat closed this as completed Jan 3, 2025

okhat reopened this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine translation metrics #1891

Machine translation metrics #1891

mahmoudaymo commented Dec 5, 2024

chenmoneygithub commented Dec 11, 2024

mahmoudaymo commented Jan 7, 2025

Machine translation metrics #1891

Machine translation metrics #1891

Comments

mahmoudaymo commented Dec 5, 2024

chenmoneygithub commented Dec 11, 2024

mahmoudaymo commented Jan 7, 2025