You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Machine translation metrics such as Bleu or PED are usually computed at corpus level meaning that to evaluate the program we need to feed it all the Examples or Devset instead of one example at a time. How can we use these metrics in DSPY
The text was updated successfully, but these errors were encountered:
I don't fully understand the issue, computing BLEU in DSPy shouldn't be different from other cases. You can define any custom metric that takes in the train/dev/test data, and the output from DSPy program: https://dspy.ai/cheatsheet/#dspy-metrics
To compute NMT metric at corpus level we will be evaluating many sentences at the same time as follow:
fromsacrebleu.metricsimportBLEU, CHRF, TERsys= ['The dog bit the man.', "It wasn't surprising.", 'The man had just bitten him.']
refs= [['The dog bit the man.', 'It was not unexpected.', 'The man bit him first.']]
bleu=BLEU()
bleu.corpus_score(sys, refs).socerchrf=CHRF()
chrf.corpus_score(sys, refs).score
In the above example the corpus is three sentences and I am not computing the metric for each sentence but for the whole corpus.
Machine translation metrics such as Bleu or PED are usually computed at corpus level meaning that to evaluate the program we need to feed it all the Examples or Devset instead of one example at a time. How can we use these metrics in DSPY
The text was updated successfully, but these errors were encountered: