You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some experiments that a colleague did during MaCoCu project, found that deduplication taking into account only source side or target side, improved translation quality. IIRC it was not clear what was better, to do it on the source or on the target, but both were better than deduplicating In some cases I think it was about 1 BLEU point for mid-resource languages. This probably reduces the amount of translation inconsistencies.
I couldn't found the table with the results, but I think this is worth exploring.
Maybe you are already doing this, but I was not sure. At least in the old pipeline dedupe is using the whole sentence pair.
The text was updated successfully, but these errors were encountered:
Some experiments that a colleague did during MaCoCu project, found that deduplication taking into account only source side or target side, improved translation quality. IIRC it was not clear what was better, to do it on the source or on the target, but both were better than deduplicating In some cases I think it was about 1 BLEU point for mid-resource languages. This probably reduces the amount of translation inconsistencies.
I couldn't found the table with the results, but I think this is worth exploring.
Maybe you are already doing this, but I was not sure. At least in the old pipeline
dedupe
is using the whole sentence pair.The text was updated successfully, but these errors were encountered: