Skip to content

Latest commit

 

History

History
64 lines (41 loc) · 3.93 KB

README.md

File metadata and controls

64 lines (41 loc) · 3.93 KB

Machine Translation Testing

license

This project provides a toolkit for automated machine translation testing, which is the first step towards robust and practical machine translation. By applying the toolkit, users can automatically find translation errors caused by any machine translation models.

🔭 If you use any of our tools or datasets in your research for publication, please kindly cite the following papers.

Input:

A list of sentence in source language. For example:

Sentences:
I live on campus with smart people.
I like basketball.

Output:

A list of suspicious issues. Each issue contains a pair of sentences in source language and their translations. For example (en->zh):

Suspicious issue 1:
I live on campus with smart people. -> 我和聪明的人住在校园里。(Meaning: I live on campus with smart people.)
I live on campus with tall people. -> 我住在校园里,身材高大。(Meaning: I live on campus, I am tall.)
Suspicious issue 2:
I like hiking. -> 我喜欢徒步。(Meaning: I like hiking.)
I hate hiking. -> 我喜欢徒步。(Meaning: I like hiking.)

Note the meanings of the translations (in parentheses) are not part of the output and they are presented here for clarity.

Testing approach currently available:

Tools References
SIT [ICSE'20] Structure-Invariant Testing for Machine Translation, by Pinjia He, Clara Meister, Zhendong Su.
TransRepair [ICSE'20] Automatic Testing and Improvement of Machine Translation, by Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang.
PatInv [ESEC/FSE'20] Machine Translation Testing via Pathological Invariance, by Shashij Gupta, Pinjia He, Clara Meister, Zhendong Su.
SemMT [arXiv'20] SemMT: A Semantic-based Testing Approach for Machine Translation Systems, by Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-Chi Cheung.
RTI [ICSE'21] Testing Machine Translation via Referential Transparency, by Pinjia He, Clara Meister, Zhendong Su.

Get started

Code organization:

  • code: the machine translation testing package
  • dataset: seed sentences for translation testing.

APIs provided by Google and Microsoft:

  • For Google Translate, useful tips could be found here: [G1][G2].
  • For Bing Microsoft Translator, after getting your API key string, you need to copy that into the apikey string (e.g., for SIT). Useful tips: [B1][B2]
  • If you intend to test Bing Microsoft Translator using your own approach, the bingtranslate method can be utilized.

Feedback

For any questions or feedback, please post to the issue page.