Skip to content

A toolkit for testing machine translation [ICSE'20, '21, ESEC/FSE'20]

License

Notifications You must be signed in to change notification settings

RobustNLP/TestTranslation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Translation Testing

license

This project provides a toolkit for automated machine translation testing, which is the first step towards robust and practical machine translation. By applying the toolkit, users can automatically find translation errors caused by any machine translation models.

🔭 If you use any of our tools or datasets in your research for publication, please kindly cite the following papers.

Input:

A list of sentence in source language. For example:

Sentences:
I live on campus with smart people.
I like basketball.

Output:

A list of suspicious issues. Each issue contains a pair of sentences in source language and their translations. For example (en->zh):

Suspicious issue 1:
I live on campus with smart people. -> 我和聪明的人住在校园里。(Meaning: I live on campus with smart people.)
I live on campus with tall people. -> 我住在校园里,身材高大。(Meaning: I live on campus, I am tall.)
Suspicious issue 2:
I like hiking. -> 我喜欢徒步。(Meaning: I like hiking.)
I hate hiking. -> 我喜欢徒步。(Meaning: I like hiking.)

Note the meanings of the translations (in parentheses) are not part of the output and they are presented here for clarity.

Testing approach currently available:

Tools References
SIT [ICSE'20] Structure-Invariant Testing for Machine Translation, by Pinjia He, Clara Meister, Zhendong Su.
TransRepair [ICSE'20] Automatic Testing and Improvement of Machine Translation, by Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang.
PatInv [ESEC/FSE'20] Machine Translation Testing via Pathological Invariance, by Shashij Gupta, Pinjia He, Clara Meister, Zhendong Su.
SemMT [arXiv'20] SemMT: A Semantic-based Testing Approach for Machine Translation Systems, by Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-Chi Cheung.
RTI [ICSE'21] Testing Machine Translation via Referential Transparency, by Pinjia He, Clara Meister, Zhendong Su.

Get started

Code organization:

  • code: the machine translation testing package
  • dataset: seed sentences for translation testing.

APIs provided by Google and Microsoft:

  • For Google Translate, useful tips could be found here: [G1][G2].
  • For Bing Microsoft Translator, after getting your API key string, you need to copy that into the apikey string (e.g., for SIT). Useful tips: [B1][B2]
  • If you intend to test Bing Microsoft Translator using your own approach, the bingtranslate method can be utilized.

Feedback

For any questions or feedback, please post to the issue page.

About

A toolkit for testing machine translation [ICSE'20, '21, ESEC/FSE'20]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages