This code accompanies the paper "Deep Learning for Cognate Identification". It presents a neural model for cognate identification that incorporates information from a character-level CNN language model trained on cognate pairs. The model identifies cognates in a dataset of Romance and Germanic word pairs that have undergone a variety of phonological, orthographic, and morphological changes. The model also successfully transfers this knowledge to the task of identifying cognates between Spanish and Basque, a previously unseen low-resource language.
A new dataset of Basque-Spanish cognates is also available under data/basque_spanish_cognates.txt
This project is the work of Eve Fleisig under the advice of Professor Christiane Fellbaum and was created for an independent work paper in Spring 2020.