Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert CLDF Wordlist with cognates to old LingPy's QLC format #5

Open
xrotwang opened this issue May 16, 2018 · 5 comments
Open

Convert CLDF Wordlist with cognates to old LingPy's QLC format #5

xrotwang opened this issue May 16, 2018 · 5 comments
Labels

Comments

@xrotwang
Copy link
Contributor

The LingPy tutorial uses LingPy's old QLC format (see polynesian.tsv). We should have a recipe to convert a CLDF Wordlist into this format. Should be a csvkit one-liner.

@LinguList
Copy link
Contributor

From lingpy, it is:

>>> from lingpy.convert.cldf import from_cldf
>>> from_cldf('path').output('tsv', filename='filename', prettify=False)

@xrotwang
Copy link
Contributor Author

Yes, this would just be a "proof-of-concept" recipe, or for providing backward compatibility with earlier LingPy versions.

@LinguList
Copy link
Contributor

BTW: it's also what @thiagochacon wanted, namely that we help convert data to "edictor" format.

@Anaphory
Copy link

Anaphory commented Nov 15, 2018

If you want support for non-standard CLDF column headers, it is

>>> from lingpy import Wordlist
>>> Wordlist.from_cldf('path').output('tsv', filename='filename', prettify=False)

although that keeps the non-standard column headers and does not yet change them into the standard DOCULECT CONCEPT IPA headers that Edictor expects.

@LinguList
Copy link
Contributor

you can easily find a workaround:

wl = wordlist.from_cldf('path.json')
wl.add_entries('doculect', 'language_name', lambda x: x)
wl.add_entries('concept', 'concept_name', lambda x: x)
wl.add_entries('tokens', 'segments', lambda x: x)
wl.output('tsv', filename='bla', prettify=False, subset=True, cols=['doculect', 'concept', 'tokens'])

This is okay enough for the time being, I'd say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants