-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle unglossed words? #158
Comments
Hm, the most transparent practice I've seen in this regard is using ellipsis |
That's a very reasonable solution, works for me. Should |
Yes, I would say so. After all, one of the main reasons for using ellipsis for unglossed words is that we get lists of |
Maybe we could keep some sort of backwards compatibility (with somewhat undefined bahaviour) by converting |
Quite often, people will not gloss words like person or place names or unparsable words, so some words may only be present in
Primary_Text
, but not inAnalyzed_Word
orGloss
.The most transparent way to store an example like that in CLDF is to have an empty list item in these two columns:
Primary_Text
:"x y Person z"
Analyzed_Word
:"x\ty\t\tz"
(["x","y",None,"z"]
once read by pycldf)Gloss
:"xg\tyg\t\tzg"
(["xg","yg",None,"zg"]
)This passes validation, but for example
cldf createdb
does not work (TypeError: sequence item 1: expected str instance, NoneType found
) and I've been doing things likeex["Analyzed_Word"] = ["" if x is None else x for x in ex["Analyzed_Word"]]
ininitializedb.py
scripts.Should empty items in a gloss column raise an error upon validation? If yes, is the way to handle unglossed words to simply leave them out? (i.e.
"x\ty\tz"
["x","y","z"]
)? Or, if empty items are allowed, would it be OK for pycldf to yield""
instead ofNone
(i.e."x\ty\t\tz"
["x","y","","z"]
)?The text was updated successfully, but these errors were encountered: