Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning Proto-Forms against their reflexes #12

Open
LinguList opened this issue Nov 11, 2014 · 2 comments
Open

Aligning Proto-Forms against their reflexes #12

LinguList opened this issue Nov 11, 2014 · 2 comments

Comments

@LinguList
Copy link

Now that I added the proto-forms as "simple languages" (language *PT in the Edictor), all proto-forms should be aligned to their reflexes. In this way, we can later on check and model how the proto-forms changed into the reflexes. We can use this to test,

  • how well the proto-language can predict the daughter languages, and
  • which sound changes frequently occur in the data, and
  • which cases of sound changes needed to model the data are problematic

For all of this, we'll need the alignments.

@thiagochacon
Copy link

Great. I think I can add here that suggestion I made by email. I suggested we should try to "hierarchize" the proto-form with the descendant forms. This could be helpful for two main problematic cases in the alignment: metathesis, phonological splits and mergers.

If we work with some sort of hierarchy, we could link the particular reflexes with a proto-form cell (i.e reconstructed sound). The normal/unmarked situation could be handled with the alignment proper. Otherwise, we could link a particular reflex to one or more proto-form cells.

Suppose we have the following scenario
Proto-L XYZ
L1 XYW
L2 XZY
L3 XYAB
L4 XT

In this scenario L1 W would be aligned, thus automatically linked with PL Z
L2 Z would be linked to PL Z.
L3 AB would be linked to PL Z.
L4 T would be linked to PL YZ.

Do you think this would be a good idea? How far/close are we to manage that with the current status of the alignment tech?

@LinguList
Copy link
Author

Easiest and most straightforward approach here is to add another column containing the "linking". This would start from the proto-form in it's tokenized representation (that is, the "TOKENS" columns). Now, we could use some easy-to-define markup in which for each reflex the relation to the proto-form is defined. This would come close to Pauls solution he presented.

A possible example for markup would be:

PROTO X/1 Y/2 Z/3
L1 X/1 Y/2 W/3
L2 X/1 Z/3 Y/2
L3 X/1 Y/2 A/3 B/3
L4 X/1 T/2,3 

Here, numbers in reflexes refer to numbers defined in Proto-Forms.

I could also write a tool similar to the alignment editor which would display these internal formats nicely or allow for quick editing.

But before starting to work on technical solutions here, I suggest we use this issue to collect the cognate sets where such a representation is actually needed. If, in the end, it is only two cases or so, we might come up with an easier solution. If not, the examples will help us to identify which functionality we need in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants