Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

z-sampa grouped superscripts fail #52

Open
mi2ebi opened this issue Jul 6, 2023 · 10 comments
Open

z-sampa grouped superscripts fail #52

mi2ebi opened this issue Jul 6, 2023 · 10 comments

Comments

@mi2ebi
Copy link

mi2ebi commented Jul 6, 2023

image

i'd assume this should result in /kʷʰ/ (and yes this is currently doable as just z/k+w+h/)

i'm unsure how related to #48 this is

@bbrk24
Copy link
Collaborator

bbrk24 commented Jul 6, 2023

Ah, yeah that's a problem. I don't know how to determine whether the user intends ( or a group of superscripts. The code currently just does regex substitutions, which can't identify balanced parentheses in the general case (though it might be possible in this specific case? I haven't thought about it much).

@mi2ebi
Copy link
Author

mi2ebi commented Jul 7, 2023

i can't think of anything either, so i gave chatgpt an explanation of z-sampa + and it came back with

If the parser's only capabilities are regex replacements, it may not be possible to determine the exact intention of the user when encountering +(. However, you can make some assumptions based on common patterns and prioritize certain interpretations over others.

One possible approach is to check for the presence of +) after +(. If +) follows +(, it can be interpreted as a segment-initial and -final diacritic application as shown in your example (2). In this case, you would replace +( with the appropriate diacritic symbol, such as .

If +) is not present, you can assume that +( is meant to indicate multiple superscripts as shown in your example (3). In this case, you would look for a group of symbols within parentheses and superscript each of them. For example, you would replace +(wh) with ʷʰ.

this won't actually work, because +( need not have a following +)

However, it's important to note that this approach is based on assumptions and may not cover all possible cases accurately. If you have additional information or context available, such as the position of +( within the transcription or specific rules that can guide the interpretation, it would be helpful in making a more precise determination.


i also notice a lack of latin extended-f support: z/l\`+d`/ should output /𝼈𐞋/ rather than /ɺ̢ᵈ˞ / (with an added legibility space because consolas bad), etc

@bbrk24
Copy link
Collaborator

bbrk24 commented Jul 7, 2023

i also notice a lack of latin extended-f support: z/l\`+d`/ should output /𝼈𐞋/ rather than /ɺ̢ᵈ˞ / (with an added legibility space because consolas bad), etc

There's some font limitations here:

image

Even the current translation of z/F\/ is not uncontroversial.

@mi2ebi
Copy link
Author

mi2ebi commented Jul 8, 2023

ah sorry- the characters in the "should output" are
image

(free stuff identifier)

@xsduan
Copy link
Owner

xsduan commented Jul 9, 2023

Ah, yeah that's a problem. I don't know how to determine whether the user intends ( or a group of superscripts. The code currently just does regex substitutions, which can't identify balanced parentheses in the general case (though it might be possible in this specific case? I haven't thought about it much).

Honestly I think at this point there should just be EBNF support or something, I feel like there's been a lot of cases like this. Regex subs work well for X-SAMPA but Z-SAMPA has a lot of innocent seeming bracket rules that turn into a giant fucking mess

@mi2ebi
Copy link
Author

mi2ebi commented Jul 9, 2023

how exactly does ebnf help here? /genq

@xsduan
Copy link
Owner

xsduan commented Jul 12, 2023

how exactly does ebnf help here? /genq

EBNF is a way to specify context free grammars, which generally allow trickier syntax (the canonical example is the same amount of as and bs in a string, like aaaabbbb). Essentially anything that requires knowing something else about the string, like in the mentioned example you need to know how many as there were, which regexes can't remember. We'd probably mostly use it for balanced parentheses.

Also, technically a "regex" as we use in Javascript or whatever is a context free grammar but it's extremely convoluted to make that work because it's more of an semi-unintended interaction of features than a properly designed functionality.

@bbrk24
Copy link
Collaborator

bbrk24 commented Jul 12, 2023

EBNF is a way to specify context free grammars

And it's only that. EBNF doesn't provide a mechanism for parsing them.

@bbrk24
Copy link
Collaborator

bbrk24 commented Jul 12, 2023

Regardless, there are some situations where well-formed Z-SAMPA -- in the original spec, not the modified one the bot uses -- is ambiguous. Consider the string /k+(hts)/. That could either be /kʰᵗˢ/ (which we call /k+h+t+s/) or /k⁽ht͡s/ (which we call /k+(hts/). No amount of grammar specification or parsing can handle genuine ambiguity.

@xsduan
Copy link
Owner

xsduan commented Jul 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants