z-sampa grouped superscripts fail #52

mi2ebi · 2023-07-06T17:45:15Z

i'd assume this should result in /kʷʰ/ (and yes this is currently doable as just z/k+w+h/)

i'm unsure how related to #48 this is

The text was updated successfully, but these errors were encountered:

bbrk24 · 2023-07-06T19:43:22Z

Ah, yeah that's a problem. I don't know how to determine whether the user intends ⁽ or a group of superscripts. The code currently just does regex substitutions, which can't identify balanced parentheses in the general case (though it might be possible in this specific case? I haven't thought about it much).

mi2ebi · 2023-07-07T16:23:56Z

i can't think of anything either, so i gave chatgpt an explanation of z-sampa + and it came back with

If the parser's only capabilities are regex replacements, it may not be possible to determine the exact intention of the user when encountering +(. However, you can make some assumptions based on common patterns and prioritize certain interpretations over others.

One possible approach is to check for the presence of +) after +(. If +) follows +(, it can be interpreted as a segment-initial and -final diacritic application as shown in your example (2). In this case, you would replace +( with the appropriate diacritic symbol, such as ⁽.

If +) is not present, you can assume that +( is meant to indicate multiple superscripts as shown in your example (3). In this case, you would look for a group of symbols within parentheses and superscript each of them. For example, you would replace +(wh) with ʷʰ.

this won't actually work, because +( need not have a following +)

However, it's important to note that this approach is based on assumptions and may not cover all possible cases accurately. If you have additional information or context available, such as the position of +( within the transcription or specific rules that can guide the interpretation, it would be helpful in making a more precise determination.

i also notice a lack of latin extended-f support: z/l\`+d`/ should output /𝼈𐞋/ rather than /ɺ̢ᵈ˞ / (with an added legibility space because consolas bad), etc

bbrk24 · 2023-07-07T17:06:12Z

i also notice a lack of latin extended-f support: z/l\`+d`/ should output /𝼈𐞋/ rather than /ɺ̢ᵈ˞ / (with an added legibility space because consolas bad), etc

There's some font limitations here:

Even the current translation of z/F\/ is not uncontroversial.

mi2ebi · 2023-07-08T17:30:06Z

ah sorry- the characters in the "should output" are

(free stuff identifier)

xsduan · 2023-07-09T14:00:20Z

Ah, yeah that's a problem. I don't know how to determine whether the user intends ⁽ or a group of superscripts. The code currently just does regex substitutions, which can't identify balanced parentheses in the general case (though it might be possible in this specific case? I haven't thought about it much).

Honestly I think at this point there should just be EBNF support or something, I feel like there's been a lot of cases like this. Regex subs work well for X-SAMPA but Z-SAMPA has a lot of innocent seeming bracket rules that turn into a giant fucking mess

mi2ebi · 2023-07-09T16:11:33Z

how exactly does ebnf help here? /genq

xsduan · 2023-07-12T00:22:23Z

how exactly does ebnf help here? /genq

EBNF is a way to specify context free grammars, which generally allow trickier syntax (the canonical example is the same amount of as and bs in a string, like aaaabbbb). Essentially anything that requires knowing something else about the string, like in the mentioned example you need to know how many as there were, which regexes can't remember. We'd probably mostly use it for balanced parentheses.

Also, technically a "regex" as we use in Javascript or whatever is a context free grammar but it's extremely convoluted to make that work because it's more of an semi-unintended interaction of features than a properly designed functionality.

bbrk24 · 2023-07-12T00:23:26Z

EBNF is a way to specify context free grammars

And it's only that. EBNF doesn't provide a mechanism for parsing them.

bbrk24 · 2023-07-12T00:30:09Z

Regardless, there are some situations where well-formed Z-SAMPA -- in the original spec, not the modified one the bot uses -- is ambiguous. Consider the string /k+(hts)/. That could either be /kʰᵗˢ/ (which we call /k+h+t+s/) or /k⁽ht͡s/ (which we call /k+(hts/). No amount of grammar specification or parsing can handle genuine ambiguity.

xsduan · 2023-07-12T00:41:37Z

Well there’s always antlr

On Tue, Jul 11, 2023 at 7:30 PM bbrk24 ***@***.***> wrote: EBNF is a way to specify context free grammars And it's only that. EBNF doesn't provide a mechanism for parsing them. — Reply to this email directly, view it on GitHub <#52 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACIQY5WQL57SI34P35ZQHYTXPXVARANCNFSM6AAAAAA2AYJILU> . You are receiving this because you commented.Message ID: ***@***.***>

-- shane duan home/cell: (925) 963-8879

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

z-sampa grouped superscripts fail #52

z-sampa grouped superscripts fail #52

mi2ebi commented Jul 6, 2023 •

edited

Loading

bbrk24 commented Jul 6, 2023

mi2ebi commented Jul 7, 2023 •

edited

Loading

bbrk24 commented Jul 7, 2023 •

edited

Loading

mi2ebi commented Jul 8, 2023

xsduan commented Jul 9, 2023

mi2ebi commented Jul 9, 2023

xsduan commented Jul 12, 2023

bbrk24 commented Jul 12, 2023

bbrk24 commented Jul 12, 2023

xsduan commented Jul 12, 2023 via email

z-sampa grouped superscripts fail #52

z-sampa grouped superscripts fail #52

Comments

mi2ebi commented Jul 6, 2023 • edited Loading

bbrk24 commented Jul 6, 2023

mi2ebi commented Jul 7, 2023 • edited Loading

bbrk24 commented Jul 7, 2023 • edited Loading

mi2ebi commented Jul 8, 2023

xsduan commented Jul 9, 2023

mi2ebi commented Jul 9, 2023

xsduan commented Jul 12, 2023

bbrk24 commented Jul 12, 2023

bbrk24 commented Jul 12, 2023

xsduan commented Jul 12, 2023 via email

mi2ebi commented Jul 6, 2023 •

edited

Loading

mi2ebi commented Jul 7, 2023 •

edited

Loading

bbrk24 commented Jul 7, 2023 •

edited

Loading