Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make isFuzzy return true if tag string includes "fuzzy" #179

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

uriesk
Copy link
Contributor

@uriesk uriesk commented Mar 9, 2024

Some editors turn the #, javascript-format into #, fuzzy, javascript-format when they merge.
ttag doesn't understand that and doesn't treat them as fuzzy.

fix #178

@uriesk
Copy link
Contributor Author

uriesk commented Mar 9, 2024

@AlexMost
hi friend. I got a few little PRs for filtering fuzzy labels, i need those for weblate.org, which uses msgmerge that ads its fuzzy label additional to the existing labels.

To my knowledge, it's all backwards compatible. The other PRs:
ttag-org/ttag-cli#153
ttag-org/ttag-po-loader#4

@dimaqq
Copy link

dimaqq commented Mar 18, 2024

Hi @uriesk I'm happy to merge this, code seems OK.
Could you expand a little more on the background for this change?

Some editors turn the #, javascript-format into #, fuzzy, javascript-format when they merge.

What editor, specifically?
Any idea why why that is?

What worries me is that a rather reasonable javascript-format tag is marked as fuzzy, and with this change, the translation is removed entirely, isn't it? Would such outcome be undesirable?

@uriesk
Copy link
Contributor Author

uriesk commented Mar 21, 2024

@dimaqq
Here an example, by using original GNU gettext to translate:

  1. babel-plugin-ttag extracts the string jt`Hello ${name}!` into a pot file:

test.pot

msgid ""
msgstr ""
"Content-Type: text/plain; charset=utf-8\n"
"Plural-Forms: nplurals=2; plural=(n!=1);\n"

#: src/test.js:10
#, javascript-format
msgid "Hello ${ name }!"
msgstr ""
  1. i create a translation using gettext with msginit and translate the string, resulting in

de_AT.po

msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"Project-Id-Version: PACKAGE VERSION\n"
"PO-Revision-Date: 2024-03-21 17:14+0100\n"
"Last-Translator: uriesk <[email protected]>\n"
"Language-Team: Austrian\n"
"Language: de_AT\n"
"MIME-Version: 1.0\n"
"Content-Transfer-Encoding: 8bit\n"

#: src/test.js:10
#, javascript-format
msgid "Hello ${ name }!"
msgstr "Hallo ${ name }!"
  1. code gets updated, the name of the variable changes and jt`Hello ${name}!` becomes jt`Hello ${userName}!`

test.pot

msgid ""
msgstr ""
"Content-Type: text/plain; charset=utf-8\n"
"Plural-Forms: nplurals=2; plural=(n!=1);\n"

#: src/test.js:10
#, javascript-format
msgid "Hello ${ userName }!"
msgstr ""
  1. update the translation file with gettext msgmerge de_AT.po test.pot, results in:
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"PO-Revision-Date: 2024-03-21 17:14+0100\n"
"Last-Translator: uriesk <[email protected]>\n"
"Language-Team: Austrian\n"
"Language: de_AT\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#: src/test.js:10
#, fuzzy, javascript-format
msgid "Hello ${ userName }"
msgstr "Hallo ${ name }!"

And here it is, it added the fuzzy tag additional to the javascript-format tag. Marking the string, to tell you that you have to check the change (and that is good, cause it is now faulty). And every single editor that uses msgmerge for updating from pot files does this. Weblate is one of them.

Currently, ttag is incapable of telling that this is fuzzy.

with this change, the translation is removed entirely, isn't it? Would such outcome be undesirable?

Every string that is detected as fuzzy is ignored by default. Ordinary j`Some text` strings do not get the javascript-format tag, so when msgmerge marks those as fuzzy, ttag ignores them. There is no good reason why it should accept a fuzzy string just because it has one additional tag.

Recently there was an option added to tell ttag to also include fuzzy strings: #169
That would apply here too, if you want to include the fuzzy, you can chose to do that.

I can't think of any situation where someone would rely on fuzzy javascript-format to be included, but ordinary fuzzy strings not. But it is a change in behavior, so maybe bumping the major version would be good? But i am not familiar in how the versioning is done.

@dimaqq
Copy link

dimaqq commented Mar 22, 2024

Unfortunately, I'm working for a different company and I don't have access to the project where I used ttag.

My gut tells me that GNU gettext was not very good for some reason, and I merged translation files using ttag tooling instead: https://github.com/ttag-org/ttag-cli/blob/master/src/lib/merge.ts

@dimaqq
Copy link

dimaqq commented Mar 22, 2024

I think last time I did something with this, ttag merge was the answer. Since you've got ttag in the toolchain, there's no need for traditional tooling.

That being said, I'll be happy to try and reproduce this behaviour.

P.S. I couldn't find a reported issue against GNU gettext, is this a bug on their side, or documented, established behaviour?

https://savannah.gnu.org/search/?search=Search&words0=msgmerge&type_of_search=bugs&only_group_id=425&exact=1&max_rows=#options

@uriesk
Copy link
Contributor Author

uriesk commented Mar 22, 2024

@dimaqq
This is part of the po file format. Lines starting with #, indicate flags and flags are a coma separated list.
https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

It is established behaviour for as long as GNU gettext exist. In the oldest available version on their server from 23 years ago, you can read in its included documentation:

The comment lines beginning with <KBD>#,</KBD> are special because they are
not completely ignored by the programs as comments generally are.  The
COMMA SEPERATED LIST of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE>...

Marking fuzzy strings as fuzzy, is not a bug.
Not removing the javascript-format flag when adding the fuzzy flag, is not a bug.

But an isFuzzy() function not detecting that a string is fuzzy, just because there is one additional flag, is a bug.

I think last time I did something with this, ttag merge was the answer. Since you've got ttag in the toolchain, there's no need for traditional tooling.

If you use weblate or properly even crowdin, msgmerge is already part of the tooling. They watch the repository (with hooks or whatever), detect changes in the .pot file and update the .po files automatically and notify translators that changes arrived (if they want to).

@jorrit
Copy link
Contributor

jorrit commented Mar 22, 2024

What I find strange is that translationObj.comments.flag seems to be regarded as a string. In gettext-parser this is an array (https://github.com/smhg/gettext-parser/blob/272ac171e385895d8692997fddb96366d143fc46/lib/poparser.js#L271). Is this because of what applyFormat() does? I think it should remain an array throughout the code.

@uriesk
Copy link
Contributor Author

uriesk commented Mar 22, 2024

What I find strange is that translationObj.comments.flag seems to be regarded as a string. In gettext-parser this is an array (https://github.com/smhg/gettext-parser/blob/272ac171e385895d8692997fddb96366d143fc46/lib/poparser.js#L271). Is this because of what applyFormat() does? I think it should remain an array throughout the code.

It's even worse with the references, if you have

#: src/test1.js src/haha.js
#: src/lol.js

then translationObj.comments.reference becomes

'src/test1.js src/hahajs \nsrc/lol.js'

However, since arrays have an includes prototype too, this merge request here will still work ;)

@jorrit
Copy link
Contributor

jorrit commented Mar 22, 2024

That's true, but if the variable is a string and (for some reason) contains an entry that contains the string fuzzy, but is not exactly fuzzy, the code will match.

@uriesk
Copy link
Contributor Author

uriesk commented Mar 22, 2024

good that there are no flags that include fuzzy but aren't fuzzy.

@@ -190,7 +190,8 @@ export function hasTranslations(translationObj) {
export function isFuzzy(translationObj) {
return (
translationObj && translationObj.comments
&& translationObj.comments.flag === 'fuzzy');
&& translationObj.comments.flag
&& translationObj.comments.flag.includes('fuzzy'));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough.
A bit quirky, but I figure that the format of PO files is changed rather rarely.

As a though experiment:

One alternative would be to parse the comment (split on comma, trim whitespace).

Another alternative would be match a more exact pattern.

Suggested change
&& translationObj.comments.flag.includes('fuzzy'));
&& translationObj.comments.flag.includes(', fuzzy'));

Ref:
https://github.com/autotools-mirror/gettext/blob/6456980949f3766b366d3a7afba05d7fefe27cd0/gettext-tools/src/read-stringtable.c#L462-L474

@AlexMost
Copy link
Member

Seems like a reasonable PR. Thanks for the contribution @uriesk ! Will try to add some test cases and release the new version ASAP

@AlexMost AlexMost merged commit ebeb9e4 into ttag-org:master Apr 18, 2024
@AlexMost
Copy link
Member

The new version is available - [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fuzzy not detected when there are multiple tags in one line
4 participants