-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some localizations #8
base: master
Are you sure you want to change the base?
Conversation
rudimentary support en, de, pt, es, fr, it, ... for the time being, hopefully without any clashes
Hi, thank you for the patch. I have some questions. On its own terms, I don’t understand the I also don’t know how I feel about this patch in general. You’ll note – although the code is structured in a way that people who merely want to use the program can ignore this – that there is a CPAN directory whereby the same code lives on CPAN as a module: a module called Lingua::EN::Titlecase::Simple. Note the language in the moniker – this code is explicitly English-centric (and a very specific and narrow English-speaking sensibility at that: the NYT style manual). I dealt with this code too long ago to remember the details but I believe I recall that some of the rules implemented by the rest of the code do not well apply to German as a German speaker would expect it to look. I don’t have the same feel for the other languages, but on the basis of the one I do know, I would be surprised to hear that the same is not true for any of the others. So I am trying to understand the circumstances that make this patch useful. Is the idea here to process mostly English text for an English audience which just happens to contain stray foreign words or other-language quotations or other such embeds? Even that seems a strange guess, but I can’t think of anything better. |
It's just that it seemed not to work with texts in other languages, like German or Spanish; |
The reason there is no such script at least for German is that in German, capitalization is part of spelling and grammar, so arbitrarily titlecasing German text this way simply makes it incorrect. Headlines are either cased the same as regular text or set in all caps; titlecasing is not a thing. Sorry, I should have been fully awake before my first response and noticed this fundamental issue before getting bogged down with details like the I don’t know whether the same is true for Spanish or any of the other languages in your patch – I suspect it is, because I know it is also true for Greek, and therefore not a peculiarity of German in particular… but I don’t know know it for a fact. Do you?
Is that observation the only basis though? Do you not speak any of those languages? (Or write or edit for audiences reading them, or whatever direct knowledge.) I do appreciate the sentiment of wanting to help, and I appreciate that you took the time to prepare and submit a patch. Apologies that the subject of this code has somewhat of a barrier to entry.
Wait, they weren’t all nonsense. I only asked about them in German, French, and Spanish specifically, and there was a reason for that: in the lists for the other languages, the |
Yes, this is understood. The less a script does, the easier it is to be good at it.
might not be to every one's taste, but they're rare enough not to go to lenghts about them and still find good use of titlecase. |
The version I just released matches John Gruber’s latest public upstream version (from 2015 as of this writing), which contains code to handle exactly these cases. As for making this code multi-lingual, I’m afraid I’m not interested in shipping that. However the module version of the code already allows modifying the small-word list by changing the |
Very laudable, thank you. Is that the code? s{
\b
(?<! -) # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (in-flight)
( $small_re )
(?= -[[:alpha:]]+) # lookahead for "-someword"
}{\u\L$1}xig;
This is open-source and voluntary effort. I am just thinking from a user perspective. How many will have the inclination, knowledge, and time to scrutinize the Perl code base and replace the array by their localized version? |
Yes.
That’s why I’m asking whether it would satisfy your use case if the program had a switch, something like
I’ve made my decision. If Gruber were to add such a change upstream I would take it with no questions asked, otherwise it’s not going in. The many, many people who would prefer otherwise will have to either consider whether the proposed |
Makes perfect sense. Yes, |
Rudimentary support en, de, pt, es, fr, it, ... for the time being, hopefully without any clashes.
Thank you for your useful tool