Incorrect casing of standard library values #2105

toastal · 2024-11-24T08:58:36Z

Describe the bug
Acronyms & initialism in English use upper case letters to differentiate them from no other kinds of words. Using improper casing loses this syntactic information. I see some errors in the standard library:

'Json should be 'JSON for JavaScript Object Notation https://www.json.org
'Toml should be 'TOML for Tom’s Obvious, Minimal Language https://toml.io/en/
'Yaml should be 'YAML for YAML Ain’t Markup Language https://yaml.org

Note that that 'Text makes sense as is since it is not an acronym or initialism, but also that all of the official sites for these listed values use all capital letters unlike the incorrect casing seen in Nickel.

To Reproduce
Look at source

Expected behavior
The standard library casing be fixed to match the English conventions & how the languages officially reference themselves.

Environment

OS name + version: NixOS unstable
Version of the code: All

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

yannham · 2024-11-25T09:29:58Z

I think there are many different ways to write those acronyms with reasonable justification for each. I would be inclined to say that in code we shouldn't follow English written conventions for text, as code isn't really text. Which leaves us with:

use only lowercase letters, as for command-line arguments: json, yaml, etc.
use only uppercase letters, as they're acronyms, as you propose. But then as you mention text isn't an acronym, so that would create inconsistencies (or at least surprise: I find both 'JSON versus 'text and 'TEXT to be surprising).
use one of the popular case convention for source code: snake case, camel case, kebab case. In Nickel we followed the Rust (and one could argue OCaml as well, at least for ADT variants) approach (albeit only as a convention) that values and types use different casing to differentiate them (even more because we can use types as normal values and vice-versa). For enum tags, I think we followed (maybe unconsciously ?) the general tradition of functional languages that ADT constructors use uppercase camel case, since enum tags can be seen as a zero-ary constructors. Note that in that case Json or Yaml are indeed the usual recommended way of writing acronyms in CamelCase, avoiding multiple consecutive uppercase letters (Microsoft guidelines, Google guidelines, etc.).

I think ultimately any of those choices is reasonable, and IMHO it doesn't really matter (as long as you're consistent). I think I slightly prefer the current approach (as opposed to say uppercase acronyms) because the casing doesn't really depend on the meaning of each enum tag, so you can use the same consistent writing for all of the values in the stdlib (say, in the option type [| 'Some a, 'None |], for hash algorithms, for stuff that is not acronyms, etc.), and it's also the same case convention as for types and contracts, so you don't have to learn a new one or think too much about it.

I must say I'm not too inclined to break backward compatibility for this, unless there is a strong motivation (and I'm even less inclined to accept multiple casing, such as both 'Json and 'JSON to maintain backward compatibility). Do you think this has any consequence with respect to discoverability, principle of least surprise, etc.? Has this bitten you in any way, or it's more of it's just doesn't feel right case?

toastal · 2024-11-25T09:53:31Z

Google also put out this style guide: https://google.github.io/styleguide/go/decisions.html#initialisms

It would break backwards compatibility, but I think this was the wrong decision in the first place. Many ‘functional’ projects use proper acronym/initialism casing even in their ADTs. The problem is that you start to lose that casing information, JSON is just a stand in for “JavaScript Object Notation” & the usage of initialism here make it clear that it means as such (as “json” isn’t a word). Since Nickel doesn’t have casing restrictions I would lean in favor of spelling things are the author intended. I don’t think following the Rust crowd is a great argument when you can set your own terms.

yannham · 2024-11-25T10:25:14Z

Ah, it's interesting that the style guide differs for Go and JavaScript. I agree it's ultimately all pretty arbitrary conventions, at the end of the day.

However, I will reiterate that Nickel is code and not prose and that we should have a convincing practical motivation for breaking backward compatibility. As language maintainers, I think that "spelling things as the author intended" is, to put it a bit bluntly, the least of our concern. As we favor Nickel users over acronym authors, I think it's even worse: now you have to do additional mental gymnastic to differentiate between 'Text and 'JSON. Sometimes it's also not entirely trivial to know how intended casing for commercial acronyms and brands, which can depend on the fad of marketing. I prefer a purely "algorithmic" casing, that is uniform and consistent.

I also agree that it's not a good argument per so to do "just like Rust" (or even a slightly dangerous irrational bias). In our case though the choice was first and foremost practically motivated: because types and values live in the same namespace, it's better for disambiguation to use entirely different casing conventions, rather than slightly different ones (such as the usual camelCase/CamelCase). It just happened that Rust is a now prominent language that has made this choice as well. Additionally, Rust, OCaml, C++ or other existing languages haven't been created out of thin air and following precedents when you're out of technical criteria to make a decision and just need to make an arbitrary, normative choice helps fulfilling the principle of least surprise.

toastal · 2024-11-25T10:57:00Z

I mean if it were up to me, I would use 'Jsᴏɴ to have the camel casing and not lose the intialism information about the word, but this is the kind of things that would actually cause “surprise” despite making sense; seeing JSON case changed was a surprise to me, which is why I raised the issue. “Algorithmic” casing loses information, which is why JavaScriptObjectNotation becomes JSON when you drop the lowercase letters to become an initialism. This isn’t a stylistic/branding thing either in the case of all 3, JSON, TOML, & YAML. I think 'Text vs. 'JSON is the perfect example for this specifically since TEXT, unlike JSON, isn’t an acronym or initialism so I can’t say I understand the example.

The OCaml naming conventions are usually snake_cased anyhow & C++ is hardly standardized in naming… where snake casing gets to ignore jsonAPI vs jsonApi arguments with json_api.

rben01 · 2024-12-02T15:30:27Z

In general I think that capital words should just get title cased like any other words. If you use PascalCase, then each capital letter should denote the start of a word — but only the J, not the S O or N, denote the start of a word. Supposing we had something like “JSON IO”, it would be spelled much more clearly as JsonIo than JSONIO, as the latter doesn't make clear the word boundaries, or even that there are two words to begin with.

toastal · 2024-12-03T10:15:30Z

J, S, O, N do denote the start of a word, three(ish, “JS” is itself an initialism) words actually: JavaScript Object Notation. This is why JSON is an acronym in the first place… & also why you wrote it in English as “JSON”, not “Json” or “json” since it matters. “Io” is a moon of Jupiter, not to be confused with the initialism “IO” which stands for “input/output”.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect casing of standard library values #2105

Incorrect casing of standard library values #2105

toastal commented Nov 24, 2024

yannham commented Nov 25, 2024 •

edited

Loading

toastal commented Nov 25, 2024

yannham commented Nov 25, 2024

toastal commented Nov 25, 2024

rben01 commented Dec 2, 2024

toastal commented Dec 3, 2024

Incorrect casing of standard library values #2105

Incorrect casing of standard library values #2105

Comments

toastal commented Nov 24, 2024

yannham commented Nov 25, 2024 • edited Loading

toastal commented Nov 25, 2024

yannham commented Nov 25, 2024

toastal commented Nov 25, 2024

rben01 commented Dec 2, 2024

toastal commented Dec 3, 2024

yannham commented Nov 25, 2024 •

edited

Loading