Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect casing of standard library values #2105

Open
toastal opened this issue Nov 24, 2024 · 6 comments
Open

Incorrect casing of standard library values #2105

toastal opened this issue Nov 24, 2024 · 6 comments

Comments

@toastal
Copy link

toastal commented Nov 24, 2024

Describe the bug
Acronyms & initialism in English use upper case letters to differentiate them from no other kinds of words. Using improper casing loses this syntactic information. I see some errors in the standard library:

Note that that 'Text makes sense as is since it is not an acronym or initialism, but also that all of the official sites for these listed values use all capital letters unlike the incorrect casing seen in Nickel.

To Reproduce
Look at source

Expected behavior
The standard library casing be fixed to match the English conventions & how the languages officially reference themselves.

Environment

  • OS name + version: NixOS unstable
  • Version of the code: All

Additional context
Add any other context about the problem here.

@yannham
Copy link
Member

yannham commented Nov 25, 2024

I think there are many different ways to write those acronyms with reasonable justification for each. I would be inclined to say that in code we shouldn't follow English written conventions for text, as code isn't really text. Which leaves us with:

  • use only lowercase letters, as for command-line arguments: json, yaml, etc.
  • use only uppercase letters, as they're acronyms, as you propose. But then as you mention text isn't an acronym, so that would create inconsistencies (or at least surprise: I find both 'JSON versus 'text and 'TEXT to be surprising).
  • use one of the popular case convention for source code: snake case, camel case, kebab case. In Nickel we followed the Rust (and one could argue OCaml as well, at least for ADT variants) approach (albeit only as a convention) that values and types use different casing to differentiate them (even more because we can use types as normal values and vice-versa). For enum tags, I think we followed (maybe unconsciously ?) the general tradition of functional languages that ADT constructors use uppercase camel case, since enum tags can be seen as a zero-ary constructors. Note that in that case Json or Yaml are indeed the usual recommended way of writing acronyms in CamelCase, avoiding multiple consecutive uppercase letters (Microsoft guidelines, Google guidelines, etc.).

I think ultimately any of those choices is reasonable, and IMHO it doesn't really matter (as long as you're consistent). I think I slightly prefer the current approach (as opposed to say uppercase acronyms) because the casing doesn't really depend on the meaning of each enum tag, so you can use the same consistent writing for all of the values in the stdlib (say, in the option type [| 'Some a, 'None |], for hash algorithms, for stuff that is not acronyms, etc.), and it's also the same case convention as for types and contracts, so you don't have to learn a new one or think too much about it.

I must say I'm not too inclined to break backward compatibility for this, unless there is a strong motivation (and I'm even less inclined to accept multiple casing, such as both 'Json and 'JSON to maintain backward compatibility). Do you think this has any consequence with respect to discoverability, principle of least surprise, etc.? Has this bitten you in any way, or it's more of it's just doesn't feel right case?

@toastal
Copy link
Author

toastal commented Nov 25, 2024

Google also put out this style guide: https://google.github.io/styleguide/go/decisions.html#initialisms

It would break backwards compatibility, but I think this was the wrong decision in the first place. Many ‘functional’ projects use proper acronym/initialism casing even in their ADTs. The problem is that you start to lose that casing information, JSON is just a stand in for “JavaScript Object Notation” & the usage of initialism here make it clear that it means as such (as “json” isn’t a word). Since Nickel doesn’t have casing restrictions I would lean in favor of spelling things are the author intended. I don’t think following the Rust crowd is a great argument when you can set your own terms.

@yannham
Copy link
Member

yannham commented Nov 25, 2024

Ah, it's interesting that the style guide differs for Go and JavaScript. I agree it's ultimately all pretty arbitrary conventions, at the end of the day.

However, I will reiterate that Nickel is code and not prose and that we should have a convincing practical motivation for breaking backward compatibility. As language maintainers, I think that "spelling things as the author intended" is, to put it a bit bluntly, the least of our concern. As we favor Nickel users over acronym authors, I think it's even worse: now you have to do additional mental gymnastic to differentiate between 'Text and 'JSON. Sometimes it's also not entirely trivial to know how intended casing for commercial acronyms and brands, which can depend on the fad of marketing. I prefer a purely "algorithmic" casing, that is uniform and consistent.

I also agree that it's not a good argument per so to do "just like Rust" (or even a slightly dangerous irrational bias). In our case though the choice was first and foremost practically motivated: because types and values live in the same namespace, it's better for disambiguation to use entirely different casing conventions, rather than slightly different ones (such as the usual camelCase/CamelCase). It just happened that Rust is a now prominent language that has made this choice as well. Additionally, Rust, OCaml, C++ or other existing languages haven't been created out of thin air and following precedents when you're out of technical criteria to make a decision and just need to make an arbitrary, normative choice helps fulfilling the principle of least surprise.

@toastal
Copy link
Author

toastal commented Nov 25, 2024

I mean if it were up to me, I would use 'Jsᴏɴ to have the camel casing and not lose the intialism information about the word, but this is the kind of things that would actually cause “surprise” despite making sense; seeing JSON case changed was a surprise to me, which is why I raised the issue. “Algorithmic” casing loses information, which is why JavaScriptObjectNotation becomes JSON when you drop the lowercase letters to become an initialism. This isn’t a stylistic/branding thing either in the case of all 3, JSON, TOML, & YAML. I think 'Text vs. 'JSON is the perfect example for this specifically since TEXT, unlike JSON, isn’t an acronym or initialism so I can’t say I understand the example.

The OCaml naming conventions are usually snake_cased anyhow & C++ is hardly standardized in naming… where snake casing gets to ignore jsonAPI vs jsonApi arguments with json_api.

@rben01
Copy link
Contributor

rben01 commented Dec 2, 2024

In general I think that capital words should just get title cased like any other words. If you use PascalCase, then each capital letter should denote the start of a word — but only the J, not the S O or N, denote the start of a word. Supposing we had something like “JSON IO”, it would be spelled much more clearly as JsonIo than JSONIO, as the latter doesn't make clear the word boundaries, or even that there are two words to begin with.

@toastal
Copy link
Author

toastal commented Dec 3, 2024

J, S, O, N do denote the start of a word, three(ish, “JS” is itself an initialism) words actually: JavaScript Object Notation. This is why JSON is an acronym in the first place… & also why you wrote it in English as “JSON”, not “Json” or “json” since it matters. “Io” is a moon of Jupiter, not to be confused with the initialism “IO” which stands for “input/output”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants