Namespace prefixes in KDL #295

marrus-sh · 2022-09-05T08:09:44Z

marrus-sh
Sep 5, 2022

Why?

Namespacing is extremely important in the world of metadata, and an important prerequisite for using KDL as a metadata language.

As one trivial example, created in DCMI is used to provide the date that a resource was created.

</resource> dc:created "1971-12-31"^^xsd:date .

However, another person might choose to use created as an inverse of the creator property, which is to say, to point from the creator to the resource they made.

</author> ex:created </resource> .

Being able to distinguish these two senses of “created” is necessary for any metadata author.

It is already possible to represent the above examples in KDL as follows:

// we can assume all node names are IRIs, but strings must be
// specifically typed as such to disambiguate them from plain
// strings

"/resource" {
  "http://purl.org/dc/terms/creator" (irl-reference)"/author"
  "http://purl.org/dc/terms/created" ("http://www.w3.org/2001/XMLSchema#date")"1971-12-31"
}
"/author" {
  "http://example.example/ns/created" (irl-reference)"/resource"
}

…but this is very cumbersome.

Details

There are actually two different kinds of “namespacing” which you can have, depending on what you consider your “expanded name”s.

In XML, “expanded names” consist of a pair of a local name (an identifier) and a namespace (an IRI). For example, <html xmlns="http://www.w3.org/1999/xhtml"> has a local name of html and a namespace of http://www.w3.org/1999/xhtml. It is not possible to represent both components in a single string.
However, most contemporary solutions instead have “expanded names” which are IRIs, which are “compacted” into a namespace prefix and a suffix. For example, http://www.w3.org/2001/XMLSchema#date might be compacted into xsd:date by assigning the prefix xsd: to http://www.w3.org/2001/XMLSchema#.

In order to implement the second kind, KDL would need two things:

A way of declaring prefixes for a given scope (e.g. a node), and
A syntax for compacted names which is canonically equivalent to catenating the expansion for the prefix to whatever follows it.

The first kind can be left to KiX, as it is really XML‐specific.

Possible Solution

Reserve properties which end with a : as a prefix declaration.
- Forbid non‐final : in such properties (only one colon is allowed).
- A prefix which is only a single : is allowed (empty prefix is OK).
- The values of such properties must be strings (with no type annotations).
- No restrictions on : in properties where it isn’t the final character.
When a node name, ~~property key~~, or type annotation contains a : and there is a matching prefix declaration in that node or any ancestor, the computed node name is the value of the prefix declaration plus the bit which follows the first :.
- Node names may contain multiple :, but only the first can end a prefix.
- If no prefix is in scope, no expansion occurs (the prefix is treated literally).
- For ~~property keys~~ and type annotations, the prefix declaration must precede them on the current node, or belong to an ancestor node. Prefix declarations which follow a ~~property key~~ or type annotation on the same node won’t be used for that ~~property key~~ or type annotation (although they may be used for others).
- EDIT: I crossed out property keys because they make it ambiguous whether blah: is a prefix declaration or a prefix with no suffix, and because namespaced property keys are (I think) unnecessary. I’m fine limiting prefixing to just node names and type annotations, where there is no ambiguity.

Here is what that would look like:

"/resource" dc:="http://purl.org/dc/terms/" xsd:="http://www.w3.org/2001/XMLSchema#" {
  dc:creator (irl-reference)"/author"
  dc:created (xsd:date)"1971-12-31"
}
"/author" ex:="http://example.example/ns/" {
  ex:created (irl-reference)"/resource"
}

Note that attributes ending in : are invalid in (namespace‐aware) XML, so prefix definitions are not possible and identifiers with : will continue to be treated literally in XiK. This allows the existing XML namespacing solutions to continue working with no changes.

larsgw · 2022-09-05T21:18:00Z

larsgw
Sep 5, 2022
Maintainer

I would prefer making this a guideline for a separate format within KDL (like XiK). Different use cases of KDL, e.g. CSS selectors or Rust module path, might want to use : with a different purpose, and with different rules.

0 replies

tabatkins · 2022-09-06T15:07:58Z

tabatkins
Sep 6, 2022
Maintainer

Also of note, using URLs (or any of their equivalents with similar initialisms) as a unique token is incredibly bad. They encode a lot more information/syntax into the token than is required for uniqueness; they imply resolvability (which can be very bad for highly-used namespaces; see the lengths the W3C has to go thru to deal with the bandwidth of badly-written XML tools trying to resolve their namespaces URLs); people naturally assume they're manipulable in the same ways that URLs are (http vs https, www. vs no, trailing slash versus no, trailing hash versus no, etc).

The sole benefit of using URLs is that there's already a reasonable mechanism for "claiming" them (the URL registrars), tho even that's just a meta-benefit; it doesn't mean anything at the actual namespace level (and there are namespace domains that host a bunch of namespaces, so people don't have to bother with maintaining a domain registration themselves).

Java practice, which is still somewhat tied to URLs for historical reasons, strips all those issues away by just using reverse-domain (com.foo.MyClass); most other namespacing mechanisms don't involve URLs at all and just employ some central registry for claiming tokens, or simply rely on people rarely clashing even when there's no registration process at all.

XiK allows for XML namespace usage using the exact same syntax as XML itself, but I'd be strongly against trying to saddle KDL itself with a similar generic mechanism.

0 replies

zkat · 2022-09-07T00:11:37Z

zkat
Sep 7, 2022
Maintainer

This has been on my mind all day. For some context/history: SDLang, the language that KDL is based on, does support namespaces, and I very intentionally removed namespace support because I considered it to complicate the language too much. My own personal agenda here is for KDL to continue being an easy to learn, implement, and use language, and I believe having namespaces at the core of them goes against the grain on this front. I'll also point to various other languages that have been very successfully used in many scenarios, without requiring explicit namespace support.

Furthermore, discussion around giving special meaning to : makes it very tricky to figure out how to make that work well, and it falls apart very quickly when you realize that KDL identifiers are, in the end, simply strings. "Plain" identifiers are simply syntax sugar convenience that, when you happen to be within the bounds of "legal" characters, you can write the identifiers out without quotes. That's it.

What would it mean for identifiers with namespaces? Is "foo:bar" a namespaced identifier? Or should it be "foo":"bar" if you want to namespace string identifiers? Serializers would now have to also take extra care when auto-formatting/serializing data to make sure namespaced identifiers get properly quoted in the right place.

I started a discussion over at https://twitter.com/zkat__/status/1567194570278187010 and there were some great points made in favor of XML namespaces, particularly for stuff like XSLT or the metadata use-case you mentioned.

At the same time, there exist such things as JSON-LD for data linking stuff, and that makes me wonder if the right place for a "metadata" abstraction is an additional layer on top of KDL, not a namespace system that all implementers are required to get right, but few users will ever use.

0 replies

tabatkins · 2022-09-07T20:21:11Z

tabatkins
Sep 7, 2022
Maintainer

I started a discussion over at twitter.com/zkat__/status/1567194570278187010 and there were some great points made in favor of XML namespaces, particularly for stuff like XSLT or the metadata use-case you mentioned.

Note that, as XiK says, if you're embedding XML in KDL the existing syntax is just fine; you can't use : in XML node names already, so a XiK node named foo:bar is unambiguously a <bar> element using a foo namespace.

For languages designed to be widely applicable in a generic fashion, like metadata, uniquifying the names themselves is, in practice, more than enough. If everyone in the world writes foaf:name, then just declaring the node names to be foaf-name is more than fine; you get the exact same avoidance of collisions without the extra effort of writing a correct xmlns:foaf attribute. Collisions of common prefixes is rare in practice. (And note that, in practice, metadata-consuming tools tend to rely on this; people leave off xmlns declarations or write the wrong namespace URL all the time, so if you actually want to reliably consume these common metadata formats from the web (or any other context that isn't strictly linted) you can pretty much just rely on the common prefix name instead.)

There are still use-cases for having a generic, separate namespacing mechanism that is clearly distinct from the plain names, and which is managed by some type of central registry to prevent collisions. But they are, in practice, miniscule in comparison to the set of use-cases that work fine with just manually-uniqued names. And they can be handled in a fashion similar to json-ld, where a recognized meta-layer can communicate it.

0 replies

djmattyg007 · 2022-09-16T22:46:22Z

djmattyg007
Sep 16, 2022

Namespacing is probably the worst part of XML. If it became a core part of KDL I’d probably just go back to JSON5. No thank you.

2 replies

hughbris Sep 20, 2022

Namespacing is probably the worst part of XML.

If you are going to offer an opinion like this, please provide some explanation. This isn't useful by itself.

djmattyg007 Sep 20, 2022

Based on my experience with XML, it significantly complicated all code that needs to parse it and handle it. IMO it’s not worth that complication.

marrus-sh · 2022-09-19T22:19:43Z

marrus-sh
Sep 19, 2022
Author

RE @tabatkins to clear up some misconceptions regarding IRIs/URLs :—

using URLs (or any of their equivalents with similar initialisms) as a unique token is incredibly bad.

If it is so incredibly bad, why does every major metadata institution do it? (Examples: the Library of Congress, Wikidata, the British National Bibliography, Google, the fediverse, etc…) The thing you are criticizing is an internet best practice which has existed for decades and which has only become more important in recent years.

They encode a lot more information/syntax into the token than is required for uniqueness

urn:uuid: URIs, in fact, do not.
This additional information is incredibly important for humans.

they imply resolvability (which can be very bad for highly-used namespaces; see the lengths the W3C has to go thru to deal with the bandwidth of badly-written XML tools trying to resolve their namespaces URLs)

Examples of IRIs which do not imply resolvability include urn:, tag:, ark:… all of which are in active use.
Resolvability is often useful (the fediverse, for example, depends on identifiers being resolvable).

The sole benefit of using URLs is that there's already a reasonable mechanism for "claiming" them (the URL registrars)

Actually the sole benefit of using URLs is that it is a universally‐recognizable syntax with defined semantics, which is a very good benefit.

For languages designed to be widely applicable in a generic fashion, like metadata, uniquifying the names themselves is, in practice, more than enough.

This requires having additional documentation about what each (locally) unique name means in a global sense (for example, “in this document the prefix foaf- refers to those terms defined in the http://xmlns.com/foaf/0.1/ namespace”). This is in fact the same exact information encoded in a prefix declaration, except that the latter is readable by computers in addition to humans.

Assuming that all humans/computers reading the file already knows what the foaf- prefix represents is folly. Resolvable prefixes are an established, elegant solution to this problem.

Responding to other points :—

RE: @zkat, wanting to keep things simple is why I focused on a simple string expansion/compacting mechanism rather than a more complicated data model like one would find in XML. I am sympathetic to this concern but I do want to emphasize that I think “prefixes let you shorten strings” is a very simple and approachable concept for both users and implementers.
I used : specifically because it is a backwards‐compatible identifier character (so documents would still be able to be processed by KDL 1.0 applications).
I initially imagined that foo:bar and "foo:bar" would be equivalent, because that seemed simplest. If you need foo: to literally expand to "foo:", but have previously defined it otherwise, you can always write foo:="foo:". It might be worthwhile to allow foo:=null to serve this purpose.
Remarks about “what if I want to use : for other purposes!” are not relevant to this proposal because it already proposes : being treated literally if no prefix has been defined. If you don’t want prefix semantics, you can just not define any prefixes.
As a historical note, XML did not define namespaces in the original specification. Namespaces were added in a separate “Namespaces in XML” specification, which XML implementations could then optionally support. Virtually everyone considers this separation to have been a mistake.
Ideally, whether an identifier was written in a prefixed or expanded form would be invisible to users. This can only be achieved via support in the core language.
Serialization issues can be easily resolved by always serializing KDL documents in a fully‐expanded form, although implementations may want to implement mechanisms for compacting KDL documents as well.

1 reply

zkat Sep 19, 2022
Maintainer

I have yet to see what feels like a compelling argument as to why this can't simply be layered on top of vanilla KDL, and why namespaces need to be a core part of the language. I'll point again to JSON-LD, which has managed metadata linking without having to modify JSON itself.

tabatkins · 2022-09-21T18:45:04Z

tabatkins
Sep 21, 2022
Maintainer

If it is so incredibly bad, why does every major metadata institution do it? (Examples: the Library of Congress, Wikidata, the British National Bibliography, Google, the fediverse, etc…) The thing you are criticizing is an internet best practice which has existed for decades and which has only become more important in recent years.

Because, for historical reasons, the big "metadata" crowd invested heavily in XML at one point in time, and that historical association has stuck for both inertia and personal (that is, literally the same people involved in many efforts) reasons. (Note that I've been involved in web standards for roughly 15 years now, so while I postdate the initial XML stuff, I've been around for a lot of discussions that are basically "ah, you want to express metadata? then you have to inherit all these xml-isms or it's terrible".)

A lot of it is that if you use tooling that hides the gory details of namespaces from you (so you can just work with prefixes), then it's no problem, because then it has roughly the same usability as all the non-URL-based namespacing mechanisms I mentioned. When you have a diversity of tooling and authors, tho, such that those details aren't hidden, it gets substantially worse.

urn:uuid: URIs, in fact, do not.

Sure, but those are (a) stripped down to precisely the amount needed for uniqueness and nothing else, making them unreadable for humans, and (b) in practice not used by any grammar intended for hand-authoring (due to (a)).

This additional information is incredibly important for humans.

Yup, meaningful names are important. It's the rest of the URL that's not meaningful, and comes with significant practical downsides.

Actually the sole benefit of using URLs is that it is a universally‐recognizable syntax with defined semantics, which is a very good benefit.

foo.bar or foo-bar is also universally-recognized (it's incredibly prevalent across virtually all methods of "namespacing" that have been invented), but doesn't contain a bunch of boilerplate cruft like URLs do, which don't meaningfully uniquify the identifier but do provide more ways to make mistakes.

I also strongly dispute the "defined semantics", since people use path segments , queries, and fragments in their NS urls, as far as I can tell, utterly at random. A URL on the web has reasonably well-defined semantics for its chunks, due to a combination of user expectations, common server configs, and browser behavior (for example, using a query segment triggers a request, but using a fragment doesn't), but a URL-ish string used in non-browser contexts can mean ~~whatever~~.

This requires having additional documentation about what each (locally) unique name means in a global sense (for example, “in this document the prefix foaf- refers to those terms defined in the http://xmlns.com/foaf/0.1/ namespace”). This is in fact the same exact information encoded in a prefix declaration, except that the latter is readable by computers in addition to humans.

Assuming that all humans/computers reading the file already knows what the foaf- prefix represents is folly. Resolvable prefixes are an established, elegant solution to this problem.

And yet that is, in fact, precisely what FOAF processors consuming information from the web at large need to do. When hand-authoring, people regularly omit the namespace declaration, or provide the wrong one (either an entirely wrong URL for some reason, or a subtly wrong one due to errors in the URL that register as insignificant to our human eyes, like http vs https), but they very consistently use the prefix "foaf", because it's meaningful to them, readable, short enough that it's hard to make mistakes, and is used in every example they've ever seen. So robust consumers of FOAF graphs, in addition to doing standard NS processing, do non-standard processing based on the literal "foaf" prefix, or else they miss out on a lot of data. Same applies to any other NS-based metadata scheme used across a reasoanble swath of the web (as opposed to processors that run on a locked-down community of files that can actually have linting processes applied to them).

All that said, you don't have to try and convince me; I've been working in web standards, as I said, for 15 years now, and I've formed my opinions thru long exposure. I find that XML Namespaces and their spiritual successors extremely appeal to certain people, for reasons I've never quite understood, and it's very difficult to convince them otherwise, but it's similarly difficult for them to make inroads back. There's a mindset disconnect that just can't be bridged very often.

0 replies

larsgw · 2022-09-21T19:11:09Z

larsgw
Sep 21, 2022
Maintainer

I really like linked data and I think KDL would be well suited to encode it, but I don't see a major downside with implementing namespace handling on top of the KDL standard, like JiK and XiK.

1 reply

larsgw Sep 21, 2022
Maintainer

I had written down some more specific responses to some of the points made but it comes down to this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Namespace prefixes in KDL #295

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Namespace prefixes in KDL #295

marrus-sh Sep 5, 2022

Why?

Details

Possible Solution

Replies: 8 comments · 4 replies

larsgw Sep 5, 2022 Maintainer

tabatkins Sep 6, 2022 Maintainer

zkat Sep 7, 2022 Maintainer

tabatkins Sep 7, 2022 Maintainer

djmattyg007 Sep 16, 2022

hughbris Sep 20, 2022

djmattyg007 Sep 20, 2022

marrus-sh Sep 19, 2022 Author

zkat Sep 19, 2022 Maintainer

tabatkins Sep 21, 2022 Maintainer

larsgw Sep 21, 2022 Maintainer

larsgw Sep 21, 2022 Maintainer

marrus-sh
Sep 5, 2022

Replies: 8 comments 4 replies

larsgw
Sep 5, 2022
Maintainer

tabatkins
Sep 6, 2022
Maintainer

zkat
Sep 7, 2022
Maintainer

tabatkins
Sep 7, 2022
Maintainer

djmattyg007
Sep 16, 2022

marrus-sh
Sep 19, 2022
Author

zkat Sep 19, 2022
Maintainer

tabatkins
Sep 21, 2022
Maintainer

larsgw
Sep 21, 2022
Maintainer

larsgw Sep 21, 2022
Maintainer