Fix regex in api_xrefs #728

etomzak · 2025-02-14T19:31:46Z

Based on line 72, I think the intent was always that - would be a valid character for api-xrefs IDs (even though it's not a valid character for C++ identifiers). Prior to this fix, API IDs with - characters (e.g., [api]#my-api# in asciidoc) were silently not getting the href treatment, because they were missed by the regex on line 117. I.e., [api]#my_api# would get a link in the HTML, [api]#my-api# would not, and no error was generated.

This took a while to debug.

Caveat: I know nothing about Ruby.

TApplencourt · 2025-02-14T19:51:03Z

Make sense \w == [a-zA-Z0-9_]; honestly, I think \S*? (aka everything that is not space) will work best.

gmlueck · 2025-02-27T22:12:55Z

My weak preference would be to change line 72 to this:

(document.attr 'api-xrefs').scan(/([\w:]*)=([\w:-]*)/) do |api, id|

I.e. removing the - from the regex, making it consistent with ApiSpan and ApiDefSpan. This makes sense to me because the "api name" is always a C++ identifier in our current specification. I tried doing this, and the SYCL spec still builds, so I think it would work fine for the SYCL repo.

I assume you're using the [apidef] stuff in your fork of our repo in some other way that does not use C++ identifiers as the "api name"? Can you explain your use case?

I'll be out for a few weeks, so others in the groups should decide what to do. I'm not super opposed to this PR. However, if we are going to expand the set of legal characters for an "api name", it seems like we should allow anything, not just add - to the list.

etomzak · 2025-02-28T10:42:24Z

Can you explain your use case?

Sure. The short version is that in SYCL SC, we're looking at creating some "tags" -- essentially glorified comments that can be placed anywhere in the spec. So the spec can say, "foo() conforms to snafu-1," and when you click on snafu-1, you go to the section explaining what snafu-1 means. So it's not exactly the same thing as an API definition, but it's similar enough that it makes sense to handle it with the same documentation facility (instead of, say, copy-pasing the api_xrefs script and creating a subtly different version).

The longer version for people not familiar with this corner of spec tooling (based on my best understanding today):

The new spec format uses asciidoc [api] and [apidef] attributes to create clickable links to specific APIs. For example, [api]#context::get_backend# produces a fixed-width formatted clickable link to [apidef]#context::get_backend# (where get_backend is defined). The problem is that there's no way in asciidoc to create the same style of link to a section. For example, it's not possible to create a clickable link context (i.e., [api]#context# in asciidoc) that takes you to the context class section (sec:context-class). This is a painful limitation, because there are many places where we'd like to talk about a SYCL class and get the class name to link to the relevant section.

api_xrefs is a post-processing script that gets applied to the HTML output (so it's not available in the PDF). api_xrefs creates additional clickable links based on a key-value map. [api]#context# can be mapped to the sec:context-class anchor, and api_xrefs will create the corresponding href in the HTML.

So while api_xrefs is primarily intended for C++ identifiers, I don't see any reason for limiting it to C++ identifiers. As I've discovered, - could be useful. Just allowing all characters also seems reasonable, because this would, for example, allow for link text with a wildcard (e.g., foo_*_bar). IMO the goal here should really be to limit the surprise factor for people editing the spec: "Why can I use _ in my link, but not +!?"

gmlueck · 2025-02-28T13:44:46Z

So the spec can say, "foo() conforms to snafu-1," and when you click on snafu-1, you go to the section explaining what snafu-1 means.

I see. There are similar cases in the SYCL spec, but my plan was to handle this case differently. In SYCL, there are places where we want an expression to be in fixed-width font, and the expression might have terms that correspond to different APIs. For example, something like foo::bar()+baz::boom(). In a case like this, we'd want foo::bar and baz::boom to each be separate links.

My plan was to add tooling that recognized Asciiidoc syntax like [apiexpr]#foo::bar()+baz::boom()#. The tooling would parse the string foo::bar()+baz::boom(), find the terms foo::bar and baz::boom, and make each of these links to their respective API definitions.

Expressions like this are not that common in the SYCL spec, so I haven't gotten around to writing this tooling, and it will probably remain low priority for a while. However, that is the direction I was thinking. If your project needs something like this sooner, you could consider writing this tooling, and contributing it to the "api_xrefs" extension.

With this background, you can see why I'm hesitant to extend the regex the way this PR proposes. Once we add tooling for [apixref], we can't recognize - as part of the API name. Instead, - would be an operator that separates terms in an expression.

That said, I'm not totally opposed to the change in this PR as a temporary thing. However, you should expect that we will undo the change later when we implement [apixref].

etomzak · 2025-02-28T15:09:47Z

For example, something like foo::bar()+baz::boom(). In a case like this, we'd want foo::bar and baz::boom to each be separate links.

Hmm, yes, I see. That does seem like a more important use case than being able to use - in snafu-1. (Though it also implies implementing a rudimentary C++ parser in Ruby...) There are other ways to make snafu-1 work. I'll have to have a think about the options for SYCL SC.

I wasn't going to bring this up at first, but since we're talking about future directions, I'll just mention it: I have minor reservations about post-processing the HTML spec in general, because it raises the question of whether the PDF and HTML specs have parity and are equal sources of truth. I've thought of the api_xrefs script as just a user-friendly convenience that happens to be impossible in vanilla asciidoc --- if it makes the HTML reader's life easier without making the PDF reader's life harder, then might as well do it. The risk is that cross-referencing is a way of creating "words of power" (explicitly giving a word a specific technical meaning). If the spec text is rewritten with only the HTML target in mind, over time the PDF target could lose clarity (because what reads like a specific term in HTML might read like a generic term in the PDF). I don't think there's a risk of that with the current [api] -> [apidef] linking, and I don't imagine it being an issue with the proposed [apixref] as you've described, but there might be a need for a discussion on whether the PDF target is relevant, and if it is, how it can be maintained.

api_xrefs missing regex hyphen

1a7aef4

etomzak added the editorial Some purely editorial problem label Feb 14, 2025

TApplencourt approved these changes Feb 14, 2025

View reviewed changes

TApplencourt closed this Feb 27, 2025

TApplencourt reopened this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regex in api_xrefs #728

Fix regex in api_xrefs #728

etomzak commented Feb 14, 2025

TApplencourt commented Feb 14, 2025

gmlueck commented Feb 27, 2025

etomzak commented Feb 28, 2025

gmlueck commented Feb 28, 2025

etomzak commented Feb 28, 2025

Fix regex in api_xrefs #728

Are you sure you want to change the base?

Fix regex in api_xrefs #728

Conversation

etomzak commented Feb 14, 2025

TApplencourt commented Feb 14, 2025

gmlueck commented Feb 27, 2025

etomzak commented Feb 28, 2025

gmlueck commented Feb 28, 2025

etomzak commented Feb 28, 2025