Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Discover schedule.xml URL/location #336

Open
rixx opened this issue Mar 7, 2024 · 8 comments
Open

RFC: Discover schedule.xml URL/location #336

rixx opened this issue Mar 7, 2024 · 8 comments

Comments

@rixx
Copy link

rixx commented Mar 7, 2024

Summary

Make the location of an event's schedule.xml file / Giggity data source discoverable by finding a <link> tag in websites supplied by users.

Background

Both for events I run and for events I just provide support for (as maintainer of pretalx)¹, I frequently get questions about how to add an event to Giggity. pretalx at least links to the XML file openly – though not mentioning Giggity by name –, whereas with other events, it's even harder to figure out how to find the data, as documented in your README.

Some of the time, people have already tried putting the event's schedule page into Giggity, only for that not to work. Of course, not all users try this, and not all schedule pages are actually served by CfP/scheduling systems (often, they are static exports or custom representations), but from my (obviously biased) perspective, it's a sizeable amount.

Proposal

Define a <link rel="alternate"> tag that events and tools can include in their websites in order to link to their Giggity-compatible data source.

Implementation

The well-defined way for websites to link to different data format is the rel=alternate attribute on <link> tags, with the type attribute specifying the MIME type. The href attribute contains the actual URL, and the type attribute contains a MIME type:

<link rel="alternate" type="application/xml" href="https://example.org/schedule.xml" title="Schedule data">

Giggity doesn't have a MIME type registered, but with the proliferation of application specific MIME types (the official list includes more than 400 types including a +xml), I think it would also be permissible to call the type something like application/giggity+xml or application/schedule+xml, but I don't really see a need for this – application/xml is correct, and other similar links use more specific types, such as application/rss+xml.

The application following this link should expect the link to redirect, and follow redirects.

Link format

From Giggity's point of view, it would be ideal if providing this link would allow event organisers to also include Giggity-style metadata, as the JSON you can pass to ggt.gaa.st links. I don't think providing ggt.gaa.st links in this proposed <link> tag would be a good idea though, as we do want to link to the actual schedule data in a fairly tool agnostic way, so the link should always lead to an XML document. Instead, I think adding the same metadata as fragment to the supplied link should work, allowing non-Giggity tools to still use the link, while Giggity could parse the additional data:

<link rel="alternate" type="application/xml" href="https://example.org/schedule.xml#json=<base64 as produced by ggt.sh>" title="Schedule data">

Impact on Giggity

Giggity would need to fully parse websites for this – <link> tags are well-defined and simple to parse, but websites often contain weirdness and errors, so you'd need a fault-tolerant parser that will still give you a functioning link tag, even when there are e.g. unclosed tags further down the document. I'm not sure if Giggity already does HTML parsing or uses a dependency that provides this (I know you do some caching of websites, but not sure if you inspect them along the way).

For positive impact, Giggity would be even easier to recommend to organsiers looking for an accompanying phone app for their event, and users would need less insider knowledge (the tool used by the conference, or the fact that "frab-compatible XML" is the URL they need) to add an event to Giggity.

Impact on events

Organisers would be able to add this link (optionally including a menu structure and metadata!) on their websites, allowing them to determine – and change – how their event looks in Giggity. They would also hopefully have fewer support cases.


¹ Also, just gotta say, as maintainer of pretalx, I love that giggity exists and recommend it to events all the time. Thank you for making and maintaining it.

@Wilm0r
Copy link
Owner

Wilm0r commented Mar 9, 2024

Woo, thank you for the very detailed proposal! Yes, this is an interesting idea. Let me try to summarise my thoughts..

First: Giggity picked up Markwon a few versions ago and I think it could help here. https://noties.io/Markwon/docs/v4/html/ Though I wonder whether just feeding HTML to a streaming XML parser would also work? No need for it to build a DOM tree, a streaming XML parser is probably very unpicky as well, I'd expect?

For the MIME-type .... So theoretically maybe some day the standard format could become JSON-based? You'll know that better than me obviously :) Just searching for a <link type=application/xml> and then checking whether the href= points at something we can parse is a way, but I'd love something more exact, either indeed by claiming a MIME-type for this (application/giggity+(xml|json|yaml(ugh, please no BTW))?), or by using another tag (title= is available? Though I guess that theoretically needs to be localised..).

Nit: It'd probably be cool if the link could be included not just on the schedule page of the conference but also the frontpage or perhaps just ~all of them? Since the user is going to have to manually enter/copypaste a URL, it'd be cool if it could be as simple as typing fosdem.org. But if the link is on more pages, would rel=alternate still be accurate?

I do have deduplication on my mind a bit.. Very important that we don't end up with two entries in the chooser menu for example, which could happen if the conference later submits a menu.json entry. And there's the interesting case of FOSDEM who finally switched over to Pretalx this year AFAIK, but their published schedule XML file was not a straight Pretalx export but something they generated themselves that looked more closely like their old file format (roughly Pentabarf so still pretty similar, just without some of the (recently) added fields). Not sure what to do here, probably hard to really get this right. Merging may be feasible if IDs/GUIDs match, but still an odd feature.

I started reintroducing an id= tag in the menu.json file to at least support rare cases where schedule files get moved around, but that won't help here either probably.

¹ Also, just gotta say, as maintainer of pretalx, I love that giggity exists and recommend it to events all the time. Thank you for making and maintaining it.

Thank you and you're welcome! It's an interesting hobby when the daytime job is more SRE-like. I'm sad that the "each event their own app" model seems to have won, but I'll keep doing what I think is better. :)

And obviously thank you for Pretalx!

@rixx
Copy link
Author

rixx commented Mar 9, 2024

Theoretically maybe some day the standard format could become JSON-based?

The XML standard is maintained by c3voc and their validator, who also provide a JSON schema for frab's (and hence pretalx's etc) JSON export. They should include largely the same data, so you could have a go at that instead? I don't mind either way, pretalx implements both APIs, and if we include one <link> tag, we may as well (and probably would) include both.

but I'd love something more exact, either indeed by claiming a MIME-type for this (application/giggity+(xml|json) or by using another tag (title= is available?)

Sure, that would work just as well. I think I have a soft preference for using application/frab+(xml|json), as the format is mostly known for (and maintained via) frab, and hardcoding Giggity there feels a bit exclusive, but tbh it doesn't really matter and would just become a de-facto standard either way (even without a formally registered mime type, which I don't think would be necessary).

But if the link is on more pages, would rel=alternate still be accurate?

I think so, yes, or at least I had intended it that way. As long as the format is sufficiently specific, that'd be like providing the RSS feed link on all pages of a website, which is also commonly done.

(Re: deduplication: I think the whole menu.json thing is a secondary concern, as we can just as well start without it, and you can decide at any point to support it or not, imo! If conferences provide their own modified export, they could link to that instead, too, so hopefully there wouldn't be too much duplication going on.)

Going to tag @saerdnaer here, as he's involved with the C3VOC side / the schedule.xsd and JSON schema, and may have ideas or opinions.

@saerdnaer
Copy link

I would prefer application/schedule+(xml|json) – in my option the format has became independent from frab/pentabarf... And as far as I understood FOSDEM and CCC / Congress will both use pretalx – so it's makes not really sense to add new schedule format features to frab source code...

@Wilm0r
Copy link
Owner

Wilm0r commented Mar 11, 2024

Sorry, yes obviously, generic has my preference too, typed that response too fast apparently. :)

I'd say that "schedule" is too generic though? :( That'd be the benefit of picking for example "frab": With most of us that's a pretty good descriptor for a file format describing conference schedules. Just "schedule" could still be anything else calendar'y.

For the duplication, you may have a good guess on how (un)likely my worries are? If folks build their own format like how FOSDEM did it, oh well oops... but hopefully Pretalx at least by default doesn't have 5 different canonical links pointing at the same XML content? Either way, yeah I'll think that one through later on.

@rixx
Copy link
Author

rixx commented Mar 12, 2024

I'd say that "schedule" is too generic though?

Yeah, +1 on that. Claiming a MIME type of "application/schedule+xml" would be to ballsy imo. "frab" has the advantage of being an established tool – as an alternative "application/c3voc+xml" might work, as VOC are maintaining the format definition?

CCC / Congress will both use pretalx

Huh, good to know. That's news to me.

@saerdnaer
Copy link

I asked in the local hack space and got directed to application/vnd.c3voc.schedule+xml which I filed as a media-type registration request:

https://mailarchive.ietf.org/arch/msg/media-types/l_coKFlR20ZsQifB9GWeXIoRaA0/

A review is to expected by March 29th.

@Wilm0r
Copy link
Owner

Wilm0r commented Mar 24, 2024

Terrific!

I'll go play with XML/HTTP parsers a little bit.

@Wilm0r
Copy link
Owner

Wilm0r commented Mar 25, 2024

OK, the standard XML parser already used by Schedule.java won't help I'm pretty sure. Even though it's possible to intercept non-fatal errors according to the SAX parser docs, my attempts at doing so aren't working, and in fact the error message suggests that it's secretly using this parser instead. o_O

So while it picked up the <link href="http://gaa.st/giggity" type="application/vnd.c3voc.schedule+xml" title="yo"> in my test file, it immediately chokes on the intentionally missing />:

03-24 21:34:59.938  6259  6304 W System.err: org.xmlpull.v1.XmlPullParserException: expected: /link read: head (position:END_TAG </head>@35:10 in java.io.BufferedReader@fe562c6) 

Don't see a way to make it less strict, validation is already turned off.

So let's try the Markwon HTML parser instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants