Skip to content
This repository has been archived by the owner on Jan 3, 2021. It is now read-only.

%-encoded versions of # and ? in IRIs in the original data don't work #24

Open
cygri opened this issue Oct 22, 2013 · 0 comments
Open

Comments

@cygri
Copy link
Owner

cygri commented Oct 22, 2013

That's because we %-encode the unencoded versions of these characters when rewriting IRIs.

Original IRI in the data => IRI where Pubby makes that data accessible:

  • http://dataset-base/foo?bar#baz => http://pubby-base/foo%3Fbar%23baz
  • http://dataset-base/foo%3Fbar%23baz => http://pubby-base/foo%3Fbar%23baz

Requested IRI in the web application => IRI that Pubby looks for in the dataset

  • http://pubby-base/foo%3Fbar%23baz => http://dataset-base/foo?bar#baz

So, if we have %23 or %3F in the original IRI, Pubby will not round-trip them correctly.

The solution of %-encoding the percent sign as %25 (so %23 becomes %2523) isn't nice, as it would only work if we %-encode all percent signs in any original data IRI. This means that %20 and other common %-sequences will now become really ugly. We want to keep Pubby's workings predictable and rewrite as little as possible, so this is bad.

A better solution is perhaps to think hard about ways of not requiring the escaping of # and ? in the first place. The former is needed because of its special role in IRIs (the part after the hash is not sent to the server when an HTTP request is made). The latter is, I believe, treated special because of the ?output=xxx thing we support, and perhaps because of uncertainty whether it's possible to still get exactly the original IRI after the servlet container has chopped it into request params.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant