Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support i18n in URLs #5

Open
torhve opened this issue Jun 28, 2016 · 9 comments
Open

Support i18n in URLs #5

torhve opened this issue Jun 28, 2016 · 9 comments

Comments

@torhve
Copy link

torhve commented Jun 28, 2016

Would love support for both IDN-encoded domains http://øl.no/ and encoded paths and query args, like http://google.com/?q=æøå or http://google.com/å

Relevant RFC:
https://www.ietf.org/rfc/rfc3987.txt

@daurnimator
Copy link
Owner

daurnimator commented Nov 15, 2016

Normalisation for domain names is hard.

Links

@daurnimator
Copy link
Owner

Started work on a new module to provide the functionality required: https://github.com/daurnimator/lua-unistring

Though I don't know how I feel about adding a dependency for lpeg_patterns.

@daurnimator
Copy link
Owner

Interesting discussion in https://tools.ietf.org/html/draft-ietf-iri-3987bis-13 (found via http://blog.jclark.com/2008/11/what-allowed-in-uri.html, thanks @jclark) about the 'ucschar' production

@bagder
Copy link

bagder commented Nov 17, 2016

More URL problems are also detailed in: https://tools.ietf.org/html/draft-ruby-url-problem-01 and I blogged about a few a while ago: https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

There really is no good URL standard right now.

@rockdaboot
Copy link

Normalisation for domain names is hard.

libicu has TR46/UTS#46 support (transitional and non-transitional), but as you said (@daurnimator), your code has to work as plugin on systems without libicu. I just say this for the record that there is an 'easy' solution. libidn (=IDNA 2003) is obsolete and risky in use, libidn2 currently lacks UTS#46.
Yesterday I found idnkit-2 which has UTS#46 as well (used on Dragonfly BSD).

@daurnimator
Copy link
Owner

daurnimator commented Nov 21, 2016

I came up with this snippet that generates the IdnaMappingTable in pure lua: https://gist.github.com/daurnimator/be276c5d32329e2a9250f4aabeea48a8

The generated file is 880K. However loading it into memory seems to take up ~5.5M. Which makes me think it's not a good solution.

@daurnimator
Copy link
Owner

@rockdaboot do I recall you saying libidn2 had some fixes and is now a good solution?

@rockdaboot
Copy link

Yes, libidn2 0.14 (in Debian unstable, maybe also already in testing) has TR46 support.
I condensed the mapping table, so it has < 100k, stripped libidn2 now has 179592 bytes. There is still room for improvements.

When using idn2_lookup_*, add either IDN2_TRANSITIONAL or IDN2_NONTRANSITIONAL to the flags to get TR46 transitional or TR46 non-transitional behavior.

Another good thing with TR46 is, you don't have to lowercase and/or NFC the input - this will be done by the TR46 processing (automatically).

@daurnimator
Copy link
Owner

Today I packaged libidn2 for arch: https://aur.archlinux.org/packages/libidn2/
And wrote bindings for lua: https://github.com/daurnimator/lua-idn2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants