Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new Uri type in 0.96.0 is hard to use and have almost no interoperability #284

Open
oxalica opened this issue May 20, 2024 · 15 comments

Comments

@oxalica
Copy link
Contributor

oxalica commented May 20, 2024

In #282, we changed the default URI type from url::Url to fluent_uri::Uri. Despite fluent_uri is several order of magnitude less used (and maybe less tested?) than url, it is also hard to use in the context of LSP servers and clients.

  1. We lacks the conversion method between lsp_types::Uri and fluent_uri::Uri (and also the popular url::Url), making lsp_types::Uri a standalone type which can only be constructed by parsing a literal string through FromStr, which is costly and suffers from escaping traps (also see (2)).
  2. Most of LSP URIs are file:// paths, plus untitled: dangling files. url::Url has convenient Url::to_file_path and Url::from_file_path for error-resistant conversions. But fluent_uri does not provide any relevant methods, and we can only serializing and parsing the path again. Non-UTF-8 and other URL-special characters (#, ? and etc) should also be carefully handled by the user.
  3. I'm not convinced by the reason in Consider not using url crate for directory specifiers? #261 for this big change. It complains that .join need some extra steps to work properly, but the crate migration makes it worse: fluent_uri does not provide join at all. The same use-case in that issue will now need to serialize, concatenate and parse again, plus all escaping work mentioned in (2).

I suggest to either:

  1. Revert back to Url, warn user that joining directory URLs should take extra cares. There are only few places that URLs refer to directories rather than files.
  2. Change them to String, fully transferring the URL handling logic to downstream users.

Either of these approaches would provide better usability and interoperability of URLs from and into LSP protocol uses.

@archseer
Copy link

Agree, this is a major regression for Helix. We're going to stick to the older package version for the time being.

@tooltitude-support
Copy link
Contributor

Are there any updates on this? It's understandable that the gluon lang is the main motivation for the existence of this crate. The change broke compatibility and for us it's hard to quickly fix it.

P.S. Using String IMO is the best solution.

@Myriad-Dreamin
Copy link

I encountered subtle issue when playing with url::Url, so I think url::Url is also not good enough for all cases.

To ensure both extensibility and type safety, Perhaps a struct Uri(String) can be provided. It might be look like:

#[derive(Serialize, Deseriaize)]
pub struct Uri(String);

#[cfg(feature = "rust-url")]
impl From<Uri> for url::Url { todo!() }

@Marwes
Copy link
Member

Marwes commented Jun 4, 2024

Yes, because url::Url does the wrong thing in some cases we did need to replace it with something. I haven't seen any better candidates than the fluent_uri crate so that really only leaves that crate or just using a plain String type as an alternative. Since fluent_uri can be easily converted to and from String (as_str/FromStr) it seemed best to have at least some helper methods and validations by using that crate.

To ensure both extensibility and type safety, Perhaps a struct Uri(String) can be provided. It might be look like:

Would that really be better than the current situation? There would still not be any of the helper methods that the old type provided, and the type would still be different than before so it would still be a breaking change?

@Marwes
Copy link
Member

Marwes commented Jun 4, 2024

We lacks the conversion method between lsp_types::Uri and fluent_uri::Uri (and also the popular url::Url), making lsp_types::Uri a standalone type which can only be constructed by parsing a literal string through FromStr, which is costly and suffers from escaping traps (also see (2)).

The newtype is unfortunate but it can hopefully be removed with updates to fluent_url eventually. If there are missing conversions please file a PR.

Most of LSP URIs are file:// paths, plus untitled: dangling files. url::Url has convenient Url::to_file_path and Url::from_file_path for error-resistant conversions. But fluent_uri does not provide any relevant methods, and we can only serializing and parsing the path again. Non-UTF-8 and other URL-special characters (#, ? and etc) should also be carefully handled by the user.

Most is not enough unfortunately, url::Url has incompatibilities with the uris used in the LSP spec so after people found and reported it, it had to go. It is unfortunate that some methods you were relying on were lost in that process but those can at least be restored, whereas the incompatibilities can't be fixed in the url::Url type.

I'm not convinced by the reason in
#261 for this big change. It complains that .join need some extra steps to work properly, but the crate migration makes it worse: fluent_uri does not provide join at all. The same use-case in that issue will now need to serialize, concatenate and parse again, plus all escaping work mentioned in (2).

It is not just inconvenicances, the url crate adds extra trailing slashes in some cases e3d0ed2#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759L2820-L2827 which causes changes in behaviour in the LSP.

Revert back to Url, warn user that joining directory URLs should take extra cares. There are only few places that URLs refer to directories rather than files.

Not possible, url::Url does not match the LSP spec.

Change them to String, fully transferring the URL handling logic to downstream users.

That seems overall worse to me. With plain strings we get no validations and no convenience methods at all. The current type at least gets us validation and some convenience methods and if the conveniences are not enough we are at least able to add those (either to this crate, or fluent_uri`) to build to something better. If that is not enough at this immediate moment you can always convert to/from strings anyway and do your own handling.

@oxalica
Copy link
Contributor Author

oxalica commented Jun 4, 2024

Yes, because url::Url does the wrong thing

wrong thing in some cases we did need to replace it with something

url::Url has incompatibilities with the uris used in the LSP spec

Could you elaborate so we can check them one-by-one? I assume you are just referring to #261 which I'll explain below. I'm using url for a long time and it's fairly standard-compliant to me, if not more than fluent_uri.

Turns out there's a divergence of what's "wrong". According to the RFC3986 section 5.2.3 mentioned in that same issue here:

return a string consisting of the reference's path component
appended to all but the last segment of the base URI's path (i.e.,
excluding any characters after the right-most "/" in the base URI
path
, or excluding the entire base URI path if it does not contain
any "/" characters).

You can also confirm the "correct" behavior from other language's URI/URL implementation, eg. in nodejs 20.12.2:

> new URL('foo', new URL('file:///no/trailing/slash')).toString()
"file:///no/trailing/foo"
> new URL('foo', new URL('file:///no/trailing/slash/')).toString()
"file:///no/trailing/slash/foo" 

Though it's kinda unfortunately, it seems manually adding a trailing slash is necessary. In the LSP Spec's recommended implementation vscode-uri, it has a Utils.joinPath which converting into POSIX path, joining them and converting it back to URI (just like Url::to_file_path, Path::join then Url::from_file_path), instead of using the native URI join. I think they were also aware of this problem and chose not to change the language provided URI-join implementation.

To ensure both extensibility and type safety, Perhaps a struct Uri(String) can be provided. It might be look like:

Would that really be better than the current situation?

To me, letting user parse it using whatever crate they like directly, is better than parsing to an predetermined type (which some users don't like), serializing to string, and parsing to another type again. The current solution is only better if most users can directly use the fluent_uri type, which turns out to be not the case, based on the number of 👍 on this issue.

so after people found and reported it, it had to go.

Note that users like me who don't encounter any issue previously, of course, do not report issues. I guess there's a bias here. We only report after the change, like this one.

It is not just inconvenicances, the url crate adds extra trailing slashes in some cases e3d0ed2#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759L2820-L2827 which causes changes in behaviour in the LSP.
Url::parse("file://test").unwrap()

That's because you missed one slash after the scheme. LSP Spec already mention the format explicitly in here:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment

file:///c:/project/readme.md
file:///C%3A/project/readme.md

For local machine (empty authority), it should be file:///path. Three slashes.

println!("{}", url::Url::parse("file:///test").unwrap());
> file:///test

Not possible, url::Url does not match the LSP spec.

As I explained above, it seems to match it (and RFC3986) better than fluent_uri.

With plain strings we get no validations and no convenience methods at all.

Validation is a good point, though I don't think it's such important considering we have to trust the peer (either server or client) anyway. Also I don't see convenience methods on fluent_uri either. The only useful method is path() but you also have to manually unescape it.

// An example URI in the LSP Spec, see the link above.
let uri = fluent_uri::Uri::parse("file:///C%3A/project/readme.md").unwrap();
println!("{}", uri.path());
> /C%3A/project/readme.md

It does not look convenient to me.

I agree fluent_uri is a good crate for fast parsing and component extraction, but it's hard to use when you want to construct, store, and access regularly as a generic container.

@tooltitude-support
Copy link
Contributor

@Marwes May be it's possible to parameterize the module with the URL type. I.e. have a feature flag in the module. You could define a type alias and fill it depending on the the feature values.

It will at least allow to postpone migration to different url type for your users.

osiewicz added a commit to zed-industries/zed that referenced this issue Jun 18, 2024
This reverts URI changes made in
#12928 while keeping the perf
goodies in tact. We should keep an eye out for
gluon-lang/lsp-types#284
Fixes: #13135
Fixes: #13131
Release Notes:

- N/A
@Timmmm
Copy link

Timmmm commented Sep 3, 2024

Yeah I'm also finding this very painful. I don't have a high opinion of the url crate (I've barely even used it and I ran into an extremely trivial crash bug that still hasn't been fixed)... So I'm definitely sympathetic to using something better, but I would say there's no way fluent_uri is overall better yet. It's very immature, there are zero convenience methods, the version used by lsp-types doesn't have any way of mutating or constructing URIs, the API changed substantially in the newest version.

@lpil
Copy link

lpil commented Oct 7, 2024

Hello all! We have also been unable to upgrade in the Gleam compiler due to lost functionality. Thank you.

@ogoffart
Copy link

ogoffart commented Oct 7, 2024

(We're also holding up the update in Slint)

@mkaput
Copy link

mkaput commented Oct 8, 2024

Same in the LS for Cairo language.

@archseer
Copy link

archseer commented Oct 8, 2024

We've ended up forking the crate (https://github.com/helix-editor/helix/tree/master/helix-lsp-types) but we also plan to make some changes to the fork to better work with the editor internals.

@udoprog
Copy link
Contributor

udoprog commented Oct 27, 2024

Same here, holding back from upgrading since the new URI container does not seem to:

  • Allow for modifying the path of the URI.
  • Have a to_file_path replacement.

@Decodetalkers
Copy link

It is a big project for changing pathbuf to uri. Returning uri to pathbuf is also big effort.. I admire the author of fluent_uri, but when considering a lot of edge conditions...

@alesbrelih
Copy link

Same here, can't update https://github.com/alesbrelih/gitlab-ci-ls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests