Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Glob if Glob Ain't Glob 2: The Globbening #10518

Merged
merged 4 commits into from
Dec 13, 2024

Conversation

parsonsmatt
Copy link
Collaborator

This PR is a less invasive variant of #10502 which moves the special treatment of glob patterns into runDirFileGlob directly. Locally, I'm observing a comparable performance improvement to that PR: #10502 takes about 150ms, and this one takes about 250ms.

Template Α: This PR modifies behaviour or interface

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • Manual QA notes have been included.
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

Template B: This PR does not modify behaviour or interface

E.g. the PR only touches documentation or tests, does refactorings, etc.

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

@parsonsmatt parsonsmatt marked this pull request as draft November 4, 2024 16:59
@parsonsmatt parsonsmatt mentioned this pull request Nov 4, 2024
5 tasks
@parsonsmatt parsonsmatt marked this pull request as ready for review November 4, 2024 17:37
@alt-romes alt-romes self-requested a review November 5, 2024 12:06
Copy link
Collaborator

@alt-romes alt-romes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Minor points:

  • Please squash the "throwaway" commits into the base one
  • I wonder if we can try to be even smarter, or whether GlobFile [Literal filename] is the only relevant shortcut we can take

@alt-romes
Copy link
Collaborator

The error message in one of the glob tests seems to regress a bit:

-Warning: [glob-missing-dir] In 'extra-source-files': the pattern '/home/user/file' attempts to match files in the directory '/home/user', but there is no directory by that name.
+Warning: [no-glob-match] In 'extra-source-files': the pattern '/home/user/file' does not match any files.

My intuition is that go in the glob matching function would traverse glob piece by piece, so if /home/user is not a valid directory it gets reported as such before file being a non-existent file.

I think this is fine, since the speedup is likely more relevant than reporting non-existent directory root rather vs invalid filepath.

Do check whether the test itself is meant to test specifically for the behaviour which we're losing, so you can update it accordingly and say the behaviour is now something else (something something, "despite /home/user not existing, we report the error with the full path because of the early "fast" check..."), if that makes sense.

@parsonsmatt
Copy link
Collaborator Author

[glob-missing-dir] In 'extra-source-files': the pattern '/home/user/file' attempts to match files in the directory '/home/user', but there is no directory by that name.

I actually think this is a pretty confusing message: /home/user/file is not a glob pattern at all, it's a literal filepath. I can see there being some value to reporting "This file was not found, additionally, the entire directory isn't present," but I think we'd want to construct this error message after a file isn't found - otherwise we're going to be doing number of directories in segment redundant lookups on the common path (file exists, is a literal).

I'll review the tests, but I'm unsure what's actually failing. ./validate.sh passes for me locally.

@parsonsmatt
Copy link
Collaborator Author

I wonder if we can try to be even smarter, or whether GlobFile [Literal filename] is the only relevant shortcut we can take

There's a prior check that separates out the GlobDirectory [Literal segment] globs already, so if we get to a GlobFile [Literal filename], then the whole chain would have been GlobDirectory [Literal seg0] (GlobDirectory [Literal seg1] (... (GlobFile [Literal filename])...).

It is slightly faster to bypass the entire parsing step and joining of prefixes like that (as seen in #10502 ), but doing so in a type-safe way requires changing the API. I don't think the marginal performance impact there is worth breaking it.

Do check whether the test itself is meant to test specifically for the behaviour which we're losing, so you can update it accordingly and say the behaviour is now something else (something something, "despite /home/user not existing, we report the error with the full path because of the early "fast" check..."), if that makes sense.

I looked into the git blame and found this PR which introduced it: #8250 with related PR #8248. It seems like the intent of the test was just to capture the behavior so that the check rewrite would have identical behavior to to what was previously done. I don't think the test was intended to capture any particular error message.

@parsonsmatt
Copy link
Collaborator Author

RE squashing commits: I'm not great at git, any chance y'all can do a squash and merge on your side?

@ulysses4ever
Copy link
Collaborator

@parsonsmatt yes, squashing is easy on our side: we should make sure that the merge label we use is "squash+merge". Moreover, it's the author's job to pick the label, normally.

Copy link
Collaborator

@ulysses4ever ulysses4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Cabal/src/Distribution/Simple/Glob.hs Show resolved Hide resolved
@alt-romes
Copy link
Collaborator

Very good @parsonsmatt. Ready to land.

@ulysses4ever ulysses4ever added the squash+merge me Tell Mergify Bot to squash-merge label Nov 7, 2024
@parsonsmatt
Copy link
Collaborator Author

@parsonsmatt yes, squashing is easy on our side: we should make sure that the merge label we use is "squash+merge". Moreover, it's the author's job to pick the label, normally.

Thanks! I'm unable to add labels to the PR though.

@mergify mergify bot added the ready and waiting Mergify is waiting out the cooldown period label Nov 7, 2024
@ulysses4ever
Copy link
Collaborator

@parsonsmatt I just added respective permissions to your account I hope. But I also applied the label to this PR.

@mergify mergify bot added the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Nov 9, 2024
@alt-romes
Copy link
Collaborator

@ulysses4ever for some reason the PR is still not being merged, could you try to land it?

@ulysses4ever
Copy link
Collaborator

@alt-romes thanks for the ping! Yes, will look into it today.

@ulysses4ever
Copy link
Collaborator

@mergify refresh

Copy link
Contributor

mergify bot commented Dec 13, 2024

refresh

✅ Pull request refreshed

@ulysses4ever ulysses4ever force-pushed the mattp/skip-glob-filepaths-2 branch from 956fc0b to f8b1263 Compare December 13, 2024 12:02
Copy link
Contributor

mergify bot commented Dec 13, 2024

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #10518 has been dequeued. The pull request has been synchronized by a user.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

@haskell haskell deleted a comment from mergify bot Dec 13, 2024
@ulysses4ever
Copy link
Collaborator

@mergify requeue

Copy link
Contributor

mergify bot commented Dec 13, 2024

requeue

☑️ This pull request is already queued

@haskell haskell deleted a comment from mergify bot Dec 13, 2024
@Mikolaj
Copy link
Member

Mikolaj commented Dec 13, 2024

@mergify help

Copy link
Contributor

mergify bot commented Dec 13, 2024

help

❌ Sorry but I didn't understand the command. Please consult the commands documentation 📚.

@Mikolaj
Copy link
Member

Mikolaj commented Dec 13, 2024

@mergify dequeue

Copy link
Contributor

mergify bot commented Dec 13, 2024

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #10518 has been dequeued by a dequeue command.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

Copy link
Contributor

mergify bot commented Dec 13, 2024

dequeue

✅ The pull request has been removed from the queue squash-merge

@Mikolaj
Copy link
Member

Mikolaj commented Dec 13, 2024

@mergify refresh

Copy link
Contributor

mergify bot commented Dec 13, 2024

refresh

✅ Pull request refreshed

@Mikolaj
Copy link
Member

Mikolaj commented Dec 13, 2024

@mergify requeue

Copy link
Contributor

mergify bot commented Dec 13, 2024

requeue

☑️ This pull request is already queued

@mergify mergify bot merged commit 1586aaa into haskell:master Dec 13, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days ready and waiting Mergify is waiting out the cooldown period squash+merge me Tell Mergify Bot to squash-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants