Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probable concurrency issue with updateHackageIndex #111

Open
int-e opened this issue Oct 4, 2023 · 0 comments
Open

Probable concurrency issue with updateHackageIndex #111

int-e opened this issue Oct 4, 2023 · 0 comments

Comments

@int-e
Copy link

int-e commented Oct 4, 2023

I saw this transient (hadrian+)stack build failure on IRC, https://gitlab.haskell.org/ghc/ghc/-/jobs/1682718

As far as I can make out, the key events are

[...]
hadrian/build-stack --version
[which invokes `stack build --no-library-profiling` in ghc's `hadrian` subdirectory]
[...]
Error: [S-922]
No cryptographic hash found for Hackage package data-array-byte-0.1.0.1, updating
[...]
Error: [S-7282]
       Stack failed to execute the build plan.
       
       While executing the build plan, Stack encountered the error:
       
       Error: [S-922]
       No cryptographic hash found for Hackage package splitmix-0.1.0.4

Note that the first S-922 error mentions a different package than the final error, and there was no corresponding "..., updating" message for splitmix-0.1.0.4. That looks like a race.

A bit of analysis supports this idea:

The messages come from here

          updated <- updateHackageIndex $ Just $ display exc <> ", updating"
          mpair2 <-
            case updated of
              UpdateOccurred -> withStorage $ loadHackageTarballInfo name ver
              NoUpdateOccurred -> pure Nothing
          case mpair2 of
            Nothing -> throwIO exc
            Just pair2 -> pure pair2

So when the corresponding package information is not found, the hackage index is updated, and if an update actually took place, the operation is retried.

Inside updateHackageIndex we find this code

  gateUpdate inner = do
    pc <- view pantryConfigL
    join $ modifyMVar (pcUpdateRef pc) $ \toUpdate -> pure $
      if toUpdate
        then (False, UpdateOccurred <$ inner)
        else (False, pure NoUpdateOccurred)

which uses an MVar to ensure that the index isn't updated multiple times. However, if two threads find a stale index concurrently, only one of them will be notified that an update occurred; the other one will be stuck with whatever stale information was in the index.

Fixing this looks tricky, because the current updateHackageIndex interface fundamentally cannot reliably detect whether another update has just taken place, even if it is changed to wait for pending updates of the index to complete (that part looks easy to do by moving the inner call inside the modifyMVar, but I don't know how pcUpdateRef is used elsewhere). So all callers need to be investigated, I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant