Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout running job when updating NuGet packages after updater logic was re-written in C# #9375

Open
1 task done
batkaevruslan opened this issue Mar 27, 2024 · 31 comments
Open
1 task done
Labels
L: dotnet:nuget NuGet packages via nuget or dotnet T: bug 🐞 Something isn't working

Comments

@batkaevruslan
Copy link

batkaevruslan commented Mar 27, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Package ecosystem

NuGet

Package manager version

No response

Language version

No response

Manifest location and content before the Dependabot update

No response

dependabot.yml content

No response

Updated dependency

No response

What you expected to see, versus what you actually saw

Hi! Since the release of https://github.blog/changelog/2023-11-28-improvements-to-nuget-support-for-dependabot/
Dependabot stopped working in our non-tiny mono repository. Before that, it had been working for more than half a year.

proxy | 2024/03/26 05:40:57 proxy starting, commit: cf8623577dad71c128f219df2b27df6de35b909d
...
updater | 2024/03/26 05:41:06 INFO <job_805408216> Starting job processing
...
updater |     file: Directory.Packages.props, metadata: 
updater |   name: Microsoft.Build.Traversal, version: 4.1.0
updater |     file: Build.IntegrationTests.proj, metadata: 
updater | 2024/03/26 05:48:58 INFO <job_805408216> Checking if prometheus-net 8.2.1 needs updating
updater | 2024/03/26 05:48:59 INFO <job_805408216> Filtered out 128 pre-release versions
  proxy | 2024/03/26 05:48:59 [227] GET https://nuget.***.dev:443/nuget/Packages(Id='prometheus-net',Version='8.2.1')
  proxy | 2024/03/26 05:48:59 [227] * authenticating nuget feed request (host: nuget.***.dev, basic auth)
  proxy | 2024/03/26 05:48:59 [227] 404 https://nuget.***.dev:443/nuget/Packages(Id='prometheus-net',Version='8.2.1')
  proxy | 2024/03/26 05:48:59 [229] GET https://api.nuget.org:443/v3-flatcontainer/prometheus-net/8.2.1/prometheus-net.nuspec
  proxy | 2024/03/26 05:48:59 [229] 200 https://api.nuget.org:443/v3-flatcontainer/prometheus-net/8.2.1/prometheus-net.nuspec
  proxy | 2024/03/26 05:49:57 Posting metrics to remote API endpoint
  proxy | 2024/03/26 05:49:57 Successfully posted metrics data via api client
updater | time="2024-03-26T06:28:30Z" level=info msg="task complete" container_id=job-805408216-updater exit_code=137 job_id=805408216 step=updater
updater | time="2024-03-26T06:28:32Z" level=warning msg="timeout running job" error="waiting for updater: waiting for container: context deadline exceeded" job_id=805408216

What is going on between 05:49:57 and 06:28:32?

Native package manager behavior

No response

Images of the diff or a link to the PR, issue, or logs

No response

Smallest manifest that reproduces the issue

No response

@batkaevruslan batkaevruslan added the T: bug 🐞 Something isn't working label Mar 27, 2024
@github-actions github-actions bot added L: dotnet:nuget NuGet packages via nuget or dotnet L: git:submodules Git submodules L: go:modules Golang modules labels Mar 27, 2024
@abdulapopoola abdulapopoola removed L: go:modules Golang modules L: git:submodules Git submodules labels Mar 27, 2024
@batkaevruslan
Copy link
Author

Today's log

...
 proxy | 2024/04/16 11:11:12 [369] 200 https://api.nuget.org:443/v3-flatcontainer/system.security.cryptography.xml/index.json
  proxy | 2024/04/16 11:12:02 Posting metrics to remote API endpoint
  proxy | 2024/04/16 11:12:02 Successfully posted metrics data via api client
updater | time="2024-04-16T11:52:13Z" level=info msg="task complete" container_id=job-815472451-updater exit_code=137 job_id=815472451 step=updater
updater | time="2024-04-16T11:52:15Z" level=warning msg="timeout running job" error="waiting for updater: waiting for container: context deadline exceeded" job_id=815472451

Nothing between 11:12:02 and 11:52:15...

@brettfo, hi! Do you have any ideas or know someone who can help?

@brettfo
Copy link
Contributor

brettfo commented Apr 16, 2024

@batkaevruslan Two questions for you:

  1. Can you share the full log? I want to see what steps have run already.
  2. Is this repo public, or if not, do you have a public repo where the same timeout occurs?

There's obviously something very slow happening and I'd like to experiment a bit to see where we could potentially speed things up. There are some instances, however, where there's not much we can do, e.g., if dependabot is running against a huge repo and trying to process all updates. The C# rewrite calls into the real NuGet and dotnet code which does much more checking, etc., and is certainly slower than the original code which was a sort of best-guess.

@batkaevruslan
Copy link
Author

@brettfo ,

  1. https://pastebin.com/9vp33hNk
  2. it is private and I can't share it. I don't have a public repro.

Can you add some more logs to see what is going between 11:12:02 and 11:52:15 ?

@batkaevruslan
Copy link
Author

batkaevruslan commented Apr 18, 2024

@brettfo ,
I created a solution with one project. That project references almost all NuGet packages that we have in our product solution. Almost all, because I removed ~20 packages that are stored in private repositories.
https://github.com/batkaevruslan/dependabotTimeoutRepro/blob/main/DependabotTimeoutRepro/DependabotTimeoutRepro.A/DependabotTimeoutRepro.A.csproj

And there was a timeout. There is also some unknown error, but I don't see it in logs...
image

P.S. the update log is so huge that GitHub fails to return it (50x errors) and if GitHub returns a response, Chrome fails to render it :)
At the second attempt (816330767), Dependabot created a couple of PR, but there still was "Unknown error"
https://github.com/batkaevruslan/dependabotTimeoutRepro/pulls

@brettfo
Copy link
Contributor

brettfo commented Apr 18, 2024

@batkaevruslan Thank you for this, I'll start experimenting.

@batkaevruslan
Copy link
Author

batkaevruslan commented Apr 25, 2024

@brettfo , hi! I've seen your pr #9566 and run Dependabot update on my public repo: at least, GH does not fail rendering the update log :) Timeouts are still here. Log show 5k+ requests to POST https://dc.services.visualstudio.com:443/v2/track from 05:31:17 till 06:14:56
Can this tracking influence performance?
image

@brettfo
Copy link
Contributor

brettfo commented Apr 25, 2024

The telemetry calls shouldn't be the determining factor in the timeouts, but I'll keep experimenting with your repo.

@brettfo
Copy link
Contributor

brettfo commented Apr 25, 2024

For a quick follow-up, I removed the telemetry calls and started running an update on the sample repo provided and right now it's still going at 4 hours. There will certainly be more work to get the time down, but at some point we'll hit a wall because much of the work now is done either directly with the dotnet CLI tool or indirectly through the NuGet APIs. The result is that it's slower, but (hopefully) more correct.

@batkaevruslan
Copy link
Author

@brettfo, thanks for update!
When we eventually hit the wall:

  1. Can you extend the timeout limits for NuGet updater, as they "are slow by definition"?
  2. Do you know what we can do on our side to fit the limit, except reducing the number of dependencies? Can we somehow change dependabot.yml to split update job?

@brettfo
Copy link
Contributor

brettfo commented Apr 30, 2024

@batkaevruslan I'm not aware of any setting to change the max run length, but given how long I've left this running locally with no end in sight, I don't think that's a viable option.

As for modifying the dependabot.yml, you should be able to split it up a bit by dependency name. I haven't tried this yet, but you might be able to do something like this:

version: 2
registries:
  public-nuget:
    type: nuget-feed
    url: https://api.nuget.org/v3/index.json
updates:
  - package-ecosystem: nuget
    directory: "/"
    registries:
      - public-nuget
    schedule:
      interval: daily
      time: "07:00"
      timezone: "America/Los_Angeles"
    open-pull-requests-limit: 15
    groups:
      # each group here should run as a separate update job
      MicrosoftExtensions: # this group name is purely for you
        patterns:
          - "Microsoft.Extensions.*" # here we group a few dependencies at a time
      FluentAssertions:
        patterns:
          - "FluentAssertions" # you can list multiple patterns
          - "FluentAssertions.*"
      # add any other groups you see fit

@batkaevruslan
Copy link
Author

Just for history: using grouping does not help to avoid timeouts.
My public repository does use grouping: https://github.com/batkaevruslan/dependabotTimeoutRepro/blob/main/.github/dependabot.yml#L16

An attempt to use several - package-ecosystem: "nuget" sections with different allowed packages and different schedules failed with

Update configs must have a unique combination of 'package-ecosystem', 'directory', and 'target-branch'

@david-brink-talogy
Copy link

david-brink-talogy commented May 6, 2024

at some point we'll hit a wall because much of the work now is done either directly with the dotnet CLI tool or indirectly through the NuGet APIs. The result is that it's slower,

I'm also seeing this on a private repo and I'm curious about this comment. Did the old solution not have to call make similar API calls to determine out of date dependencies?

When I run dotnet list package --outdated --include-transitive on the example solution's directory I get the list of packages in less than 30 seconds. I traced requests to api.nuget.org and I see about 300 requests to get the index and then paged data via GET /v3/registration5-gz-semver2/{package_name}/*.

I know dependabot collects release notes and the commit diff. Is it this extra information that causes the wall to be hit?

@batkaevruslan
Copy link
Author

@brettfo, hi! Any news on the issue?

I was wondering if our product solution is the largest in the world and the only one suffering from timeouts after this release...

  1. I forked eShopOnContaines (demo project from MS): https://github.com/batkaevruslan/eShop
  2. Ran Dependabot and got timeouts

image

@brettfo
Copy link
Contributor

brettfo commented May 30, 2024

@batkaevruslan This is something we're actively working on, but the short version is that we don't want to generate PRs that will fail the end user's CI and with the example you shared, updating the requested package resulted in one or more NU1605/NU1608 warnings which indicates a conflict with the package versions specified. Currently the approach is an unsophisticated brute force search of trying to resolve all of those warnings and that is really slow. Our current work is trying to smartly trim down the package search space, but like all good bugs, fixing one issue surfaces 6 more.

@batkaevruslan
Copy link
Author

updating the requested package resulted in one or more NU1605/NU1608

Looks like you are about my another issue #6710

Currently the approach is an unsophisticated brute force search of trying to resolve all of those warnings and that is really slow.

Is it possible to build a graph of all dependencies once and create PRs only for leaf level dependencies? Even if PR grouping feature of Dependabot won't work, it is better for us to have one PR per leaf dependency every day than no PRs at all.

@Patrick-3000
Copy link

Patrick-3000 commented Oct 24, 2024

Same issue here. After 60 minutes, Azure is killing our Dependabot run due to timeout and it is not possible with our tier to increase the timeout to more then 60 minutes. Also, I am not sure what Dependabot is doing: we see 20.000 lines of GET/POST log entries in our pipeline log.

@roldengarm
Copy link

Yep, same issue here, it times out after 60 minutes. Last run had 70k log lines and then got cancelled.

@Patrick-3000
Copy link

@roldengarm: I managed to figure out the issue and a solution: Microsoft is killing the pipeline after 60 minutes, no matter which timeout we specify, due to the fact that we used free Microsoft-hosted agents.

Now we use the paid Microsoft-hosted agents and have a timeout of 360 minutes. To archive this, you have to order the paid agents and set the timeout value of the pipeline to 0.

But I still don't understand why we need > 100k requests and > 2 hours to update a handful of dependencies.

I observed that a lot of requests result in 401/403 errors, followed by 200 status code responses. For me (as somebody who has absolutely no background knowledge of Dependabot) is seems as if Dependabot first tries to authenticate without credentials, then receives an authentication error as a response and only then includes credentials.

@roldengarm
Copy link

I'm using Github, not ADO, so not sure how to increase the timeout.
For now I've reverted to doing it manually which takes a couple of minutes.

Thanks @Patrick-3000

@brettfo
Copy link
Contributor

brettfo commented Nov 13, 2024

There ideally shouldn't be any 401/403 codes in the log, are you passing in a token via dependabot.yml?

As for lots of requests, the NuGet protocol dictates that every package needs to be downloaded to properly determine the dependencies so there is expected to be a lot of network traffic during a NuGet update.

@Patrick-3000
Copy link

Patrick-3000 commented Nov 20, 2024

@brettfo: Yes, I understand. However, the current algorithm leads to issues which prevents us from using Dependabot on our main repo: due to the runtime of > 2.5 hours we get pipeline timeouts and our agents run out of disc space due to log files of > 10 GB.

My (to be honest, very naive) approach would be to just update every Nuget dependency. If there is no newer version, fine. If there is, we update it. In this case, we would save a lot of round trips (and therefore runtime and disc space for logs). And I am sure, that ~100 Nuget packages won't take > 2.5 hours to update.

@batkaevruslan
Copy link
Author

batkaevruslan commented Nov 20, 2024

By the way, Dependabot fails to update the repro repository, that I mentioned earlier, with the error about disk space:

updater | 2024/11/14 03:39:48 ERROR <job_916598998> Starting analysis of MediatR...
updater | Unhandled exception: System.IO.IOException: No space left on device : '/tmp/dependabot-packages_22752f9a-511a-42fb-b23a-6ec5d0ffcf64'
updater |    at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode)
updater |    at System.IO.Directory.CreateDirectory(String path)
updater |    at NuGetUpdater.Core.Analyze.NuGetContext..ctor(String currentDirectory, ILogger logger) in /opt/nuget/lib/NuGetUpdater/NuGetUpdater.Core/Analyze/NuGetContext.cs:line 41
updater |    at NuGetUpdater.Core.Analyze.AnalyzeWorker.RunAsync(String repoRoot, WorkspaceDiscoveryResult discovery, DependencyInfo dependencyInfo) in /opt/nuget/lib/NuGetUpdater/NuGetUpdater.Core/Analyze/AnalyzeWorker.cs:line 96
updater |    at NuGetUpdater.Core.Analyze.AnalyzeWorker.RunWithErrorHandlingAsync(String repoRoot, String discoveryPath, String dependencyPath) in /opt/nuget/lib/NuGetUpdater/NuGetUpdater.Core/Analyze/AnalyzeWorker.cs:line 47
updater |    at NuGetUpdater.Core.Analyze.AnalyzeWorker.RunAsync(String repoRoot, String discoveryPath, String dependencyPath, String analysisDirectory) in /opt/nuget/lib/NuGetUpdater/NuGetUpdater.Core/Analyze/AnalyzeWorker.cs:line 34
updater |    at NuGetUpdater.Cli.Commands.AnalyzeCommand.<>c.<<GetCommand>b__4_0>d.MoveNext() in /opt/nuget/lib/NuGetUpdater/NuGetUpdater.Cli/Commands/AnalyzeCommand.cs:line 30
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Invocation.AnonymousCommandHandler.InvokeAsync(InvocationContext context)
updater |    at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass17_0.<<UseParseErrorReporting>b__0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass12_0.<<UseHelp>b__0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseVersionOption>b__0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseTypoCorrections>b__0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__18_0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseParseDirective>b__0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__5_0>d.MoveNext()
updater | --- End of stack trace from previous location ---
updater |    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass8_0.<<UseExceptionHandler>b__0>d.MoveNext()

@Patrick-3000
Copy link

@batkaevruslan: for this problem, I opened issue tinglesoftware/dependabot-azure-devops#1473

@brettfo
Copy link
Contributor

brettfo commented Nov 20, 2024

@Patrick-3000 Even looking for newer packages means we have to download and extract it to see what it's dependencies are. It's not uncommon for a package update to not be possible because some transitive dependency won't work. There's not really a quick way around it, otherwise we'd be generating massive amounts of PRs that don't build.

@david-brink-talogy
Copy link

david-brink-talogy commented Nov 20, 2024

@Patrick-3000 Even looking for newer packages means we have to download and extract it to see what it's dependencies are. It's not uncommon for a package update to not be possible because some transitive dependency won't work. There's not really a quick way around it, otherwise we'd be generating massive amounts of PRs that don't build.

@brettfo, on the other hand, for myself and many others, dependabot simply no longer works and isn't generating PRs because it times out most or even all of the time.

@chriscameron-vertexinc
Copy link

@Patrick-3000 Even looking for newer packages means we have to download and extract it to see what it's dependencies are. It's not uncommon for a package update to not be possible because some transitive dependency won't work. There's not really a quick way around it, otherwise we'd be generating massive amounts of PRs that don't build.

I may be speaking for myself, but I'd be fine with seeing lots of update PRs that don't build. It highlights that newer versions are available and it's up to the human developer if they want to make the necessary changes to fix compatibility issues.

I'd be very happy if there was an option to force the Dependabot NuGet module into "dumb mode", where it simply makes PRs for any immediate dependencies that have a new version available (grouping and filtering by Dependabot config settings).

Alternatively, have you guys considered any kind of NuGet version cache? On a cache miss download the package and extract, examine the dependencies, store in a database somewhere.

@Patrick-3000
Copy link

Patrick-3000 commented Dec 6, 2024

@brettfo: thank you for your response. In my mind, I was thinking about something similar to the "Manage NuGet Packages" tab in Visual Studio. There, I can see and update all NuGet updates. In Visual Studio this works very quickly, that is why I can't comprehend why it takes so long for Dependabot to figure out if new updates are available.

@brettfo
Copy link
Contributor

brettfo commented Dec 11, 2024

@Patrick-3000 When running locally, your machine has a package cache where downloaded packages are saved. For example, you may have downloaded Newtonsoft.Json/10.0.1 several years ago and that now exists on your machine and never needs to be downloaded again, but when dependabot starts, it doesn't have anything so has to download all packages. Further, Visual Studio has already loaded all packages and transitive dependencies, but dependabot is working from a fresh call to git clone with nothing else.

We've discussed pre-loading common packages in the dependabot Docker image, but that would make the updater image absolutely huge and from a security standpoint, isn't good. Meaning we don't want to pre-load the package Newtonsoft.Json/13.0.3 if your code uses a private NuGet feed and Newtonsoft 13.0.3 hasn't been approved; your IT department has only loaded version 12 onto your feed. Dependabot needs to honor that.

@Patrick-3000
Copy link

@brettfo: thank your very much for your explanation.

@KennethHoff
Copy link

KennethHoff commented Jan 3, 2025

Is there any news here? Any workarounds?

Back in November Dependabot stopped working for me as well. I noticed back then, but I had a lot of other, more important things to work on. I also assumed it would be fixed in a couple of days/weeks, but this is seemingly still an issue.

Turning off grouping seems to make it work again - at least it's generating a couple of PRs - but generates way too many overlapping PRs which makes it a show-stopper. I want a single PR that updates everything; Makes testing a lot easier.

Nov 18 was the last time my weekly dependabot update worked.

@niels-neogeo
Copy link

As a workaround and probably a permanent solution, we switched to https://github.com/renovatebot/renovate instead of using dependabot because of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: dotnet:nuget NuGet packages via nuget or dotnet T: bug 🐞 Something isn't working
Projects
Status: Planned
Development

No branches or pull requests

9 participants