Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API calls cancelled by client because of Timeout (Reopen #899) #904

Closed
eitah opened this issue Jul 10, 2024 · 14 comments · Fixed by #926 or #928
Closed

API calls cancelled by client because of Timeout (Reopen #899) #904

eitah opened this issue Jul 10, 2024 · 14 comments · Fixed by #926 or #928

Comments

@eitah
Copy link

eitah commented Jul 10, 2024

This issue reopens #899 because it was not addressed for our company.

Terraform Version

Terraform v1.5.7
on linux_amd64

Affected Resource(s)

  • event_orchestrations
  • services (esp auto_pause_notifications_parameters)
  • schedules

The issue occurs randomly and without warning. We've definitely seen it with those issues, but maybe others

Terraform Configuration Files

Debug Output

https://gist.github.com/eitah/12333e7982ecf7bbe3da5283a4cb4b8d

Expected Behavior

What should have happened?

Timeouts should have been gracefully released or the terraform provider should have provided information about the quotas that it had exceeded to produce this timeout.

Actual Behavior

The plan failed when it should have succeeded

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

Pinning to 3.14.4 as suggested in the issue made no difference.

References

CC @jcpoconnor @imjaroiswebdev @whyisjacob

@dhrapson
Copy link

dhrapson commented Jul 16, 2024

Experiencing same issue, and I've upvoted above.
If it helps the URLs we are hitting on are:

  • https://api.pagerduty.com/services/<id>?include%5B%5D=auto_pause_notifications_parameters same as in the Gist
  • https://api.pagerduty.com/event_orchestrations/services/<id>

In our case the issue is oddly linked with a transition from GitLab Pipelines, with Runners on AWS in eu-west-1. No timeouts.
To GitHub Actions Workflows, with Gitlab SaaS Runners which the docs indicate are in Azure.

Since I can see that PagerDuty is hosted in AWS we will be trying self-hosted GitHub Runners in AWS shortly.

Edited

Works fine with GitHub Runners self-hosted in AWS, in EU region in my case.

So AWS EU client -> AWS US service leads to fewer timeouts than Azure US client -> AWS US service
¯_(ツ)_/¯
Would probably have to know more detail about PagerDuty traffic routing to understand the reasons behind that.

@shonun1
Copy link

shonun1 commented Jul 30, 2024

Experiencing the exact same issues as well. I have tried changing the HTTP client configuration used by the PagerDuty SDK underneath and the following values have completely resolved the issue for us, it might be useful in further investigation as to why this issue persists:
https://github.com/shonun1/terraform-provider-pagerduty/blob/v3.16.2/pagerduty/config.go#L77-L102

@ronballesteros
Copy link

ronballesteros commented Aug 13, 2024

I've encountered the same issue, but it seems that version 3.11.0 does not have this problem (I know its not the latest, but we were quite behind).

@victorbiga
Copy link

I've encountered the same issue, but it seems that version 3.11.0 does not have this problem (I know its not the latest, but we were quite behind).

I can confirm 3.11.0 has surprisingly worked for few runs now. Will see how it behaves over time, but for now looks good.

@imjaroiswebdev
Copy link
Contributor

Based on @shonun1 recommendation, I'm going to publish #926 with the settings that made sense for me, however, since I haven't being able to fully diagnose the reason behind this issue yet, I'm going to really appreciate your feedback regarding this patch and the additional ones if they are necessary. This new version of PagerDuty Terraform provider is going to released in the following minutes.

@imjaroiswebdev
Copy link
Contributor

TF Provider v3.15.4 it out. Once again, I'll appreciate your feedback about the effectiveness or not of this patch to iterate over it if needed. Thank you al in advance for help and support ✌🏽

@victorbiga
Copy link

victorbiga commented Aug 19, 2024

TF Provider v3.15.4 it out. Once again, I'll appreciate your feedback about the effectiveness or not of this patch to iterate over it if needed. Thank you al in advance for help and support ✌🏽

@imjaroiswebdev I have tried new version and got timeout in the plans.

Error: Error reading: XXXXXX: Get "https://api.pagerduty.com/users/XXXXXX": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

  with module.pagerduty.pagerduty_user.user["mikeXXXXXX"],
  on ../modules/pagerduty/users.tf line 1, in resource "pagerduty_user" "user":
   1: resource "pagerduty_user" "user" {

back to 3.11.0 😒

@onepabz
Copy link

onepabz commented Aug 20, 2024

same for us 40% failing because of timeouts

image

@imjaroiswebdev
Copy link
Contributor

After some trials and research, We came to the conclusion that the solution is to reduce the timeout, and that's why @victorbiga notice improvement when switches to v3.11.0. This shorter timeout should be able to allow the provider retry logic to recover from closed connection after not having a response inside the timeout window. So #928 is aiming to do just this.

@imjaroiswebdev
Copy link
Contributor

Thank you all once again for your patience and support, specially to @victorbiga for insisting on the use of v3.11.0 and for providing quick feedback. Terraform provider v3.15.5 has just been released and should be finally solving this issue. Again, in case anyone believe this issue needs to be re-open, feel free to do it, and feedback as the one provided by @onepabz will be very welcome and helpful. However, based in our findings, this update should be enough.

@victorbiga
Copy link

@imjaroiswebdev v3.15.5 has passed with all our stack, there is a lot in there.

I only did 1 test, but I would like to say that is seems to be fix with positive outcome😄

Will post here if will experience any issues...

@imjaroiswebdev
Copy link
Contributor

hooray! Thank you very much for the feedback @victorbiga 🥇 it's very appreciated! We'll keep and eye on this issue, just in case 💪🏽

@onepabz
Copy link

onepabz commented Aug 23, 2024

Works like a charm also for us

image

Thanks a bunch guys 🎉

@imjaroiswebdev
Copy link
Contributor

Again! Thanks to you all for the feedback 🥲 Appreciate it! We can mark this as solved then 🏆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants