Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling the creation of networkPolicy in the helm charts causes networking (DNS) issues for datadog-clusterchecks #1606

Open
Vassilis0 opened this issue Nov 12, 2024 · 2 comments

Comments

@Vassilis0
Copy link

Describe what happened:
Enabling the creation of networkPolicy in the helm charts causes networking (DNS) issues for the datadog-clusterchecks because the policy created does not allow egress traffic on port 53 .

Spec:
  PodSelector:     app=datadog-clusterchecks
  Allowing ingress traffic:
    ‹none> (Selected pods are isolated for ingress connectivity)
  Allowing egress traffic:
    To Port: 443/TCP
    To: <any> (traffic not restricted by destination)
     ------------
    To Port: 5005/TCP
    To:
        PodSelector: app=datadog-cluster-agent
    ------------
    To Port: <any> (traffic allowed to all ports)
    To: <any> (traffic not restricted by destination)
Policy Types: Ingress, Egress

More specifically the errors in datadog-clusterchecks for the endpoinds

• https://api.datadoghq.com/api/v1/check_run
• https://api.datadoghq.com/api/v2/series
2024-11-04 16:21:09 UTC | CORE | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://7-58-0-app.agent.datadoghq.com/api/v1/check_run': retrying later

2024-11-04 16:21:09 UTC | CORE | ERROR | (comp/forwarder/defaultforwarder/worker.go:187 in process) | Too many errors for endpoint 'https://7-58-0-app.agent.datadoghq.com/api/v2/series': retrying later

2024-11-08 14:00:46 UTC | CORE | ERROR | (comp/forwarder/defaultforwarder/worker.go:191 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post "https://7-58-0-app.agent.datadoghq.com/intake/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

2024-11-06 15:28:47 UTC | CORE | DEBUG | (comp/forwarder/defaultforwarder/transaction/transaction.go:87 in 1) | DNS Lookup failure: lookup 7-58-0-app.agent.datadoghq.com: i/o timeout

Steps to reproduce the issue:
Enable the creation of networkPolicy in charts for a datadog deployment that runs in AWS EKS with the network policy enabled

kubectl get networkpolicy 
kubectl describe networkpolicy <networkpolicy-name>

check for name resolution from within the datadog-clusterchecks pod

Additional environment details (Operating System, Cloud provider, etc):
AWS EKS 1.30
Datadog Helm chart: 3.77

@Vassilis0 Vassilis0 changed the title Enabling the creation of networkPolicy in the helm charts causes networking (DNS) issues for datadog-clusterchecks. Enabling the creation of networkPolicy in the helm charts causes networking (DNS) issues for datadog-clusterchecks Nov 13, 2024
@celenechang
Copy link
Contributor

Hi @Vassilis0 , thanks for opening the issue and for providing those details. Did adding an egress rule on port 53 help resolve the DNS error?

Do you mind opening a support ticket so that our team can investigate further? Thank you in advance.

@Vassilis0
Copy link
Author

Thank you for your reply @celenechang
Yes, adding egress rule on port 53 help resolve the DNS error but we end up disabling the creation of the Network policy.

Support ticket: #1970318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants