Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track OOM-kills in Docker compose setup #322

Open
kam193 opened this issue Feb 11, 2025 · 0 comments
Open

Track OOM-kills in Docker compose setup #322

kam193 opened this issue Feb 11, 2025 · 0 comments
Assignees
Labels
assess We still haven't decided if this will be worked on or not enhancement New feature or request

Comments

@kam193
Copy link

kam193 commented Feb 11, 2025

Is your feature request related to a problem? Please describe.
OOM-kills are quite a usual issue for AL services. In the Docker compose setup, it's difficult to track which service/container has been OOMkilled. In Kubernetes deployment, this is solved by _monitor_pods that runs in the separated thread in the scaler and prints the last state event when detecting restarts (by default, after each one restart). There is currently no similar solution thing for Docker Compose setup.

The "restart always" policy ensures the containers are automatically restored after OOMkill, but also resets the OOMKilled flag.

Having a clear log indicating an OOM would be a great debugging improvement.

Describe the solution you'd like
Similar as in Kubernetes, we can easily subscribe to the Docker events and process them in real time. As we don't need to collect so much information as in Kuberenetes deployments, I suggest a much simpler method focused just on OOM. It could eventually be extended, if necessary.

The following example, tested on my machine, should be enough to report OOMKills from a given compose project:

import docker
client = docker.from_env()

filter = {"label": "com.docker.compose.project=oomtst", "event": "oom", "type": "container"}
for event in client.events(decode=True, filters=filter):
    print(f"Container {event['Actor']['Attributes']['name']} killed by OOM")

Describe alternatives you've considered

  • Matching OOM or Docker logs after the kill is extremely hard as they may not save the container name, and given IDs are not always useful.
  • Copying the logic of Kubernetes monitor is not necessary. Subscribing for specific events keeps the solution simple.

Additional context
References:

@kam193 kam193 added assess We still haven't decided if this will be worked on or not enhancement New feature or request labels Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess We still haven't decided if this will be worked on or not enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants