-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ask god what the draino works? #78
Comments
Yes your understanding is correct:
|
I'm sorry I'd like to know more |
|
hey can someone help me to test or make docker-monitor.json: |
{
"plugin": "journald",
"pluginConfig": {
"source": "docker"
},
"logPath": "/var/log/journal",
"lookback": "5m",
"bufferSize": 10,
"source": "docker-monitor",
"conditions": [],
"rules": [
{
"type": "temporary",
"reason": "CorruptDockerImage",
"pattern": "Error trying v2 registry: failed to register layer: rename /var/lib/docker/image/(.+) /var/lib/docker/image/(.+): directory not empty.*"
}
]
}
kernel-monitor.json: |
{
"plugin": "journald",
"pluginConfig": {
"source": "kernel"
},
"logPath": "/var/log/journal",
"lookback": "5m",
"bufferSize": 10,
"source": "kernel-monitor",
"conditions": [
{
"type": "KernelDeadlock",
"reason": "KernelHasNoDeadlock",
"message": "kernel has no deadlock"
},
{
"type": "Ready",
"reason": "NodeStatusUnknown",
"message": "Kubelet stopped posting node status"
}
],
"rules": [
{
"type": "temporary",
"reason": "OOMKilling",
"pattern": "Kill process \\d+ (.+) score \\d+ or sacrifice child\\nKilled process \\d+ (.+) total-vm:\\d+kB, anon-rss:\\d+kB, file-rss:\\d+kB"
},
{
"type": "temporary",
"reason": "TaskHung",
"pattern": "task \\S+:\\w+ blocked for more than \\w+ seconds\\."
},
{
"type": "temporary",
"reason": "UnregisterNetDevice",
"pattern": "unregister_netdevice: waiting for \\w+ to become free. Usage count = \\d+"
},
{
"type": "temporary",
"reason": "KernelOops",
"pattern": "BUG: unable to handle kernel NULL pointer dereference at .*"
},
{
"type": "temporary/permanent",
"condition": "NodeStatusUnknown",
"reason": "NodeStatusUnknown",
"pattern": "Kubelet stopped posting node status"
},
{
"type": "temporary",
"reason": "KernelOops",
"pattern": "divide error: 0000 \\[#\\d+\\] SMP"
},
{
"type": "permanent",
"condition": "KernelDeadlock",
"reason": "AUFSUmountHung",
"pattern": "task umount\\.aufs:\\w+ blocked for more than \\w+ seconds\\."
},
{
"type": "permanent",
"condition": "KernelDeadlock",
"reason": "DockerHung",
"pattern": "task docker:\\w+ blocked for more than \\w+ seconds\\."
}
]
} m majorly concerned about following as this happen quite frequently with us. {
"type": "temporary/permanent",
"condition": "NodeStatusUnknown",
"reason": "NodeStatusUnknown",
"pattern": "Kubelet stopped posting node status"
} I have - command: [/draino, --debug, --evict-daemonset-pods, --evict-emptydir-pods, --evict-unreplicated-pods, KernelDeadlock, NodeStatusUnknown] Is that how it works? |
Dear gods, please ask me a question about NPD. I don't know if it's right. I hope I can correct my question. Draino is a relief system, which can be used with NPD. For example, when NPD is subject to kernel deadlock, or CPU disk is broken, NPD can get this information. In order to prevent reuse of this node, draino will Each node is set as maintenance and cannot be scheduled to prevent other containers from being allocated to this node and expel the pod of the node. This understanding is correct, but to realize this function, the job of the rescue system is to judge whether the information NPD obtains is an internal core deadlock, CPU, The disk problem triggers this rule to set up, maintain and expel this node. After the expelling, can redundant nodes be guaranteed to schedule the pod of the previous node? Perhaps autoscale will be used. Is this understanding correct?
The text was updated successfully, but these errors were encountered: