-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fleet with shutdownOnIdle and inbound launcher fails to scale out due to the preserved temporaryOffline state of stopped agents #633
Comments
The IP addresses in the logs are inconsistent due to my copy-paste error, but otherwise, they are correct. Additionally, clicking 'Bring this node back online' on the agent's status page resolves the issue. Maybe something similar to the following could fix the issue:
|
Thanks, if you're able to test that change it will be useful |
I've manually tested the change, and it works for me (aside from the missing semicolon).
|
mind submitting a pull request? |
There's probably no need to do this separately for SSH and inbound agents is there? |
Sure, I will create a PR tomorrow (and retest to verify the case with the SSH launcher). |
It seems this issue has become more complicated. Let me share current status:
--- a/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java
+++ b/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java
@@ -691,6 +691,7 @@ public class AzureVMCloud extends Cloud {
getServiceDelegate().setVirtualMachineDetails(
agentNode, template);
Jenkins.get().addNode(agentNode);
+ azureComputer.setTemporaryOfflineCause(null);
if (agentNode.getAgentLaunchMethod().equalsIgnoreCase("SSH")) {
retrySshConnect(azureComputer);
} else { // Wait until node is online
Logs
VM is stopped while the job that triggered scaling up is still in the queue My settings for retention:
(For certain reasons, I would like the agent to have a public IP but use the internal network for Jenkins connections. Adding public IP is implemented as a custom image feature.) Fix? (untested)--- a/src/main/java/com/microsoft/azure/vmagent/AzureVMManagementServiceDelegate.java
+++ b/src/main/java/com/microsoft/azure/vmagent/AzureVMManagementServiceDelegate.java
@@ -1169,9 +1169,9 @@ public final class AzureVMManagementServiceDelegate {
String publicIPStr = "";
String privateIP = vm.getPrimaryNetworkInterface().primaryPrivateIP();
String fqdn;
- if (publicIP == null) {
+ if (publicIP == null || template.getUsePrivateIP()) {
fqdn = privateIP;
- LOGGER.log(Level.INFO, "The Azure agent doesn't have a public IP. Will use the private IP");
+ LOGGER.log(Level.INFO, "The Azure agent doesn't have a public IP or usePrivateIP is set. Will use the private IP");
} else {
fqdn = publicIP.fqdn();
publicIPStr = publicIP.ipAddress();
|
Fixed in #636 |
Jenkins and plugins versions report
Environment
Jenkins: 2.492.1
OS: Linux - 4.18.0-553.37.1.el8_10.x86_64
Java: 17.0.14 - Red Hat, Inc. (OpenJDK 64-Bit Server VM)
[...]
azure-credentials:343.vd80f9c4859df
azure-sdk:191.v53ec8913ee10
azure-vm-agents:1001.vf3448fe27897
[...]
What Operating System are you using (both controller, and any agents involved in the problem)?
Controller and Agents: Rocky Linux 8
Jenkins 2.492.1 with azure-vm-agents:1001.vf3448fe27897
Reproduction steps
Expected Results
Agent should be online.
Actual Results
Agent is offline:
Status:
Status check script
Logs
Anything else?
It looks like the agent hangs on
waitUntilJNLPNodeIsOnline
due to Temporary Offline status .This may be reloaded to a change in Jenkins 2.479.3: Retain user-generated offline reason when agent connects or disconnects for technical reasons. pull 9855, JENKINS-30101, JENKINS-30175
Are you interested in contributing a fix?
No response
The text was updated successfully, but these errors were encountered: