-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kitchen converge race condition #41
Comments
I'll try to take a look next week @jeremyciak |
@scalp42 Awesome! ...and thank you! I have rudimentary ability to read and write Ruby code, but I have no idea how to do underlying Chef code development to test whether anything I would write is actually functional in that realm. If you're able to point me in the right direction there I would love to help out. |
@scalp42 Please let me know if/when you get a chance to assist here. I'm desperate! |
Looking at it, it doesn't seem to be possible because some of the stuff like the That being said, I believe you should be able to create the JSON files in advance. Is it not working? |
What do you mean about creating the JSON files in advance? You mean manually define them? Or you mean basically implement what this provisioner is doing in an out-of-band method? And what are you referring to specifically not working? |
So I believe it's working as intended. This plugin does it both ways. Here's an example on Ubuntu 18.04 with a simple recipe: suites:
- name: node1
run_list:
- recipe[example::search] tag 'jeremyciak'
nodes = search(:node, 'tags:jeremyciak').sort
if nodes.empty?
Chef::Log.info %|#{cookbook_name} => Could not find node matching "tags:jeremyciak".|
else
Chef::Log.info %|#{cookbook_name} => The following nodes were found:|
nodes.uniq.each { |n| Chef::Log.info "- #{n.name} (ip: #{n['ipaddress']})" }
end Now if I just run converge, it won't find any node matching But if I drop a simple JSON with whatever needed to make it work (here we just care about the // Place this file under cookbook_name/test/nodes/node1-ubuntu-1804.json
{
"name": "node1-ubuntu-1804",
"chef_environment": "kitchen",
"normal": {
"run_completed": false,
"tags": [
"jeremyciak"
]
}
} Now if I destroy and converge again:
Notice that the But if you run converge again, |
Yes, over my time troubleshooting this situation I have become intimately familiar with how this provisioner should operate. The issue I am wondering if we can fix is the dependency on the node data for all relevant nodes being generated prior to any of the nodes pulling down the generated node data. I am seeing a race condition where one of my nodes generates its node data and pulls it down before any of the other nodes have generated their node data. This results in that node being unable to find any of the other nodes. I'm wondering if we can add some kind of context awareness or something to wait until all nodes have produced their node data before continuing the converge actions. I don't know whether this can live within this provisioner or if this would have to be implemented in the test kitchen code somewhere. The use case I have is to orchestrate nodes for a Microsoft Windows Remote Desktop infrastructure deployment (RD Broker(s), RD Gateway(s), RD Web Server(s), RD Host(s)). I need accurate networking data populated so that these nodes can be referenced with PowerShell/DSC and create a deployment from them. The other issue I have is that our development and CI utilize different networks so I rely on DHCP to hand off the IP addresses and then reference those IP's in the node data that this provisioner allows me to dynamically produce. I can't manually specify node data related to networking or it will break either my development process or my CI process. |
Unfortunately, I'm not sure I can help more with your environment. Someone else might be able to chime in. In general, if I really have to rely on "time", you could also |
Yeah, my recipes already have a bunch of retries and waits to account for other orchestration stuff. The issue here is that this problem manifests before my recipes are even relevant so I have no control over this from within my Chef recipe code. |
I know this repository hasn't been contributed to in a long time, and I have also adopted scalp42 fork myself due to compatibility issues, BUT... I'm hoping that posting this issue may get some visibility from someone who may have some suggestions or answers.
What I am experiencing has to do with platform composition and the race condition that manifests because of it. I have a test kitchen suite that I am running where I have 1 Windows Server 2016 node and 3 Windows Server 2019 nodes. For whatever reason the Windows Server 2016 node will blast through the first aspects of a
kitchen converge
so that it gets to the "Preparing nodes" step before any of the Windows Server 2019 nodes have populated any node data. This causes any node search functionality from the Windows Server 2016 node to fail since there is no node data for the other nodes.My current attempt at a "solution" is to simply run the converge an initial time, wait for it to fail, and then run converge again. I had not hit this race condition previously and I was always thinking previously that the node data was populated at the end of the
kitchen create
action and not at the start of thekitchen converge
action. I am assuming that with my "solution" the first converge will populate all of the node data and fail, and then the next converge will succeed.Is there a way to update test kitchen code anywhere/anyhow so that the node data is populated at the end of the
kitchen create
action and not at the start of thekitchen converge
action?The text was updated successfully, but these errors were encountered: