Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor caching of chef-client #300

Closed
cheeseplus opened this issue Mar 21, 2017 · 15 comments
Closed

Refactor caching of chef-client #300

cheeseplus opened this issue Mar 21, 2017 · 15 comments

Comments

@cheeseplus
Copy link
Contributor

cheeseplus commented Mar 21, 2017

The current implementation suffers from a few issues that make the experience less than delightful, especially new users.

  • caching on by default - this changed the default and breaks existing users with unexpected behaviour
  • the cache itself isn't structured so we have issues like Cache directory should be structured #293
  • we have to enumerate every platform this doesn't work for instead of only enumerating the platforms it does work for (there are more of the former category) see Caching should use whitelist of supported platforms #296
  • even if we have a list of boxes it works against, we are only doing so based on name so this breaks the second anyone chooses a name that doesn't include the OS
  • if the virt tools aren't properly installed on the base box or the host version of the tooling is newer/older than the box vagrant has, this breaks
  • we never clean/prune the cache and that can get out of hand pretty quickly

I personally feel like caching by default breaks the semver contract but I do understand why we added caching (downloads cost money) so that ship may have sailed.

I propose that we:

  • fix the cache structuring
  • only enable caching on known platform names or if we can only for bento boxes (we have nominal control over those)
  • investigate if it's possible to catch the error from Vagrant and re-try without shared folders on platforms that otherwise should be supported
@charlesjohnson
Copy link

What we can't do is increase the number of repeat downloads served of chef-client, because that comes out of Chef's pocket.

What ideas do people have to fix the use cases that are broken, without increasing the number of repeat downloads we serve of chef-client?

@tas50
Copy link
Member

tas50 commented Mar 21, 2017

So there's 2 major changes that absolutely need to be made, otherwise we've just entirely broken test kitchen for users:

  1. Caching should only occur for bento boxes. As Seth mentioned we don't know the state of other boxes. A lot / most out there don't have the tooling necessary to setup the NFS mount for the cache, and Test Kitchen just fails there.

  2. We need to whitelist operating systems instead of blacklisting. As I found out the other week trying to test chef on OmniOS we entirely forgot to blacklist Solaris variants so those no longer work with Test Kitchen. There's probably a few other operating systems we've also forgotten about so lets just assume we don't know what should be in a blacklist.

@cheeseplus
Copy link
Contributor Author

Core to this is that we absolutely must know that the end user has the correct/working combination of hypervisor and base box with compatible vmtools for the shared folder method to be viable. The problem is that we can't introspect this in a meaningful way without either building this matrix into kitchen-vagrant or having reach out to some external endpoint that has a compatibility list - neither of these are viable solutions to maintain.

At best we should only enable caching by default for base boxes that we know should at least have vmtools installed which would be things in chef/bento (excluding platforms known not to work like FreeBSD). Even then, it's still a gamble as the local hypervisor version may be fundamentally incompatible with the base box that is local to the system.

Shared folders via hypervisors are inherently brittle and the caching solution relies on them to be a lot more predictable and useable than they are in practice.

@charlesjohnson
Copy link

We can't ask Chef to pay to serve up repeat downloads of chef-client to people running Kitchen, unless we know that those people can't get chef-client onto their instances in any automated way other than by re-downloading chef-client from Chef.

How can we ensure that we don't waste Chef's resources, but still provide a great user experience?

@coderanger
Copy link
Contributor

@charlesjohnson Do we collect data on how many installs as a proportion of the whole are coming from kitchen+kitchen-vagrant? Because that seems like a good thing to have numbers on before we cite it as a reason to do or not do things.

@charlesjohnson
Copy link

Wouldn't that be great? But no, don't have a way to get that data afaik, paging @schisamo to confirm.

@coderanger
Copy link
Contributor

We could, at a minimum, have the install.sh/ps1 scripts record $TEST_KITCHEN and forward that along with the download somehow (header?). We would have to make TK expose the driver name in a similar way to be able to tell which driver is in use but that is very doable.

@coderanger
Copy link
Contributor

Or, now that it's all handled in mixlib-install, we could directly pass it the driver name and test_kitchen: true :)

@schisamo
Copy link
Contributor

So mixlib-install does have the ability to set User-Agent headers now:
https://github.com/chef/mixlib-install#user-agent-request-headers

We'll need to tweak test-kitchen to set this option properly.

There is also some work in flight to properly extract the User-Agent header (and a few others) in our Faslty logs. Once that is done we'll begin indexing the data in our ES cluster and being to explore this facet of our packages.chef.io data.

@coderanger
Copy link
Contributor

@schisamo That doesn't seem to be supported by Mixlib::Install::ScriptGenerator which is how almost everyone (I'm not sure anyone outside of Chef Software but me is even aware of theother mode) installs things in TK.

@schisamo
Copy link
Contributor

@coderanger Yeah Mixlib::Install::ScriptGenerator is more or less deprecated, we need to update Test Kitchen to use the newer idioms. I'll let @wrightp weigh in with the specifics.

@cheeseplus
Copy link
Contributor Author

I've outlined the short term fixes here: https://gist.github.com/cheeseplus/8a1871b837a31cd6c0113a382fe8e03d

@cheeseplus
Copy link
Contributor Author

I've co-opted two existing issues to track this:

@afiune
Copy link

afiune commented Mar 24, 2017

I'll take care of #293 👍

@cheeseplus
Copy link
Contributor Author

We've already addressed #296 and #293 is sufficient for the remaining work so closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants