Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow instance_reconfigure to resize a stopped instance #6774

Merged
merged 9 commits into from
Nov 19, 2024

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Oct 4, 2024

mostly just adapted #6321 to the new instance update endpoint machinery. same general behavior as outlined there: can only resize stopped instances, updated instance still returns the updated InstanceAndActiveVmm. same size validation as we do for a new instance, tests checking all that as well.

Also fixes #3769.

@hawkw hawkw self-requested a review October 4, 2024 18:05
@karencfv
Copy link
Contributor

karencfv commented Oct 4, 2024

Heya! Can we hold off on merging this until v12 please? It may break the TF provider which I've already released for v11 😅
and I'll be out for 2 weeks

@iximeow iximeow added this to the 12 milestone Oct 4, 2024
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we will currently only perform updates that change both the number of vCPUs and the amount of memory, which I think is wrong. I think we should be able to change either individually.

Besides that, this seems basically fine; it would be nice to avoid the nop boot disk update but if the diesel is too scary, whatever...

Comment on lines 1226 to 1233
// Do not allow setting sizes of running instances to ensure that if an
// instance is running, its resource usage matches what we record in the
// database.
const OK_TO_SET_SIZE_STATES: &'static [InstanceState] = &[
InstanceState::NoVmm,
InstanceState::Failed,
InstanceState::Creating,
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that OK_TO_SET_SIZE_STATES and OK_TO_SET_BOOT_DISK_STATES are the same list of states, and what we really want in both cases is "states in which there is no VMM actively incarnating this instance". I kind of wonder if, rather than duplicating in both the instance-resize and instance-set-boot-disk queries, we should just have a constant on InstanceState that's like NOT_INCARNATED_STATES or similar, so that these queries and any future ones can all use the same set of states, and if we add new states that have "not incarnated" semantics, we can add them to the one definition of that list of states, instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on one hand i agree (and enough to go just do the thing, there's now InstanceState::NO_VMM_STATES*), but from the conversations we've had we've been pretty close to having caveouts for specific not-incarnated states and fields. namely "is it OK to set the boot disk in Creating", and does allowing that allow correctness issues (no) or a possibly-confusing-but-correct API result (yes). otoh "actions we can take on not-incarnated instances" is a much easier concept than having to think through each kind of action and state combination, so it's nicer in an axiomatic way.

*: after writing this i realize why you said NOT_INCARNATED rather than NO_VMM - because there is a NoVmm state which is very distinct from what this list is getting at. so i should go change that name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my thinking is that we should avoid having weird exceptions for setting specific parameters unless it's absolutely necessary. I'd be quite surprised if we don't basically just end up being able to divide every possible instance_reconfigure into either "can do this only if there is no Real Life VMM Process incarnating the instance" and "can do this whenever", and I'd prefer to have to burn that bridge only once we come to it...

nexus/db-queries/src/db/datastore/instance.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/instance.rs Outdated Show resolved Hide resolved
"better" here meaning "if we don't need to update the database, don't
update the database". if we would change the boot disk to the same
value, or we would change it from null to null, this change means we
will instead not even select the row for an update. since we still
`check_if_exists` we'll still fetch an Instance for the operation if
there's no update performed. so, if the database can detect that the
update would have been a no-op and elide it, this change is not
particularly useful. if the database would have to commit some state as
a result of an ineffectual `set`, then this is helpful!

more importantly, this is moderately more legible than smearing the
update logic and consistency checking between the Diesel query and
handling in the `NotUpdatedButExists` arm.

at the same time, Eliza rightfully pointed out that there are now (at
least) two copies of a list of states that are both roughly "there is no
VMM for this instance, so we can do things to its' database record".
extract that out to a const on `InstanceState` and name it
appropriately.

finally, the instance size change logic incorrectly rejected size
changes that only changed one of `(memory, cpus)`. fix that to allow
changing one or the other independently and add tests to catch a
future dingus (me) who might regress that somehow.
avoid ambiguity between "NoVmm the state" and "NO_VMM the states that do not have a VMM"
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall! I had some last nitpicks, but no blocking concerns now. Sorry I didn't get back to you on the latest changes sooner!

nexus/db-model/src/instance.rs Show resolved Hide resolved
nexus/db-queries/src/db/datastore/instance.rs Show resolved Hide resolved
nexus/db-queries/src/db/datastore/instance.rs Show resolved Hide resolved
Comment on lines 1208 to 1212
// There should be no other reason the update fails on an
// existing instance.
return Err(err.bail(Error::internal_error(
"unable to reconfigure instance boot disk",
)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth adding more details to the internal message for this error, since it indicates pretty clearly that there was a bug --- it would be nice if the instance record was logged if this ever did happen? not a blocker though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went for the instance ID instead of the full record because sometimes we add fields that need to be secret and i wouldn't want these to surprise someone by dumping a future instance key out somewhere. but yes, it's good to give a more concrete pointer this happened than just an otherwise-unnoted error out to a user.. done

nexus/db-queries/src/db/datastore/instance.rs Show resolved Hide resolved
nexus/db-queries/src/db/datastore/instance.rs Outdated Show resolved Hide resolved
assert_eq!(instance.ncpus.0, new_ncpus.0);
assert_eq!(instance.memory, new_memory);

// No harm in reverting to the original size one field at a time though:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for testing the case where only one value differs!

@iximeow iximeow merged commit d9153ff into main Nov 19, 2024
17 checks passed
@iximeow iximeow deleted the ixi/instance-resize branch November 19, 2024 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Want ability to resize the vcpu and memory of stopped instances
4 participants