Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_exact_from and write_all_to from GuestMemory don't have the correct semantics #174

Open
sboeuf opened this issue Sep 27, 2021 · 6 comments

Comments

@sboeuf
Copy link

sboeuf commented Sep 27, 2021

When using the GuestMemory implementation of read_exact_from and write_all_to, I was expecting they would handle the retry, but they simply return an Error saying they couldn't read or write the entire buffer. That's the wrong behavior IMO, especially because of the naming that suggest they should behave exactly as the Rust traits Read and Write.

The proper solution would be to call underlying functions read_exact_from and write_all_to from the GuestRegionMmap implementation after we found the regions affected by the range, leading to a proper retry when not all data has been read or written.

/cc @andreeaflorescu @alexandruag @jiangliu

sboeuf pushed a commit to sboeuf/cloud-hypervisor-1 that referenced this issue Sep 27, 2021
Both read_exact_from() and write_all_to() functions from the GuestMemory
trait implementation in vm-memory are buggy. They should retry until
they wrote or read the amount of data that was expected, but instead
they simply return an error when this happens. This causes the migration
to fail when trying to send important amount of data through the
migration socket, due to large memory regions.

This should be eventually fixed in vm-memory, and here is the link to
follow up on the issue: rust-vmm/vm-memory#174

Signed-off-by: Sebastien Boeuf <[email protected]>
@bonzini
Copy link
Member

bonzini commented Sep 27, 2021

I agree, they should retry as long as read_from/write_to return Ok.

However, there's the problem of returning the number of bytes processed before an IOError. It would be best to change IOError to { error: io::Error, processed: usize } and implementing From<io::Error> for guest_memory::Error. That change is an API breakage, but somewhat unlikely to cause problems.

@andreeaflorescu
Copy link
Member

andreeaflorescu commented Sep 27, 2021

Is this issue related to #171?

@sboeuf
Copy link
Author

sboeuf commented Sep 27, 2021

@andreeaflorescu

Is this issue related to #171?

I don't think so.

sboeuf pushed a commit to cloud-hypervisor/cloud-hypervisor that referenced this issue Sep 27, 2021
Both read_exact_from() and write_all_to() functions from the GuestMemory
trait implementation in vm-memory are buggy. They should retry until
they wrote or read the amount of data that was expected, but instead
they simply return an error when this happens. This causes the migration
to fail when trying to send important amount of data through the
migration socket, due to large memory regions.

This should be eventually fixed in vm-memory, and here is the link to
follow up on the issue: rust-vmm/vm-memory#174

Signed-off-by: Sebastien Boeuf <[email protected]>
@alindima
Copy link
Contributor

That's the wrong behavior IMO, especially because of the naming that suggest they should behave exactly as the Rust traits Read and Write.

The rust trait documentation says that ErrorKind::Interrupted errors are handled with a retry. Any other IOError is not and results in returning an Error from the function. Moreover, it says that the contents of the buffer are undefined in this case.
See: https://doc.rust-lang.org/std/io/trait.Read.html#errors-3
So I'm not sure that this would be the right semantic for read_exact_from and write_all_to.
I don't think we should be hiding unrecoverable errors, because this call could block indefinitely

I agree, they should retry as long as read_from/write_to return Ok.

I agree as well, but this is already true. read_exact_from calls read_from only once, but read_from calls read in a try_access, which performs it in a loop as long as there isn't an error or until we're done reading. This is also the root-cause of this issue: #171

@bonzini
Copy link
Member

bonzini commented Sep 27, 2021

I don't think we should be hiding unrecoverable errors, because this call could block indefinitely

Indeed, absolutely not. So I'm not sure what can/should be changed, except possibly adjusting the error type.

@bonzini
Copy link
Member

bonzini commented Sep 27, 2021

Looking at the cloud-hypervisor commit, that seems like cloud-hypervisor is setting sockets to non-blocking mode? If so, that is certainly something that read_exact cannot handle, and neither should read_exact_from. cloud-hypervisor instead must handle WouldBlock correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants