Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize IPC behaviour #335

Closed
wants to merge 4 commits into from
Closed

Finalize IPC behaviour #335

wants to merge 4 commits into from

Conversation

rkuklik
Copy link
Contributor

@rkuklik rkuklik commented Jun 26, 2024

This is probably largest part of my IPC journey. Currently very much work in progress. It will probably consist of three parts:

  1. Sketch out new types and API in general and implement alongside old system.
  2. Fuzz the hell out of it. Memory allocation and type safety very much depend on correctness of Serialize impl and is crucial to get right.
  3. If two previous steps work, out with the old, in with the new.
  4. Profit

Given that this will basically a rewrite of all communication, I would like opinion of other people before merging, as this is quite broad.

Copy link
Owner

@LGFae LGFae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may take me a bit longer than usual because I nearing the end of my final semester as an undergrad, so I apologize for that.

common/src/ipc/serde.rs Show resolved Hide resolved
common/src/mmap.rs Show resolved Hide resolved
@rkuklik
Copy link
Contributor Author

rkuklik commented Jun 27, 2024

I may take me a bit longer than usual because I nearing the end of my final semester as an undergrad, so I apologize for that.

Of course, take your time. On that note, next two weeks I will also be less available, so no pressure. Also, good luck, hope you do well.

@@ -33,18 +33,17 @@ impl Mmap {
#[must_use]
pub fn create(len: usize) -> Self {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're implementing this all manually instead of using memmap2? It would remove a lot of complexity and potential unsafe-ness on swww's end and also improve compile times by improving crate parallelism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. Seems nice though, I would love to use it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will improve compile times because we already depend on rustix for other things (so we need to compile it anyway). Also, specially release builds, tends to take more time during linking, which may actually get worse with more crates to link.

Generally speaking, I don't like including dependencies only to use one or two functions in them. Our memmap logic is pretty simple, I don't think it's that problematic to keep it here.

Finally, I believe some things would break on the daemon if we were to do the switch. We can probably work around it, but it would involve more work.

use super::types2::Vec2;

/// Type for managing access in [`Serialize`] and [`Deserialize`]
pub struct Cursor<T> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems nearly identical to std::io::Cursor (just with a usize vs a u64) - why don't we just use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it, but didn't see a reason to use it. It implements BufRead and Write, but I would want to panic anyway with the way I allocate memory (with size on Serialize). I would also implement extension traits to ease tagged slice writing, so I just wrote couple of lines and left it at that.

io::Cursor was the first thing I looked at, but disliked being unable to directly access it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know if I'm following - io::Cursor has get_ref() and get_ref_mut() methods to get a reference to the underlying data, so you can directly access it.

And are you saying that when writing a wrong amount of data to the Cursor you'd rather panic than get back information about how much was written? That would make sense as a rational, I guess, but feels like less ideal for a user in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read correctly, get_* gets underlying value, not remaining. There is remaining_slice, which is nightly for some reason. So I just implemented bare minimum of what would be required.

And with the panics, given that (currently) reading and writing is constrained to common crate, I wanted to implement fuzzing for correctness. So I just opted for panics.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And are you saying that when writing a wrong amount of data to the Cursor you'd rather panic than get back information about how much was written? That would make sense as a rational, I guess, but feels like less ideal for a user in the long run.

I disagree with the last part; I think it makes more sense for the user to panic. This isn't a recoverable error anyway: if our serialization/deserialization fails, something fundamentally wrong has happened (for example, maybe a different program is trying to write something malicious to the swww socket). It also makes for a much easier-to-debug error, since it's just a panic.

}

/// Serializes data structure into byte slice.
pub trait Serialize {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of writing these traits and all this logic ourselves instead of just using the serde crate with their derive macros and some well-supported format (e.g. postcard)? This seems very error-prone, just due to how much bit-messing you have to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be against it, @LGFae expressed that he wouldn't want to include it. See comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. @LGFae could we revisit that? Serde does, by default, copy a lot of data around, but utilizing slices and #[serde(borrow)] should allow you to avoid all of that. Or is it something else that's an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would simplify the codebase a great deal. +1 from me.

Copy link
Owner

@LGFae LGFae Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting, @itsjunetime, I did not know about #[serde(borrow)].

Well, first, I don't really care about using a well-supported format. There was never any plans of supporting multiple client implementations. On the contrary, I actually prefer a simple, custom-made format just for ourselves (time will tell whether I will regret this decision later).

I don't know how easy it is to transfer over the current implementation to serde. We actually had a serde implementation first, then I switched to rkyv, then back to serde with bitcode, then just to bitcode directly, and then to the current implementation. Each of these moves was motivated by eliminating dependencies and making things more efficient.

From my understanding, serde's greatest benefit would come from using another library that actually implements the serialization routines. I think if we were to use serde ourselves we would have to re-implement those anyway, or pull along yet another dependency besides serde.

In any case, I am willing to check what this would look like, but I think we should leave it outside the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As June wrote, there is a crate, which has a similar format to my current one. Usually, you can just derive desired traits, so no major refactoring needed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, I thought postcard was like, a standard format like protobuf or something. It seems to be a lot more flexible than that.

Sure, we can try it out! But I do think we should do it in a separate PR, because I feel like it will be a pretty big change.

($($name:ident $num:literal),* $(,)?) => {
pub enum Code {
#[derive(Copy, Clone, PartialEq, Eq)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it would be worth it for another dependency, but the strum crate has derive macros that can do this conversion between a repr for you (FromRepr) - also, if you mark Code as #[repr(u64)], you can just do e.g. Code::ReqPing as u64 for free without needing the into fn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to convert back from u64, which (to my knowledge) cannot be done with num as Code. Also, owner of this project expressed his desire not to add dependencies.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said before, I specially don't like pulling in dependencies just to use a single function or feature, a single time. In fact, if I was writing this PR, I wouldn't even have used a macro at all. I would have instead written everything manually, though I can see the benefits of the macro @rkuklik wrote.

Code::ReqQuery => Self::Query,
Code::ReqKill => Self::Kill,
Code::ReqClear => {
let mmap = value.shm.as_ref().expect("clear request must contain data");
Copy link

@itsjunetime itsjunetime Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't IPCMessage be an enum to encode these invariants (of Clear and Img requiring shm to be Some)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, I think I left a todo comment there, just was too lazy to implement it (yet).

let info = Box::<[Info]>::deserialize(&mut Cursor::new(mmap.slice()));
Self::Info(info)
}
_ => unreachable!("`Response` builder reached invalid state"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this state invalid? imo I feel like it should at least be mentioned in a comment, if not the message itself, why those are invalid states (and maybe the unreachable! could explicitly say what state it's in right now for debugging purposes). If these invalid states could be prevented the type system somehow, though, that would obvs be best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message consists of code and optional payload. When I read via socket and shove it in IpcMessage, I then let this message be interpreted by calling code. If you get a message and expect it to be a Response, getting Request breaks this expectation. I didn't find elegant way to do this, but at least I will try to improve the diagnostics.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can at least build a better error message, I think that would greatly improve debugging and figuring out what went wrong when people report issues.

let image = ImageRequest::deserialize(&mut Cursor::new(mmap.slice()));
Self::Img(image)
}
_ => unreachable!("`Request` builder reached invalid state"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as below about the unreachable! with the invalid states

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. Thank you very much for reviewing, I would wait for owners opinion on additional dependencies and the get to work. If you would want to add some changes to this PR yourself, can you, or do I have to add you somehow?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to thank you for you input, @itsjunetime. I've had a hard time reviewing this thoroughly, so your help is much appreciated.

@rkuklik
Copy link
Contributor Author

rkuklik commented Jul 18, 2024

Hello, sorry for the delay. Given that my previous PR broke swww, I would close this PR and reopen it after I fix the breakage.

I have looked into postcard and must say, +1 for me. I consider dependencies one of the biggest wins with Rust, and would like to introduce serde and postcard to common crate. serde is goto solution for the ecosystem as a whole and postcard seems decent to me.

What do you think?

@LGFae
Copy link
Owner

LGFae commented Jul 21, 2024

Hey there. I am going to slowly get back into this over the course of the following weeks.

I've thought a lot about using serde and dependencies in general after your and @itsjunetime's comments. There's really just two problems I have with dependencies:

  1. they increase compilation time -- this isn't super troublesome, because most users will only pay the price for this once. It does make development more annoying, but not by much since we can use debug builds.
  2. they increase resource consumption in the daemon. This is the big one. Because the daemon will run "forever", I would really like for it to be as efficient as reasonably possible. For example, right now, after setting a static image, we unmap the underlying memory, such that RSS reports that just around 2.5MB are being used. I would really like to keep this behavior, and it's not going to be possible with higher-level libraries.

So, in conclusion, if you think using serde and postcard is possible while retaining the ability to unmap the underlying image bytes' memory, let's go for it.

@rkuklik
Copy link
Contributor Author

rkuklik commented Jul 22, 2024

Hello. I will close this and get back to it when I fix the regression with socket. As to the two points:

  1. they increase compilation time -- this isn't super troublesome, because most users will only pay the price for this once. It does make development more annoying, but not by much since we can use debug builds.

About speed of compilation, that depends. Cargo can concurrently compile multiple crates, so the biggest slowdown is in downloading them (with the now resolver, shouldn't be a big deal). When recompiling, you only recompile your crates, not the whole dependency graph, so it shouldn't impact development (can speed it up even), with the exception of link-time optimization (which can be turned off) and macro expansion (which shouldn't be an issue, only ones we want to be using are for serde).

  1. they increase resource consumption in the daemon. This is the big one. Because the daemon will run "forever", I would really like for it to be as efficient as reasonably possible. For example, right now, after setting a static image, we unmap the underlying memory, such that RSS reports that just around 2.5MB are being used. I would really like to keep this behavior, and it's not going to be possible with higher-level libraries.

I am not sure, why should they increase resource usage? You can stumble upon a poorly written crate, sure, but I would say using widely used and well maintained library can improve correctness and efficiency of your code. Big selling point of Rust is the ability to use high level abstractions without the penalties usually associated with them (Iterator would be a great example).

I think it is stupid to use crates like console_error_panic_hook which is about 20 lines of trivial
code, but for complex tasks it is beneficial to lean into other people's work.

So, in conclusion, if you think using serde and postcard is possible while retaining the ability to unmap the underlying image bytes' memory, let's go for it.

Sure. I will fix the regression and get to it. Thank you and @itsjunetime for your work.

@rkuklik rkuklik closed this Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants