Tracking issue for core: improve performance of IO backends #684

LtdJorge · 2025-01-14T16:20:44Z

This is a tracking issue for changes to the IO backends to improve their performance. Additions, suggestions and improvements welcome!

Rationale

Motivation and my current thoughts

I have been testing both the unix and the io_uring backend against each other, and the unix backend always has lower latency. Of course, my testing is pretty limited, since it was only using the testing database file with the current banches. I got the same results for both backends on SATA SSD and NVMe, so I'm pretty sure there is more overhead with the io_uring calls than the potential performance gains. Fortunately, I think there is a lot of room for improvement, including registering the files to the ring, registering buffers (this is pretty complex with the current Limbo codebase, but I'm working on something), turning remaining syscalls into io_uring opcodes, improving O_DIRECT alignment sizes, and so much more. Many of these changes should also translate in better performance for the other backends.

However, before turning to implementing things, better observability and benchmarking are needed. I will open an issue for each in a moment and edit this. Right now, Limbo CLI uses log with env-logger and core has criterion with pprof-rs. However, the set of benches and the current log points are not very exhaustive. Since Limbo is in heavy development, this is fine, but I think improving the situation now would be beneficial even for the development process, as there is frequently a need to debug some behavior or to compare performance of implementations.

For IO testing, we also need tests stressing high concurrency. I will have new hardware in a few days, where I can work on this better than on my personal system tuned for responsiveness.

The performance improvements don't have to be isolated to the IO backends. It's just that I have worked mostly on those and I have a deeper understanding. If anyone wants to add other parts of the system, I welcome the addition to this issue and will rename it.

Steps

An initial list of steps, in rough order

Issues that require or would be affected by this issue

Make Limbo core no_std Make Limbo core no_std #442 will hopefully be a side effect of this, since it involves custom allocation and I would like to use the allocator_api, we could just use alloc and core everywhere instead of std.
Shared page cache support Shared page cache support? #318 I think this one might already have been implemented by Pere in 39a7514 but I'm convinced we could benefit from doing away with the cache altogether, which I would like to explore ²
Make all operations asynchronous asynchronous I/O: Not all operations are asynchronous #672 is another side effect
Crash on linux when selecting columns after inserting large blobs crash on linux when selecting columns after inserting large blobs #570 This may be fixed already. Otherwise, should be addressed by this issue. My intention is that the system supports databases many times bigger than RAM while maintaining performance close to the limits provided by the storage.

Swizzling (LeanStore) ↩
Umbra ↩ ↩²
Virtual-Memory Assisted Buffer Management (LeanStore) ↩

The text was updated successfully, but these errors were encountered:

PThorpe92 · 2025-01-14T18:20:53Z

Some great stuff here! ❤️

I also made some observations when re-designing the io_uring module a couple weeks ago, and although it wasn't the direction the project wanted to go (involved batching all writev sqe's into 1 syscall submission during an event of SQ overflow), I walked away with some good insight on areas that could be improved. Pretty much all of which you have listed here already :)

io_uring: register FD's and buffers to the ring.
optimize Direct I/O for the block device(s) used.
optimize BufferData size

I am also gaining more context daily about the rest of the codebase, but this is great to have somewhere to mark down observations or just open questions around IO performance 👍 So I'll be sure to keep up with this thread.

Note For reference I do believe that #570 is indeed fixed on main at this moment

LtdJorge · 2025-01-14T21:17:12Z

I also made some observations when re-designing the io_uring module a couple weeks ago, and although it wasn't the direction the project wanted to go (involved batching all writev sqe's into 1 syscall submission during an event of SQ overflow), I walked away with some good insight on areas that could be improved.

Yup, I had a look at your PR a few days ago. If you come up with any other improvement, feel free to add it.

PThorpe92 · 2025-01-15T15:18:11Z

I think we should evaluate the efficiency or just general use of vectored IO operations with the io_uring backend. I personally still think there is a scenario where we can group contiguous writes and submit as a single syscall. Seems like anything > 1 page in size at least, there is a pretty huge opportunity for perf increase.

LtdJorge mentioned this issue Jan 23, 2025

Async IO does not guarantee order of operations #772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for core: improve performance of IO backends #684

Tracking issue for core: improve performance of IO backends #684

LtdJorge commented Jan 14, 2025 •

edited

Loading

PThorpe92 commented Jan 14, 2025

LtdJorge commented Jan 14, 2025

PThorpe92 commented Jan 15, 2025

Tracking issue for core: improve performance of IO backends #684

Tracking issue for core: improve performance of IO backends #684

Comments

LtdJorge commented Jan 14, 2025 • edited Loading

Rationale

Steps

An initial list of steps, in rough order

Issues that require or would be affected by this issue

Footnotes

PThorpe92 commented Jan 14, 2025

LtdJorge commented Jan 14, 2025

PThorpe92 commented Jan 15, 2025

LtdJorge commented Jan 14, 2025 •

edited

Loading