Implement a ComponentsRepo for alpm/pacman package databases#50
Implement a ComponentsRepo for alpm/pacman package databases#50jlebon merged 4 commits intocoreos:mainfrom
ComponentsRepo for alpm/pacman package databases#50Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for alpm/pacman package databases by implementing a ComponentsRepo. While the implementation is a great addition, it has critical security vulnerabilities. It lacks sufficient validation and resource limits when processing untrusted package metadata, making it vulnerable to decompression bombs and Out-of-Memory (OOM) crashes due to unbounded file reads and decompression of mtree files. Malformed inputs can also trigger panics. Additionally, there are suggestions to improve code clarity and general error handling, specifically replacing expect() with proper error handling to prevent panics. Addressing these issues is crucial for safely processing untrusted root filesystems.
|
I haven't done a formal review yet, but just immediately what catches my eye is the number of added deps. At the very least, we'd need these feature gated. But even so, can you verify if those crates have features we can turn off? |
|
Thanks for pointing this out, I didn't notice that. I guess that's what you get for depending on a full-fledged parsing library. I had a quick look at parsing the files myself and ended up with an ~100 LOC implementation including comments. Contrary to the crate (where I stumbled over https://gitlab.archlinux.org/archlinux/alpm/alpm/-/issues/268), it successfully parses my local package database that contains quite a few packages, some of them build from AUR. I think this is preferrable to the crate. |
8e89564 to
ff4dd32
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for ALPM/pacman package databases by implementing a new ComponentsRepo. This is a significant and well-executed feature addition. The implementation in src/components/alpm.rs is well-structured, featuring a dedicated parser for ALPM's desc and files formats, and includes good test coverage. The refactoring of common utility functions like canonicalize_parent_path and calculate_stability into src/utils.rs is a great improvement for code reuse and maintainability. The new end-to-end tests for Arch Linux images are also a valuable addition. The overall code quality is high. I have a few minor suggestions to further refine the implementation.
|
@jlebon Hey, I think this has progressed pretty well and I'd consider this nearly complete. I plan to add some more unit tests and consider some of the AI improvements above, but apart from that, I think it's ready for a (human) review now :-) One thing I noted that bugs me a little is the |
Nice, that sounds great! I need to put some time aside to review this more in-depth but at a high-level it looks good. A few things:
|
|
I'm also now adding a (To be clear, that wasn't prompted by anything necessarily wrong in this submission. But assuming you're using a coding agent, it's good to have a look over it. :) ) |
|
@marcoh00 Hi! Are you still interested in pursuing this? |
|
@jlebon Hey, sorry for taking this long to finish this! I was hoping to be able to finish this before my vacation, which didn't work out unfortunately! I'll be back next week and plan to finish it then. |
|
@jlebon The commit history is cleaned up, the test fixture is now a single .tar.gz and the code now contains calls to After a brief look, I don't think the alpm repository has enough information for the "weak claims", as it doesn't know about any hashes, but I haven't had a look at the rationale for these changes, so I really cannot tell right now if there's something useful to do here. |
jlebon
left a comment
There was a problem hiding this comment.
Partial review. I haven't reviewed the parsing bits and the tests yet. Don't be alarmed by the number of comments. :) The approach looks good and I have mostly nits.
Yeah, don't worry about weak claims. We can add that if there's a use case for it in the Arch ecosystem. E.g. do bootc Arch images do some of the same moves that is done in Fedora?
7ac3f41 to
210ed75
Compare
|
@jlebon I addressed your points, implemented a simple parser for |
ComponentsRepo for alpm/pacman package databasesComponentsRepo for alpm/pacman package databases
jlebon
left a comment
There was a problem hiding this comment.
OK, full review now. More nits mostly. :)
Thanks for continuing to push this forward!
src/components/alpm.rs
Outdated
| const REPO_NAME: &str = "alpm"; | ||
|
|
||
| /// These paths are searched for a local ALPM database. First is the default path of Arch Linux, | ||
| /// second is currently used by the popular ghcr.io/bootcrew/arch-bootc image. |
There was a problem hiding this comment.
So now that we do parse pacman.conf, presumably we don't actually need to hardcode the /usr/lib/sysimage/ path in? I'd just prefer to keep semi/non-official APIs out of chunkah if we can.
Some functionality such as path canonicalization is useful for more types of repositories besides rpm. As such, move these functions from the rpm repository to the `utils` module and make them publicy callable. Additionally, make `calculate_stability` return a f64 directly and include a safety comment on why the `.iter().min().unwrap()` call won't fail.
Provides a test fixture with a local package database consisting of the packages in `base`. Moreover, provide an E2E test for chunking Arch Linux-based images.
a141b2c to
22fe977
Compare
|
@jlebon Just pushed my recent set of changes. Feel free to take a look. |
jlebon
left a comment
There was a problem hiding this comment.
LGTM! Thank you so much for this!
I took the liberty of just force-pushing and squashing the one nit I had which was to use the same tense for error contexts as the rest of the codebase. Also tweaked one or two errors while I did that. Let me know if some of those should be tweaked! We can do that in a follow-up.
|
Actually, let me just do a final Gemini run in case it finds any major issues. /gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for Arch Linux by implementing an ALPM component repository that parses package databases and integrates with pacman configuration. It also refactors shared utilities for path canonicalization and stability calculation from the RPM module into a common utility file. Review feedback recommends optimizing performance by sharing the path resolution cache across packages, increasing the buffer size for reading version files, and making the ALPM file parser more robust against leading empty lines.
The ALPM component repository can parse local package databases managed by Arch Linux' pacman package manager. This allows chunkah to chunk container images based on Arch Linux.
…e same srpm are processed
This is a first draft and pretty much untested. I will add some more unit and end-to-end tests in the next few days.
Closes #45