Skip to content

Implement a ComponentsRepo for alpm/pacman package databases#50

Merged
jlebon merged 4 commits intocoreos:mainfrom
marcoh00:feature/alpm
Apr 1, 2026
Merged

Implement a ComponentsRepo for alpm/pacman package databases#50
jlebon merged 4 commits intocoreos:mainfrom
marcoh00:feature/alpm

Conversation

@marcoh00
Copy link
Copy Markdown
Contributor

This is a first draft and pretty much untested. I will add some more unit and end-to-end tests in the next few days.

Closes #45

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for alpm/pacman package databases by implementing a ComponentsRepo. While the implementation is a great addition, it has critical security vulnerabilities. It lacks sufficient validation and resource limits when processing untrusted package metadata, making it vulnerable to decompression bombs and Out-of-Memory (OOM) crashes due to unbounded file reads and decompression of mtree files. Malformed inputs can also trigger panics. Additionally, there are suggestions to improve code clarity and general error handling, specifically replacing expect() with proper error handling to prevent panics. Addressing these issues is crucial for safely processing untrusted root filesystems.

@jlebon
Copy link
Copy Markdown
Member

jlebon commented Feb 12, 2026

I haven't done a formal review yet, but just immediately what catches my eye is the number of added deps. At the very least, we'd need these feature gated. But even so, can you verify if those crates have features we can turn off?

@marcoh00
Copy link
Copy Markdown
Contributor Author

Thanks for pointing this out, I didn't notice that. I guess that's what you get for depending on a full-fledged parsing library.

I had a quick look at parsing the files myself and ended up with an ~100 LOC implementation including comments. Contrary to the crate (where I stumbled over https://gitlab.archlinux.org/archlinux/alpm/alpm/-/issues/268), it successfully parses my local package database that contains quite a few packages, some of them build from AUR. I think this is preferrable to the crate.

@marcoh00 marcoh00 force-pushed the feature/alpm branch 3 times, most recently from 8e89564 to ff4dd32 Compare February 14, 2026 21:04
@marcoh00
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for ALPM/pacman package databases by implementing a new ComponentsRepo. This is a significant and well-executed feature addition. The implementation in src/components/alpm.rs is well-structured, featuring a dedicated parser for ALPM's desc and files formats, and includes good test coverage. The refactoring of common utility functions like canonicalize_parent_path and calculate_stability into src/utils.rs is a great improvement for code reuse and maintainability. The new end-to-end tests for Arch Linux images are also a valuable addition. The overall code quality is high. I have a few minor suggestions to further refine the implementation.

@marcoh00
Copy link
Copy Markdown
Contributor Author

@jlebon Hey, I think this has progressed pretty well and I'd consider this nearly complete. I plan to add some more unit tests and consider some of the AI improvements above, but apart from that, I think it's ready for a (human) review now :-)

One thing I noted that bugs me a little is the now_secs() test function, which is now duplicated three times for alpm, rpm and utils tests. That is, in addition to utils::get_current_epoch(). Maybe we should just get rid of them and use get_current_epoch.unwrap() for tests?

@jlebon
Copy link
Copy Markdown
Member

jlebon commented Feb 17, 2026

I had a quick look at parsing the files myself and ended up with an ~100 LOC implementation including comments.

Nice, that sounds great!

I need to put some time aside to review this more in-depth but at a high-level it looks good. A few things:

  1. We'll need to clean up the git history here.
  2. If you'd like, we can start in a separate prep PR with the commits needed to move functions to utils.rs.
  3. tree: add logging using tracing crate #54 added logging throughout the codebase. Could you add similar log points to what now exists in the other ComponentsRepo backends?
  4. For the test fixtures, perhaps let's tar.xz that up and then unit tests can unpack it in a tempdir?

@jlebon
Copy link
Copy Markdown
Member

jlebon commented Feb 17, 2026

I'm also now adding a CONTRIBUTING.md in #61. It includes guidelines wrt AI contributions which may be relevant for this PR.

(To be clear, that wasn't prompted by anything necessarily wrong in this submission. But assuming you're using a coding agent, it's good to have a look over it. :) )

@jlebon
Copy link
Copy Markdown
Member

jlebon commented Mar 2, 2026

@marcoh00 Hi! Are you still interested in pursuing this?

@marcoh00
Copy link
Copy Markdown
Contributor Author

marcoh00 commented Mar 9, 2026

@jlebon Hey, sorry for taking this long to finish this! I was hoping to be able to finish this before my vacation, which didn't work out unfortunately!

I'll be back next week and plan to finish it then.

@marcoh00
Copy link
Copy Markdown
Contributor Author

@jlebon The commit history is cleaned up, the test fixture is now a single .tar.gz and the code now contains calls to tracing. I didn't have time to look at the changes to the core trait in detail by now and hope to catch up in the coming days.

After a brief look, I don't think the alpm repository has enough information for the "weak claims", as it doesn't know about any hashes, but I haven't had a look at the rationale for these changes, so I really cannot tell right now if there's something useful to do here.

Copy link
Copy Markdown
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. I haven't reviewed the parsing bits and the tests yet. Don't be alarmed by the number of comments. :) The approach looks good and I have mostly nits.

Yeah, don't worry about weak claims. We can add that if there's a use case for it in the Arch ecosystem. E.g. do bootc Arch images do some of the same moves that is done in Fedora?

@marcoh00 marcoh00 force-pushed the feature/alpm branch 4 times, most recently from 7ac3f41 to 210ed75 Compare March 24, 2026 14:07
@marcoh00
Copy link
Copy Markdown
Contributor Author

@jlebon I addressed your points, implemented a simple parser for pacman.conf and made the database validation a bit more robust (a local alpm db must have an ALPM_DB_VERSION file now). Right now, it prints a warning if the version that is read doesn't match the one the code is tested with, but does not fail. I'd suggest to keep it like this and maybe change it if issues should pile up in the future. I had a quick look at the spec history and didn't see anything obvious that would've been a breaking change.

@marcoh00 marcoh00 marked this pull request as ready for review March 24, 2026 14:08
@marcoh00 marcoh00 changed the title WIP: Implement a ComponentsRepo for alpm/pacman package databases Implement a ComponentsRepo for alpm/pacman package databases Mar 24, 2026
Copy link
Copy Markdown
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, full review now. More nits mostly. :)
Thanks for continuing to push this forward!

const REPO_NAME: &str = "alpm";

/// These paths are searched for a local ALPM database. First is the default path of Arch Linux,
/// second is currently used by the popular ghcr.io/bootcrew/arch-bootc image.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that we do parse pacman.conf, presumably we don't actually need to hardcode the /usr/lib/sysimage/ path in? I'd just prefer to keep semi/non-official APIs out of chunkah if we can.

Some functionality such as path canonicalization is useful for more types of
repositories besides rpm. As such, move these functions from the rpm repository
to the `utils` module and make them publicy callable.

Additionally, make `calculate_stability` return a f64 directly and include a safety
comment on why the `.iter().min().unwrap()` call won't fail.
Provides a test fixture with a local package database consisting of the packages
in `base`. Moreover, provide an E2E test for chunking Arch Linux-based images.
@marcoh00 marcoh00 force-pushed the feature/alpm branch 2 times, most recently from a141b2c to 22fe977 Compare March 29, 2026 16:32
@marcoh00
Copy link
Copy Markdown
Contributor Author

@jlebon Just pushed my recent set of changes. Feel free to take a look.

Copy link
Copy Markdown
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you so much for this!

I took the liberty of just force-pushing and squashing the one nit I had which was to use the same tense for error contexts as the rest of the codebase. Also tweaked one or two errors while I did that. Let me know if some of those should be tweaked! We can do that in a follow-up.

@jlebon
Copy link
Copy Markdown
Member

jlebon commented Apr 1, 2026

Actually, let me just do a final Gemini run in case it finds any major issues.

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Arch Linux by implementing an ALPM component repository that parses package databases and integrates with pacman configuration. It also refactors shared utilities for path canonicalization and stability calculation from the RPM module into a common utility file. Review feedback recommends optimizing performance by sharing the path resolution cache across packages, increasing the buffer size for reading version files, and making the ALPM file parser more robust against leading empty lines.

The ALPM component repository can parse local package databases
managed by Arch Linux' pacman package manager.
This allows chunkah to chunk container images based on Arch Linux.
@jlebon jlebon enabled auto-merge (rebase) April 1, 2026 02:44
@jlebon jlebon merged commit 1161a92 into coreos:main Apr 1, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support alpm/pacman package manager

2 participants