Skip to content

update tables #1515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

update tables #1515

wants to merge 3 commits into from

Conversation

RTann
Copy link
Contributor

@RTann RTann commented Apr 11, 2025

First: scannerlist is never used, so I figured why not remove it here

Second: we noticed an issue when there are two packages with the same name/version in the same image in different locations, the Indexer only outputs one of them. That is because the package table does not account for package_db (ie package location). When the coalescer gets all the packages for a layer, it will see the two unique packages with the same ID. The IndexReport's packages are keyed by the DB's package ID, so even though the coalescer knows about both packages, it is overwritten in the IndexReport.

@crozzy
Copy link
Contributor

crozzy commented Apr 11, 2025

I don't think we should do this, the package table should represent the canonical package and not be layer specific (the package_scanartifact should have any layer specific information), adding the package_db (which is often used just as the path to the package within the layer) will break that and also explode the size of that table for larger production environments.

@hdonnay
Copy link
Member

hdonnay commented Apr 11, 2025

Yeah, I agree: this is actually a feature of a layer. We do not want new packages for every path a package could exist at.

@RTann
Copy link
Contributor Author

RTann commented Apr 11, 2025

That is fair, but now I'm left wondering if there is another way to handle this while also minimizing changes. One option I can think of is adding an id BIGSERIAL column to the package_scanartifact table and then using that ID in the IndexReport instead of the ID from the package table. Perhaps that's better and also can give us more information about the specific entry that we found in the IndexReport (easy to go from this new ID to the package_scanartifact table which will tell us about the layer and such). What do you think @hdonnay @crozzy ?

This approach doesn't quite account for source packages, though, as we'd now give the package and source package the same ID...

@RTann RTann force-pushed the update-tables branch 2 times, most recently from 4c1f798 to 23b3372 Compare April 11, 2025 17:41
@@ -0,0 +1,5 @@
-- The package table only save one copy for each unique name+version of a package (and a few more fields).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment for now

RTann added 3 commits April 11, 2025 10:42
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: RTann <[email protected]>
Signed-off-by: RTann <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: RTann <[email protected]>
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: RTann <[email protected]>
@crozzy
Copy link
Contributor

crozzy commented Apr 11, 2025

The ol' DB IDs in the user facing responses comes back to haunt us again. Technically the keying the packages (and environments) by the Package.ID is not required, the index report just has to be consistent with itself.......

Another option is to append to the package_db string when you see the same package again in the coalescer, as in "you have this package installed at these two locations" (I haven't really thought through the implication of this I must admit).

@RTann
Copy link
Contributor Author

RTann commented Apr 12, 2025

The ol' DB IDs in the user facing responses comes back to haunt us again. Technically the keying the packages (and environments) by the Package.ID is not required, the index report just has to be consistent with itself.......

Another option is to append to the package_db string when you see the same package again in the coalescer, as in "you have this package installed at these two locations" (I haven't really thought through the implication of this I must admit).

I did consider just using some random IDs in the coalescer, which I'm sure is doable. I did like the idea of the IDs being closely coupled with the DB for debugging purposes (to know exactly which row in the table this entry is associated with). I don't think this is urgent, but it'd be nice to figure out a solution sometime in the near future. We can talk more offline at some point

@daynewlee
Copy link
Contributor

daynewlee commented Apr 21, 2025

I agree with using ir.Packages[pkg.ID+pkg.PackageDB] = pkg to add each package to the index report, if that's what the discussions are referring to. But I’m wondering if using pkg.ID+pkg.PackageDB could cause issues with layers — for example, when a package is added in one layer and then deleted in a subsequent layer.
Or maybe use pkg.ID+pkg.Filepath ?
@crozzy @RTann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants