libimage: fix manifest race for AnnotateInstance() and RemoveInstance()#2504
Conversation
Neither AnnotateInstance() or RemoveInstance() use the lock so we were still open to many races. Fixes: https://issues.redhat.com/browse/RHEL-21291 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Share code as they do the same thing. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
This does nothing it is just overhead. The lookupManifestList() call returns a new image which when then use and that also already calls image.reload() internally. Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Reviewer's GuideAdded manifest-specific locks in AnnotateInstance and RemoveInstance to avoid races and refactored reload logic by extracting reloadID to streamline saveAndReload and reload methods. Sequence diagram for manifest list mutation with lockingsequenceDiagram
participant Caller
participant ManifestList
participant Locker
participant Storage
Caller->>ManifestList: AnnotateInstance/RemoveInstance(d, options)
ManifestList->>Locker: LockerForImage(store, ID)
Locker->>ManifestList: lock()
ManifestList->>Storage: reload()
Storage-->>ManifestList: latest data
ManifestList->>ManifestList: mutate manifest list
Locker->>ManifestList: unlock()
ManifestList-->>Caller: result
Class diagram for ManifestList reload logic refactor and lockingclassDiagram
class ManifestList {
+saveAndReload()
+reload()
+reloadID(listID string)
+AnnotateInstance(d, options)
+RemoveInstance(d)
image
list
}
class Locker {
+Lock()
+Unlock()
}
ManifestList --> Locker : uses
ManifestList --> "1" image : has
ManifestList --> "1" list : has
Class diagram for Locker usage in AnnotateInstance and RemoveInstanceclassDiagram
class ManifestList {
+AnnotateInstance(d, options)
+RemoveInstance(d)
}
class Locker {
+Lock()
+Unlock()
}
ManifestList --> Locker : acquires lock
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
I will test tomorrow morning to verify this fixes the race on my end |
|
Tested locally. Without the patch, I consistently see 2-3 failures when creating 100 manifests in a tight loop. With the patch, I see 0 failures. |
|
LGTM |
There was a problem hiding this comment.
Hey @Luap99 - I've reviewed your changes - here's some feedback:
- Refactor the repeated lock acquisition and reload boilerplate in AnnotateInstance and RemoveInstance into a shared helper to reduce duplication and improve maintainability.
- Verify that using lookupManifestList in reloadID fully replaces the previous m.image.reload + lookup sequence to guarantee the same data freshness, since dropping the explicit reload step could introduce stale-state issues.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Refactor the repeated lock acquisition and reload boilerplate in AnnotateInstance and RemoveInstance into a shared helper to reduce duplication and improve maintainability.
- Verify that using lookupManifestList in reloadID fully replaces the previous m.image.reload + lookup sequence to guarantee the same data freshness, since dropping the explicit reload step could introduce stale-state issues.
## Individual Comments
### Comment 1
<location> `libimage/manifest_list.go:661` </location>
<code_context>
return nil
}
+ locker, err := manifests.LockerForImage(m.image.runtime.store, m.ID())
+ if err != nil {
+ return err
</code_context>
<issue_to_address>
Consider refactoring the repeated lock, reload, and unlock logic into a helper method to reduce boilerplate and improve maintainability.
Here’s one way to collapse all of the repeated “lock, reload, defer‐unlock” boilerplate into a small helper. You still get the same behavior (locking + initial reload) and leave both `saveAndReload` and `reload` untouched.
```go
// add this near the top of manifest_list.go
// withLockAndReload grabs the image‐specific mutex, reloads the
// in‐memory state, then runs `action`. Unlock happens via defer.
func (m *ManifestList) withLockAndReload(action func() error) error {
locker, err := manifests.LockerForImage(m.image.runtime.store, m.ID())
if err != nil {
return err
}
locker.Lock()
defer locker.Unlock()
if err := m.reload(); err != nil {
return err
}
return action()
}
```
Then reduce your two methods to:
```go
func (m *ManifestList) AnnotateInstance(d digest.Digest, options *ManifestListAnnotateOptions) error {
if options == nil {
return nil
}
return m.withLockAndReload(func() error {
if len(options.OS) > 0 {
if err := m.list.SetOS(d, options.OS); err != nil {
return err
}
}
if len(options.OSVersion) > 0 {
if err := m.list.SetOSVersion(d, options.OSVersion); err != nil {
return err
}
}
return m.saveAndReload()
})
}
func (m *ManifestList) RemoveInstance(d digest.Digest) error {
return m.withLockAndReload(func() error {
if err := m.list.Remove(d); err != nil {
return err
}
return m.saveAndReload()
})
}
```
This removes the duplicated lock/unlock+reload in each method while preserving all original functionality.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| return nil | ||
| } | ||
|
|
||
| locker, err := manifests.LockerForImage(m.image.runtime.store, m.ID()) |
There was a problem hiding this comment.
issue (complexity): Consider refactoring the repeated lock, reload, and unlock logic into a helper method to reduce boilerplate and improve maintainability.
Here’s one way to collapse all of the repeated “lock, reload, defer‐unlock” boilerplate into a small helper. You still get the same behavior (locking + initial reload) and leave both saveAndReload and reload untouched.
// add this near the top of manifest_list.go
// withLockAndReload grabs the image‐specific mutex, reloads the
// in‐memory state, then runs `action`. Unlock happens via defer.
func (m *ManifestList) withLockAndReload(action func() error) error {
locker, err := manifests.LockerForImage(m.image.runtime.store, m.ID())
if err != nil {
return err
}
locker.Lock()
defer locker.Unlock()
if err := m.reload(); err != nil {
return err
}
return action()
}Then reduce your two methods to:
func (m *ManifestList) AnnotateInstance(d digest.Digest, options *ManifestListAnnotateOptions) error {
if options == nil {
return nil
}
return m.withLockAndReload(func() error {
if len(options.OS) > 0 {
if err := m.list.SetOS(d, options.OS); err != nil {
return err
}
}
if len(options.OSVersion) > 0 {
if err := m.list.SetOSVersion(d, options.OSVersion); err != nil {
return err
}
}
return m.saveAndReload()
})
}
func (m *ManifestList) RemoveInstance(d digest.Digest) error {
return m.withLockAndReload(func() error {
if err := m.list.Remove(d); err != nil {
return err
}
return m.saveAndReload()
})
}This removes the duplicated lock/unlock+reload in each method while preserving all original functionality.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99, sourcery-ai[bot] The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@containers/podman-maintainers PTAL |
|
/lgtm |
Neither AnnotateInstance() or RemoveInstance() use the lock so we were still open to many races.
The reason the should fix the podman manifest add race is because that one also always calls AnnotateInstance().
Fixes: https://issues.redhat.com/browse/RHEL-21291
Also some two small commits to reduce the overhead of reload a bit.
Summary by Sourcery
Prevent races in manifest instance annotation and removal by acquiring a per-image lock and reloading the manifest list, and refactor reload logic into a shared helper.
Bug Fixes:
Enhancements: