Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some synchronization in Reference to avoid deadlock #20694

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pshipton
Copy link
Member

Issue #13823

@pshipton
Copy link
Member Author

jenkins test sanity plinux jdk8

@pshipton
Copy link
Member Author

@babsingh
Copy link
Contributor

babsingh commented Nov 28, 2024

The deadlock happens because the two locks are acquired in different order in the two code paths.

20:55:13  "Finalizer thread" Id=29 BLOCKED on java.lang.ref.ReferenceQueue@49d657dd owned by "Thread-11" Id=33
20:55:13  	at java.lang.ref.ReferenceQueue.enqueue(ReferenceQueue.java:179) // Acquire ReferenceQueue's monitor
20:55:13  	at java.lang.ref.Reference.enqueueImpl(Reference.java:147) // Acquire Reference's monitor
20:55:13  


20:55:13  "Thread-11" Id=33 BLOCKED on java.util.logging.LogManager$LoggerWeakRef@596d6465 owned by "Finalizer thread" Id=29
20:55:13  	at java.lang.ref.Reference.dequeue(Reference.java:195) // Acquire Reference's monitor
20:55:13  	at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:85) // Acquire ReferenceQueue's monitor
20:55:13  	at java.util.logging.LogManager.drainLoggerRefQueueBounded(LogManager.java:1130)
20:55:13  	at java.util.logging.LogManager.addLogger(LogManager.java:1160)
20:55:13  	at java.util.logging.LogManager.demandLogger(LogManager.java:556)
20:55:13  	at java.util.logging.Logger.demandLogger(Logger.java:455)
20:55:13  	at java.util.logging.Logger.getLogger(Logger.java:502)
20:55:13  	at TestLogConfigurationDeadLock$AddLogger.run(TestLogConfigurationDeadLock.java:175)

Removing the synchronized block in dequeue might change Reference.state while enqueue is being executed. This can lead to incorrect behavior.

@babsingh
Copy link
Contributor

babsingh commented Nov 28, 2024

The correct fix is to avoid the different order in which the two locks are acquired. Example: In ReferenceQueue, we can avoid invoking Reference.dequeue() from within the synchronized(this) block.

public Reference<? extends T> poll () {
        Reference ref;

        /* Optimization to return immediately and not synchronize if there is nothing in the queue */
        if(empty) {
                return null;
        }
        synchronized(this) {
                if(empty) {
                        return null;
                }
                ref = references[head];
                references[head++] = null;
                if(head == references.length) {
                        head = 0;
                }
                if(head == tail) {
                        empty = true;
                }
        }
        ref.dequeue(); // Moved outside the synchronized block
        return ref;
}
public Reference<? extends T> remove(long timeout) throws IllegalArgumentException, InterruptedException {
        if (timeout < 0) throw new IllegalArgumentException();

        Reference ref;
        synchronized(this) {
                if(empty) {
                        wait(timeout);
                        if(empty) return null;
                }
                ref = references[head];
                references[head++] = null;
                if(head == references.length) {
                        head = 0;
                }
        }
        ref.dequeue(); // Moved outside the synchronized block
        synchronized(this) { // Not sure if it matters to notifyAll() after invoking dequeue()
                if(head == tail) {
                        empty = true;
                } else {
                        notifyAll();
                }
        }
        return ref;
}

@dmitripivkine Does it matter if we move the dequeue operation at the end? Also, is there a requirement to notifyAll after dequeue or can we notifyAll first and then invoke dequeue?

@keithc-ca
Copy link
Contributor

correct fix is to avoid the different order in which the two locks are acquired

Because this is a public class, we don't have complete control over the synchronization order.

Removing synchronization in isEnqueued() is safe because there are no values of state that might be observed that weren't possible before.

All of the uses of dequeue() are in ReferenceQueue - none of them require synchronization within dequeue().

@babsingh
Copy link
Contributor

All of the uses of dequeue() are in ReferenceQueue - none of them require synchronization within dequeue().

Once the synchronized block in Reference.dequeue is removed, there will be data contention in updating Reference.state when ReferenceQueue.poll or ReferenceQueue.remove is invoked concurrently with Reference.enqueue.

@pshipton
Copy link
Member Author

I don't think the change to remove() is correct in #20694 (comment)

I've pushed a new commit (I can squash before merge) to ensure lock ordering and consistency between enqueue and dequeue.

@babsingh
Copy link
Contributor

babsingh commented Nov 28, 2024

I don't think the change to remove() is correct in #20694 (comment)

head and empty fields need to be updated in a single synchronized block. Spreading the operations across two synchronized blocks will lead to inconsistency in updating those fields. My initial proposal, before updating my previous comment, was the following:

public Reference<? extends T> remove(long timeout) throws IllegalArgumentException, InterruptedException {
        if (timeout < 0) throw new IllegalArgumentException();

        Reference ref;
        boolean notifyAll = true;
        synchronized(this) {
                if(empty) {
                        wait(timeout);
                        if(empty) return null;
                }
                ref = references[head];
                references[head++] = null;
                if(head == references.length) {
                        head = 0;
                }
                if(head == tail) {
                        empty = true;
                        notifyAll = false;
                }
        }
        ref.dequeue(); // Moved outside the synchronized block
        if (notifyAll) {
                synchronized(this) {
                        notifyAll();
                }
        }
        return ref;
}

@pshipton
Copy link
Member Author

jenkins test sanity plinux jdk8

/*[PR CMVC 96472] deadlock loading sun.misc.Cleaner */
/*[PR 102259] call Cleaner.clean(), do not enqueue */
if (reference instanceof Cleaner) {
reference.dequeue();
((Cleaner)reference).clean();
return true;
reference.setEnqueued();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this, and the other a few lines below, should be removed, but it's what was done previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this call to setEnqueued(), nor the one on line 213.

/*[PR CMVC 96472] deadlock loading sun.misc.Cleaner */
/*[PR 102259] call Cleaner.clean(), do not enqueue */
if (reference instanceof Cleaner) {
reference.dequeue();
((Cleaner)reference).clean();
return true;
reference.setEnqueued();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this call to setEnqueued(), nor the one on line 213.

}

/**
* Called when a Reference has been added to it's ReferenceQueue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct the spelling of "its".

@pshipton
Copy link
Member Author

Previous PR build https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/6597/ passed.

@pshipton
Copy link
Member Author

jenkins compile plinux jdk8

@pshipton
Copy link
Member Author

jenkins compile plinux jdk11

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

Just some known problems in the test results, the non-network failures passed in reruns.

@pshipton pshipton force-pushed the refqueue branch 2 times, most recently from 7ffa7c4 to 2ecb6c9 Compare December 2, 2024 15:05
@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

jenkins compile plinux jdk8,jdk21

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

Updated.

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

jenkins compile amac jdk21

@pshipton
Copy link
Member Author

pshipton commented Dec 2, 2024

Created ibmruntimes/openj9-openjdk-jdk#900 and a backport for jdk23. I accidentally pushed the same change to jdk21. We shouldn't remove the constructor until z/OS and valhalla are also updated.

T tempReferent = referent;
synchronized(this) {
/* Static order for the following code (DO NOT CHANGE) */
tempQueue = queue;
final ReferenceQueue tempQueue = queue;
queue = null;
if (state == STATE_ENQUEUED || tempQueue == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this test could be reordered to (sometimes) avoid a volatile read of state?

			if ((null == tempQueue) || (STATE_ENQUEUED == state)) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and again

reprocess();
}
tempQueue.enqueue(this);
if (null != tempReferent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 200 and 204 should be consistent: the null (or the constant) go on the left or the right, but not a mix of both, please.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STATE_ENQUEUED is also a constant: it should go on the left just like null.

reprocess();
}
tempQueue.enqueue(this);
if (null != tempReferent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STATE_ENQUEUED is also a constant: it should go on the left just like null.

Comment on lines 242 to 244
ReferenceQueue(int value) {
this();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor can also be removed now that ibmruntimes/openj9-openjdk-jdk#899 and back-ports have been merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, see #20694 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't consider z/OS or Valhalla.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants