When Scheduler encounters an internal error, Peer CANNOT download piece #3768

karlhjm · 2025-01-15T08:29:46Z

Bug report:

I customized and rewrote the evaluate method but due to my mistake, it will cause an internal error, then the peer was unable to get the parent download piece properly, resulting in the failure of the image download.
What's more, in this situation, the peer cannot back to source even if disableAutoBackSource=false.

Expected behavior:

If the scheduler encounters an internal error, the peer should perform a download back to the source

How to reproduce it:

inject an err in evaluate method such as

func (e *evaluatorBase) evaluate(parent *resource.Peer, child *resource.Peer, totalPieceCount int32) float64 {
	parentLocation := parent.Host.Network.Location
	parentIDC := parent.Host.Network.IDC
	childLocation := child.Host.Network.Location
	childIDC := child.Host.Network.IDC

         //  index out of range here
        injectErr := make([]float64, 1)
	return injectErr[2] +
                 finishedPieceWeight*e.calculatePieceScore(parent, child, totalPieceCount) +
		parentHostUploadSuccessWeight*e.calculateParentHostUploadSuccessScore(parent) +
		freeUploadWeight*e.calculateFreeUploadScore(parent.Host) +
		hostTypeWeight*e.calculateHostTypeScore(parent) +
		idcAffinityWeight*e.calculateIDCAffinityScore(parentIDC, childIDC) +
		locationAffinityWeight*e.calculateMultiElementAffinityScore(parentLocation, childLocation)
}

peer A download an image for the first time in the cluster, will back to source successfully
peer B download the same image, expected to choose peer A as the parent to download, but in reality, it cannot download the image and cannot back to source

Environment:

Dragonfly version: 2.1.0
OS: ubuntu16.04
Kernel (e.g. uname -a):
Others:

The text was updated successfully, but these errors were encountered:

karlhjm · 2025-01-15T08:35:31Z

here is the log from peer, when it cannot download images, and the internal error is made in 'evalate' method by myself.

Here are some key logs. In the end, the peer failed to download the piece and could not back to source

peer/peertask_conductor.go:845 receive peer packet failed: rpc error: code = Internal desc = runtime error: index out of range [2] with length 1
peer task failed, no need to wait first peer
wait first piece failed due to peer task failed
start stream task error: peer task failed: 1500
round trip error

karlhjm added the bug label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When Scheduler encounters an internal error, Peer CANNOT download piece #3768

When Scheduler encounters an internal error, Peer CANNOT download piece #3768

karlhjm commented Jan 15, 2025 •

edited

Loading

karlhjm commented Jan 15, 2025 •

edited

Loading

When Scheduler encounters an internal error, Peer CANNOT download piece #3768

When Scheduler encounters an internal error, Peer CANNOT download piece #3768

Comments

karlhjm commented Jan 15, 2025 • edited Loading

Bug report:

Expected behavior:

How to reproduce it:

Environment:

karlhjm commented Jan 15, 2025 • edited Loading

karlhjm commented Jan 15, 2025 •

edited

Loading

karlhjm commented Jan 15, 2025 •

edited

Loading