Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Scheduler encounters an internal error, Peer CANNOT download piece #3768

Open
karlhjm opened this issue Jan 15, 2025 · 1 comment
Open
Labels

Comments

@karlhjm
Copy link
Contributor

karlhjm commented Jan 15, 2025

Bug report:

I customized and rewrote the evaluate method but due to my mistake, it will cause an internal error, then the peer was unable to get the parent download piece properly, resulting in the failure of the image download.
What's more, in this situation, the peer cannot back to source even if disableAutoBackSource=false.

Expected behavior:

If the scheduler encounters an internal error, the peer should perform a download back to the source

How to reproduce it:

  1. inject an err in evaluate method such as
func (e *evaluatorBase) evaluate(parent *resource.Peer, child *resource.Peer, totalPieceCount int32) float64 {
	parentLocation := parent.Host.Network.Location
	parentIDC := parent.Host.Network.IDC
	childLocation := child.Host.Network.Location
	childIDC := child.Host.Network.IDC

         //  index out of range here
        injectErr := make([]float64, 1)
	return injectErr[2] +
                 finishedPieceWeight*e.calculatePieceScore(parent, child, totalPieceCount) +
		parentHostUploadSuccessWeight*e.calculateParentHostUploadSuccessScore(parent) +
		freeUploadWeight*e.calculateFreeUploadScore(parent.Host) +
		hostTypeWeight*e.calculateHostTypeScore(parent) +
		idcAffinityWeight*e.calculateIDCAffinityScore(parentIDC, childIDC) +
		locationAffinityWeight*e.calculateMultiElementAffinityScore(parentLocation, childLocation)
}
  1. peer A download an image for the first time in the cluster, will back to source successfully
  2. peer B download the same image, expected to choose peer A as the parent to download, but in reality, it cannot download the image and cannot back to source

Environment:

  • Dragonfly version: 2.1.0
  • OS: ubuntu16.04
  • Kernel (e.g. uname -a):
  • Others:
@karlhjm karlhjm added the bug label Jan 15, 2025
@karlhjm
Copy link
Contributor Author

karlhjm commented Jan 15, 2025

here is the log from peer, when it cannot download images, and the internal error is made in 'evalate' method by myself.

Here are some key logs. In the end, the peer failed to download the piece and could not back to source

peer/peertask_conductor.go:845 receive peer packet failed: rpc error: code = Internal desc = runtime error: index out of range [2] with length 1
peer task failed, no need to wait first peer
wait first piece failed due to peer task failed
start stream task error: peer task failed: 1500
round trip error

peer err peer err2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant