Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server.Handle returns EOF when repeatedly creating new servers and clients #3

Open
ValentaTomas opened this issue Nov 8, 2024 · 0 comments

Comments

@ValentaTomas
Copy link

ValentaTomas commented Nov 8, 2024

I'm having problems using the path mounts and also when trying to implement similar functionality via the go-nbd library.

When using NewDirectPathMount in a loop, the server sometimes ends with EOF error, while the client returns device or resource busy and fails to connect.

You should be able to use the following snippet to replicate the behavior:

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"os/signal"
	"sync"

	"github.com/pojntfx/go-nbd/pkg/backend"
	"github.com/pojntfx/go-nbd/pkg/client"
	"github.com/pojntfx/go-nbd/pkg/server"
	"github.com/pojntfx/r3map/pkg/mount"
	"github.com/pojntfx/r3map/pkg/utils"
)

const blockSize = 512

func main() {
	device := backend.NewMemoryBackend(make([]byte, blockSize*1))

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	done := make(chan os.Signal, 1)
	signal.Notify(done, os.Interrupt)

	go func() {
		<-done

		cancel()
	}()

	for i := 0; ; i++ {
		select {
		case <-ctx.Done():
			return
		default:
		}

		log.Printf("[%d] starting mock nbd server\n", i)

		err := MockNbd(ctx, device, i)
		if err != nil {
			log.Printf("[%d] failed to mock nbd: %v\n", i, err)

			return
		}
	}
}

func MockNbd(ctx context.Context, device backend.Backend, index int) error {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	devicePath, err := utils.FindUnusedNBDDevice()
	if err != nil {
		return fmt.Errorf("failed to get device: %w", err)
	}

	deviceFile, err := os.Open(devicePath)
	if err != nil {
		return fmt.Errorf("failed to open device: %w", err)
	}
	defer deviceFile.Close()

	mnt := mount.NewDirectPathMount(device, deviceFile, &server.Options{
		MaximumBlockSize:   uint32(blockSize),
		MinimumBlockSize:   uint32(blockSize),
		PreferredBlockSize: uint32(blockSize),
	}, &client.Options{
		BlockSize: uint32(blockSize),
	})

	var wg sync.WaitGroup

	wg.Add(1)
	go func() {
		defer wg.Done()

		err := mnt.Wait()
		if err != nil {
			log.Printf("[%d] failed to wait: %v\n", index, err)
		}
	}()

	go func() {
		<-ctx.Done()

		log.Println("Exiting gracefully")

		err := mnt.Close()
		if err != nil {
			log.Printf("[%d] failed to close: %v\n", index, err)
		}
	}()

	err = mnt.Open()
	if err != nil {
		return fmt.Errorf("failed to open: %w", err)
	}

	log.Println("Resource available on", deviceFile.Name())

	cancel()

	wg.Wait()

	return nil
}

Here are logs from a run showing the error:

2024/11/08 00:55:27 Resource available on /dev/nbd0
2024/11/08 00:55:27 Exiting gracefully
2024/11/08 00:55:27 [34] starting mock nbd server
2024/11/08 00:55:27 Resource available on /dev/nbd0
2024/11/08 00:55:27 Exiting gracefully
2024/11/08 00:55:27 [35] starting mock nbd server
2024/11/08 00:55:28 Resource available on /dev/nbd0
2024/11/08 00:55:28 Exiting gracefully
2024/11/08 00:55:28 [36] starting mock nbd server
2024/11/08 00:55:28 Resource available on /dev/nbd0
2024/11/08 00:55:28 Exiting gracefully
2024/11/08 00:55:28 [37] starting mock nbd server
2024/11/08 00:55:28 Resource available on /dev/nbd0
2024/11/08 00:55:28 Exiting gracefully
2024/11/08 00:55:28 [38] starting mock nbd server
2024/11/08 00:55:28 Resource available on /dev/nbd0
2024/11/08 00:55:28 Exiting gracefully
2024/11/08 00:55:28 [39] starting mock nbd server
Failed to handle server: EOF
Failed to connect client: device or resource busy
2024/11/08 00:55:28 [39] failed to wait: device or resource busy

The Failed to handle server: EOF is from the log I added here.

When I change how the NBD devices are acquired so it is not just one being reused, the problem is still there.

When digging deeper, the problem seems to be somewhere during the transmission (https://github.com/pojntfx/go-nbd/blob/main/pkg/server/nbd.go#L332). Do you have any ideas or insights into why this is happening? Do you think this could be some problem with the host setup?

This was tested on Ubuntu 22.04.4 LTS with kernel 6.5.0-1025-gcp and 4096 created NBD devices.

Also, thank you for creating this library, go-nbd, and writing the thesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant