Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

follow-up to #3343: review uses of bufio.Scanner #3370

Open
CerberusQc opened this issue Aug 27, 2024 · 3 comments
Open

follow-up to #3343: review uses of bufio.Scanner #3370

CerberusQc opened this issue Aug 27, 2024 · 3 comments

Comments

@CerberusQc
Copy link
Contributor

What is the problem you're trying to solve

As discussed in #3343, the usage of bufio.Scanner when we do not have the control over the length of the message processed by the scanner can put the Scanner in a I/O error state failing silently and blocking the process sending a message.

@apostasie recommend that the usage of bufio.Scanner across the project is revisited.

Describe the solution you'd like

A solution is suggested with #3366 but might not be a "one size fit all solution".

Additional context

Documentation

The documentation state the following:

Scanning stops unrecoverably at EOF, the first I/O error, or a token too large to fit in the Scanner.Buffer. When a scan stops, the reader may have advanced arbitrarily far past the last token. Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead.

@apostasie
Copy link
Contributor

apostasie commented Aug 27, 2024

Note:

For a cursory review:
bufio.NewScanner is used for:

  1. processing soci command IO (processSociIO)
  2. processing notation command IO (processNotationIO)
  3. processing cosign command IO (processCosignIO)
  4. read /proc/net/* files (ReadStatsFileData)
  5. read /etc/os-release (DistroName)
  6. process env vars files (parseEnvVars) provided via --env-file
  7. compose log pipe tagger (PipeTagger.Run)
  8. process /etc/hosts files (parseHostsButSkipMarkedRegion)

I believe that:

  • 1, 2, and 3 are PROBABLY fine, but I defer to people with expertise on soci and cosign
  • 4, 5, 8 SHOULD be fine... under non-abnormal circumstances...
  • 6 seems fine (I would not expect the user to break the scanner with an env file...)

That leaves 7, which is certainly a problem

Hey @CerberusQc

May I interest you in patching https://github.com/containerd/nerdctl/blob/main/pkg/composer/pipetagger/pipetagger.go#L97-L110 ? :-)

@CerberusQc
Copy link
Contributor Author

Sorry for the late response, I am currently moving into a new house, I won't be able to work on this before 1-2 weeks.

I am wondering if it would be better to move the function into it's own struct to be more generic and allow to stay DRY in that case. If so, I am not really familiar with GO folders structure, where should I put the generic class ?

I was thinking about something like

type LogProcessor struct {
	wg       *sync.WaitGroup
	reader   io.Reader
	dataChan chan string
}

func NewLogProcessor(wg *sync.WaitGroup, reader io.Reader, dataChan chan string) *LogProcessor {
	return &LogProcessor{
		wg:       wg,
		reader:   reader,
		dataChan: dataChan,
	}
}

func (p *LogProcessor) Process() {
	defer p.wg.Done()
	defer close(p.dataChan)
	r := bufio.NewReader(p.reader)

	var err error

	for err == nil {
		var s string
		s, err = r.ReadString('\n')

		if len(s) > 0 {
			p.dataChan <- strings.TrimSuffix(s, "\n")
		}

		if err != nil && err != io.EOF {
			log.L.WithError(err).Error("failed to read log")
		}
	}
}

This is just a draft and did not run it, naming need improvement.

@apostasie
Copy link
Contributor

No problem. Congrats on the new house!

On your code suggestion - let's chat on your (future) PR when you get there.
On the location, what about under pkg/logging/processor.go? (and then let see what maintainers think)

See you in a few weeks then ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants