Skip to content

AnchorStart and AnchorEnd do not match start and end of **chunks** as expected. #1731

@hippietrail

Description

@hippietrail

These are needed to work on chunks because chunks are the subdivisions of a document over which PatternLinters/ExprLinters run. I don't think chunk is a properly defined concept though. Instead, the AnchorStart and AnchorEnd refer to the concept of a token stream, which I think is also not a defined concept:

/// A [`Step`] which will match only if the cursor is over the first word-like of a token stream.
/// It will return that token.

The importance of the chunk concept is seen in lint_group.rs:

impl Linter for LintGroup {
    fn lint(&mut self, document: &Document) -> Vec<Lint> {
        let mut results = Vec::new();

        // Normal linters
        for (key, linter) in &mut self.linters {
            if self.config.is_rule_enabled(key) {
                results.extend(linter.lint(document));
            }
        }

        // Pattern linters
        for chunk in document.iter_chunks() {
            let Some(chunk_span) = chunk.span() else {
                continue;
            };

            let chunk_chars = document.get_span_content(&chunk_span);
            let config_hash = self.hasher_builder.hash_one(&self.config);
            let key = (chunk_chars.into(), config_hash);

            let mut chunk_results = if let Some(hit) = self.chunk_expr_cache.get(&key) {
                hit.clone()
            } else {
                let mut pattern_lints = Vec::new();

                for (key, linter) in &mut self.expr_linters {
                    if self.config.is_rule_enabled(key) {
                        pattern_lints.extend(run_on_chunk(linter, chunk, document.get_source()));
                    }
                }

                // Make the spans relative to the chunk start
                for lint in &mut pattern_lints {
                    lint.span.pull_by(chunk_span.start);
                }

                self.chunk_expr_cache.put(key, pattern_lints.clone());
                pattern_lints
            };

            // Bring the spans back into document-space
            for lint in &mut chunk_results {
                lint.span.push_by(chunk_span.start);
            }

            results.append(&mut chunk_results);
        }

        results
    }

A lint group is an arbitrary bunch of lints, and crucially is used for the set of all lints enabled by the user when linting a document through the LSP or any of the plugins.

You can see that the way a document is linted is in two steps:

  1. First run all of the "normal" linters over the document as a whole.
  2. Iterate over each chunk and run the pattern linters* (including Exprs) over each chunk.

Why do we need anchors at the start and end of a chunk?
Because Patterns and Exprs run over chunks and (just as in regexes) sometimes a pattern needs to know if it's at the start or end (just as ^ and $ in regexes.)

For instance, many patterns are only applicable if not preceded by or not followed by certain words.

To check if there is not a specific prev/next word, there are two possibilities:

  1. There is a prev/next word, but it's not one of the words we need. (And there will be whitespace too.)
  2. There is no prev/next word at all (and no whitespace). Because we're at the end. But the end of a chunk, not the entire document (or "token stream") since chunks are what Patterns and Exprs operate on.

The only way to be able to check for condition 2. is to be able to check if we're at the beginning or end of the chunk.


This is why my linter to change "reach out" to "contact" is incomplete.

Please let me know if any of this is still not clear.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions