Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework handling of assignment words #176

Merged
merged 5 commits into from
Sep 17, 2023
Merged

Rework handling of assignment words #176

merged 5 commits into from
Sep 17, 2023

Conversation

Niols
Copy link
Member

@Niols Niols commented Jun 18, 2023

This pull request reworks the way assignment words are handled in Morbig. It is to be noted that Morbig was correctly promoting parsing tokens from WORD to ASSIGNMENT_WORD and thus yielding reasonable CSTs. What this PR fixes is only the processing of assignment words in word CSTs only.

The most important change is the disappearance of the WordAssignmentWord in word_cst. I think this constructor made very little sense in itself because assignment words were already handled in the CST as CmdPrefix_AssignmentWord constructors carrying a name and a word. There were probably two reasons why this word CST constructor existed:

  1. Because it might be practical for client libraries (eg. static analysers) to already have access to assignment words in other places (eg. as arguments of alias or make). I think this is a bad argument: in the semantics of Shell, those words are not assignment words, it is only the utilities that decide to see them as such, and they should do so in the way that is practical for them. We can offer a helper to do that nicely in word CSTs.

  2. As an intermediary state to store words that could be assignment words before reaching the proper assignment words recognition. I think that makes sense but (a) it clutters the output type with an extra useless constructor, (b) Morbig actually leaves quite a lot of those in its output, and (c) it breaks tilde prefixes recognition that was based on the presence of such constructors.

I think it is better to get rid of it altogether. This PR therefore removes the WordAssignmentWord constructor of word_cst, rewrites Assignment.recognize_assignment to not rely on it, and gets rid of PrelexerState.recognize_assignment as well.

Additionally, I took this opportunity to fix the tilde prefixes recognition mechanism. The main problem was that tilde prefixes recognition changes depending on whether we are in an assignment word, in which case it splits the word on colon characters and recognises tilde prefixes in all the components. This was being overzealous. This recognition is now made in two steps: during prelexing, only standard tilde prefixes (at the beginning of words) are recognised. During parsing, when yielding a CmdPrefix_AssignmentWords, tilde prefixes are recognised again, this time fully.

I updated the expectation files and checked them one by one. I am happy with the result. A lot of them actually have to do with tests on alias and are not very interesting when it comes to parsing assignment words. There are two main interesting test files:

  • good/2.6-word-expansions/2.6.1-tilde-prefix/login.t is the one discussed in Strip tilde prefixes from WordTildePrefix #164 (comment) which led to writing Should assignment words be recognised on the right-hand side of a command name? #165. After this change, the last lines:

    echo W=~knuth/foo/bar
    echo X=~kn:~/bin:~darkvador/secret
    echo Y=~kn:/bonjour:~/bonsoir:and/this/one
    echo Z=/bonjour:~kn:~/bonsoir:and/this/one
    

    are correctly handled as containing no tilde prefixes.

  • tests/golden/good/2.9-shell-commands/2.9.1-simple-commands/assignment-words.t/input.sh, funnily enough, explains the subtelty handled in this PR:

    CC=gcc make
    make CC=cc
    ln -s /bin/ls "X=1"
    "./X"=1 echo
    
    # On line 1, the word CC=gcc is recognized as a word assignment of gcc
    # to CC because CC is a valid name for a variable, and because CC=gcc
    # is written just before the command name of the simple command
    # make. On line 2, the word CC=cc is not promoted to a word assignment
    # because it appears after the command name of a simple command. On
    # line 4, since "./X" is not a valid name for a shell variable, the
    # word "./X=1" is not promoted to a word assignment and is interpreted
    # as the command name of a simple command.
    

    However, Morbig, until now, was precisely recognising CC=cc as an assignment words. Not anymore.

@yurug WDYT?

@Niols Niols force-pushed the rework-assignment-words branch 2 times, most recently from 10147c5 to c4b22a9 Compare June 19, 2023 00:08
@Niols Niols marked this pull request as ready for review June 19, 2023 00:16
@Niols
Copy link
Member Author

Niols commented Jun 19, 2023

CI is failing on the Windows runner but that seems to be for unrelated reasons. Other PRs have it failing as well (see #177).

@Niols Niols requested a review from yurug June 19, 2023 08:54
@Niols
Copy link
Member Author

Niols commented Aug 16, 2023

@yurug Any chance you might look at this?

@Niols Niols force-pushed the rework-assignment-words branch 2 times, most recently from d62450c to 67e955b Compare August 16, 2023 12:47
@Niols Niols self-assigned this Aug 16, 2023
@Niols Niols force-pushed the rework-assignment-words branch from 67e955b to 789fd2d Compare September 17, 2023 15:17
@Niols Niols force-pushed the rework-assignment-words branch from 789fd2d to a52431c Compare September 17, 2023 16:49
@Niols Niols merged commit c750447 into main Sep 17, 2023
@Niols Niols deleted the rework-assignment-words branch September 17, 2023 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Should assignment words be recognised on the right-hand side of a command name?
1 participant