Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate TextMate grammar from Tree-sitter grammar #2

Open
griimick opened this issue Nov 1, 2022 · 14 comments
Open

Generate TextMate grammar from Tree-sitter grammar #2

griimick opened this issue Nov 1, 2022 · 14 comments

Comments

@griimick
Copy link
Owner

griimick commented Nov 1, 2022

The maintenance efforts will reduce drastically if we can generate TextMate grammar which is used by VSCode from VHS official tree-sitter grammar.

Resources

  1. Tree-sitter Parser and Grammer
  2. Writing a TextMate Grammar: Some Lessons Learned by Matt Neuburg
  3. VSCode Syntax Highlight guide
  4. VSCode Language Extension Overview
  5. Lexer and Parser Generators in Scheme
@uncenter
Copy link
Contributor

I spent a bit searching for a proper converter between the two looks like it's a hard task since there isn't a single one. I have a few ideas though for automatically updating parts of it and I'll create a draft PR in the moment if you want to give some thoughts.

@griimick
Copy link
Owner Author

Thanks for your interest. Yes, it will be a bit tricky to write a generic tree-sitter to TextMate grammar converter. Anyways feel free to open a draft PR if you have something to share.

@uncenter
Copy link
Contributor

uncenter commented Jul 26, 2023

I'm just gonna give my thoughts here before I commit to any coding.

The https://github.com/charmbracelet/tree-sitter-vhs/blob/main/src/grammar.json file is easy to parse and contains a lot of information that we can scrape.

For example, the rules.setting section:

"setting": {
  "type": "CHOICE",
  "members": [
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Shell"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "FontFamily"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "FontSize"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Framerate"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "PlaybackSpeed"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Height"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "LetterSpacing"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "TypingSpeed"
        },
        {
          "type": "SYMBOL",
          "name": "time"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "LineHeight"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Padding"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Theme"
        },
        {
          "type": "CHOICE",
          "members": [
            {
              "type": "SYMBOL",
              "name": "json"
            },
            {
              "type": "SYMBOL",
              "name": "string"
            }
          ]
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "LoopOffset"
        },
        {
          "type": "SEQ",
          "members": [
            {
              "type": "SYMBOL",
              "name": "float"
            },
            {
              "type": "CHOICE",
              "members": [
                {
                  "type": "STRING",
                  "value": "%"
                },
                {
                  "type": "BLANK"
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Width"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "BorderRadius"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Margin"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "MarginFill"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "WindowBar"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "WindowBarSize"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    }
  ]
}

Super easy to extract from:

#!/usr/bin/env node

const data = require('./tree-sitter.json');

let settings = [];
for (const setting of data.rules.setting.members) {
    settings.push(setting.members[0].value)
}
console.log(settings)
[
  'Shell',         'FontFamily',
  'FontSize',      'Framerate',
  'PlaybackSpeed', 'Height',
  'LetterSpacing', 'TypingSpeed',
  'LineHeight',    'Padding',
  'Theme',         'LoopOffset',
  'Width',         'BorderRadius',
  'Margin',        'MarginFill',
  'WindowBar',     'WindowBarSize'
]

We don't have to write something like this for every little bit, but it could be a good way to easily update some parts. A workflow that runs once a week could check if anything has changed and update it automatically.

@griimick
Copy link
Owner Author

griimick commented Jul 29, 2023

The file https://github.com/charmbracelet/tree-sitter-vhs/blob/main/src/grammar.json gets generated from https://github.com/charmbracelet/tree-sitter-vhs/blob/main/grammar.js. I think we can use the later to generate TextMate grammar.

@uncenter
Copy link
Contributor

The file charmbracelet/tree-sitter-vhs@main/src/grammar.json gets generated from charmbracelet/tree-sitter-vhs@main/grammar.js. I think we can use the later to generate TextMate grammar.

I noticed but it seems harder to scrape/generate it from a JS file... I'll take another look.

@griimick
Copy link
Owner Author

You don't have to scrape it, think of how this file must be getting used by tree-sitter itself to generate the resultant json.

Can we override the global functions used in the grammar.js file like project, seq, choice, repeat, choice, etc and use the same file to generate TextMate grammar instead of tree-sitter grammar?

@uncenter
Copy link
Contributor

You don't have to scrape it, think of how this file must be getting used by tree-sitter itself to generate the resultant json.

Can we override the global functions used in the grammar.js file like project, seq, choice, repeat, choice, etc and use the same file to generate TextMate grammar instead of tree-sitter grammar?

Totally, that's why I said scrape/generate. The only issue I'm noticing is just naming certain patterns and rulesets. I'll give it a go tonight and see what gives.

@uncenter
Copy link
Contributor

uncenter commented Aug 5, 2023

I'm gonna be honest this is pretty difficult. A lot of it has to be hard-coded into the functions and it might honestly be easier to just do it by hand.

module.exports = grammar({
  name: 'vhs',
  rules: {
    program: $ => repeat(choice($.command, $.comment)),
    command: $ => choice(
      $.control,
      $.alt,
      $.hide,
      $.show,
      $.output,
      $.sleep,
      $.type,
      $.backspace,
      $.down,
      $.enter,
      $.escape,
      $.left,
      $.right,
      $.set,
      $.space,
      $.tab,
      $.up,
      $.pageup,
      $.pagedown,
    ),

    control: $ =>   /Ctrl\+[A-Z]/,
    alt: $ =>       /Alt\+[A-Z]/,
    hide: $ =>      seq('Hide'),
    show: $ =>      seq('Show'),
    output: $ =>    seq('Output',    $.path),
    set: $ =>       seq('Set',       $.setting),
    sleep: $ =>     seq('Sleep',     $.time),
    type: $ =>      seq('Type',      optional($.speed), repeat1($.string)),
    backspace: $ => seq('Backspace', optional($.speed), optional($.integer)),
    down: $ =>      seq('Down',      optional($.speed), optional($.integer)),
    enter: $ =>     seq('Enter',     optional($.speed), optional($.integer)),
    escape: $ =>    seq('Escape',    optional($.speed), optional($.integer)),
    left: $ =>      seq('Left',      optional($.speed), optional($.integer)),
    right: $ =>     seq('Right',     optional($.speed), optional($.integer)),
    space: $ =>     seq('Space',     optional($.speed), optional($.integer)),
    tab: $ =>       seq('Tab',       optional($.speed), optional($.integer)),
    up: $ =>        seq('Up',        optional($.speed), optional($.integer)),
    pageup: $ =>    seq('PageUp',    optional($.speed), optional($.integer)),
    pagedown: $ =>  seq('PageDown',  optional($.speed), optional($.integer)),

    setting: $ => choice(
      seq('Shell',         $.string),
      seq('FontFamily',    $.string),
      seq('FontSize',      $.float),
      seq('Framerate',     $.integer),
      seq('PlaybackSpeed', $.float),
      seq('Height',        $.integer),
      seq('LetterSpacing', $.float),
      seq('TypingSpeed',   $.time),
      seq('LineHeight',    $.float),
      seq('Padding',       $.float),
      seq('Theme',         choice($.json, $.string)),
      seq('LoopOffset',    seq($.float, optional('%'))),
      seq('Width',         $.integer),
      seq('BorderRadius',  $.integer),
      seq('Margin',        $.integer),
      seq('MarginFill',    $.string),
      seq('WindowBar',     $.string),
      seq('WindowBarSize', $.integer),
    ),

    string: $ =>  choice(/"[^"]*"/, /'[^']*'/, /`[^`]*`/),
    comment: $ => /#.*/,
    float: $ =>   /\d*\.?\d+/,
    integer: $ => /\d+/,
    json: $ =>    /\{.*\}/,
    path: $ =>    /[\.\-\/A-Za-z0-9%]+/,
    speed: $ =>   seq('@', $.time),
    time: $ =>    /\d*\.?\d+m?s?/,
  }
});

There all of the types (string, comment, float, integer, json, path, speed, time) along with other properties like setting, we would also have to ignore things like rules.program that just have no relation, and more (let alone the workarounds I had to use to get it to run properly). It just honestly seems easier to parse select bits of the JSON or do it by hand. LMK your thoughts.

@griimick
Copy link
Owner Author

griimick commented Aug 5, 2023

No worries. Thanks for looking into this @uncenter and I really appreciate you spending time on this.

I know this is a bit tricky. We can definitely do this by parsing the JSON and we are already maintaining this repo by hand.

I see this as a coding exercise and want to solve this by writing good enough parser. Let me look into this and come up with a small writeup on how can this be achieved, maybe add in some example code. If it looks achievable, maybe you can pickup from there.

This can become a good learning experience for both of us, if you are up for it :)

@uncenter
Copy link
Contributor

uncenter commented Aug 5, 2023

Totally! I would love to figure this out I'm just totally stumped/lost.

@griimick

This comment was marked as off-topic.

@uncenter

This comment was marked as off-topic.

@griimick
Copy link
Owner Author

griimick commented Aug 14, 2023

I gave it a shot here: https://github.com/griimick/vscode-vhs/blob/treesitter-textmate/generate.js

I found out that tokens generated by tree-sitter grammar are less detailed compared to TextMate grammar in this repo. Tree-sitter token also do not directly map to the highlight definitions directly.

Also, TextMate uses Ruby regex which I don't think can be always converted to from js Regex as they are incompatible.

Knowing all this, I am inclined to maintain the rules manually now. If someone still wants to give it a shot, feel free.

@uncenter
Copy link
Contributor

Exactly my thinking. At least we tried 😅...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants