Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recast needs additional maintainers #1086

Closed
conartist6 opened this issue Apr 24, 2022 · 58 comments
Closed

Recast needs additional maintainers #1086

conartist6 opened this issue Apr 24, 2022 · 58 comments

Comments

@conartist6
Copy link
Contributor

@benjamn I am trying every way that might be possible to get your attention, because if I can't I'm forking the library.

@conartist6
Copy link
Contributor Author

I think I'm just going to start work on the fork. It's macrome-js/macrome that I'm building which fundamentally relies on recast, so perhaps macrome-js/recast and @macrome/recast would be appropriate.

@conartist6
Copy link
Contributor Author

Currently my organizations consist solely of me, but I've created them in the explicit hope that I can bring others in and form communities of trust and shared responsibility.

@conartist6
Copy link
Contributor Author

I'm going to give this one day. Let's see how far out ahead I can get my fork in just one day of work.

@eventualbuddha
Copy link
Collaborator

@conartist6 I suggest talking with @benjamn about helping maintain recast. If you have a stake in it, then you're clearly more motivated.

@conartist6
Copy link
Contributor Author

conartist6 commented Apr 24, 2022

I've been trying rather aggressively, but no luck. I think the most helpful thing I can do is make my fork and show him what my direction would be if I had control over the package. I'm working on that right now.

I'm getting my fork ready to be a 1.0.0 release. I'm dropping support for everything that isn't absolutely essential.

I'm going to kick all the parsers out of the repo as they really need to be their own packages in order to configure deps correctly. I'll recommend babel-plugin-recast for babel users. No more one-size-fits-all parser options either.

I'm stripping away everything that isn't what the library is about to reduce the work involved documenting, maintaining, and versioning. No more run or runFile or runString. That is for other tools and libraries to build (and document). I'm not even exporting ast-types anymore because I don't want to merge those API surfaces and have to worry about which version is exported.

@conartist6
Copy link
Contributor Author

Or maybe I should just change parsers to a different signature, something like parser => parser. I don't think recast should have the responsibility of actually requiring the parser, it just needs a way to let users specify parser options they want while recast dictates certain parser options that it needs.

@conartist6
Copy link
Contributor Author

I've also ripped out the dependency on acorn as a default parsers. I just error now if your parser doesn't define tokens. I think most people will want to use @babel/parse, but I don't think recast should have dependencies on any particular parser at all. That probably means that recast shouldn't have a parse method at all but rather a wrapParse method. Again, stripping away the magic makes the library that much easier to document and maintain.

@conartist6
Copy link
Contributor Author

Here's how I imagine usage looking roughly:

import { parse } from '@babel/parse';
import { wrapParseWithFormatting, printFormatted } from 'recast';

const parseFormatted = wrapParseWithFormatting(parse, 'babel');
const ast = parseFormatted(source, options);
// ast mutations
return printFormatted(ast);

@conartist6
Copy link
Contributor Author

Replacing delete ... with ... = undefined as delete is a known perf-killer.

@conartist6
Copy link
Contributor Author

Making the tests javascript instead of typescript. // @ts-check ensures they still get some basic benefits from typescript checking while devs get a few less things to worry about.

@conartist6
Copy link
Contributor Author

conartist6 commented Apr 24, 2022

Typescript doesn't understand self-references in js files (bug filed), and I'd prefer having those to having typescript. I think cleaning up the tests is going to be the biggest chunk of this work. The test cases need to be reorganized so that as many test cases as possible are expressed in a manner that is agnostic to the specific parser and ast structure being used.

@conartist6
Copy link
Contributor Author

Actually I'm going to do the minimum amount of work necessary to get the test suite to pass then push, merge and push my fix for the parens issues, and continue working. I don't want to fall into the trap of verifying behavior which is wrong.

@conartist6
Copy link
Contributor Author

Another thing I'm thinking about is that there's just very little left that recast is needed for at all. Recast has a lot of pretty-printing logic baked into, and these days all that really belongs in prettier. All that recast really does in an environment with pretty-printing is to ensure that certain line breaks remain untouched because prettier makes certain decisions about retaining its original formatting by comparing ast node locations to the source string.

I think the best thing to do for the long run would be to help prettier avoid dependence on the original source text so that the workflow would be:

  • parse with prettier (prettier wraps the real parser)
  • transform the AST
  • print with prettier

And then we get to deprecate recast! I've created an issue to discuss the changes to prettier that would be necessary.

@eventualbuddha
Copy link
Collaborator

I can't say I disagree with most of your thoughts here. I also wonder about the future of recast in a world where many (most?) use Prettier.

That said, I decided to spend a little time getting things in slightly better shape and released v0.21.1: https://github.com/benjamn/recast/releases/tag/v0.21.1. If there's any specific PR you want me to prioritize, let me know and I'll hopefully get to it this week.

@conartist6
Copy link
Contributor Author

Thank you!! Are you interested in my putting together PRs for the API changes I mentioned as being a potential 1.0.0? Even if the long term is to move away from recast, I think there's a strong argument to shipping a 1.x branch first. Don't wanna end up on the 0ver list!

@conartist6
Copy link
Contributor Author

Also would it be possible to get at least some basic perms on this repo, like triage? If I had that I could take a proper pass through the open bugs and see what's legit.

@gnprice
Copy link
Contributor

gnprice commented Apr 28, 2022

I just wanted to chime in that I really appreciate the work that goes into maintaining recast ❤️

Even in a possible future where the Prettier maintainers respond enthusiastically to prettier/prettier#12709 and Prettier grows the ability to support writing codemods, there will be a lot of codebases out there that don't use Prettier. I appreciate being able to transform that code without completely reformatting it, meeting the code where it is. Recast is a focused tool that does the job of enabling transforms without making a lot of assumptions, and I think there's a lot of value in that.

@gnprice
Copy link
Contributor

gnprice commented Apr 28, 2022

On the subject of specific PRs, I would appreciate a look at #1089 and #1090 :-). Those are new -- I just sent them this week.

@conartist6 to your thought at #1068 (comment) about test cases, I quite like the structure of test/flow.ts after #1089 (and it's mostly not my doing, I only made small changes there.) An individual test case there takes just a line or two to express, and I think that helps a lot in being able to write a thorough set of test cases. So you might find that part interesting to take a look at.

One thing that might be productive for someone to do would be to put some of the other tests into the same style, and then just churn out a few dozen more one-line test cases for areas that seem to be tricky, like extra/missing parens. (Hmm, in fact: test/parens.ts is the other file that's closest to that structure already -- in fact most of it is basically exactly like the existing version of test/flow.ts, which is already quite good for adding more cases. So that's convenient.)

@conartist6
Copy link
Contributor Author

conartist6 commented Apr 28, 2022

One thing's for sure, both recast and prettier are super high-maintenance projects which need continuous fixes and improvements from a community of active maintainers to keep them healthy.

My impression is that both are fundamentally a bit unstable due to exceedingly high costs in two major dimensions:

  • constant additions to the language, and even the need to support proposals
  • support for a wide variety of parsers which all represent the AST just a little differently, especially when it comes to whitespace, location, and comments.

Prettier is the more stable project just because it has so many more users and it is run so much more often.

The language changing is pretty much a constant, so what I think could use the most attention is some way to curb the cost of supporting so many different parsers. Does anyone else have ideas about this?

For now I'm going to reopen this issue as I think it's still relevant and there's productive discussion to be had about direction and combining efforts that have started independently.

@conartist6 conartist6 reopened this Apr 28, 2022
@conartist6
Copy link
Contributor Author

One way to reduce maintenance costs would be to try to extract as much common functionality as possible between prettier and recast and share it. needsParens is a great example of something that would best be shared between the two libraries, if they could standardize on the arguments to the function and what its responsibilities are or are not. I'm sure there are more.

@gnprice
Copy link
Contributor

gnprice commented Apr 28, 2022

I agree that there's fundamentally a bit of ongoing instability in that the language changes.

But I think the volume of those changes is not actually all that high. And when a feature is new, or especially when it's an early proposal, there's not a lot of code out there that uses it yet -- so for the great majority of users, it's perfectly fine if the tool isn't quick to gain support for that feature, because none of their code uses it. If the feature becomes widespread, someone will be motivated to add it.

One thing I've appreciated about Recast is that it's not a lot of code. My experience with #1089 and #1090, where I add or fix support for some less-widespread (because they're specific to Flow) language features, was that it wasn't a lot of work to do so.

(FWIW I think fixing something like where parens do or don't get added is inherently trickier, because it's logic that cuts across wide swathes of the language. But it's also not something where the language really changes over time. The existing code doesn't need a fix because the language changed, but rather because it wasn't quite right in the first place. A really spot-on solution for that, if/when we nail one down, shouldn't inherently have any ongoing need for maintenance.)

@gnprice
Copy link
Contributor

gnprice commented Apr 28, 2022

I am curious about the possibility of simplifying by supporting fewer parsers. It looks to me (from looking in parsers/) like there are three different parsers -- Babel, Acorn, and Esprima. (Then three different choices of what Babel plugins to use, called babel, babel-ts, and typescript; plus babel has two aliases, flow and babylon. But I imagine those are all pretty well aligned with each other in what output they give, for the language features they have in common.)

What do the differences look like that add complexity to the Recast code?

I also don't know enough about the context of the various parsers to understand what value some people might be getting from different ones. Personally I believe I wouldn't miss anything if both the non-Babel parsers were dropped; but I don't really know the history of the different parsers, or what use cases people might have had in the past, and might still have, that the others serve and Babel didn't or doesn't.

@conartist6
Copy link
Contributor Author

conartist6 commented Apr 29, 2022

These are excellent points. I'm not so worried about the language itself either. I am a bit more concerned about things like JSX, Typescript, and Flow which are not the language (but whose syntax you must handle in tools). These even complicate the experience of writing a codemod, for example using the babel AST because you may be writing a mod for plain javascript codebase and the type checker will be telling you that you need to handle nodes for type annotations and JSX.

I'm going to dig more into what the differences are and whether there are better ways of abstracting them away. Prettier has a postprocess step that does a lot of mangling to try to clean up some of the main differences. That's actually another bit of code that prettier could conceivably share with recast.

@conartist6
Copy link
Contributor Author

Also after having read some more or prettier's logic I think the biggest thing to abstract would be a representation of the token stream. Recast demands that it be given tokens to work (though I haven't yet found the locations where they are used). Prettier already essentially relies on being able to treat the source text like a token stream, looking around for particular tokens like { or ( or =>. Prettier also doesn't trust the parser's representation of comments at all, instead preferring to grab the complete comment text from the source by slicing it on using the start/end of the comment node.

Since I'm always thinking in iterables, I think it might be possible to write a token generator which can generate tokens by traversing an AST. That way it would still be possible to work with an AST but easily write rules that involve source ordering. Also one of the benefits or writing a generator is that without the need to store the data you can parameterize the way you generate it, e.g. as function* nextTokens(node, includeWhitespace = false) { /*...*/ }. I think such a method could power most of the checks that prettier currently needs to make against the source text.

@conartist6
Copy link
Contributor Author

conartist6 commented Apr 29, 2022

Oh wasm is another thing that I learned (from reading prettier sources) introduces some new syntaxes that some parsers handle and some do not, e.g. import {"string" as name} from "source".

@conartist6
Copy link
Contributor Author

conartist6 commented Apr 29, 2022

OK I think I feel pretty confident though that prettier should be my starting point. Prettier's starting point was recast, and they've come a very long way since then. I want to adopt their solutions to the problems that recast still struggles with. I'm finding it very interesting to look at some of the oldest history of prettier. Here is what is essentially its initial commit. Some of the content of that readme is documentation about its mechanism that does not exist for current versions though the mechanism remains the same!!

@benjamn
Copy link
Owner

benjamn commented Apr 29, 2022

@conartist6 I’m here. You have my attention. I’m sorry I haven’t been able to prioritize a proper response until now.

Would you mind summarizing what you need from Recast in order to use it without having to fork it (also fine/allowed, just hopefully unnecessary)?

@benjamn benjamn changed the title Recast is not maintained. Recast is looking for additional maintainers! Apr 29, 2022
@conartist6
Copy link
Contributor Author

conartist6 commented Apr 29, 2022

@eventualbuddha Oh cool, taking a look. I'm on the same page with you about babel's APIs being the ones that you really need to be able to do serious codemodding work. You might be interested in what I was working on immediately before this.

My main project that's directly relevant is macrome, a build system that allows you to keep generated output continuously up to date even as input and the transforms themselves change. Its aim is to provide robust clean and watch operations that "just work", and to help users bake all the magic out of their code so that they have the complete logic of their application expressed in plain, static javascript (maybe even checked into VCS). No more complicated bundler configs to parse all that wacky fake syntax!

Imagine being able to use CSS modules but instead just having a magic syntax it actually generates the file containing the mappings between importable selectors and their secret hashes.

Or imagine being able to use macros to inject a stable ID into your source code. You could write code like this:

import uid from '@macrome/uid.macro';
translation(uid``, 'description');
// and the build system would change it to:
translation(uid`e9af0`, 'description');

This solves so many problems around knowing what is and isn't the same translation across multiple version of the code without offering any particular opinion about how you build your translation system. It would also be very relevant for generating stable error codes that could be used to link to a knowledgebase!

@conartist6
Copy link
Contributor Author

Long story short, I don't want to end up maintaining recast. In order to be able to execute on such an ambitious vision, I need modifying and printing code to also "just work" so that I can build on top of it without spending a lot of my time on maintenance of this library.

@conartist6
Copy link
Contributor Author

conartist6 commented May 4, 2022

OK, I think I've got my recommendation. It is: embrace concrete syntax trees by adding node.tokens to every node. To allow nesting of the structure without duplicating the existing tree structure, intersperse the existing token types with a Reference type which contains the name of a reference to another property of node as its value. This ensures a single source of truth, and will allow me to do a lot of work for everyone in handling edge cases.

Let's say you had a source like this: (astexplorer)

if (true) {
  consequent;
}

In my CST representation it would look like:

{ // = node
  "type": "IfStatement",
  "test": {
    "type": "Literal",
    "value": true,
    "tokens": [
      { "type": "Boolean", "value": "true" }
    ]
  },
  "consequent": {
    "type": "BlockStatement",
    "body": {/*...*/},
    "tokens": [
      { "type": "Punctuator", "value": "{" },
      { "type": "Whitespace", "value": "\n  " },
      { "type": "Reference", "value": "body" }, // interpret as node.body.tokens
      { "type": "Whitespace", "value": "\n" },
      { "type": "Punctuator", "value": "}" }
    ]
  },
  "tokens": [
    { "type": "Keyword", "value": "if" },
    { "type": "Whitespace", "value": " " },
    { "type": "Punctuator", "value": "(" },
    { "type": "Reference", "value": "test" },
    { "type": "Punctuator", "value": ")" },
    { "type": "Whitespace", "value": " " },
    { "type": "Reference", "value": "consequent" }
  ]
}

I think I can provide the equivalent of @babel/traverse over this structure in such a way that I can make the AST the real source of truth if ever it should disagree with the tokens array. Thus the basic add, remove, delete operations would be covered. If you add a node, the tokens iterator would generate a minimal set of correct tokens from the AST. If you move a node its tokens move with it. If you remove a node its tokens are removed with it. If you get fancy with the inside of a node such that the tokens don't make any sense any more, the default printer would ignore the garbage and generate new tokens. I can also provide operations to validate, rebuild, or strictly print the token data.

The idea would be to provide a rich API, document it, and then see if I could get enough momentum to perhaps merge with @babel/traverse and slog through the process of refactoring prettier.

At the end though you'd have exactly what I want: a way that things can interoperate and "just work" without the explicit need to maintain a library like recast.

I'm going to work on building the library here. It's gonna be a bit of a bear because it actually needs what is essentially a custom parser over a token stream so that it understands node-by-node both how to generate tokens and how to verify that the existing tokens constitute legal syntax. Fortunately I've been experimenting with various kinds of syntax matching architectures...

@conartist6
Copy link
Contributor Author

conartist6 commented May 5, 2022

Here's a "little" piece of code that shows how I think a token management engine would look and work:

EDIT: I have moved the prototype to a gist since I keep changing it.

@conartist6
Copy link
Contributor Author

conartist6 commented May 5, 2022

Hmm, and maybe I could have one more kind of builder, SourceBuilder which would use loc or range information to construct tokens from the source text during initial parsing before any AST modifications have begun (much like prettier assumes it is safe to do). That way this approach would be compatible with any existing parser, as recast is.

@conartist6
Copy link
Contributor Author

conartist6 commented May 5, 2022

Well, that's it. I don't want to burden this thread with all my dev work. I talk a lot. I think this shows pretty clearly a concrete proposal -- essentially a rearchitecture of recast. I could call it something else, or I would also be perfectly happy if there was interest in publishing this under the recast name. I'd sure love to have help building it. My hope would be that building this would make it possible for prettier to eventually depend on recast again, as this structure supports all the kinds of queries prettier needs to make. If this did happen, I don't think recast would be at a loss for maintainers anymore.

@conartist6
Copy link
Contributor Author

I really like it because it'd be an implementation of recast with absolutely no logic about pretty-printing, the incomplete implementation of which is one of recast's current maintenance burdens.

Also in addition to making a best effort to preserve formatting, it permits easy creation of formatting-aware transforms: those which also manipulate the tokens array in such a way that it remains valid for the given AST.

@conartist6
Copy link
Contributor Author

I now have a complete implementation for a small grammar in the repo. I know some of my examples contained magical thinking, but this doesn't. It works!

@benjamn
Copy link
Owner

benjamn commented May 9, 2022

This is a lot to digest 😅, but I want you to know I am digesting it @conartist6. In particular, I agree it would be convenient if Recast could output tokens rather than (or in addition to) a string of code.

Some initial thoughts, from a first glance:

Instead of providing a tokens property on every output AST node, which might overlap significantly with the tokens arrays of descendant nodes, I think it might make more sense to have one global list of output tokens, with each AST node indexing into that list with a property like node.tokenRange which holds numeric [start, end] pairs. That seems like it would be a lot cheaper (less redundant) overall, and still relatively easy to serialize to JSON (in case that matters)?

I really like it because it'd be an implementation of recast with absolutely no logic about pretty-printing, the incomplete implementation of which is one of recast's current maintenance burdens.

I agree keeping up with AST and parsing/printing changes is an ongoing chore, potentially never finished, and I know I've fallen behind. But I'm not sure I'm following how the CST output tokens idea would save Recast from needing a pretty-printer. Generating a stream of concrete output tokens (including whitespace and comments) seems pretty much equivalent to full pretty-printing, with all the choices that need to be made about indentation and parenthesization, etc. Some of the token ranges could be copied directly from the original source token stream, but the ones that Recast generates will need to be formatted by some sort of pretty-printing-like process, right? (Not a rhetorical question—please let me know if I'm misunderstanding something!)

In more general terms, I think it would be valuable if Recast could be configured to delegate pretty-printing to another library (as it can already delegate parsing), even if it still needs to maintain its own default pretty-printer. The challenge is that pretty-printing is only part of the conservative reprinting process, so the other printing library needs to have a certain kind of structure that allows alternating recursively between pretty-printing and just copying source code. Since Prettier was originally a fork of the Recast pretty-printer, it's probably the closest to having that structure already (passing a print callback to the genericPrintNoParens function, IIRC).

@benjamn
Copy link
Owner

benjamn commented May 9, 2022

In case this isn't already shared understanding: I strongly believe Recast users should not have to worry about manipulating tokens while doing their AST transformations. If we make Recast capable of generating output tokens, it needs to be an automatic process based entirely on inferred AST changes, to avoid increasing the implementation burdens of AST transforms.

One more question for async discussion: can we enumerate the benefits of having Recast generate tokens directly, rather than just tokenizing the output string that it already prints? I imagine it might be faster to skip the re-tokenization, though tokenization is fairly quick compared to full parsing. I ask because there might be a quick and dirty implementation here where Recast reuses its parser somehow to tokenize the output string. A little slower, perhaps, but it might be enough to unblock downstream experimentation, if output tokens are ultimately what you need?

@conartist6
Copy link
Contributor Author

@benjamn Thanks for taking the time to look through this all and start thinking about it! I know I have a tendency to race off ahead in my own direction, but it's the only way to find out in a concrete way what is possible and what problems would be encountered.

@conartist6
Copy link
Contributor Author

I really think it makes a lot of sense to have the tokens arrays on the nodes. You mention the possibility of tokenRanges, but that is essentially not much different than what we have now -- ranges into sourceText. In particular it means you'll be unable to do anything like inserting tokens because you'd upset all those indexes referencing into the array.

What i've built ensures that code transforms are able to add or modify tokens if they want to, but also ensures that they need not do so, that a reasonable default set of tokens will be provided.

@conartist6
Copy link
Contributor Author

About deciding which printing algorithm to apply to different parts of the AST, have you seen this part of my code?

@conartist6
Copy link
Contributor Author

As for tokenizing an output string, I'm not really thinking in that direction. In the world I'm imagining there would be no need for a pure string representation until all the transformation work was done. That's why I also want prettier to offer a compatible formatCst(cst) method.

@conartist6
Copy link
Contributor Author

Something cool about the way my code works: it falls back from one printer to another. Suppose you have:

{
  key1: 'value1',
}

And you write a codemod whose purpose is to change that to

{
  key1: 'value1',
  key2: 'value2'
}

You do this by pushing to the ObjectExpression.properties array. Suppose you don't update the tokens attached to the ObjectExpression (they are: ['{', '\n ', 'ref:properties', ',\n', '}']). Now when you go to print the updated AST you're going to fail reprinting the existing tokens -- you'll find '}' where you need to see a second 'ref:properties'. This will force you into fallback which generates only the necessary tokens and the immediate result will be the ugly and uninentional:

{
  key1: 'value1',
  key2: 'value2'}

I agree that that looks bad, but it's actually great for feeding into a pretty printer (prettier, or one built into recast) because it retaines the most crucial formatting decision: whether the object should be printed on one line or more than one, represented as whether '{' is followed by '\n'.

@eventualbuddha
Copy link
Collaborator

Where does this fallback decision happen? How does your code tell that the new ObjectProperty means the tokens array is outdated?

@conartist6
Copy link
Contributor Author

conartist6 commented May 10, 2022

@conartist6
Copy link
Contributor Author

conartist6 commented May 10, 2022

The tokens array only becomes outdated if the number of properties changes. That's because I use the simple { type: 'reference', value: 'properties' } token as a placeholder for both an AST node and its tokens. I don't do anything fancy like value: 'properties[0]'.

I can tell if the token stream is outdated using my grammar which defines which combination of tokens are valid. Notice that whitespace and comments are allowed, but for an object property or specifier, references, commas, and curly braces are ensured (optional = false), which is to say that a fallback would occur when the ensured reference token for the second property was not found (or the comma if there had been no trailing comma).

@conartist6
Copy link
Contributor Author

conartist6 commented May 11, 2022

I've made up some basic architecture documentation to try to get the basic principles of the design across quickly.

I'm also going to start experimenting with a mechanism for diffing references that would allow me to handle cases where it would be appropriate to go into fallback generation and then come back out of it within a single node.

That would allow me to handle some tricky cases like transforming:

import { /* a */ a, /* c */ c } from 'source';

// after inserting b currently you'd get the c comment attached to b:
import { /* a */ a, /* c */ b, c } from 'source';

// but b should use fallback tokens while c gets the ones that belong to it
import { /* a */ a,  b, /* c */ c } from 'source';

@conartist6
Copy link
Contributor Author

conartist6 commented May 17, 2022

I've done more of my due diligence on prettier now too. I'm sure I haven't found every surprise, but I think I have the basics.

Essentially I have two related questions that I'm trying to understand the answers to:

  1. Can prettier use a CST to answer its queries about what the existing syntax is in a way that would allow it to work with altered trees? My impression is that the answer is yes. I would need to build a syntax-querying API that was equivalent to the APIs prettier currently uses, and then convince prettier to adopt it.
  2. Can prettier represent its output as mutations to a CST? I think the answer is a qualified yes, but first I need to convince them that they want to. They are faced with a problem currently: prettier mutates its input AST, most notably to turn (a && b) && c into a && b && c. They consider this mutation of the input AST to be a bug, because it means that any formatAst(ast) => string would have a mixture of pure (string) and impure (modified AST) output. I can offer them a solution to this problem they weren't previously considering: embracing impure output instead of pure output. They would have a formatCst(cst) => undefined method whose entire point would be to mutate the input tree. I can offer them the ability to use the tools I write to take the out array they build (containing string token values) and use it as a source by which to rebuild the node.tokens arrays in the input cst (which also has the ast modifications in it). This way the output is entirely as impure mutations, which is unambiguous. It's unclear as to whether they will really come around to my way of thinking on this as it would require them to adopt the CST format that I am proposing.

@conartist6
Copy link
Contributor Author

conartist6 commented Jun 11, 2022

OK I'm back-ish after some distractions. I've also changed tack a bit on how I plan to integrate with prettier. The new plan is to use exactly the existing APIs that prettier has for output. In other words, I expect prettier essentially to be able to do this:

const {
  __debug: { formatAST },
} = require("prettier");

const formatCst = (cst) => {
  // Note: formatAst mutates the AST to rebalance binary expressions.
  const formattedText = formatAST(cst);

  rebuildTokensFromText(cst, formattedText);
}

That's great because that part at least doesn't require any changes from prettier at all. They even export the formatAST method that works the way I need it to!

Then the remaining challenge is the one that can't be avoided: writing prettier's language rules to query for tokens against the CST. I haven't really started digging into that yet, but I have heard at least one contrib there express enthusiasm for the idea of that change which is encouraging.

@conartist6
Copy link
Contributor Author

I've got it! I think. It took a while to get to this design, but now that I'm here it feels pretty powerful and correct. I'm using coroutines -- one side of the coroutine is the grammar generator for a given node, and on the other side we have an engine of some kind, either for validation, generator, or mutation.

The grammar generator yields simple commands to the engine like takeMatch, take, match, and drop. The use of these commands completely decouples the grammar concerns from the details of the task the grammar is being used for, while ensuring that all the functions involved in the coroutine are being evaluated in lockstep. This allows me to fulfill my basic goal of driving a variety of actions through easy-to-specify imperative grammar.

I've punted on the question of how to attach whitespace and comments by saying that it will be up to the plugin ecosystem to define a comment association strategy. I'll always hoist whitespace and comments up in the tree, user code can insert whitespace and comments wherever it wants, and plugins can be uses to systemically push comments downward. For example you might use plugins to push typecast comments or doc comments down into the nodes they describe, so that they are sure to move with those nodes if those nodes move. In this way I ensure that the most practical use cases can be served without endorsing any particular set of rules.

I have a working implementation of the engine that consumes the grammar to build up node.tokens for each node in an AST. Now I need a generative engine that merges existing valid node.tokens tokens with emitted tokens. Finally I'll need to build some kind of API that prettier will be able to use to do linear token lookarounds and on-the-fly normalization during a pathful tree traversal. I don't yet see any reason why this shouldn't be feasible...

@conartist6
Copy link
Contributor Author

I think I won't have to mess with the prettier language grammar either, which is amazing. I only have to define my own grammar, and I'm quite liking the syntax I've developed for expressing those definitions.

@conartist6
Copy link
Contributor Author

There's still a ton of stuff to do, but I'd say that after three rewrites my architectural prototype is now complete. This is the design that I will move forward with!

I'm going to rebuild my architecture docs, but for now here's the most interesting part of the grammar definition:

const visitors = {
  // This generator is executed by the `match` coroutine to build a `matchNode` for `node`
  *ImportSpecifier(node, { matchNodes }) {
    const { local, imported } = node;

    // These next three lines are equivalent to `` take(ref`imported`) ``
    // We use `match`/`emit` instead of `take` in order to get `importedRef`
    // `match` does recursive submatching, and returns to us an array of matched tokens
    // ref tokens can be used to get match nodes, e.g. `matchNodes.get(importedRef)`
    const importedMatch = yield match(ref`imported`);
    const importedRef = importedMatch[0];

    // Causes the coroutine to add these tokens to `matchNode.tokens`
    yield emit(importedMatch);

    if (local.name !== imported.name) {
      yield take(_, ID`as`, _, ref`local`);
    } else {
      const asMatch = yield match(_, ID`as`, _, ref`local`);
      const localRef = arrayLast(asMatch);

      // Ensure that `foo as bar` becoming `foo as foo` only emits `foo`
      const valid = !matchNodes.get(importedRef).fallback && !matchNodes.get(localRef).fallback;

      if (asMatch && valid) {
        yield emit(asMatch);
      }
    }
  },
}

@conartist6
Copy link
Contributor Author

OK, so I've had one last revelation. That grammar that I just shared is sufficiently abstract that it's actually compatible with prettier's grammar definitions! That means that instead of maintaining a complete grammar for recast and a complete grammar for prettier, there can be a single next-gen grammar and tool (maybe called prettier, maybe not) which is capable of printing just about anything -- pretty printing, reprinting original whitespace, and any blend of the two. This would largely eliminate the need for a separate repository with the responsibilities that recast has had until now.

Since my final answer to the question as to what to do about the difficulties maintaining recast is "heal the fork with prettier" I am going to close this issue and track further efforts elsewhere, for now on prettier/prettier#12806

@conartist6
Copy link
Contributor Author

conartist6 commented Jul 13, 2022

Update: I think for a start I'll make a grammar independent of prettier's, and then shift towards maybe merging grammars with prettier as a potential long term goal. But for the moment I need to focus on my life, so there probably won't be too much progress in the near future.

@conartist6
Copy link
Contributor Author

conartist6 commented Aug 20, 2022

Juuuust kidding, I've still been working on cst-tokens. After the 4th (ish) big refactor, I think the core architecture is now likely to be pretty stable.

This last refactor has allowed me to do some really cool stuff. For example, I no longer need to include whitespace in the definition of my javascript grammar at all. The parser simply understands that whitespace is necessary between a keyword and an identifier (but not between a punctuator and an identifier).

I have some work left to do at this point, but I anticipate that I'll soon be have a web playground I can point folks to to try out what I've built and see how it allows them to transform code. I'll also be able to showcase the power of integration with prettier including the things you just couldn't do before, like insert a blank line or even control block expansion from inside a transform!

@conartist6
Copy link
Contributor Author

conartist6 commented Feb 17, 2023

This project got huge -- I've been working on it (instead of having a job) this entire time. It certainly is intended to replace recast, but it's also now meant to replace major parts of babel and eslint and prettier as well. The architecture turned on its head yet again, making the grammar look much more like a parser. Just for fun the ImportSpecifier snippet I had shared previously now looks like this:

const visitors = {
  *ImportDeclaration() {
    yield eat(KW`import`);

    const special = yield eatMatch(ref`specifiers:ImportSpecialSpecifier`);

    const brace = special ? yield eatMatch(PN`,`, LPN`{`) : yield eatMatch(LPN`{`);
    if (brace) {
      for (;;) {
        yield eat(ref`specifier:ImportSpecifier`);

        if (yield match(RPN`}`)) break;
        if (yield match(PN`,`, RPN`}`)) break;
        yield eat(PN`,`);
      }
      yield eatMatch(PN`,`);
      yield eat(RPN`}`);
      yield eat(KW`from`);
    }

    yield eat(ref`source:StringLiteral`);
    yield eatMatch(PN`;`);
  },
};

Also I should say at some point this project turned into the core of a new IDE. I'm hoping to put together a conference talk once I have some of the coolest demos I know how to create working properly.

EDIT:
Sorry that snippet is ImportDeclaration but I had shared ImportSpecifier previously, which now looks like this:

v = {
  *ImportSpecifier() {
    yield eat(ref`imported:Identifier`);
    yield eatMatch(KW`as`, ref`local:Identifier`);
  },
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants