-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recast needs additional maintainers #1086
Comments
I think I'm just going to start work on the fork. It's macrome-js/macrome that I'm building which fundamentally relies on recast, so perhaps |
Currently my organizations consist solely of me, but I've created them in the explicit hope that I can bring others in and form communities of trust and shared responsibility. |
I'm going to give this one day. Let's see how far out ahead I can get my fork in just one day of work. |
@conartist6 I suggest talking with @benjamn about helping maintain recast. If you have a stake in it, then you're clearly more motivated. |
I've been trying rather aggressively, but no luck. I think the most helpful thing I can do is make my fork and show him what my direction would be if I had control over the package. I'm working on that right now. I'm getting my fork ready to be a
I'm stripping away everything that isn't what the library is about to reduce the work involved documenting, maintaining, and versioning. No more |
Or maybe I should just change parsers to a different signature, something like |
I've also ripped out the dependency on |
Here's how I imagine usage looking roughly: import { parse } from '@babel/parse';
import { wrapParseWithFormatting, printFormatted } from 'recast';
const parseFormatted = wrapParseWithFormatting(parse, 'babel');
const ast = parseFormatted(source, options);
// ast mutations
return printFormatted(ast); |
Replacing |
Making the tests javascript instead of typescript. |
Typescript doesn't understand self-references in js files (bug filed), and I'd prefer having those to having typescript. I think cleaning up the tests is going to be the biggest chunk of this work. The test cases need to be reorganized so that as many test cases as possible are expressed in a manner that is agnostic to the specific parser and ast structure being used. |
Actually I'm going to do the minimum amount of work necessary to get the test suite to pass then push, merge and push my fix for the parens issues, and continue working. I don't want to fall into the trap of verifying behavior which is wrong. |
Another thing I'm thinking about is that there's just very little left that I think the best thing to do for the long run would be to help prettier avoid dependence on the original source text so that the workflow would be:
And then we get to deprecate recast! I've created an issue to discuss the changes to prettier that would be necessary. |
I can't say I disagree with most of your thoughts here. I also wonder about the future of recast in a world where many (most?) use Prettier. That said, I decided to spend a little time getting things in slightly better shape and released v0.21.1: https://github.com/benjamn/recast/releases/tag/v0.21.1. If there's any specific PR you want me to prioritize, let me know and I'll hopefully get to it this week. |
Thank you!! Are you interested in my putting together PRs for the API changes I mentioned as being a potential |
Also would it be possible to get at least some basic perms on this repo, like triage? If I had that I could take a proper pass through the open bugs and see what's legit. |
I just wanted to chime in that I really appreciate the work that goes into maintaining recast ❤️ Even in a possible future where the Prettier maintainers respond enthusiastically to prettier/prettier#12709 and Prettier grows the ability to support writing codemods, there will be a lot of codebases out there that don't use Prettier. I appreciate being able to transform that code without completely reformatting it, meeting the code where it is. Recast is a focused tool that does the job of enabling transforms without making a lot of assumptions, and I think there's a lot of value in that. |
On the subject of specific PRs, I would appreciate a look at #1089 and #1090 :-). Those are new -- I just sent them this week. @conartist6 to your thought at #1068 (comment) about test cases, I quite like the structure of One thing that might be productive for someone to do would be to put some of the other tests into the same style, and then just churn out a few dozen more one-line test cases for areas that seem to be tricky, like extra/missing parens. (Hmm, in fact: |
One thing's for sure, both My impression is that both are fundamentally a bit unstable due to exceedingly high costs in two major dimensions:
Prettier is the more stable project just because it has so many more users and it is run so much more often. The language changing is pretty much a constant, so what I think could use the most attention is some way to curb the cost of supporting so many different parsers. Does anyone else have ideas about this? For now I'm going to reopen this issue as I think it's still relevant and there's productive discussion to be had about direction and combining efforts that have started independently. |
One way to reduce maintenance costs would be to try to extract as much common functionality as possible between prettier and recast and share it. |
I agree that there's fundamentally a bit of ongoing instability in that the language changes. But I think the volume of those changes is not actually all that high. And when a feature is new, or especially when it's an early proposal, there's not a lot of code out there that uses it yet -- so for the great majority of users, it's perfectly fine if the tool isn't quick to gain support for that feature, because none of their code uses it. If the feature becomes widespread, someone will be motivated to add it. One thing I've appreciated about Recast is that it's not a lot of code. My experience with #1089 and #1090, where I add or fix support for some less-widespread (because they're specific to Flow) language features, was that it wasn't a lot of work to do so. (FWIW I think fixing something like where parens do or don't get added is inherently trickier, because it's logic that cuts across wide swathes of the language. But it's also not something where the language really changes over time. The existing code doesn't need a fix because the language changed, but rather because it wasn't quite right in the first place. A really spot-on solution for that, if/when we nail one down, shouldn't inherently have any ongoing need for maintenance.) |
I am curious about the possibility of simplifying by supporting fewer parsers. It looks to me (from looking in What do the differences look like that add complexity to the Recast code? I also don't know enough about the context of the various parsers to understand what value some people might be getting from different ones. Personally I believe I wouldn't miss anything if both the non-Babel parsers were dropped; but I don't really know the history of the different parsers, or what use cases people might have had in the past, and might still have, that the others serve and Babel didn't or doesn't. |
These are excellent points. I'm not so worried about the language itself either. I am a bit more concerned about things like JSX, Typescript, and Flow which are not the language (but whose syntax you must handle in tools). These even complicate the experience of writing a codemod, for example using the babel AST because you may be writing a mod for plain javascript codebase and the type checker will be telling you that you need to handle nodes for type annotations and JSX. I'm going to dig more into what the differences are and whether there are better ways of abstracting them away. Prettier has a postprocess step that does a lot of mangling to try to clean up some of the main differences. That's actually another bit of code that prettier could conceivably share with recast. |
Also after having read some more or prettier's logic I think the biggest thing to abstract would be a representation of the token stream. Recast demands that it be given tokens to work (though I haven't yet found the locations where they are used). Prettier already essentially relies on being able to treat the source text like a token stream, looking around for particular tokens like Since I'm always thinking in iterables, I think it might be possible to write a token generator which can generate tokens by traversing an AST. That way it would still be possible to work with an AST but easily write rules that involve source ordering. Also one of the benefits or writing a generator is that without the need to store the data you can parameterize the way you generate it, e.g. as |
Oh wasm is another thing that I learned (from reading prettier sources) introduces some new syntaxes that some parsers handle and some do not, e.g. |
OK I think I feel pretty confident though that prettier should be my starting point. Prettier's starting point was recast, and they've come a very long way since then. I want to adopt their solutions to the problems that recast still struggles with. I'm finding it very interesting to look at some of the oldest history of prettier. Here is what is essentially its initial commit. Some of the content of that readme is documentation about its mechanism that does not exist for current versions though the mechanism remains the same!! |
@conartist6 I’m here. You have my attention. I’m sorry I haven’t been able to prioritize a proper response until now. Would you mind summarizing what you need from Recast in order to use it without having to fork it (also fine/allowed, just hopefully unnecessary)? |
@eventualbuddha Oh cool, taking a look. I'm on the same page with you about babel's APIs being the ones that you really need to be able to do serious codemodding work. You might be interested in what I was working on immediately before this. My main project that's directly relevant is macrome, a build system that allows you to keep generated output continuously up to date even as input and the transforms themselves change. Its aim is to provide robust Imagine being able to use CSS modules but instead just having a magic syntax it actually generates the file containing the mappings between importable selectors and their secret hashes. Or imagine being able to use macros to inject a stable ID into your source code. You could write code like this: import uid from '@macrome/uid.macro';
translation(uid``, 'description');
// and the build system would change it to:
translation(uid`e9af0`, 'description'); This solves so many problems around knowing what is and isn't the same translation across multiple version of the code without offering any particular opinion about how you build your translation system. It would also be very relevant for generating stable error codes that could be used to link to a knowledgebase! |
Long story short, I don't want to end up maintaining recast. In order to be able to execute on such an ambitious vision, I need modifying and printing code to also "just work" so that I can build on top of it without spending a lot of my time on maintenance of this library. |
OK, I think I've got my recommendation. It is: embrace concrete syntax trees by adding Let's say you had a source like this: (astexplorer) if (true) {
consequent;
} In my CST representation it would look like: { // = node
"type": "IfStatement",
"test": {
"type": "Literal",
"value": true,
"tokens": [
{ "type": "Boolean", "value": "true" }
]
},
"consequent": {
"type": "BlockStatement",
"body": {/*...*/},
"tokens": [
{ "type": "Punctuator", "value": "{" },
{ "type": "Whitespace", "value": "\n " },
{ "type": "Reference", "value": "body" }, // interpret as node.body.tokens
{ "type": "Whitespace", "value": "\n" },
{ "type": "Punctuator", "value": "}" }
]
},
"tokens": [
{ "type": "Keyword", "value": "if" },
{ "type": "Whitespace", "value": " " },
{ "type": "Punctuator", "value": "(" },
{ "type": "Reference", "value": "test" },
{ "type": "Punctuator", "value": ")" },
{ "type": "Whitespace", "value": " " },
{ "type": "Reference", "value": "consequent" }
]
} I think I can provide the equivalent of The idea would be to provide a rich API, document it, and then see if I could get enough momentum to perhaps merge with At the end though you'd have exactly what I want: a way that things can interoperate and "just work" without the explicit need to maintain a library like recast. I'm going to work on building the library here. It's gonna be a bit of a bear because it actually needs what is essentially a custom parser over a token stream so that it understands node-by-node both how to generate tokens and how to verify that the existing tokens constitute legal syntax. Fortunately I've been experimenting with various kinds of syntax matching architectures... |
Here's a "little" piece of code that shows how I think a token management engine would look and work: EDIT: I have moved the prototype to a gist since I keep changing it. |
Hmm, and maybe I could have one more kind of builder, |
Well, that's it. I don't want to burden this thread with all my dev work. I talk a lot. I think this shows pretty clearly a concrete proposal -- essentially a rearchitecture of recast. I could call it something else, or I would also be perfectly happy if there was interest in publishing this under the recast name. I'd sure love to have help building it. My hope would be that building this would make it possible for prettier to eventually depend on recast again, as this structure supports all the kinds of queries prettier needs to make. If this did happen, I don't think recast would be at a loss for maintainers anymore. |
I really like it because it'd be an implementation of recast with absolutely no logic about pretty-printing, the incomplete implementation of which is one of recast's current maintenance burdens. Also in addition to making a best effort to preserve formatting, it permits easy creation of formatting-aware transforms: those which also manipulate the tokens array in such a way that it remains valid for the given AST. |
I now have a complete implementation for a small grammar in the repo. I know some of my examples contained magical thinking, but this doesn't. It works! |
This is a lot to digest 😅, but I want you to know I am digesting it @conartist6. In particular, I agree it would be convenient if Recast could output tokens rather than (or in addition to) a string of code. Some initial thoughts, from a first glance: Instead of providing a
I agree keeping up with AST and parsing/printing changes is an ongoing chore, potentially never finished, and I know I've fallen behind. But I'm not sure I'm following how the CST output tokens idea would save Recast from needing a pretty-printer. Generating a stream of concrete output tokens (including whitespace and comments) seems pretty much equivalent to full pretty-printing, with all the choices that need to be made about indentation and parenthesization, etc. Some of the token ranges could be copied directly from the original source token stream, but the ones that Recast generates will need to be formatted by some sort of pretty-printing-like process, right? (Not a rhetorical question—please let me know if I'm misunderstanding something!) In more general terms, I think it would be valuable if Recast could be configured to delegate pretty-printing to another library (as it can already delegate parsing), even if it still needs to maintain its own default pretty-printer. The challenge is that pretty-printing is only part of the conservative reprinting process, so the other printing library needs to have a certain kind of structure that allows alternating recursively between pretty-printing and just copying source code. Since Prettier was originally a fork of the Recast pretty-printer, it's probably the closest to having that structure already (passing a |
In case this isn't already shared understanding: I strongly believe Recast users should not have to worry about manipulating tokens while doing their AST transformations. If we make Recast capable of generating output tokens, it needs to be an automatic process based entirely on inferred AST changes, to avoid increasing the implementation burdens of AST transforms. One more question for async discussion: can we enumerate the benefits of having Recast generate tokens directly, rather than just tokenizing the output string that it already prints? I imagine it might be faster to skip the re-tokenization, though tokenization is fairly quick compared to full parsing. I ask because there might be a quick and dirty implementation here where Recast reuses its parser somehow to tokenize the output string. A little slower, perhaps, but it might be enough to unblock downstream experimentation, if output tokens are ultimately what you need? |
@benjamn Thanks for taking the time to look through this all and start thinking about it! I know I have a tendency to race off ahead in my own direction, but it's the only way to find out in a concrete way what is possible and what problems would be encountered. |
I really think it makes a lot of sense to have the tokens arrays on the nodes. You mention the possibility of tokenRanges, but that is essentially not much different than what we have now -- ranges into What i've built ensures that code transforms are able to add or modify tokens if they want to, but also ensures that they need not do so, that a reasonable default set of tokens will be provided. |
About deciding which printing algorithm to apply to different parts of the AST, have you seen this part of my code? |
As for tokenizing an output string, I'm not really thinking in that direction. In the world I'm imagining there would be no need for a pure string representation until all the transformation work was done. That's why I also want prettier to offer a compatible formatCst(cst) method. |
Something cool about the way my code works: it falls back from one printer to another. Suppose you have: {
key1: 'value1',
} And you write a codemod whose purpose is to change that to {
key1: 'value1',
key2: 'value2'
} You do this by pushing to the {
key1: 'value1',
key2: 'value2'} I agree that that looks bad, but it's actually great for feeding into a pretty printer (prettier, or one built into recast) because it retaines the most crucial formatting decision: whether the object should be printed on one line or more than one, represented as whether |
Where does this fallback decision happen? How does your code tell that the new |
Fallback is in the builders: https://github.com/conartist6/cst-tokens/blob/248c6dd60934840c13cabb5019ef050c68ae96b4/lib/builders/tokens.js#L23 |
The tokens array only becomes outdated if the number of properties changes. That's because I use the simple I can tell if the token stream is outdated using my grammar which defines which combination of tokens are valid. Notice that whitespace and comments are allowed, but for an object property or specifier, references, commas, and curly braces are ensured ( |
I've made up some basic architecture documentation to try to get the basic principles of the design across quickly. I'm also going to start experimenting with a mechanism for diffing references that would allow me to handle cases where it would be appropriate to go into fallback generation and then come back out of it within a single node. That would allow me to handle some tricky cases like transforming: import { /* a */ a, /* c */ c } from 'source';
// after inserting b currently you'd get the c comment attached to b:
import { /* a */ a, /* c */ b, c } from 'source';
// but b should use fallback tokens while c gets the ones that belong to it
import { /* a */ a, b, /* c */ c } from 'source'; |
I've done more of my due diligence on prettier now too. I'm sure I haven't found every surprise, but I think I have the basics. Essentially I have two related questions that I'm trying to understand the answers to:
|
OK I'm back-ish after some distractions. I've also changed tack a bit on how I plan to integrate with prettier. The new plan is to use exactly the existing APIs that prettier has for output. In other words, I expect prettier essentially to be able to do this: const {
__debug: { formatAST },
} = require("prettier");
const formatCst = (cst) => {
// Note: formatAst mutates the AST to rebalance binary expressions.
const formattedText = formatAST(cst);
rebuildTokensFromText(cst, formattedText);
} That's great because that part at least doesn't require any changes from prettier at all. They even export the Then the remaining challenge is the one that can't be avoided: writing prettier's language rules to query for tokens against the CST. I haven't really started digging into that yet, but I have heard at least one contrib there express enthusiasm for the idea of that change which is encouraging. |
I've got it! I think. It took a while to get to this design, but now that I'm here it feels pretty powerful and correct. I'm using coroutines -- one side of the coroutine is the grammar generator for a given node, and on the other side we have an engine of some kind, either for validation, generator, or mutation. The grammar generator yields simple commands to the engine like I've punted on the question of how to attach whitespace and comments by saying that it will be up to the plugin ecosystem to define a comment association strategy. I'll always hoist whitespace and comments up in the tree, user code can insert whitespace and comments wherever it wants, and plugins can be uses to systemically push comments downward. For example you might use plugins to push typecast comments or doc comments down into the nodes they describe, so that they are sure to move with those nodes if those nodes move. In this way I ensure that the most practical use cases can be served without endorsing any particular set of rules. I have a working implementation of the engine that consumes the grammar to build up |
I think I won't have to mess with the prettier language grammar either, which is amazing. I only have to define my own grammar, and I'm quite liking the syntax I've developed for expressing those definitions. |
There's still a ton of stuff to do, but I'd say that after three rewrites my architectural prototype is now complete. This is the design that I will move forward with! I'm going to rebuild my architecture docs, but for now here's the most interesting part of the grammar definition: const visitors = {
// This generator is executed by the `match` coroutine to build a `matchNode` for `node`
*ImportSpecifier(node, { matchNodes }) {
const { local, imported } = node;
// These next three lines are equivalent to `` take(ref`imported`) ``
// We use `match`/`emit` instead of `take` in order to get `importedRef`
// `match` does recursive submatching, and returns to us an array of matched tokens
// ref tokens can be used to get match nodes, e.g. `matchNodes.get(importedRef)`
const importedMatch = yield match(ref`imported`);
const importedRef = importedMatch[0];
// Causes the coroutine to add these tokens to `matchNode.tokens`
yield emit(importedMatch);
if (local.name !== imported.name) {
yield take(_, ID`as`, _, ref`local`);
} else {
const asMatch = yield match(_, ID`as`, _, ref`local`);
const localRef = arrayLast(asMatch);
// Ensure that `foo as bar` becoming `foo as foo` only emits `foo`
const valid = !matchNodes.get(importedRef).fallback && !matchNodes.get(localRef).fallback;
if (asMatch && valid) {
yield emit(asMatch);
}
}
},
} |
OK, so I've had one last revelation. That grammar that I just shared is sufficiently abstract that it's actually compatible with prettier's grammar definitions! That means that instead of maintaining a complete grammar for recast and a complete grammar for prettier, there can be a single next-gen grammar and tool (maybe called prettier, maybe not) which is capable of printing just about anything -- pretty printing, reprinting original whitespace, and any blend of the two. This would largely eliminate the need for a separate repository with the responsibilities that recast has had until now. Since my final answer to the question as to what to do about the difficulties maintaining recast is "heal the fork with prettier" I am going to close this issue and track further efforts elsewhere, for now on prettier/prettier#12806 |
Update: I think for a start I'll make a grammar independent of prettier's, and then shift towards maybe merging grammars with prettier as a potential long term goal. But for the moment I need to focus on my life, so there probably won't be too much progress in the near future. |
Juuuust kidding, I've still been working on cst-tokens. After the 4th (ish) big refactor, I think the core architecture is now likely to be pretty stable. This last refactor has allowed me to do some really cool stuff. For example, I no longer need to include whitespace in the definition of my javascript grammar at all. The parser simply understands that whitespace is necessary between a keyword and an identifier (but not between a punctuator and an identifier). I have some work left to do at this point, but I anticipate that I'll soon be have a web playground I can point folks to to try out what I've built and see how it allows them to transform code. I'll also be able to showcase the power of integration with prettier including the things you just couldn't do before, like insert a blank line or even control block expansion from inside a transform! |
This project got huge -- I've been working on it (instead of having a job) this entire time. It certainly is intended to replace recast, but it's also now meant to replace major parts of babel and eslint and prettier as well. The architecture turned on its head yet again, making the grammar look much more like a parser. Just for fun the const visitors = {
*ImportDeclaration() {
yield eat(KW`import`);
const special = yield eatMatch(ref`specifiers:ImportSpecialSpecifier`);
const brace = special ? yield eatMatch(PN`,`, LPN`{`) : yield eatMatch(LPN`{`);
if (brace) {
for (;;) {
yield eat(ref`specifier:ImportSpecifier`);
if (yield match(RPN`}`)) break;
if (yield match(PN`,`, RPN`}`)) break;
yield eat(PN`,`);
}
yield eatMatch(PN`,`);
yield eat(RPN`}`);
yield eat(KW`from`);
}
yield eat(ref`source:StringLiteral`);
yield eatMatch(PN`;`);
},
}; Also I should say at some point this project turned into the core of a new IDE. I'm hoping to put together a conference talk once I have some of the coolest demos I know how to create working properly. EDIT: v = {
*ImportSpecifier() {
yield eat(ref`imported:Identifier`);
yield eatMatch(KW`as`, ref`local:Identifier`);
},
}; |
@benjamn I am trying every way that might be possible to get your attention, because if I can't I'm forking the library.
The text was updated successfully, but these errors were encountered: