Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
c46d22d
a Rascal formatter setup
jurgenvinju Aug 13, 2025
d6d886e
make sure around empty boxes no additional spacing is added
jurgenvinju Aug 14, 2025
6f038c0
removed dead var
jurgenvinju Aug 14, 2025
2fcf830
progress on a rascal formatter
jurgenvinju Aug 14, 2025
3c20142
Merge branch 'hifi-tree-diff' into rascal-formatter
jurgenvinju Aug 14, 2025
b36df06
Merge branch 'main' into rascal-formatter
jurgenvinju Aug 14, 2025
f196ba9
Merge branch 'main' into rascal-formatter
jurgenvinju Aug 14, 2025
2c616ef
removed dead doc comment
jurgenvinju Aug 14, 2025
6adec9f
minor progress on rascal formatter
jurgenvinju Aug 14, 2025
afdc030
convenience for separated lists
jurgenvinju Aug 14, 2025
7085bfa
improving the formatter
jurgenvinju Aug 17, 2025
73356c3
Merge branch 'minor-fixes-for-formatters' into rascal-formatter
jurgenvinju Aug 17, 2025
d43ecf5
more additions to the formatter
jurgenvinju Aug 17, 2025
962b521
renamed test to a unique name
jurgenvinju Aug 18, 2025
bc33cb6
added lots of new formatting rules for Rascal
jurgenvinju Aug 18, 2025
bd5d40e
some deforestation makes the algorithm much faster
jurgenvinju Aug 18, 2025
a0cae59
added missing group function (ported from Set)
jurgenvinju Aug 18, 2025
5610e68
factored expensive constant to private global
jurgenvinju Aug 18, 2025
b5ca046
more constructs are formatted now and we group statements if they wer…
jurgenvinju Aug 18, 2025
be576a1
added lots of formatting rules
jurgenvinju Aug 18, 2025
2b7f7c3
prevent generation of empty literals, replace by NULL where neceessary
jurgenvinju Aug 18, 2025
7709572
fixed an issue and added some debug code
jurgenvinju Aug 19, 2025
9279533
added assert for empty literals
jurgenvinju Aug 21, 2025
2958ef0
Merge branch 'main' into rascal-formatter
jurgenvinju Aug 21, 2025
0993c59
minor fixes
jurgenvinju Aug 21, 2025
0a1d7ea
fixed unused warning
jurgenvinju Aug 21, 2025
850acdd
fixed some unused imports and removed unused function
jurgenvinju Aug 21, 2025
16dc9b7
replaced the complex and broken GG feature of Box2Text by an eager re…
jurgenvinju Aug 21, 2025
ff20d58
continued with adding more formatting rules
jurgenvinju Aug 21, 2025
bd75140
added rs for rowSeparator feature to Arrays, for when we want to disp…
jurgenvinju Aug 23, 2025
84213b4
tables fixed for the occurence of nested splices. This involves infer…
jurgenvinju Aug 25, 2025
a23e703
separated lists do not have space by default between the previous ele…
jurgenvinju Aug 25, 2025
5d08cf4
removed unused pattern variable
jurgenvinju Aug 25, 2025
dd6fc23
added more formatting rules. needed fixes in Box2Text
jurgenvinju Aug 25, 2025
43e23fe
streamlining
jurgenvinju Aug 25, 2025
e6c0b45
some progress with string termplate formatting
jurgenvinju Aug 25, 2025
a88b978
added todo
jurgenvinju Aug 26, 2025
95dc0f4
turned prefix of \' continuation into a layout node to make sure layo…
jurgenvinju Aug 26, 2025
c4badbe
re-introduced striprec which has disappeared from Type during a cleanup
jurgenvinju Aug 26, 2025
eb94796
gave constructor names to StringLiteral alternatives for easy API
jurgenvinju Aug 26, 2025
ace1460
added plausably correct formatting of string templates with left-alig…
jurgenvinju Aug 27, 2025
5ccca1d
slices
jurgenvinju Aug 27, 2025
d10dc4a
Merge branch 'main' into rascal-formatter
jurgenvinju Aug 27, 2025
d68abb3
better callOrTree indentation
jurgenvinju Aug 28, 2025
e0d7d25
Merge branch 'main' into rascal-formatter
jurgenvinju Oct 4, 2025
9786b06
inlined G semantics again to be able to deal with nested U boxes prop…
jurgenvinju Oct 6, 2025
75e0191
started formatting syntax definitions
jurgenvinju Oct 6, 2025
2679cea
fixes in G boxes with H contexts
jurgenvinju Oct 6, 2025
ba3128c
added todo
jurgenvinju Oct 6, 2025
a2da436
added and used toClusterBox function for retaining vertical clusters …
jurgenvinju Oct 7, 2025
e95a888
add clustering to function statements
jurgenvinju Oct 7, 2025
fc257f3
AG is now also inlined and lazily evaluated to allow for U and G grou…
jurgenvinju Oct 7, 2025
98dc007
improved relation tables
jurgenvinju Oct 7, 2025
791d6b9
added debUG function for debugging G U and AG groups
jurgenvinju Oct 7, 2025
eca370e
fixed Main again
jurgenvinju Oct 7, 2025
f2af14d
added more constructs to format
jurgenvinju Oct 7, 2025
7b6dd34
all binary expressions are now lists by tree2box
jurgenvinju Oct 8, 2025
52823cd
added backward grouping
jurgenvinju Oct 8, 2025
3cfe450
added backward option to G groups
jurgenvinju Oct 8, 2025
46736f0
added backwards grouping to debUG
jurgenvinju Oct 8, 2025
6f5658f
single line comments that have to end with a newline now end with a n…
jurgenvinju Oct 8, 2025
bf88448
expression elements wrapped in HV because they can become lists now d…
jurgenvinju Oct 8, 2025
f2527b3
single line comments fixed
jurgenvinju Oct 8, 2025
2070d2c
incremental additions to the Rascal formatter'
jurgenvinju Oct 8, 2025
e3e44aa
incremental additions to the Rascal formatter'
jurgenvinju Oct 8, 2025
a56918e
better empty sets
jurgenvinju Oct 8, 2025
a45661d
fixed tricky issues with single line comments in layoutDiff
jurgenvinju Oct 9, 2025
8cfe4c4
much better solution for single line comments that end up formatted i…
jurgenvinju Oct 9, 2025
f7e2f18
finetuning
jurgenvinju Oct 9, 2025
dcb6934
radical optimization due to caching symbol for newline character class
jurgenvinju Oct 9, 2025
280edda
conditional debug prints get special treatment
jurgenvinju Oct 9, 2025
d3565ca
added asserts to formatter
jurgenvinju Oct 9, 2025
c5b68b9
less leading space for indented data declaration variants
jurgenvinju Oct 10, 2025
082ab36
much better indentation for assignments and declarations with initial…
jurgenvinju Oct 10, 2025
d2ea2c5
fix for non-single-line comments that do not need forced newlines, bu…
jurgenvinju Oct 10, 2025
21531a7
removed test comments
jurgenvinju Oct 10, 2025
97393ec
inlined the hot width function (alias for string size)
jurgenvinju Oct 10, 2025
f153c9c
factored constant to a global for efficiency sake
jurgenvinju Oct 10, 2025
e5d3399
introduced varargs versions of all boxes and renamed the respective b…
jurgenvinju Oct 11, 2025
f72ba58
lot of finetuning and removing little issues
jurgenvinju Oct 12, 2025
13a3474
removed all superfluous brackets using the cleanMeUp function
jurgenvinju Oct 12, 2025
f8f6c47
fixed issues caused by rough cleanup
jurgenvinju Oct 12, 2025
1d70edf
more fixes
jurgenvinju Oct 12, 2025
2303b89
fixed multi-variable declarations
jurgenvinju Oct 13, 2025
ef55725
fixed additional statements in for loops inside string templates
jurgenvinju Oct 13, 2025
e5d03b5
fixed return of binary expressions
jurgenvinju Oct 13, 2025
883a798
added some more sanity checks, also for documentation purposes
jurgenvinju Oct 14, 2025
6427de1
removed the superfluous brackets from the Box2Text test cases, for do…
jurgenvinju Oct 14, 2025
54641d8
test renamings
jurgenvinju Oct 14, 2025
34e3cc2
removed more superfluous brackets
jurgenvinju Oct 14, 2025
5214fd3
Merge branch 'main' into rascal-formatter
jurgenvinju Oct 16, 2025
5e2b8f0
moving ahead with testing all files in the library automatically
jurgenvinju Oct 17, 2025
a7ad1a3
some bugs
jurgenvinju Oct 17, 2025
849a60a
fixed an 11-year-old search/replace bug in the C grammar
jurgenvinju Oct 19, 2025
bf82621
still fixing fall out of changin the default formatter for binary exp…
jurgenvinju Oct 19, 2025
f2b50f1
checker found another undeclared constructor
jurgenvinju Oct 19, 2025
a88df2c
restored relations-as-tables
jurgenvinju Oct 19, 2025
892e0de
single expressions end up in the first cell
jurgenvinju Oct 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion src/org/rascalmpl/interpreter/IEvaluatorContext.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
import java.util.Optional;
import java.util.Stack;

import org.checkerframework.checker.nullness.qual.Nullable;
import org.rascalmpl.ast.AbstractAST;
import org.rascalmpl.debug.IRascalMonitor;
import org.rascalmpl.exceptions.StackTrace;
Expand Down
23 changes: 23 additions & 0 deletions src/org/rascalmpl/library/List.rsc
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,29 @@ tuple[list[&T],list[&T]] split(list[&T] l) {
return <take(half,l), drop(half,l)>;
}

@synopsis{Groups sublists for consecutive elements which are `similar`}
@description{
This function does not change the order of the elements. Only elements
which are similar end-up in a sub-list with more than one element. The
elements which are not similar to their siblings, end up in singleton
lists.
}
@examples{
```rascal-shell
import List;
bool bothEvenOrBothOdd(int a, int b) = (a % 2 == 0 && b % 2 == 0) || (a % 2 == 1 && b % 2 == 1);
group([1,7,3,6,2,9], bothEvenOrBothOdd);
```
}
public list[list[&T]] group(list[&T] input, bool (&T a, &T b) similar) {
lres = while ([hd, *tl] := input) {
sim = [hd, *takeWhile(tl, bool (&T a) { return similar(a, hd); })];
append sim;
input = drop(size(sim), input);
}

return lres;
}

@synopsis{Sum the elements of a list.}
@examples{
Expand Down
79 changes: 58 additions & 21 deletions src/org/rascalmpl/library/analysis/diff/edits/HiFiLayoutDiff.rsc
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ module analysis::diff::edits::HiFiLayoutDiff
extend analysis::diff::edits::HiFiTreeDiff;
import ParseTree; // this should not be necessary because imported by HiFiTreeDiff
import String; // this should not be be necessary because imported by HiFiTreeDiff
import lang::rascal::grammar::definition::Characters;
import IO;

@synopsis{Normalization choices for case-insensitive literals.}
data CaseInsensitivity
Expand Down Expand Up @@ -106,22 +108,31 @@ list[TextEdit] layoutDiff(Tree original, Tree formatted, bool recoverComments =
default list[TextEdit] rec(
Tree t:appl(Production p, list[Tree] argsA),
appl(p /* must be the same by the above assert */, list[Tree] argsB))
= [*rec(a, b) | <a, b> <- zip2(argsA, argsB)];
= [*rec(argsA[i], argsB[i]) | i <- [0..size(argsA)]];

// first add required locations to layout nodes
original = reposition(original, markLit=true, markLayout=true, markSubLayout=true);
// TODO: check if indeed repositioning is never needed
// original = reposition(original, markLit=true, markLayout=true, markSubLayout=true);

return rec(original, formatted);
}

private Symbol newlineClass = \char-class([range(10,10)]);

@synopsis{Make sure the new layout still contains all the source code comments of the original layout}
@description{
This algorithm uses the @category("Comments") tag to detect source code comments inside layout substrings. If the original
This algorithm uses the `@category(/[cC]omments/)` tag to detect source code comments inside layout substrings. If the original
layout contains comments, we re-introduce the comments at the expected level of indentation. New comments present in the
replacement are kept and will overwrite any original comments.

This trick is complicated by the syntax of multiline comments and single line comments that have
to end with a newline.
There are corner cases with respect to the original comments:
* the single line comment that does not end with a newline itself, yet it must always end with a newline after it.
* multiple single line comments after each other

Then there are corner cases with respect to the replacement whitespace:
* the last line of the replacement whitespace is special. This is the indentation to use for all comments.
* but there could be no newlines in the replacement whitespace; and still there is a single line comment to be included.
Now we need to infer an indentation level for what follows the comment from "thin air".
}
@benefits{
* if comments are kepts and formatted by tools like Tree2Box, then this algorithm does not overwrite these.
Expand All @@ -132,7 +143,14 @@ to end with a newline.
* if comments are not marked with `@category("Comment")` in the original grammar, then this algorithm recovers nothing.
}
private str learnComments(Tree original, Tree replacement) {
originalComments = ["<c>" | /c:appl(prod(_,_,{\tag("category"(/^[Cc]omment$/)), *_}), _) := original];
bool mustEndWithNewline(lit("\n")) = true;
bool mustEndWithNewline(conditional(Symbol s, _)) = mustEndWithNewline(s);
// if a comment can not contain newline characters, but everything else, then it must be followed by one:
bool mustEndWithNewline(\iter(Symbol cc:\char-class(_))) = intersection(cc, newlineClass) != newlineClass;
bool mustEndWithNewline(\iter-star(Symbol cc:\char-class(_))) = intersection(cc, newlineClass) != newlineClass;
default bool mustEndWithNewline(_) = false;

originalComments = [<s, s[-1] == "\n" || mustEndWithNewline(lastSym)> | /c:appl(prod(_,[*_,Symbol lastSym],{\tag("category"(/^[Cc]omment$/)), *_}), _) := original, str s := "<c>"];

if (originalComments == []) {
// if the original did not contain comments, stick with the replacements
Expand All @@ -146,23 +164,42 @@ private str learnComments(Tree original, Tree replacement) {
return "<replacement>";
}

// At this point, we know that: (a) comments are not present in the replacement and (b) they used to be there in the original.
// So the old comments are going to be the new output. however, we want to learn indentation from the replacement.
// At this point, we know that:
// (a) comments are not present in the replacement and
// (b) they used to be there in the original.
// So the old comments are going to be copied to the new output.
// But, we want to indent them using the style of the replacement.

// The last line of the replacement string typically has the indentation for the construct that follows:
// | // a comment
// | if (true) {
// ^^^^
// newIndent
//
// However, if the replacement string is on a single line, then we don't have the indentation
// for the string on the next line readily available. In this case we indent the next line
// to the start column of the replacement layout, as a proxy.

str replString = "<replacement>";
str newIndent = split("\n", replString)[-1] ? "";

// Drop the last newline of single-line comments, because we don't want two newlines in the output for every comment:
str dropEndNl(str line:/^.*\n$/) = (line[..-1]);
default str dropEndNl(str line) = line;
if (/\n/ !:= replString) {
// no newline in the repl string, so no indentation available for what follows the comment...
newIndent = "<for (_ <- [0..replacement@\loc.begin.column]) {> <}>";
}

// the first line of the replacement ,is the indentation to use.
str replString = "<replacement>";
str replacementIndent = /^\n+$/ !:= replString
? split("\n", replString)[0]
: "";

// trimming each line makes sure we forget about the original indentation, and drop accidental spaces after comment lines
return replString + indent(replacementIndent,
"<for (c <- originalComments, str line <- split("\n", dropEndNl(c))) {><indent(replacementIndent, trim(line), indentFirstLine=true)>
'<}>"[..-1], indentFirstLine=false) + replString;
// we always place sequential comments vertically, because we don't know if we are dealing
// we a single line comment that has to end with newline by follow restriction or by a literal "\n".
// TODO: a deeper analysis of the comment rule that's in use could also be used to discover this.
str trimmedOriginals = "<for (<c, newLine> <- originalComments) {><trim(c)><if (newLine) {>
'<}><}>";

// we wrap the comment with the formatted whitespace to assure the proper indentation
// of its first line, and the proper indentation of what comes after this layout node
return replString
+ indent(newIndent, trimmedOriginals, indentFirstLine=false)
+ newIndent
;
}

private Symbol delabel(label(_, Symbol t)) = t;
Expand Down
91 changes: 79 additions & 12 deletions src/org/rascalmpl/library/lang/box/syntax/Box.rsc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
@synopsis{An abstract declarative language for two dimensional text layout}
module lang::box::\syntax::Box

import List;

@synopsis{Every kind of boxes encodes one or more parameterized two-dimensional text constraints.}
@description{
* `H` puts their elements next to each other one the same line separated by `hs` spaces.
Expand All @@ -22,8 +24,10 @@ module lang::box::\syntax::Box
* `SPACE` produces `space` spaces
* `L` produces A literal word. This word may only contain printable characters and no spaces; this is a required property that the formatting algorithm depends on for correctness.
* `U` splices its contents in the surrounding box, for automatic flattening of overly nested structures in syntax trees.
* `G` is an additional group-by feature that reduces tot the above core features
* `SL` is a convenience box for separated syntax lists based on `G`
* `G` is an additional group-by feature for `list[Box]` that reduces tot the above core features. You can use it to wrap another
box around every `gs` elements.
* `AG` is an additional group-by feature for array `Row`s that reduces to the above core features. You can use it to wrap a `R` row
around every `gs` elements and then construct an `A` around those rows.
* `NULL()` is the group that will dissappear from its context, useful for skipping content. It is based on the `U` box.
}
@benefits{
Expand All @@ -38,27 +42,44 @@ set on every `I` Box according to the current preferences of the user.
* `U(boxes)` is rendered as `H(boxes)` if it's the outermost Box.
}
data Box(int hs=1, int vs=0, int is=4)
= H(list[Box] boxes)
| V(list[Box] boxes)
| HOV(list[Box] boxes)
| HV(list[Box] boxes)
| I(list[Box] boxes)
| WD(list[Box] boxes)
| A(list[Row] rows, list[Alignment] columns=[l() | [R(list[Box] cs), *_] := rows, _ <- cs] /* learns the amount of columns from the first row */)
= _H(list[Box] boxes)
| _V(list[Box] boxes)
| _HOV(list[Box] boxes)
| _HV(list[Box] boxes)
| _I(list[Box] boxes)
| _WD(list[Box] boxes)
| _A(list[Row] rows, Box rs=NULL(), list[Alignment] columns=[])
| _AG(list[Box] boxes, int gs=2, list[Alignment] columns=[], Box rs=NULL())
| SPACE(int space)
| L(str word)
| U(list[Box] boxes)
| G(list[Box] boxes, Box(list[Box]) op = H, int gs=2)
| _U(list[Box] boxes)
| _G(list[Box] boxes, bool backwards=false, int gs=2, Box op = H([]))
| NULL()
;

Box H(Box boxes..., int hs=1) = _H(boxes, hs=hs);
Box V(Box boxes..., int vs=0) = _V(boxes, vs=vs);
Box HOV(Box boxes..., int hs=1, int vs=0) = _HOV(boxes, hs=hs, vs=vs);
Box HV(Box boxes..., int hs=1, int vs=0) = _HV(boxes, hs=hs, vs=vs);
Box I(Box boxes...) = _I(boxes);
Box WD(Box boxes...) = _WD(boxes);
Box A(Row rows..., Box rs=NULL(), list[Alignment] columns=[])
= _A(rows, rs=rs, columns=columns);
Box AG(Box boxes..., int gs=2, list[Alignment] columns=[], Box rs=NULL())
= _AG(boxes, gs=gs, columns=columns, rs=rs);
Box U(Box boxes...) = _U(boxes);
Box G(Box boxes..., bool backwards=false, int gs=2, Box op = H([]))
= _G(boxes, backwards=backwards, gs=gs, op=op);

@synopsis{A row is a list of boxes that go into an `A` array/table.}
@description{
Rows do not have parameters. These are set on the `A` level instead,
or per cell Box.
}
data Row = R(list[Box] cells);

// Row R(Box cells...) = _R(cells);

data Alignment = l() | r() | c();

@synopsis{NULL can be used to return a Box that will completely dissappear in the surrounding context.}
Expand All @@ -81,4 +102,50 @@ algorithm starts counting boxes and widths.
* Do not use `NULL` for empty Row cells, unless you do want your cells aligned to the left and filled up to the right with empty H boxes.
* NULL will be formatted as `H([])` if it's the outermost Box.
}
Box NULL() = U([]);
Box NULL() = U([]);

@synopsis{Convenience box for adding separators to an existing box list}
@description{
Each element is wrapped by the `op` operator together with the next separator.
The resulting list is wrapped by a G box, of which the elements will be spliced
into their context.
}
Box SL(list[Box] boxes, Box sep, Box op = H([], hs=0))
= G([b, sep | b <- boxes][..-1], op=op, gs=2);

@synopsis{Flatten and fold U and G boxes to simplify the Box structure}
@description{
U and G and AG boxes greatly simplify the Box tree before it is formatted. This
happens "just-in-time" for efficiency reasons. However, from a Box tree
with many U and G boxes it can become hard to see what the actual formatting
constraints are going to be.

This function applies the semantics of G and U and returns a Box that renders
exactly the same output, but with a lot less nested structure.
}
@benefits{
* useful to debug complex `toBox` mappings
* formatting semantics preserving transformation
}
@pitfalls{
* only useful for debugging purposes, because it becomes a pipeline bottleneck otherwise.
}
Box debUG(Box b) {
list[Box] groupBy([], int _gs, Box _op) = [];
list[Box] groupBy(list[Box] boxes:[Box _, *_], int gs, Box op)
= [op[boxes=boxes[..gs]], *groupBy(boxes[gs..], gs, op)];

list[Box] groupByBackward([], int _gs, Box _op) = [];
list[Box] groupByBackward(list[Box] boxes:[Box _, *_], int gs, Box op)
= [op[boxes=boxes[..size(boxes) mod gs]], *groupBy(boxes[size(boxes) mod gs..], gs, op)];

list[Row] groupRows([], int _gs) = [];
list[Row] groupRows(list[Box] boxes:[Box _, *_], int gs)
= [R(boxes[..gs]), *groupRows(boxes[gs..], gs)];

return innermost visit(b) {
case [*Box pre, _U([*Box mid]), *Box post] => [*pre, *mid, *post]
case _G(list[Box] boxes, gs=gs, op=op, backwards=bw) => _U(bw ? groupByBackward(boxes, gs, op) : groupBy(boxes, gs, op))
case _AG(list[Box] boxes, gs=gs, columns=cs, rs=rs) => A(groupRows(boxes, gs), columns=cs, rs=rs)
}
}
Loading
Loading