[RFC007] Migration of the typechecker, part 2b - focus on the cache #2178

yannham · 2025-02-24T17:50:28Z

Cache migration

This is once again a slice of #2134, this time focused on the migration of the cache to accommodate the new AST.

Note: this PR won't be merged as it is, because NLS hasn't been migrated yet, so it can't pass the CI. However, I'm checking locally that it passes all of the nickel-lang-core tests. It's separate for reviewing purpose.

Content

Gets rid of the legacy typecheck module, replace it with bytecode::typecheck instead
Entirely rework the cache module (more details below)
Move bytecode::ast::AstAlloc and Allocable to a new module bytecode::ast::alloc, which also gets a MoveTo trait which makes it possible to deep-copy arena-allocated data from one arena to the other
Simplify the REPL to mostly rely on the cache instead of performing some stuff itself

A few helpers are added here and there as well.

Cache migration design decisions

In order to properly cache asts, we now need to use an allocator. One possibility would be to parametrize the whole cache with a lifetime 'ast and ask for an additional allocator argument on methods whenever required. However, beside making the whole cache module much heavier, the allocator will need to lie somewhere at some point, but it seems that cache should actually be in charge of owning the allocator: if we instantiate the allocator in, say, main or program, then it can not be freed as long as the cache is alive, although we don't need to retain memory during evaluation. If the cache owns it instead, it can take the decision of dropping the ASTs whenever it sees fit. Also, even if we store the allocator at the Program level, the borrowing difficulties described below will just be pushed in Program. The only way to avoid them is to create an allocator in the outermost function (main()), which means, once again, that it's never freed.

AST cache

Thus, we store the AST allocator together with the AST cache directly in the main cache. This has a number of issues, because this requires to write a struct with a self-referential lifetime - which isn't possible in safe Rust (without extra crates). This is encapsulated in a separate submodule, ast_cache, and in the AstCache struct, which handle the unsafe parts and offer a safe interface. As we needed to split the cache, this PR split it further for more modularity and better separation of concerns.

Term cache

We still need RichTerms for evaluation. The simplest was to keep all the existing caches, including the term cache, and insert the AST cache at the front of the pipeline. Whenever we parse something, to avoid subtle bugs of having e.g. the term but not the AST, or the converse, we directly populate both the AST cache and the term cache. The latter still serves as the reference for the state of a given file. The AST cache is just a supplementary cache that is used when e.g. typechecking, to avoid parsing stuff again and again, but a large part of the Nickel pipeline doesn't use it.

Import resolution

Import are handled differently in the old AST (RichTerm) and the new one (Ast). The former has two forms, a resolved and a non-resolved one, and requires that before typechecking, unresolved imports are substituted for resolved ones in a separate phase. This is a bit artificial, and requires an additional traversal that could be avoided. The new AST, however, has just one representation corresponding to the syntax 'import <something> [as 'Something].

When migrating the typechecker, we made the choice of letting typecheck be the driver for import resolution. No prior import resolution phase is needed: instead, when the typechecker sees an import, it resolves it and peek at its content if needed for typechecking. The import resolver is responsible for 1. avoid reparsing of the same imports again and again, i.e cache them 2. record dependencies and reverse dependencies of files. Another motivation for this setup is to prepare for lazy imports. Indeed, the pipeline now doesn't require imports to be resolved upfront, up to the typechecker.

Note that, unfortunately, import resolution on the RichTerm representation is still needed for evaluation. Once we've typechecked a term, we convert it to a RichTerm and run usual program transformation, including import resolution. Though those resolutions should all hit the cache, and shouldn't need to actually reload or reparse any data. This means that we have two different resolver now, one for ASTs and one for RichTerms.

To accommodate for the upcoming LSP changes, where NLS has a different allocator for each live file, the interface of AstImportResolver is a bit more restricted in that it doesn't guarantee that the imported AST has the same lifetime than the importer (which are indeed different in NLS). The typechecker thus has to move types at the import boundaries. We don't expect those types to be huge (they are either top-level signatures or inferred apparent types). If that becomes a problem, we can also offer a specialized method for the main pipeline that doesn't need to move anything (or a richer return value that can distinguish the two cases).

A last change is that the typechecker now only peek at one import deep. That is, if there is a chain of imports, the apparent type isn't propagated through the whole chain anymore. It's a trade-off, because this was difficult to implement with the weaker AstImportResolver. Also, I don't expect many files with just an import "foo" at the top-level, being nested, and where the users rely on the apparent type for typechecking. At worst, one can always add a simple type annotation (and apparent types are always very simple).

REPL

The REPL has the following peculiarity: it needs to maintain its own typing and term environment, because of top-level let. However, since typing contexts now depend on a 'ast lifetime, it was just impossible to mix that in a satisfactory way with the self-referential AST cache. Instead, as many things in this PR, the general solution was to move everything that depend on the AST allocator directly into the AST cache - here the typing context - and offers an interface to manipulate it (mostly insert new bindings). Similarly, the type_of feature requires to look into the context or other 'ast-dependent data, so it has been moved to the cache.

How to review

The diff is big, but part of it can be ignored:

the typecheck module has been replaced verbatim by bytecode::ast::typecheck. The only small difference (between administrative fixes) is the handling of imports, that is the case Node::Import(..) in both walk and check. The rest can be ignored.
Many functions of the new cache are close, if not verbatim copies, of the previous ones, especially in SourceCache, TermCache, etc. It's still worth taking a broad look at the new structure. On the other hand, I believe ast_cache deserve to be extensively scrutinized as it is new and concentrates the unsafe usages.

yannham · 2025-02-26T09:16:19Z

core/src/eval/contract_eq.rs

@@ -161,8 +161,8 @@ fn contract_eq_bounded(
                    let had_gas = state.use_gas();
                    state.use_gas();

-                    let closure1 = idx1.borrow();
-                    let closure2 = idx2.borrow();
+                    let closure1 = idx1.borrow_orig();


This was lost in the original extraction of contract_eq. I noticed it thanks to a unused warning for borrow_orig popping up after getting rid of the legacy typecheck module.

Is it worth having a test that would have caught this?

jneem

I made it up to mod ast_cache and decided I was too tired to read it properly 😄 Will resume tomorrow

core/src/bytecode/ast/alloc.rs

core/src/cache.rs

Co-authored-by: jneem <[email protected]>

jneem

The unsafety part looks fine to me, although I have some disagreement about which aspects are the responsibility of which methods.

In general, it seems to me that all of the unsafe functions are fine as long as they always get both alloc and ast (or type_ctxt) from self. So it would be tempting to have a method like, say,

pub fn borrow_asts_mut<'ast>(&'ast mut self) -> &'ast mut HashMap<FileId, &'ast Ast<'ast>> {}

which wouldn't even need to be unsafe. The problem (as you mentioned in one of the comments) is that this borrows all of self and thus is too inflexible. But I wonder if this can be solved with a macro: have a borrow_asts_mut!(self) macro that's safe to call because it guarantees internally to take both alloc and asts from self.

If all of the helper methods could be written as macros, I think there wouldn't need to be any unsafe code when using them.

core/src/cache.rs

core/src/bytecode/ast/alloc.rs

core/src/cache.rs

core/src/eval/operation.rs

yannham · 2025-02-27T09:55:18Z

The macro approach is an interesting idea! I also don't really like the current situation where each call-site must have a Safety comment, but this comment is always the same, copy pasted many times. It doesn't feel great to have the potential unsafety at the call sites, which are numerous.

Co-authored-by: jneem <[email protected]>

yannham · 2025-02-27T10:43:20Z

Ah, in fact, I think the macro idea can't work, because a function is the only way to get a parametric lifetime over the allocator (that is, to get a named lifetime that you can use in transmute). Maybe one possibility is still to wrap borrow_xxx with a macro over self, which would be safe.

jneem · 2025-02-27T13:16:46Z

Maybe one possibility is still to wrap borrow_xxx with a macro over self, which would be safe.

Right, the macro would call some function under the hood. The whole point of it would be to ensure that self.ast is paired with self.alloc

core/src/cache.rs

yannham · 2025-02-27T14:47:12Z

The last version uses macro wrappers, which does make things nicer.

yannham · 2025-02-27T15:40:30Z

I think it's virtually good to go. I'm keeping this PR open so that the last PR (focus on NLS) can be based onto this one. Then this one will be closed, and the last one merge instead (as this one doesn't pass the CI).

yannham added 14 commits February 24, 2025 17:38

Get rid of legacy typecheck module

64f5446

Big bag of changes

7f11583

Entirely reset the LSP changes

95ba4e7

Clean-up of the cache

74e1552

Simplify further cache, get rid of unused method

72f5b5b

Add guarantee in AstAlloc doc

6cf8274

End of cache cleaning - make it compile

cc26641

Fix unnoticed regression when extracting contract_eq

81fff55

Add clone_for_eval to Caches

02e3759

Fix utils::bench

756ccdb

Fix stdlib typecheck integration test

e3eb24f

Also resolve import in walk mode

739f011

Introduce CacheHub and CacheHubSlice

6a81636

Rename slice() -> split_asts()

9d45d5a

yannham force-pushed the rfc007/typechecking-part-2-cache branch from f66eb99 to 9d45d5a Compare February 25, 2025 17:07

yannham commented Feb 26, 2025

View reviewed changes

yannham added 4 commits February 26, 2025 10:20

Reset utils/src/bench.rs to original state

197b176

Update bench helpers for new cache

6c07d84

Add missing conditional compilation check

4035fd3

Make some imports conditional on their feature

7addf34

yannham marked this pull request as ready for review February 26, 2025 09:57

yannham requested a review from jneem February 26, 2025 09:58

jneem reviewed Feb 26, 2025

View reviewed changes

yannham and others added 2 commits February 26, 2025 14:47

Update core/src/cache.rs

381c431

Co-authored-by: jneem <[email protected]>

Apply suggestions from code review

7831f7b

Co-authored-by: jneem <[email protected]>

jneem reviewed Feb 27, 2025

View reviewed changes

core/src/eval/operation.rs Outdated Show resolved Hide resolved

jneem approved these changes Feb 27, 2025

View reviewed changes

CopyTo/Move -> Clone

7faac47

yannham and others added 6 commits February 27, 2025 11:04

Update core/src/bytecode/ast/alloc.rs

c030f1d

Co-authored-by: jneem <[email protected]>

Update core/src/bytecode/ast/alloc.rs

8c419d5

Co-authored-by: jneem <[email protected]>

Make update_state return a Result instead of Option

eb82d73

Rationalize _no_cache method, get rid of unused one

730c109

Update core/src/cache.rs

84f218f

Co-authored-by: jneem <[email protected]>

CacheHubSlice -> CacheHubView

2419628

jneem reviewed Feb 27, 2025

View reviewed changes

core/src/cache.rs Show resolved Hide resolved

yannham added 3 commits February 27, 2025 15:39

Use macro for safer self-referential borrowing in cache

2822a4b

Update safety comment

15cd3b9

Undo fix typo to avoid merge conflict upstream

e69c7e2

yannham added 2 commits February 27, 2025 16:34

Apply clippy fixes

1fc62ec

Fix broken link in rust doc

94fa071

yannham mentioned this pull request Mar 3, 2025

Rfc007/typechecking part 2 nls #2182

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC007] Migration of the typechecker, part 2b - focus on the cache #2178

[RFC007] Migration of the typechecker, part 2b - focus on the cache #2178

yannham commented Feb 24, 2025 •

edited

Loading

yannham Feb 26, 2025 •

edited

Loading

jneem Feb 27, 2025

jneem left a comment

jneem left a comment

yannham commented Feb 27, 2025

yannham commented Feb 27, 2025

jneem commented Feb 27, 2025

yannham commented Feb 27, 2025

yannham commented Feb 27, 2025

[RFC007] Migration of the typechecker, part 2b - focus on the cache #2178

Are you sure you want to change the base?

[RFC007] Migration of the typechecker, part 2b - focus on the cache #2178

Conversation

yannham commented Feb 24, 2025 • edited Loading

Cache migration

Content

Cache migration design decisions

AST cache

Term cache

Import resolution

REPL

How to review

yannham Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

jneem Feb 27, 2025

Choose a reason for hiding this comment

jneem left a comment

Choose a reason for hiding this comment

jneem left a comment

Choose a reason for hiding this comment

yannham commented Feb 27, 2025

yannham commented Feb 27, 2025

jneem commented Feb 27, 2025

yannham commented Feb 27, 2025

yannham commented Feb 27, 2025

yannham commented Feb 24, 2025 •

edited

Loading

yannham Feb 26, 2025 •

edited

Loading