-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC007] Migration of the typechecker, part 2b - focus on the cache #2178
base: master
Are you sure you want to change the base?
Conversation
f66eb99
to
9d45d5a
Compare
@@ -161,8 +161,8 @@ fn contract_eq_bounded( | |||
let had_gas = state.use_gas(); | |||
state.use_gas(); | |||
|
|||
let closure1 = idx1.borrow(); | |||
let closure2 = idx2.borrow(); | |||
let closure1 = idx1.borrow_orig(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was lost in the original extraction of contract_eq
. I noticed it thanks to a unused warning for borrow_orig
popping up after getting rid of the legacy typecheck module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having a test that would have caught this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it up to mod ast_cache
and decided I was too tired to read it properly 😄 Will resume tomorrow
Co-authored-by: jneem <[email protected]>
Co-authored-by: jneem <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsafety part looks fine to me, although I have some disagreement about which aspects are the responsibility of which methods.
In general, it seems to me that all of the unsafe functions are fine as long as they always get both alloc
and ast
(or type_ctxt
) from self
. So it would be tempting to have a method like, say,
pub fn borrow_asts_mut<'ast>(&'ast mut self) -> &'ast mut HashMap<FileId, &'ast Ast<'ast>> {}
which wouldn't even need to be unsafe. The problem (as you mentioned in one of the comments) is that this borrows all of self
and thus is too inflexible. But I wonder if this can be solved with a macro: have a borrow_asts_mut!(self)
macro that's safe to call because it guarantees internally to take both alloc
and asts
from self
.
If all of the helper methods could be written as macros, I think there wouldn't need to be any unsafe code when using them.
The macro approach is an interesting idea! I also don't really like the current situation where each call-site must have a |
Co-authored-by: jneem <[email protected]>
Co-authored-by: jneem <[email protected]>
Co-authored-by: jneem <[email protected]>
Ah, in fact, I think the macro idea can't work, because a function is the only way to get a parametric lifetime over the allocator (that is, to get a named lifetime that you can use in |
Right, the macro would call some function under the hood. The whole point of it would be to ensure that |
The last version uses macro wrappers, which does make things nicer. |
I think it's virtually good to go. I'm keeping this PR open so that the last PR (focus on NLS) can be based onto this one. Then this one will be closed, and the last one merge instead (as this one doesn't pass the CI). |
Cache migration
This is once again a slice of #2134, this time focused on the migration of the cache to accommodate the new AST.
Note: this PR won't be merged as it is, because NLS hasn't been migrated yet, so it can't pass the CI. However, I'm checking locally that it passes all of the
nickel-lang-core
tests. It's separate for reviewing purpose.Content
bytecode::typecheck
insteadbytecode::ast::AstAlloc
andAllocable
to a new modulebytecode::ast::alloc
, which also gets aMoveTo
trait which makes it possible to deep-copy arena-allocated data from one arena to the otherA few helpers are added here and there as well.
Cache migration design decisions
In order to properly cache asts, we now need to use an allocator. One possibility would be to parametrize the whole cache with a lifetime
'ast
and ask for an additional allocator argument on methods whenever required. However, beside making the wholecache
module much heavier, the allocator will need to lie somewhere at some point, but it seems thatcache
should actually be in charge of owning the allocator: if we instantiate the allocator in, say,main
orprogram
, then it can not be freed as long as the cache is alive, although we don't need to retain memory during evaluation. If the cache owns it instead, it can take the decision of dropping the ASTs whenever it sees fit. Also, even if we store the allocator at theProgram
level, the borrowing difficulties described below will just be pushed inProgram
. The only way to avoid them is to create an allocator in the outermost function (main()
), which means, once again, that it's never freed.AST cache
Thus, we store the AST allocator together with the AST cache directly in the main cache. This has a number of issues, because this requires to write a struct with a self-referential lifetime - which isn't possible in safe Rust (without extra crates). This is encapsulated in a separate submodule,
ast_cache
, and in theAstCache
struct, which handle the unsafe parts and offer a safe interface. As we needed to split the cache, this PR split it further for more modularity and better separation of concerns.Term cache
We still need
RichTerm
s for evaluation. The simplest was to keep all the existing caches, including the term cache, and insert the AST cache at the front of the pipeline. Whenever we parse something, to avoid subtle bugs of having e.g. the term but not the AST, or the converse, we directly populate both the AST cache and the term cache. The latter still serves as the reference for the state of a given file. The AST cache is just a supplementary cache that is used when e.g. typechecking, to avoid parsing stuff again and again, but a large part of the Nickel pipeline doesn't use it.Import resolution
Import are handled differently in the old AST (
RichTerm
) and the new one (Ast
). The former has two forms, a resolved and a non-resolved one, and requires that before typechecking, unresolved imports are substituted for resolved ones in a separate phase. This is a bit artificial, and requires an additional traversal that could be avoided. The new AST, however, has just one representation corresponding to the syntax'import <something> [as 'Something]
.When migrating the typechecker, we made the choice of letting
typecheck
be the driver for import resolution. No prior import resolution phase is needed: instead, when the typechecker sees an import, it resolves it and peek at its content if needed for typechecking. The import resolver is responsible for 1. avoid reparsing of the same imports again and again, i.e cache them 2. record dependencies and reverse dependencies of files. Another motivation for this setup is to prepare for lazy imports. Indeed, the pipeline now doesn't require imports to be resolved upfront, up to the typechecker.Note that, unfortunately, import resolution on the
RichTerm
representation is still needed for evaluation. Once we've typechecked a term, we convert it to aRichTerm
and run usual program transformation, including import resolution. Though those resolutions should all hit the cache, and shouldn't need to actually reload or reparse any data. This means that we have two different resolver now, one for ASTs and one forRichTerm
s.To accommodate for the upcoming LSP changes, where NLS has a different allocator for each live file, the interface of
AstImportResolver
is a bit more restricted in that it doesn't guarantee that the imported AST has the same lifetime than the importer (which are indeed different in NLS). The typechecker thus has to move types at the import boundaries. We don't expect those types to be huge (they are either top-level signatures or inferred apparent types). If that becomes a problem, we can also offer a specialized method for the main pipeline that doesn't need to move anything (or a richer return value that can distinguish the two cases).A last change is that the typechecker now only peek at one import deep. That is, if there is a chain of imports, the apparent type isn't propagated through the whole chain anymore. It's a trade-off, because this was difficult to implement with the weaker
AstImportResolver
. Also, I don't expect many files with just animport "foo"
at the top-level, being nested, and where the users rely on the apparent type for typechecking. At worst, one can always add a simple type annotation (and apparent types are always very simple).REPL
The REPL has the following peculiarity: it needs to maintain its own typing and term environment, because of top-level let. However, since typing contexts now depend on a
'ast
lifetime, it was just impossible to mix that in a satisfactory way with the self-referential AST cache. Instead, as many things in this PR, the general solution was to move everything that depend on the AST allocator directly into the AST cache - here the typing context - and offers an interface to manipulate it (mostly insert new bindings). Similarly, thetype_of
feature requires to look into the context or other'ast
-dependent data, so it has been moved to the cache.How to review
The diff is big, but part of it can be ignored:
typecheck
module has been replaced verbatim bybytecode::ast::typecheck
. The only small difference (between administrative fixes) is the handling of imports, that is the caseNode::Import(..)
in bothwalk
andcheck
. The rest can be ignored.SourceCache
,TermCache
, etc. It's still worth taking a broad look at the new structure. On the other hand, I believeast_cache
deserve to be extensively scrutinized as it is new and concentrates theunsafe
usages.