Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the module structure of the namespace module #6291

Open
wants to merge 106 commits into
base: master
Choose a base branch
from

Conversation

jjcnn
Copy link
Contributor

@jjcnn jjcnn commented Jul 22, 2024

Description

Fixes #5498.

This PR contains a major refactoring of the namespace module, which is responsible for keeping track of the symbol bindings in the program. The refactoring is intended to reduce memory allocation during compilation, and also fixes a number of minor bugs related to path resolution.

Advice to the reviewer:

I recommend reviewing this PR in 3 steps:

  1. Review the PR description. If something in the description is unclear, then leave a review comment, and I will try to address the issue. The code changes will be significantly easier to review once you get the overall idea of what is happening.
  2. Review the files sway-core/src/semantic_analysis/namespace/root.rs, sway-core/src/semantic_analysis/namespace/namespace.rs and sway-core/src/language/call_path.rs. These files contain the major datastructure changes, and if anything in those files needs to be addressed, then there is limited value in reviewing the rest of the code changes.
  3. Review the remaining files. These code changes are all consequences of the changes in step 2, so it shouldn't take too much time, even though there are quite a few changes.

Remaining issues

  • Address the remaining test failures.

Changes made by this PR

Representation of the dependency graph (main file: sway-core/src/semantic_analysis/namespace/root.rs)

Previous: External libraries were represented as submodules of every module in the program. This meant that every module in every external package was cloned for every module in the program, and recursively so, since external libraries could also have dependencies that required clones for every module. In particular, std has core as a dependency:

                       core:

                       "" (lib.sw)
          /         /     |         \      \	
   primitives  raw_ptr  raw_slice  slice   ...


                        std:

                          "" (lib.sw)
          /         /        |         \                      \	
       core    constants    vec     primitive_conversions    ...
    / / | \ \       |        |         |        |     \
      ...         core      core      core     b256   ...
               / / | \ \  / / | \ \ / / | \ \   | 
                 ...        ...       ...     core
                                            / / | \ \
					       ...  


                      External library:


                          "" (lib.sw)
          /         /           |              \
       core        std         submodule        ...
    / / | \ \     /   \       /       \    \
      ...       core  ...   core      std    ...
             / / | \ \   / / | \ \    /   \   
                 ...        ...     core   ... 
                                  / / | \ \
				   	       

                         Main program:


                          "" (main.sw)
          /         /           |                        \
       core        std     external library                 submodule
    / / | \ \     /   \       /       \    \               /  |       \
      ...       core  ...   core      std    submodule  core  std    external library
             / / | \ \   / / | \ \    /   \      /  \    / \  /   \      /       |     \
                 ...        ...     core   ...  ... ...  ...  core ...  core    std    ...
                                  / / | \ \                   / \        / \    / \ 
                                                              ...        ... core  ...
                                                                              / \
                                                                              ...

Changes in this PR: External libraries are now represented only at the package root. The root module is given the same name as the package so that paths beginning with the package name can be resolved as a module path. If multiple libraries depend on the same external library, then they each contain a clone (this can be resolved a at later stage by introducing a global table of libraries).

                       core:

                       core (lib.sw)
          /         /     |         \      \	
   primitives  raw_ptr  raw_slice  slice   ...


                        std:

                     std (lib.sw)    ====== externals =====>   core
                                                             / / | \ \
           /      |          |                \                 ...
      constants  vec   primitive_conversions    ...             
                           /     \
                        b256     ...


                   External library:

     external_lib (lib.sw)  === externals ===>   core
        /          \                          / / | \ \
    submodule      ...                           ...
       |
      ...                                        std    == externals => core
                                              / / | \ \              / / | \ \
                                                 ...                    ...       				   	       


                     Main program:

    main_program (main.sw)  === externals ===>  core
      /           \                          / / | \ \
  submodule      ...                            ...
      |
     ...                                        std      == externals => core
                                             / / | \ \                / / | \ \
                                                ...                      ...       				   	       

                                            extenral_lib == externals => core
                                             / / | \ \                / / | \ \
                                                ...                      ...       				   	       

                                                                         std      == externals => core  
                                                                      / / | \ \                / / | \ \
                                                                         ...                      ...       			

Reasoning: This change dramatically reduces the amount of cloning of external libraries. The cloning is not eliminated completely, since multiple libraries may have the same library as a dependency (e.g., core is used as a dependency for both the program and std, so if the program depends on std, then there will be two clones of core in the tree). The remaining cloning can be eliminated by introducing a global table of libraries, which should be doable without making significant changes to the path resolution algorithm.

Callpath representation (main file: sway-core/src/language/call_path.rs)

Previous: Paths to externals were considered relative to current module. This worked because all externals existed a submodules of every module. Paths inside the current package were ambiguous, and were either interpreted relative to current module or relative to the root module. The root module did not have a name, so the empty path was used to indicate the root module.

Now: All full paths start with the name of the package, which is also the name of the root module. If the first identifier is not the name of the current package, then we look in the externals of that package, and if the name isn't found there we throw an error. Note that the name of an external is defined in Forc.toml, and may be different from the name used inside the external library (e.g., a library may call itself X, but a project may use X as an external under the name Y). This means that any path that escapes from a library must have its package name changed.

Reasoning: This is all a conseqence of the change to the representation of the dependency graph. We need to be able to the rely on the package name in order to resolve the path within the correct pacakge. Additionally, if a path escapes from a dependency to a dependent, then the package name must be changed to what the package is called in the dependent.

CallPath.callpath_type (main file: sway-core/src/language/call_path.rs)

Previous: The CallPath struct contained a flag is_absolute, which, if set, was used to mean two different things. For parsed paths it was used to signal that the path started with ::, i.e., that the path should be resolved relative to the current package root. For paths generated during typechecking the flag was used to indicate that the path was a full path, i.e., that it should either be resolved relative to the current package root or that it was a path to an external library.

Now: is_absolute has been replaced with callpath_type, which has 3 possible values: RelativeToPackageRoot, which indicates that the path started with ::, Full, which indicates that the path starts with the package name, and Ambiguous, which indicates that the path should either be resolved relative to the current module, or that it is a full path referring to an external library.

Reasoning: A path has 3 different states, so a boolean flag is not sufficient to represent all states.

Path representation (main file: sway-core/src/language/call_path.rs)

Previous: Full paths always pointed to the module containing the declaration of the thing that the path resolves to.

Now: Full paths point to a module where the suffix is bound. This may not be the module containing the declaration of the thing the suffix is bound to, because the binding may be the result of an import. The trait map is special in that the path to the actual declaration is used as a key in the map, so for traits we use the canonical path, which point to the module where the declaration resides.

Reasoning: The previous definition of full paths required a duplication of the symbol resolution algorithm in order to construct the full path, and it was getting very difficult to maintain. In the vast majority of cases we only need the full path to indicate somewhere where the symbol can be resolved, and don't actually need to know where the declaration resides. In the few cases where we do need to know the path to the declaration (the trait map uses these paths as keys) we can generate the canonical path by first converting to the full path, and then resolving the path and using the path stored with the binding in the lexical scope, since this is the path to the actual declaration. Additionally, the previous solution was only able to generate full paths for paths where the suffix is an identifier - for other suffix types we could not generate a full path. This has now been changed so that we can generate full paths for all paths, although canonical paths are still restricted to paths with an identifier as the suffix. This also moves us closer to a situation where we can represent semantic (canonical) paths separately from syntactic callpaths.

Namespace.init removed (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)

Previous: The Namespace struct contained an init module which was used as a template for new modules, a clone of which was created every time the collection pass or the typechecker entered a submodule. This template contained all external libraries as submodules, as well as (for contract pacakges) a (new) declaration of CONTRACT_ID.

Now: The init module has been removed. When the collection pass or the typechecker enters a new submodule, Namespace creates a new Module (if none exists yet for the relevant submodule), and populates it with implicit imports (core::prelude::* and std::prelude::*, and for contract packages CONTRACT_ID).

Reasoning: The current solution makes it significantly clearer what is in a new Module object. Additionally, the declaration of CONTRACT_ID is no longer duplicated in each module - it is now declared once in the root module, and imported everywhere else.

namespace encapsulation (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)

Previous: The Namespace struct contained methods to extract the underlying datastructures in the namespace module.

Now: An attempt has been made to encapsulate the namespace module a little bit more, so that changes and lookups to the datastructures in the namespace module are done by calling methods on the Namespace struct rather than letting the caller extract the datastructures and manipulating them directly.

Reasoning: When the datastructures are manipulated in many different places it makes it very hard to track down all the places that need to be changed when making changes to the datastructure. The current solution is not perfect, but it does encapsulate things a bit better than before.

Minor changes and improvements

A few minor improvements have been made:

  • A compilation unit (core, std, etc.) is now referred to as a package. (Note: The Root struct should probably be renamed to Package for consistency)
  • The visibility check for path resolution in type_resolve.rs had to be updated because path resolution was changed. The current version is not perfect, but it does improve the faulty check we had before.
  • TyModule so far contained a clone of the Namespace struct in the state it was in when typechecking of the relevant module was done. The Namespace object was never used except for the one in the root module, so I have moved the Namespace clone to TyProgram, thus eliminating quite a bit of unnecessary cloning (the Namespace struct owns the current package Root object, and thus cloning the Namespace struct clones the entire dependency graph).

Updated tests

A number of tests have been updated for various reasons:

  • Improved or changed error messages
  • Module visibility was not checked correctly, so some tests had to have certain submodules made public.
  • In once case, core::str_slice was referenced using the path ::core::str_slice. This is invalid, but since core used to be treated as a submodule of every module in the program, the path resolved anyway. (This also shows that the invalid paths in issue Fix performance and semantic problem when creating Modules and make Module::name mandatory  #5498 are now actually treated as invalid - I have not added regression tests of that issue, since it seems unlikely that we will reintroduce the bug).

Unresolved issues

  • The Namespace struct is cloned before the symbol collection phase. This is expensive, since the Namespace struct at that point contains the entire dependency graph (minus the modules of the current package). Ideally we would like the symbol collection phase and the typechecker to use the same Namespace struct, but at the moment the Module struct is unable to perform updates from parsed to typechecked declarations, so this isn't possible. The cloning happens in sway-core/src/lib.rs and sway-core/src/semantic_analysis/namespace/contract_helpers.rs. Once we represent externals as a global table we should be able to change the Root.externals map to only use references into the global table, which will make the cloning step much cheaper.

  • contract_helpers.rs throws a regular compiler error if typechecking fails, but that should be an ICE, since it is generated code that is failing to typecheck.

  • Paths may in some cases escape from external packages to the current package. If this happens across multiple packages, then this may result in the path no longer being valid. For instance, if A depends on B, which in turn depends on C, then a path may escape from C via B into A, but since A does not have C as a direct dependency, then the path can no longer be resolved. This again can be fixed once we introduce a global table of externals.

  • The visibility check in type_resolve.rs is insufficient, because it doesn't handle enum variants and associated types correctly.

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

Copy link

codspeed-hq bot commented Sep 5, 2024

CodSpeed Performance Report

Merging #6291 will degrade performances by 10.82%

Comparing jjcnn/move-external-modules-to-root (37df6ec) with master (1192b3f)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 20 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark master jjcnn/move-external-modules-to-root Change
did_change_with_caching 503.8 ms 443.9 ms +13.51%
semantic_tokens 3.2 ms 3.6 ms -10.82%

@jjcnn jjcnn requested a review from a team as a code owner December 28, 2024 17:42

/// The set of items that represent the namespace context passed throughout type checking.
#[derive(Clone, Debug, Default)]
#[derive(Clone, Debug)]
pub struct Namespace {
Copy link
Contributor

@tritao tritao Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the need for the separate Namespace and Root types now, but probably better to leave that be for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to keep track of the current module path during the traversal. Namespace is currently responsible for that, but I guess we could do that some other way. It doesn't feel like it belongs in Root, though, since it's responsibility is to contain the module tree of the package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I agree that's an issue that's better to leave for a future PR.

forc-pkg/src/pkg.rs Outdated Show resolved Hide resolved
self.is_contract_package
}

pub fn add_span_to_root_module(&mut self, span: Span) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this single-use one liner method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced it in order to improve the encapsulation of the namespace module. Without it we need to introduce current_package_root_module_mut, and set the span on the resulting Module.

I've committed the change for review, and I guess it's closer to idiomatic Rust this way, but I have to say I dislike the solution. I prefer to call methods that cause the owner of an object to mutate the object, rather than call methods that return mutable borrows of the object. In this case it's just a single call site, but generally we end up with mutating code all over the place, and I do find it harder to maintain.

// The contents of the package being compiled.
current_package: Module,
// True if the current package is a contract, false otherwise.
is_contract_package: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about dropping this out of here?

That would simplify the constructor here, and clean contracts out of the namespace.
It does mean we'd have to do the implicit import of CONTRACT_ID out of this file, but seems like it would be worth it and clean some complexity here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm not sure how I feel about that.

I think the implicit import of CONTRACT_ID is an improvement over what we had before, where a new declaration of CONTRACT_ID was inserted into each module. We need to perform that import for each module, so we have to have is_contract_package (or logic that computes it) in the namespace module somewhere.

I suppose we could move it to the Namespace struct instead. The consequence would be that we have to pass is_contract_package through from forc-pkg down to the place where we create Namespace structs in parsed_to_ast.

Alternatively the Namespace constructor could inspect the root module to see whether CONTRACT_ID is declared there, but that feels like a bit of a hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix performance and semantic problem when creating Modules and make Module::name mandatory
2 participants