Refactor the module structure of the namespace module #6291

jjcnn · 2024-07-22T16:07:45Z

Description

This PR contains a major refactoring of the namespace module, which is responsible for keeping track of the symbol bindings in the program. The refactoring is intended to reduce memory allocation during compilation, and also fixes a number of minor bugs related to path resolution.

Advice to the reviewer:

I recommend reviewing this PR in 3 steps:

Review the PR description. If something in the description is unclear, then leave a review comment, and I will try to address the issue. The code changes will be significantly easier to review once you get the overall idea of what is happening.
Review the files sway-core/src/semantic_analysis/namespace/root.rs, sway-core/src/semantic_analysis/namespace/namespace.rs and sway-core/src/language/call_path.rs. These files contain the major datastructure changes, and if anything in those files needs to be addressed, then there is limited value in reviewing the rest of the code changes.
Review the remaining files. These code changes are all consequences of the changes in step 2, so it shouldn't take too much time, even though there are quite a few changes.

Remaining issues

Address the remaining test failures.

Changes made by this PR

Representation of the dependency graph (main file: sway-core/src/semantic_analysis/namespace/root.rs)

Previous: External libraries were represented as submodules of every module in the program. This meant that every module in every external package was cloned for every module in the program, and recursively so, since external libraries could also have dependencies that required clones for every module. In particular, std has core as a dependency:

                       core:

                       "" (lib.sw)
          /         /     |         \      \	
   primitives  raw_ptr  raw_slice  slice   ...


                        std:

                          "" (lib.sw)
          /         /        |         \                      \	
       core    constants    vec     primitive_conversions    ...
    / / | \ \       |        |         |        |     \
      ...         core      core      core     b256   ...
               / / | \ \  / / | \ \ / / | \ \   | 
                 ...        ...       ...     core
                                            / / | \ \
					       ...  


                      External library:


                          "" (lib.sw)
          /         /           |              \
       core        std         submodule        ...
    / / | \ \     /   \       /       \    \
      ...       core  ...   core      std    ...
             / / | \ \   / / | \ \    /   \   
                 ...        ...     core   ... 
                                  / / | \ \
				   	       

                         Main program:


                          "" (main.sw)
          /         /           |                        \
       core        std     external library                 submodule
    / / | \ \     /   \       /       \    \               /  |       \
      ...       core  ...   core      std    submodule  core  std    external library
             / / | \ \   / / | \ \    /   \      /  \    / \  /   \      /       |     \
                 ...        ...     core   ...  ... ...  ...  core ...  core    std    ...
                                  / / | \ \                   / \        / \    / \ 
                                                              ...        ... core  ...
                                                                              / \
                                                                              ...

Changes in this PR: External libraries are now represented only at the package root. The root module is given the same name as the package so that paths beginning with the package name can be resolved as a module path. If multiple libraries depend on the same external library, then they each contain a clone (this can be resolved a at later stage by introducing a global table of libraries).

                       core:

                       core (lib.sw)
          /         /     |         \      \	
   primitives  raw_ptr  raw_slice  slice   ...


                        std:

                     std (lib.sw)    ====== externals =====>   core
                                                             / / | \ \
           /      |          |                \                 ...
      constants  vec   primitive_conversions    ...             
                           /     \
                        b256     ...


                   External library:

     external_lib (lib.sw)  === externals ===>   core
        /          \                          / / | \ \
    submodule      ...                           ...
       |
      ...                                        std    == externals => core
                                              / / | \ \              / / | \ \
                                                 ...                    ...       				   	       


                     Main program:

    main_program (main.sw)  === externals ===>  core
      /           \                          / / | \ \
  submodule      ...                            ...
      |
     ...                                        std      == externals => core
                                             / / | \ \                / / | \ \
                                                ...                      ...       				   	       

                                            extenral_lib == externals => core
                                             / / | \ \                / / | \ \
                                                ...                      ...       				   	       

                                                                         std      == externals => core  
                                                                      / / | \ \                / / | \ \
                                                                         ...                      ...

Reasoning: This change dramatically reduces the amount of cloning of external libraries. The cloning is not eliminated completely, since multiple libraries may have the same library as a dependency (e.g., core is used as a dependency for both the program and std, so if the program depends on std, then there will be two clones of core in the tree). The remaining cloning can be eliminated by introducing a global table of libraries, which should be doable without making significant changes to the path resolution algorithm.

Callpath representation (main file: sway-core/src/language/call_path.rs)

Previous: Paths to externals were considered relative to current module. This worked because all externals existed a submodules of every module. Paths inside the current package were ambiguous, and were either interpreted relative to current module or relative to the root module. The root module did not have a name, so the empty path was used to indicate the root module.

Now: All full paths start with the name of the package, which is also the name of the root module. If the first identifier is not the name of the current package, then we look in the externals of that package, and if the name isn't found there we throw an error. Note that the name of an external is defined in Forc.toml, and may be different from the name used inside the external library (e.g., a library may call itself X, but a project may use X as an external under the name Y). This means that any path that escapes from a library must have its package name changed.

Reasoning: This is all a conseqence of the change to the representation of the dependency graph. We need to be able to the rely on the package name in order to resolve the path within the correct pacakge. Additionally, if a path escapes from a dependency to a dependent, then the package name must be changed to what the package is called in the dependent.

CallPath.callpath_type (main file: sway-core/src/language/call_path.rs)

Previous: The CallPath struct contained a flag is_absolute, which, if set, was used to mean two different things. For parsed paths it was used to signal that the path started with ::, i.e., that the path should be resolved relative to the current package root. For paths generated during typechecking the flag was used to indicate that the path was a full path, i.e., that it should either be resolved relative to the current package root or that it was a path to an external library.

Now: is_absolute has been replaced with callpath_type, which has 3 possible values: RelativeToPackageRoot, which indicates that the path started with ::, Full, which indicates that the path starts with the package name, and Ambiguous, which indicates that the path should either be resolved relative to the current module, or that it is a full path referring to an external library.

Reasoning: A path has 3 different states, so a boolean flag is not sufficient to represent all states.

Path representation (main file: sway-core/src/language/call_path.rs)

Previous: Full paths always pointed to the module containing the declaration of the thing that the path resolves to.

Now: Full paths point to a module where the suffix is bound. This may not be the module containing the declaration of the thing the suffix is bound to, because the binding may be the result of an import. The trait map is special in that the path to the actual declaration is used as a key in the map, so for traits we use the canonical path, which point to the module where the declaration resides.

Reasoning: The previous definition of full paths required a duplication of the symbol resolution algorithm in order to construct the full path, and it was getting very difficult to maintain. In the vast majority of cases we only need the full path to indicate somewhere where the symbol can be resolved, and don't actually need to know where the declaration resides. In the few cases where we do need to know the path to the declaration (the trait map uses these paths as keys) we can generate the canonical path by first converting to the full path, and then resolving the path and using the path stored with the binding in the lexical scope, since this is the path to the actual declaration. Additionally, the previous solution was only able to generate full paths for paths where the suffix is an identifier - for other suffix types we could not generate a full path. This has now been changed so that we can generate full paths for all paths, although canonical paths are still restricted to paths with an identifier as the suffix. This also moves us closer to a situation where we can represent semantic (canonical) paths separately from syntactic callpaths.

Namespace.init removed (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)

Previous: The Namespace struct contained an init module which was used as a template for new modules, a clone of which was created every time the collection pass or the typechecker entered a submodule. This template contained all external libraries as submodules, as well as (for contract pacakges) a (new) declaration of CONTRACT_ID.

Now: The init module has been removed. When the collection pass or the typechecker enters a new submodule, Namespace creates a new Module (if none exists yet for the relevant submodule), and populates it with implicit imports (core::prelude::* and std::prelude::*, and for contract packages CONTRACT_ID).

Reasoning: The current solution makes it significantly clearer what is in a new Module object. Additionally, the declaration of CONTRACT_ID is no longer duplicated in each module - it is now declared once in the root module, and imported everywhere else.

namespace encapsulation (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)

Previous: The Namespace struct contained methods to extract the underlying datastructures in the namespace module.

Now: An attempt has been made to encapsulate the namespace module a little bit more, so that changes and lookups to the datastructures in the namespace module are done by calling methods on the Namespace struct rather than letting the caller extract the datastructures and manipulating them directly.

Reasoning: When the datastructures are manipulated in many different places it makes it very hard to track down all the places that need to be changed when making changes to the datastructure. The current solution is not perfect, but it does encapsulate things a bit better than before.

Minor changes and improvements

A few minor improvements have been made:

A compilation unit (core, std, etc.) is now referred to as a package. (Note: The Root struct should probably be renamed to Package for consistency)
The visibility check for path resolution in type_resolve.rs had to be updated because path resolution was changed. The current version is not perfect, but it does improve the faulty check we had before.
TyModule so far contained a clone of the Namespace struct in the state it was in when typechecking of the relevant module was done. The Namespace object was never used except for the one in the root module, so I have moved the Namespace clone to TyProgram, thus eliminating quite a bit of unnecessary cloning (the Namespace struct owns the current package Root object, and thus cloning the Namespace struct clones the entire dependency graph).

Updated tests

A number of tests have been updated for various reasons:

Improved or changed error messages
Module visibility was not checked correctly, so some tests had to have certain submodules made public.
In once case, core::str_slice was referenced using the path ::core::str_slice. This is invalid, but since core used to be treated as a submodule of every module in the program, the path resolved anyway. (This also shows that the invalid paths in issue Fix performance and semantic problem when creating Modules and make Module::name mandatory #5498 are now actually treated as invalid - I have not added regression tests of that issue, since it seems unlikely that we will reintroduce the bug).

Unresolved issues

The Namespace struct is cloned before the symbol collection phase. This is expensive, since the Namespace struct at that point contains the entire dependency graph (minus the modules of the current package). Ideally we would like the symbol collection phase and the typechecker to use the same Namespace struct, but at the moment the Module struct is unable to perform updates from parsed to typechecked declarations, so this isn't possible. The cloning happens in sway-core/src/lib.rs and sway-core/src/semantic_analysis/namespace/contract_helpers.rs. Once we represent externals as a global table we should be able to change the Root.externals map to only use references into the global table, which will make the cloning step much cheaper.
contract_helpers.rs throws a regular compiler error if typechecking fails, but that should be an ICE, since it is generated code that is failing to typecheck.
Paths may in some cases escape from external packages to the current package. If this happens across multiple packages, then this may result in the path no longer being valid. For instance, if A depends on B, which in turn depends on C, then a path may escape from C via B into A, but since A does not have C as a direct dependency, then the path can no longer be resolved. This again can be fixed once we introduce a global table of externals.
The visibility check in type_resolve.rs is insufficient, because it doesn't handle enum variants and associated types correctly.

Checklist

I have linked to any relevant issues.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation where relevant (API docs, the reference, and the Sway book).
- If my change requires substantial documentation changes, I have requested support from the DevRel team
I have added tests that prove my fix is effective or that my feature works.
I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
I have requested a review from the relevant team or maintainers.

…ate-namespace-module

…ernal-modules-to-root

…aths correctly

codspeed-hq · 2024-09-05T19:07:33Z

CodSpeed Performance Report

Merging #6291 will degrade performances by 10.82%

_{Comparing jjcnn/move-external-modules-to-root (37df6ec) with master (1192b3f)}

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 20 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`master`	`jjcnn/move-external-modules-to-root`	Change
⚡	`did_change_with_caching`	503.8 ms	443.9 ms	+13.51%
❌	`semantic_tokens`	3.2 ms	3.6 ms	-10.82%

…package_root (for UseStatement)

tritao · 2025-01-08T17:33:34Z

sway-core/src/semantic_analysis/namespace/namespace.rs


 /// The set of items that represent the namespace context passed throughout type checking.
-#[derive(Clone, Debug, Default)]
+#[derive(Clone, Debug)]
 pub struct Namespace {


Not sure about the need for the separate Namespace and Root types now, but probably better to leave that be for now.

We need a way to keep track of the current module path during the traversal. Namespace is currently responsible for that, but I guess we could do that some other way. It doesn't feel like it belongs in Root, though, since it's responsibility is to contain the module tree of the package.

But I agree that's an issue that's better to leave for a future PR.

sway-core/src/abi_generation/fuel_abi.rs

forc-pkg/src/pkg.rs

tritao · 2025-01-08T17:44:37Z

sway-core/src/semantic_analysis/namespace/root.rs

+        self.is_contract_package
+    }
+
+    pub fn add_span_to_root_module(&mut self, span: Span) {


Do we really need this single-use one liner method?

I introduced it in order to improve the encapsulation of the namespace module. Without it we need to introduce current_package_root_module_mut, and set the span on the resulting Module.

I've committed the change for review, and I guess it's closer to idiomatic Rust this way, but I have to say I dislike the solution. I prefer to call methods that cause the owner of an object to mutate the object, rather than call methods that return mutable borrows of the object. In this case it's just a single call site, but generally we end up with mutating code all over the place, and I do find it harder to maintain.

sway-core/src/ir_generation/compile.rs

tritao · 2025-01-08T17:58:51Z

sway-core/src/semantic_analysis/namespace/root.rs

+    // The contents of the package being compiled.
+    current_package: Module,
+    // True if the current package is a contract, false otherwise.
+    is_contract_package: bool,


What do you think about dropping this out of here?

That would simplify the constructor here, and clean contracts out of the namespace.
It does mean we'd have to do the implicit import of CONTRACT_ID out of this file, but seems like it would be worth it and clean some complexity here.

Hmm... I'm not sure how I feel about that.

I think the implicit import of CONTRACT_ID is an improvement over what we had before, where a new declaration of CONTRACT_ID was inserted into each module. We need to perform that import for each module, so we have to have is_contract_package (or logic that computes it) in the namespace module somewhere.

I suppose we could move it to the Namespace struct instead. The consequence would be that we have to pass is_contract_package through from forc-pkg down to the place where we create Namespace structs in parsed_to_ast.

Alternatively the Namespace constructor could inspect the root module to see whether CONTRACT_ID is declared there, but that feels like a bit of a hack.

sway-core/src/semantic_analysis/ast_node/declaration/abi.rs

Co-authored-by: João Matos <[email protected]>

…ernal-modules-to-root

…Labs/sway into jjcnn/move-external-modules-to-root

jjcnn and others added 26 commits July 15, 2024 23:13

Make module name mandatory and private

8aa2e55

Merge branch 'master' of github.com:FuelLabs/sway into jjcnn/encapsul…

aa9fec4

…ate-namespace-module

Encapsulate visiblity and span fiels of Module

158dfd9

Merge branch 'master' into jjcnn/encapsulate-namespace-module

9c301f2

Fix sway-core tests

b87dcd1

clippy; fmt

329fecb

Save game

19ecf89

Rebase

d7061c1

Namespace module builds, interface defined

960b2b5

Change to new namespace interface

74ca425

Merge branch 'master' of github.com:FuelLabs/sway into jjcnn/move-ext…

3f802bb

…ernal-modules-to-root

Change externals to be packages rather than modules

428996a

forc-pkg and sway-lsp usage of namespace module updated

1b06988

Updated test harness

fcf2644

Merge branch 'master' of github.com:FuelLabs/sway into jjcnn/move-ext…

20863df

…ernal-modules-to-root

Simplify compilation of core during testing

d463cf8

Ensure the preludes are imported into all modules

93aab4d

Root module is not visited before its submodules

a63db83

Remove SubmoduleNamespace

3fc997f

Rename is_absolute to is_relative_to_package_root, convert relative p…

a76a1ee

…aths correctly

Pop submodule when done typechecking it

019c089

Fix various path issues

3a45380

Fix require_module

e6bbc60

Fix resolution of ambiguous paths

4f80b55

Merge

a17290e

Changed CallPath.is_absolute to CallPath.callpath_type

c028199

JoshuaBatty assigned jjcnn Aug 22, 2024

jjcnn added 2 commits September 3, 2024 20:08

Merge

cacbd4e

Fix path in shadow check in asm blocks

51a673c

jjcnn requested a review from a team as a code owner December 28, 2024 17:42

jjcnn added 7 commits January 2, 2025 16:20

Fix use of single-identifier paths referring to an enum variant

b2aef5d

Fix unit tests

003ed9e

Change is_absolute to callpath_type (for CallPath) or is_relative_to_…

74a027c

…package_root (for UseStatement)

Only check for variant star imports if the path is long enough

4392fa8

Workaround to avoid CI aborts on warning

b328e1b

Fixed constant collection for storage declarations

830275a

Move namespace from TyModule to TyProgram

dc8faed

tritao reviewed Jan 8, 2025

View reviewed changes

sway-core/src/abi_generation/fuel_abi.rs Outdated Show resolved Hide resolved

tritao reviewed Jan 8, 2025

View reviewed changes

forc-pkg/src/pkg.rs Outdated Show resolved Hide resolved

tritao reviewed Jan 8, 2025

View reviewed changes

sway-core/src/ir_generation/compile.rs Outdated Show resolved Hide resolved

tritao reviewed Jan 8, 2025

View reviewed changes

sway-core/src/semantic_analysis/ast_node/declaration/abi.rs Outdated Show resolved Hide resolved

jjcnn and others added 15 commits January 8, 2025 19:39

Simplify AbiContext::to_str_context

e6cf7d0

Improve error message when library dependency isn't found

e58ec1b

Co-authored-by: João Matos <[email protected]>

Eliminate Root::add_span_to_root_module

810a877

Simplify compilation of constants

2bc81ce

Fixed duplicate cloning of current mod path

168d8d3

Merge branch 'master' into jjcnn/move-external-modules-to-root

cec5bd3

Consume error iterator when constant compilation results in errors

73142ca

Merge branch 'master' into jjcnn/move-external-modules-to-root

9c39636

Merge branch 'master' into jjcnn/move-external-modules-to-root

22f79d7

Fix is_module check in type_check_delineated_path

bbf99af

Visibility check for paths should take reexports into account

eb2dc06

Merge branch 'master' into jjcnn/move-external-modules-to-root

c8fdd02

Fix test after improved visibility check

75f407c

Merge branch 'master' of github.com:FuelLabs/sway into jjcnn/move-ext…

de64127

…ernal-modules-to-root

Merge branch 'jjcnn/move-external-modules-to-root' of github.com:Fuel…

37df6ec

…Labs/sway into jjcnn/move-external-modules-to-root

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the module structure of the namespace module #6291

Refactor the module structure of the namespace module #6291

jjcnn commented Jul 22, 2024 •

edited

Loading

codspeed-hq bot commented Sep 5, 2024 •

edited

Loading

tritao Jan 8, 2025 •

edited

Loading

jjcnn Jan 8, 2025

jjcnn Jan 8, 2025

tritao Jan 8, 2025

jjcnn Jan 8, 2025

tritao Jan 8, 2025

jjcnn Jan 8, 2025

Refactor the module structure of the namespace module #6291

Are you sure you want to change the base?

Refactor the module structure of the namespace module #6291

Conversation

jjcnn commented Jul 22, 2024 • edited Loading

Description

Advice to the reviewer:

Remaining issues

Changes made by this PR

Minor changes and improvements

Updated tests

Unresolved issues

Checklist

codspeed-hq bot commented Sep 5, 2024 • edited Loading

CodSpeed Performance Report

Merging #6291 will degrade performances by 10.82%

Summary

Benchmarks breakdown

tritao Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

jjcnn Jan 8, 2025

Choose a reason for hiding this comment

jjcnn Jan 8, 2025

Choose a reason for hiding this comment

tritao Jan 8, 2025

Choose a reason for hiding this comment

jjcnn Jan 8, 2025

Choose a reason for hiding this comment

tritao Jan 8, 2025

Choose a reason for hiding this comment

jjcnn Jan 8, 2025

Choose a reason for hiding this comment

jjcnn commented Jul 22, 2024 •

edited

Loading

codspeed-hq bot commented Sep 5, 2024 •

edited

Loading

tritao Jan 8, 2025 •

edited

Loading