Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop a way to create and a package format for smart contracts #550

Closed
Dominik1999 opened this issue Mar 27, 2024 · 23 comments
Closed

Develop a way to create and a package format for smart contracts #550

Dominik1999 opened this issue Mar 27, 2024 · 23 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Dominik1999
Copy link
Collaborator

Dominik1999 commented Mar 27, 2024

Feature description

Background

Currently, the Miden client only allows users to create accounts with standardized code (basic wallet and faucets) and storage. We want users to be able to create accounts with custom code and storage and deploy them on Miden using the Miden client. This is the equivalent of deploying Ethereum smart contracts, and it will widen the space for use cases as users can define their account interfaces.

We need to devise a way for users to write their own smart contracts and initialize their custom storage, which results in a .mac file that can be injected into the Miden client. See 0xPolygonMiden/miden-client#204.

We could come up with a simple Rust tool at the beginning.

We can mostly follow how we create accounts in here. To create an account, we need the following:

pub struct Account {
    id: AccountId,
    vault: AssetVault,
    storage: AccountStorage,
    code: AccountCode,
    nonce: Felt,
}

Account id

The account ID should be ground at account creation. We should notify the user that this might take some time.

Account vault and nonce

It should be clear that the vault is empty and the nonce is 0. Users can always populate the account's vault in the first transaction.

Account storage

That might be the most complex part. To initialize the AccountStorage, we need to know the type (simple Value, Map, Array) and the content of each slot. We have 255 slots. In technical terms, we need to construct a Vec<SlotItem> whereas a SlotItem consists of (u8, (StorageSlotType, Word)).

One problem is that certain slots are reserved. The basic authentication smart contract assumes that the private key is always at slot 0. So, our tool or package needs to warn the user if he occupies the slot with something else. We should also clearly document this.

Another problem is that we must ensure that the user provides the necessary data if the StorageSlotType is Map or Array.

We should also think of conflict resolution. The tool can notify a user if he wants to occupy the same slot twice.

Account code

After the user has written his MASM or Rust account code, we need to parse the code, pick the right assembler and create the AccountCode. I think this should not be an issue here.

Why is this feature needed?

We want generalized smart contracts. And we want users to write them.

@bobbinth
Copy link
Contributor

A couple of preliminary comments:

Account code

We are moving toward using MAST as the format for serializing Miden VM programs. So, the code included in the package will probably be serialized MAST. This means two things:

  • The compiler will need to output the account package and MAST code would be a part of this. This is already planned.
  • We'll need a tool that can do the same but for hand-written MASM code.

Account storage

This is indeed the trickiest part. I actually think what we are looking for is a way to define an "account template" which would consist of MASM/MAST code + storage layout that is required to support this code. This account template can then be instantiated into a specific account (at this time, account ID would also be generated), and then this account can be serialized as .mac file.

The code part should probably in the package format discussed in 0xPolygonMiden/compiler#129 (cc @bitwalker, @greenhat).

The storage part could be some metadata file describing which storage slots should be initialized and how. Just to illustrate and using JSON format, for basic wallet the storage template could look like so:

{
  "slots": [
    {
      "index": 0,
      "name": "pub_key",
      "value:" {
        "arity": 0,
        "value": "{{auth::rpo_falcon512::pub_key}}"
      }
    }
  ]
}

This basically specifies that when an account with this template is instantiated, we'll need to get a RpoFalcon512 public key from the user and put it into the first slot of account storage (the expected type of the storage slot would be a simple value with arity 0 - i.e., just a single word).

Of course, we'll actually need to define all the details of this format (what I have above is just a very basic illustration) - but I don't think that's going to be too difficult.

A more difficult question to resolve is what to do when a user combines different components into a single contract (e.g., basic wallet + something else). The main issue is potential conflicts in how different components try to access storage. The simplest solution is to just panic on conflicts. A more sophisticated one would be to try to resolve conflicts by dynamically modifying the code - but this probably best left to the compiler.

@bitwalker
Copy link
Contributor

A more difficult question to resolve is what to do when a user combines different components into a single contract (e.g., basic wallet + something else).

I think to some degree this will tie back to the fundamentals of packaging and distribution of libraries/programs. A package that contains a library which implements an account/contract built out of other, more foundational, libraries/components, is going to be doing so either by referencing code from other packages, or by generating code that integrates behavior provided by another package (e.g. basically along the lines of the difference between calling an extern function in Rust, and calling a generic function in Rust, where the generic function is monomorphized in the caller's crate). I would imagine this would have to extend to data (including config) as well.

I would recommend that we define the details of storage layout abstractly, but deterministically, much like how Rust defines how it lays out types in memory. So if you know the offset at which your storage slots start, you can compute the offset of specific storage keys by adding a zero-based index to that offset. This is akin to how one can compute the memory address of a specific struct field in a complex nested data structure, all you need to know is the offset of the data structure, and from there the rules for deriving the address of a specific field are entirely deterministic.

This would make it possible to combine together storage requirements from multiple "parent" contracts/components in a new contract, as long as they can all fit in 255 slots - each need only know (or be provided) the offset at which their storage begins. Because the derived contract knows all of the components that it is mixing together, it can compute the offset for each of those components statically, so accessing specific slots doesn't necessarily even require any runtime calculations.

Admittedly, I have no idea where we're at with regard to account storage in general (i.e. how it is specified, how it gets initialized, etc.), so maybe some of those details have already been pinned down. It'll certainly be something @greenhat will need to get deeper into pretty soon (right now we've largely been focused on surfacing the low-level primitives, but coming up with a natural idiomatic Rust way of expressing storage is something we haven't really done yet).

We probably should make accommodations in the package format for surfacing additional arbitrary custom package metadata (e.g. Rollup-specific details like whether a package provides an account or note script, what its storage requirements are, etc.). Tooling specific to the Rollup will want to be able to easily access that information, but I don't think we want to bake in Rollup-specific details in the package format, nor do I think we would want to have multiple package formats. Cargo is a good example of this, custom tools can describe metadata in Cargo.toml that is not used by Cargo itself, but this metadata is still accessible to tools querying info about a specific Cargo package via Cargo, or via crates.io, or by parsing Cargo.toml manually. I'd like to use a similar model for our packaging.

With that in mind, we'd then only really need some lightweight tooling, built on top of the more general packaging infra, that can validate how specific packages are used together (i.e. ensuring the package types are compatible, validating that there are no storage conflicts, etc.).

In any case, definitely interested to understand more of the gritty detail of account storage and how some of the interactions will work - definitely a weak spot in my knowledge of the overall system right now.

@bobbinth
Copy link
Contributor

We probably should make accommodations in the package format for surfacing additional arbitrary custom package metadata (e.g. Rollup-specific details like whether a package provides an account or note script, what its storage requirements are, etc.).

Yes - totally agreed: we should have a single package format for the VM programs - but allow users to add custom metadata to these packages (i.e., for rollup purposes).

I would recommend that we define the details of storage layout abstractly, but deterministically, much like how Rust defines how it lays out types in memory. So if you know the offset at which your storage slots start, you can compute the offset of specific storage keys by adding a zero-based index to that offset.

This could work, but has one potential downside: if the offsets are part of the code, then MAST roots of public procedures would change depending on what other components are added to the same contract. In some cases this may be fine, but in other cases, it may break things (e.g., notes would not have a stable MAST root to call on an account interface).

It would be ideal if we could somehow provide these offsets at runtime in a way that doesn't change MAST roots - but I'm not sure it is possible.

There are other solutions to this (e.g., using only storage maps and salting the keys) - but these come with their own set of pros and cons.

@bitwalker
Copy link
Contributor

This could work, but has one potential downside: if the offsets are part of the code, then MAST roots of public procedures would change depending on what other components are added to the same contract.

I'm not sure I follow how this happens - wouldn't the slot indices be fixed when publishing the contract anyway? In other words, if you are "mixing in" elements from other contracts acting as templates, you end up with code that is either unique to the derived contract (and so published with it), or code which is generic across all derivations (and so external to the derived contract and published with the base contract). In the latter case, presumably the base contract would expect to be provided with the offset at which its storage slots begin, at runtime, so that contracts which derive its functionality, but that have storage offset at potentially different slot indices, can all share the same code. In the former case, the slot indices are computed/known statically during compilation, and the code is published with those indices baked in (or, possibly also compiled to be relative to some runtime-provided base offset, if it is intended to itself be derivable).

I'm not sure if what I'm saying is applicable here or not, since I only have a rough understanding of how account storage is supposed to work, but since contracts are going to have to be compiled to MAST, whatever mechanism we come up with here, has to be something that is either known at the time the MAST is compiled (and so baked in), or is provided at runtime. Either way, the code never changes (and can't change anyway), regardless of how the contract is used downstream.

It kind of sounds like the mechanism would need to be based on runtime-provided offsets, but I'm definitely way out of my depth at this point, so I hesitate to opine too much on what makes sense here. The problem statement stood out to me as very similar to how the concept of ABIs, type layouts in memory, and how linkers allocate segments of memory to different parts of a program that are potentially linked in dynamically at runtime; so I do think there is prior art on how to approach that kind of thing when interop is in play - but it might be that there are other elements here that make it a fundamentally unique problem.

@bobbinth
Copy link
Contributor

I'm not sure I follow how this happens - wouldn't the slot indices be fixed when publishing the contract anyway?

This is correct - but let me explain what I mean on a concrete example.

Let's use our BasicWallet as an example. This is not the best example because procedures in this module do not touch the storage at all - but for the purposes of the example, let's pretend that the do.

So, we have two procedures defined as a part of BasicWallet: receive_asset and send_asset. There are standardized notes which assume that these procedures have stable MAST roots (P2ID note is a good example). That is, when we create a P2ID note, we include the MAST root of receive_asset into the script of P2ID note. So, if the MAST root of receive_asset were to change somehow, we would not be able to execute standard P2ID notes against it.

Now, let's say a user wants to create an account which combines functionality of BasicWallet with some other components. Let's assume that these components need to access storage as well. When we compile them together, may need to change storage offsets for receive_asset procedure. While this results in an internally consistent account code (i.e., there are no conflicts between different components and the provided functionality is the same as we'd expect), this could change the MAST root of receive_assets. This would render all P2ID notes sent to this account un-spendable. Or, alternatively, anyone creating P2ID notes would now need to know the specifics of this account to send notes to it (this will probably be unsustainable at scale).

So, basically, in the ideal scenario we want to achieve 2 goals:

  1. Be able to mix and match any number of components to create a single account without creating conflicts between them.
  2. Keep MAST root of exported procedures the same across different accounts - even if different accounts may use other combinations of components.

There are many ways to achieve the above - I think the tricky part is finding a way which minimizes performance overhead. Here is one approach:

By convention, we designate storage slot 0 to store offsets for all components deployed in the account. The type of this slot would be storage map (basically a key-value map with 256-bit keys and 256-bit values). Each component would be identified by some unique 256-bit ID and so, when a component is added to the account, we'd insert a key-value pair for this component into storage slot zero which would look like (component_id, storage_offset).

Then, when a component needs to access storage, it would look up its offset in this map, and then use it accordingly. For example, one of the first steps of receive_asset could be to get the storage offset and save it into memory. From that point on, any time it needs to access storage, it would look up the offset in memory and add it to the storage slot it wants to access.

This does create quite a bit of overhead though. Specifically:

  1. There is an overhead in writing this code correctly. Though, if this is handled by the compiler - it's probably not too bad.
  2. There is an overhead of reading the offset from the storage map and then applying for every storage access. Maybe this is also tolerable.

Also, there is a question of how to we guarantee that the 256-bit component IDs are indeed unique (or whether we can even guarantee this at all). So, I do wonder if there is a better alternative.

@bobbinth
Copy link
Contributor

Another approach, which is simpler in many ways but also less flexible in others:

A component which doesn't want to have any potential conflicts with other components would always use a storage map in some designated slot (let's say storage slot 0 again). This slot would always type storage map - and so, could hold practically unlimited amount of data. To avoid key collisions before inserting values into the storage map, we'd compute the new key a:

new_key = hash(key || component_id)

As long as component ID can be easily identified, this key mapping can be handled by the kernel.

@bitwalker
Copy link
Contributor

Now, let's say a user wants to create an account which combines functionality of BasicWallet with some other components. Let's assume that these components need to access storage as well. When we compile them together, may need to change storage offsets for receive_asset procedure.

I think I see where we're diverging here. My expectation would be that we would not be compiling the code of BasicWallet again, i.e. the MAST root of receive_assets remains the same (it is an external dependency). Or more precisely, compiling the components "together" here actually means compiling the source code for the derived contract, while linking against the procedures provided by the contracts it derives functionality from (e.g. BasicWallet::receive_asset). To be clear, what you are stating absolutely would have been the case where MASM is the artifact we ship, but I'm assuming two things here:

  1. We're shipping BasicWallet (all contracts really) as a package containing MAST
  2. BasicWallet is a contract that was published (presumably on-chain), and is designed to provide core low-level functionality used by higher-level contracts that in turn provide additional, richer functionality.

With those assumptions in mind, the only actual requirement imposed on BasicWallet here, is that it be written in such a way that its storage accesses are expressed in terms of relative offsets, not absolute offsets, and that the "base" offset that the others are relative to, is provided at runtime somehow. So any new contract that derives from BasicWallet, only need provide the offset at which the BasicWallet storage begins, when it calls the BasicWallet APIs (assuming there is no way to provide this configuration in some more global fashion).

If instead, BasicWallet is a library that you sort of vendor into your own project as source code (MASM in this case, but in theory it could be some other language), then yeah, each time it gets compiled, the MAST root could change (though that is true even if nothing else about the source code has changed at all, i.e. a new version of the assembler or the compiler could result in changes to the way the source code is lowered to MAST). That said, it seems to me that in this model, you literally could not bake in any assumptions about the MAST root of any of the procedures even if you wanted to (since you can't feasibly know what they are before the code is compiled). And so things like P2ID wouldn't be portable across different derivations of BasicWallet. For that reason, I'm assuming that my first interpretation is more what we're talking about here, but wanted to at least draw the comparison.

That said, assuming I'm still missing the key thing here that breaks my mental model as laid out above, what you've described as a possible solution seems sound to me. We can likely lean on the compiler to generate some of the more repetitive/annoying code as you mentioned, I don't see any reason why that wouldn't be possible as long as some pretty minimal requirements are met (i.e. that we know the storage layout somehow, that we know all of the call sites where storage is being accessed, and that at each call site we know what "component" it belongs to), and I'm pretty sure that will be the case in general.

@bitwalker
Copy link
Contributor

It does occur to me that one of the things I'm obviously forgetting to account for here are the boundaries of the system with regard to storage. When we talk about BasicWallet as a distinct contract, that kind of implicitly sounds like its APIs are going to be called, but my assumption is that storage doesn't get shared across call boundaries - and so I'm realizing that I'm definitely missing a piece of the puzzle here.

Do we have an up-to-date description/sketch of how we have been expecting storage to work up until now, and how it relates to Miden execution contexts (to the degree there is a relationship there)? I think that would probably sort me out.

@bobbinth
Copy link
Contributor

Do we have an up-to-date description/sketch of how we have been expecting storage to work up until now, and how it relates to Miden execution contexts (to the degree there is a relationship there)?

Unfortunately, there isn't a single comprehensive description of this yet (@Dominik1999 let's create an issue to add this to the docs). The basic design of the storage is described in #239 (comment) and following comments. To summarize:

  • Account storage consists of 255 slots each of each can have one of 3 types: (1) simple value (a word), (2) an addressable array of word, or (3) a map where both keys and values are words.
    • The types of each slot (collectively called "storage layout") are set at account creation and cannot be changed afterwards. There is an issue for relaxing this restriction in the future.
  • We have kernel procedures which allow account code to read/write from storage. Currently, we have implemented only the procedures for reading/writing simple values (see here and here). Also, in #521 we have implemented support for storage maps, but this hasn't been merged yet.
    • These procedures can be invoked only from one of account's procedures (this is enforced by the kernel). So, for example, a note's script cannot directly modify an account's storage.

With those assumptions in mind, the only actual requirement imposed on BasicWallet here, is that it be written in such a way that its storage accesses are expressed in terms of relative offsets, not absolute offsets, and that the "base" offset that the others are relative to, is provided at runtime somehow.

Yes, agreed. I think the tricky part here is how exactly to provide these at runtime in a way which introduces minimal overhead both in terms of performance and complexity.

There is another thing we may want to consider: not all account procedures added to the account have the same requirements. Specifically, I think we have 4 cases:

  1. A procedure does not touch account storage. This is for example the case for send_asset and receive_asset procedures of BasicWallet. This is the simplest case to consider because there is nothing extra we need to do about them (though, it may be beneficial to mark such procedures somehow - e.g., via attributes).
  2. A procedure does need to touch account storage but the MAST of such a procedure does not need to be stable because it is expected to be invoked only via transaction scripts (rather than note scripts). For example, this is the case the basic authentication procedure. This procedure does access storage, but in theory, we can pick any storage slot for it and it wouldn't really matter for anything else. Offsets for such procedures could be defined statically (though, how to do it in the context of the MAST-based package format, is not clear).
  3. A procedure needs to touch account storage and MAST for such procedure must be stable. This is the case for providing offsets at runtime.
  4. A procedure needs to touch the storage and it needs to do it statically (i.e., it always must touch the same storage slot) while maintaining a stable MAST root. In this case, we the only thing we can do is detect if two procedures need to use the same storage slot, and raise an error if the user is trying to add both procedures their account.

So far, we've been focusing on use case 3 (and use case 1 is already handled), but I think we should figure out if we want to (and are able to) support use cases 2 and 4.

@Dominik1999
Copy link
Collaborator Author

Dominik1999 commented Apr 5, 2024

Interesting discussion, indeed.

I will create an issue to explain how account storage works in the docs. I wanted to wait until we merge StorageMaps.

However, aren't we overthinking this here a bit?

So, the problem, as I understand it, lies in the combination of different contracts or procedures that have conflicting storage assumptions. And please correct me, if I am oversimplifying it.

A more difficult question to resolve is what to do when a user combines different components into a single contract (e.g., basic wallet + something else). The main issue is potential conflicts in how different components try to access storage. The simplest solution is to just panic on conflicts. A more sophisticated one would be to try to resolve conflicts by dynamically modifying the code - but this probably best left to the compiler.

  1. So, if we can identify conflicts at account creation/compilation, then we should panic. Let's do more sophisticated architecture later if needed. That shouldn't be too difficult to implement. We could use decorators for storage assumptions and analyze them. Or, we could add metadata to our contract packages or procedures (we only have 3 at the moment) and get the info like this (I think this is what @bitwalker suggested in his first answer).

  2. About adjusting dynamically, as I said, I would do that later. At the moment we have 3 contracts and only 3 storage assumptions (0 - authentication, 1 - faucet assumes token metadata, 255 - storage layout).
    But in the future, we could say, that all non miden-lib contracts, so user-defined contracts, get only one slot per contract and they must use StorageMaps. So, there is a hardcap for contract mixing / combination of 253 contracts. We can then sort all importer contracts according to their MAST root and map StorageSlots dynamically.

@bobbinth
Copy link
Contributor

bobbinth commented May 5, 2024

Revisiting this again, I think the solution is actually much simpler than what I originally thought.

First, for every storage access we'll implicitly apply an offset. So, for example, when the user executes account::get_item or account::set_item procedures, the kernel will add an offset to the specified slot index. The offset will come from the proc_root |-> offset map that the kernel maintains in its memory. Here, proc_root is the MAST root of the procedure exposed via the account's public interface and it can be looked up via caller instruction at the time when get_item or set_item instructions are called.

The process of performing the offset lookup would be conceptually similar to what is happening in the current authenticate_procedure procedure. In fact, we may want to create a separate authenticate_storage_access procedure which will look something like this:

#! Verifies that the procedure root is part of the account code Merkle tree and applies the offset
#! associated with this procedure to the provided slot index. Panics if the procedure root is not
#! part of the account code Merkle tree.
#!
#! Stack: [PROC_ROOT, slot_index]
#! Output: [PROC_ROOT, offset_slot_index]
#!
#! - PROC_ROOT is the hash of the procedure to authenticate.
export.authenticate_storage_access

One of the implications of this is that both storage reads and storage writes would need to be authenticated (currently, we authenticate writes but not reads) - but I think that's a good idea anyway.

To support the above, we'll need to modify how account code tree is built. Specifically, right now this is a Merkle tree where each leaf is a root of a procedure in the public interface of the account. We'll need to change this to (proc_root, storage_offset) tuples instead.

Separately, it may be a good idea to move away from using a Merkle tree here in favor of a sequential hash which would be "unhashed" into the kernel memory during the transaction prologue. This will substantially reduce the cost of accessing storage and authenticating procedures in general.

The overall mechanism would be that at account instantiation time (or at upgrade time) we'll set storage offsets for all procedures in the account's public interface. These offsets could be computed by the compiler as it combines different components into a single contract, or could be done manually for simpler contracts.

Overall, I think with this we achieve all desired properties:

  • Each component can be written independently without the need to consider how it will be combined with other components inside a given account.
  • MAST roots of the components remain stable (i.e., they are not affected by the presence of other components in the account).
  • The introduced overhead is minimal (probably leas than 10 extra VM cycles per storage access).

One other implication is that we should make the public account procedures accessible only via the call instruction. For this we'll need to have support for attributes implemented in the assembler - but we are going there anyway.

One the "account template" side (mentioned in #550 (comment)), the additional storage metadata associated with account code may need to look something like this:

{
  "slots": [
    {
      "index": 0,
      "name": "pub_key",
      "value:" {
        "arity": 0,
        "value": "{{auth::rpo_falcon512::pub_key}}"
      }
    }
  ],
  "offsets": {
    "0x12345...": 0,
    "0x23456...": 0,
    "0x34567...": 5,
  }
}

Where procedures with MAST roots 0x12345... and 0x23456... have the same storage offset 0, while procedure with MAST root 0x34567 has storage offset 5.

@bobbinth
Copy link
Contributor

With implementation of #667, we now have the ability to associate storage offsets with every account procedure. This will be enhanced a little with #866 - but overall, this aspect is near its final form. We still need to have a way to specify this info via MASM, but this will be addressed with procedure annotations in Miden Assembly (0xPolygonMiden/miden-vm#1434).

Taking a step back, for account package format, I think we need to have the following pieces:

  1. A way to define account code: this is covered by the current MastForest struct which we derive from MASM.
  2. A way to attach storage attributes to account procedures: this will be covered by procedure annotations, which I assume will also be added to the MastForest struct.
  3. A way to define account ABI for higher-level languages (primarily to be used by the compiler): this could be done either by adding type signatures to procedures as discussed in 0xPolygonMiden/miden-vm#1436 (comment) or by providing WIT metadata for accounts, or both.
  4. A way to specify how account storage is to be initialized whenever a new account is instantiated.

In my mind, for all of these, except for the last item, we have a pretty good idea of what needs to be done. So, it probably is a good idea to start brainstorming what we want to do with the last item as well.

Perhaps, a good way to start is to take a concrete example. Let's say we want to instantiate a fungible faucet account. For this, the storage could look roughly like this:

- slot 0 (value): reserved for internal faucet usage, cannot be set by the user.
- slot 1 (value): authentication data - e.g., public key for RpoFalcon512 signature scheme.
- slot 2 (value): faucet metadata which includes `max_supply`, `symbol`, and `decimals`.

We could described the above with something like this:

[
    {
        "type": "value",
        "value": ["0", "0", "0", "0"]
    },
    {
        "type": "value",
        "name": "pub_key",
        "value": "{{auth::rpo_falcon512::pub_key}}"
    },
    {
        "type": "value",
        "name": "metadata",
        "value": [
            "{{decimals}}",
            "{{symbol}}",
            "{{max_supply}}",
            "0"
        ]
    }
]

The main idea is that values in {{}} would need to be provided by the user - but there are some open questions about these:

  1. If we are going to present these to the user for input, we should probably add much more information for each such element. For example, we need to define what are the valid values (e.g., max_supply must be a value between $0$ and $2^{63}$), provide some basic description of what the field would be used for etc. We probably need to define something like a user_input element which can describe these attributes in more detail.
  2. What should we do about {{auth::rpo_falcon512::pub_key}}. There are a couple of things to consider here:
    a. Should we expect the user to enter their key directly? If so, we should probably define acceptable format for the encoding these.
    b. Do we want to make this extensible beyond a very narrow use case of authentication?

I think the way we want to go will probably depend on the answer to this last question. For example, one approach could be that with every account we need to provide an "initialization function" (maybe represented by a WASM program). This function would take some set of user inputs and output the AccountStorage struct (or something like the above JSON but with all {{}} replaced with actual values).

@bobbinth
Copy link
Contributor

bobbinth commented Dec 4, 2024

With the introduction of account components in #935, the approach changed slightly from what I described above. Now, AccountComponent is the primary unit of functionality which can be added to an account.

An instantiated account component currently looks like so:

pub struct AccountComponent {
    library: Library,
    storage_slots: Vec<StorageSlot>,
    supported_types: BTreeSet<AccountType>,
}

Basically, it contains the code of the component, the storage of the component, and a set of account types to which the component can be added (e.g., some components can be added only to faucet accounts).

So, a component package should contain enough information to instantiate a component (or at least to describe how one should be instantiated). One approach is to say that a component consists of 2 files:

  1. A .masl file which can be directly converted into a Library struct.
  2. A metadata file which contains all other info (structured as a .toml file).

We could also add other files to this (e.g., WIT file that could be useful for the compiler) but I'll omit them for now.

The metadata file could look something like this:

name = "MyComponent"
description = "Description of my component"
version = "0.1.0"
targets = ["MutableAccount", "ImmutableAccount", "FungibleFaucet", "NonFungibleFaucet"]

# simple value slot with the value initialized to 0
[[storage]]
name = "simple-slot"
description = "some description of what this slot is for"
slot = 0
type = "value"
value = ["0", "0", "0", "0"]

# simple value slot which requires initialization
[[storage]]
name = "simple-slot"
slot = 1
type = "value"
value = "{{foo.bar.baz}}"

# sometimes a logical value can span multiple slots; would be good to handle this somehow but also this
# could be something left for the future
[[storage]]
name = "multi-slot"
slots = [2, 3]

# a map storage slot initialized with 2 key-value pairs
[[storage]]
name = "map-slot"
slot = 4
type = "map"
values = [
    { key = "0x123", value = "0x456" },
    { key = "0x234, "value = "0x567 }
]

In the above the [[storage]] table defines the schema for the storage, but this is just an initial design and we should probably think though more methodically what the requirements for this could be.

The .masl file and the .toml file could be combined together into a single package file (maybe with .acp extension).

To instantiate an account from the above component, we'd need to provide the data for foo.bar.baz (which should be a single word). This could be done in a variety of ways, to to illustrate the approach, this could also be expressed as a .toml file:

foo.bar.baz = "0x123456"

Then, we could have a constructor that given these files can create an account component:

impl AccountComponent {
    pub fn read_from_file(
        component_package: PathBuf,
        component_data: PathBuf,
    ) -> Result<Self, AccountError> {
        ...
    }
}

@greenhat
Copy link

greenhat commented Dec 5, 2024

...

So, a component package should contain enough information to instantiate a component (or at least to describe how one should be instantiated). One approach is to say that a component consists of 2 files:

  1. A .masl file which can be directly converted into a Library struct.
  2. A metadata file which contains all other info (structured as a .toml file).

We could also add other files to this (e.g., WIT file that could be useful for the compiler) but I'll omit them for now.

...

The .masl file and the .toml file could be combined together into a single package file (maybe with .acp extension).

This sounds like the miden-package 0xPolygonMiden/miden-vm#1544 I'm porting from the compiler.
It will include an optional WIT file in the near future and could easily include a .toml file as well. We use .masp file extension for it, but it's not set in stone.

@bobbinth
Copy link
Contributor

bobbinth commented Dec 6, 2024

This sounds like the miden-package 0xPolygonMiden/miden-vm#1544 I'm porting from the compiler.
It will include an optional WIT file in the near future and could easily include a .toml file as well. We use .masp file extension for it, but it's not set in stone.

Yes, I was thinking about this too. One thing we need to decide is whether the account component package is basically the same as Miden VM package just with extra metadata or if they are different and account component package can be derived from the VM package.

For example, Miden VM package can contain either an executable or a library code. But account component package is always a library. Miden VM package could also be used to describe a note - but for notes here we might want to have a slightly different format (e.g., the metadata would be different). So, the question is whether the complier produces "generic packages" or something more specific like "account component package" or "note package". Or maybe it does both - first produces a more generic package and then this gets converted into more specific packages.

Regardless though, one of the main things in this PR is to define a format for the account storage schema. Once we have this, we can move this definition to other places, if needed.

@igamigo
Copy link
Collaborator

igamigo commented Dec 6, 2024

After reading this issue and taking a look at the approach on Miden VM, I have a couple of questions/comments:

We probably want to make a new crate for this, right? I guess an alternative could be to add it to miden-objects (under a feature flag, possibly) but at first sight it seems cleaner to have a separate crate, considering the dependencies and scope of this work.

Regarding formats, I don't think I have too many notes on @bobbinth's schema as it seems to cover all bases while looking very usable. One thing that stands out is that it seems that defining the slot indices seems unnecessary because they are arranged contiguously in the component's storage_slots anyway, but it also makes sense that we enforce adding that field (and making sure they are actually contiguous) for ease of use when inspecting the storage layout and looking at the component's code. If they were not explicitly specified in the serialized format, they could be retrieved after having loaded/compiled this template. In this sense, the type of the storage slots also seems a bit redundant since I believe we can express maps, arrays and values in TOML directly (although there would be ambiguity for words expressed as 4 elements and arrays, and also in the case of defining initial values outside the main file).

Another small thing is that we could add a library field to the component in order to allow specifying a path to the masl file that corresponds to the package, and maybe assume the same filename if the field is not present. Lastly, not sure if it's worth adding some sort of component ID, where the account template name would be more UI-oriented and the component ID would be more systems-oriented (for things like account template repos, detecting more than one component for the same account, etc.), unless this was the idea for name in the first place.

@bobbinth
Copy link
Contributor

bobbinth commented Dec 7, 2024

We probably want to make a new crate for this, right? I guess an alternative could be to add it to miden-objects (under a feature flag, possibly) but at first sight it seems cleaner to have a separate crate, considering the dependencies and scope of this work.

I was thinking this would be in miden-objects as I didn't think there would actually be a lot of new dependencies (i.e., the ones that came to mind is parsing/writing .toml files which would be gated by the existing std feature) - but it is also possible I didn't think it far enough. Overall, I don't mind creating another crate (not sure what to call it yet though). I'm guessing this new crate will depend on miden-base, right?

Regarding formats, I don't think I have too many notes on @bobbinth's schema as it seems to cover all bases while looking very usable.

I think there are still a few open questions for me here:

  • How do we set the value property for value slots? I have 2 ways in my example - but should there be more? Specifically, we could have the following options:
    • Should we support full word initialization or element-by-element initialization (i.e., 0x0 vs. [0, 0, 0, 0]) or both?
    • Similar question but for when the initial values are to be provided later on.
    • And if we support element-by-element initialization, can these be missed and matched? For example [0, {{foo}}, 1 {{bar}}].
  • What are the options to initialize map? I have one example, but similar to the above we can have "static" keys and values, or static keys and dynamic values, or both keys and values could be dynamic.

One thing that stands out is that it seems that defining the slot indices seems unnecessary because they are arranged contiguously in the component's storage_slots anyway, but it also makes sense that we enforce adding that field (and making sure they are actually contiguous) for ease of use when inspecting the storage layout and looking at the component's code.

Initially that's how I thought about them as well, but if we do want to add support for entries that span multiple slots (e.g., a public key that requires 64 bytes) we might have to specify slots indexes explicitly.

In this sense, the type of the storage slots also seems a bit redundant since I believe we can express maps, arrays and values in TOML directly (although there would be ambiguity for words expressed as 4 elements and arrays, and also in the case of defining initial values outside the main file).

I generally agree with the approach not to have redundant info (e.g., if it is clear form the value field what type the slot is, we shouldn't have type as a separate field). But I'd like to see how different variations here would look like.

Another small thing is that we could add a library field to the component in order to allow specifying a path to the masl file that corresponds to the package, and maybe assume the same filename if the field is not present.

I've assumed that we'd actually "merge" the .masl file and .toml file into a single file (e.g., by zipping them together) so that the package is always just a single file. But maybe that complicates things?

Lastly, not sure if it's worth adding some sort of component ID, where the account template name would be more UI-oriented and the component ID would be more systems-oriented (for things like account template repos, detecting more than one component for the same account, etc.), unless this was the idea for name in the first place.

I was thinking name would be the unique identifier (or name + version), but it may be a good idea to have something more semantically binding. One option could be to use Library::digest() for this.

@igamigo
Copy link
Collaborator

igamigo commented Dec 9, 2024

I think there are still a few open questions for me here:

  • How do we set the value property for value slots? I have 2 ways in my example - but should there be more? Specifically, we could have the following options:

    • Should we support full word initialization or element-by-element initialization (i.e., 0x0 vs. [0, 0, 0, 0]) or both?
    • Similar question but for when the initial values are to be provided later on.
    • And if we support element-by-element initialization, can these be missed and matched? For example [0, {{foo}}, 1 {{bar}}].
  • What are the options to initialize map? I have one example, but similar to the above we can have "static" keys and values, or static keys and dynamic values, or both keys and values could be dynamic.

Regarding these, I think it would be best to be as flexible as possible. Basically, I don't see any significant downside to supporting most (if not all) the variations you mentioned, as most of them represent a valid scenario, at least at first glance: For values intended as single words, simple hex notation is easier. For cases that require partial dynamic substitution, an array-like syntax is beneficial. Scenarios where you are matching against a subset of elements make sense for things such as assets where you are trying to represent something like [faucet_id, 0, 0, {{amount}} ].
For maps, it also makes sense to make values dynamic. I don't think I could come up with a good example of a key being dynamic, but it's also not too hard to imagine something where it could be useful (and also, it wouldn't be too simple to work around if we didn't allow it).

One possible downside is that this might make parsing a bit more complex, but I don't see this as a blocker (unless it's much more complex than I am originally thinking).

Initially that's how I thought about them as well, but if we do want to add support for entries that span multiple slots (e.g., a public key that requires 64 bytes) we might have to specify slots indexes explicitly.

I think this could still be supported without making the slot indices explicit, but rather specifying the length of the storage section that the logical slot will use, and assuming a length of one word otherwise (in your example, instead of slots = [2, 3], there'd be something like length = 2 or similar, and we'd remove the slot field elsewhere). However, I think I'd still favor explicitly setting absolute (to the component) slot indices as they make it easier to correlate to the component's code.

@greenhat
Copy link

Another small thing is that we could add a library field to the component in order to allow specifying a path to the masl file that corresponds to the package, and maybe assume the same filename if the field is not present.

I've assumed that we'd actually "merge" the .masl file and .toml file into a single file (e.g., by zipping them together) so that the package is always just a single file. But maybe that complicates things?

I'd rather keep everything in a single file. Since we plan to have a repository for Miden packages, any path might be either local or remote, which complicates things.

@bobbinth
Copy link
Contributor

I'd rather keep everything in a single file. Since we plan to have a repository for Miden packages, any path might be either local or remote, which complicates things.

Do you mean to keep the contents of the .toml file in the .masl file?

@greenhat
Copy link

I'd rather keep everything in a single file. Since we plan to have a repository for Miden packages, any path might be either local or remote, which complicates things.

Do you mean to keep the contents of the .toml file in the .masl file?

Oh, no. I agreed with you, i.e, merging .masl and .toml files into a single file (package). I meant that in a package we should refrain from using a local path to refer to any of its "dependencies" like a .masl file, but rather keep all the package's contents in a single file. This way, we can easily distribute the package as a single file and not worry about paths (local/remote).

@igamigo
Copy link
Collaborator

igamigo commented Dec 10, 2024

I agree with that as well. What I meant was to keep the single .acp package file, but also have the library file be defined as part of the toml, so when packaging the templates, the file to use would be indicated within the toml itself. But I also don't mind if they are fully independent pre-packaging.

@bobbinth
Copy link
Contributor

bobbinth commented Jan 9, 2025

Closed by #1015.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

5 participants