Arbitrary nesting of heap types in contract returns #944

segfault-magnet · 2023-04-26T18:32:07Z

segfault-magnet
Apr 26, 2023
Collaborator

@hal3e and I have been discussing something, details to follow:

Currently, the Rust SDK cannot handle a nested heap type (e.g. -> Option<Vec<u64>>) in the return type of a contract/script. The limitation stems from the fact that returning heap types only returns pointers which are useless without the VM's memory.

A partial workaround was implemented where we would inject some extra bytecode after the contract call. The injected bytecode would generate a ReturnData receipt containing the previously inaccessible heap data. This allowed for returning a single non-nested heap type (e.g. -> Vec<u64>). The injected bytecode is currently incapable of handling anything more complex.

We suggest taking this to the extreme and supporting arbitrarily nested heap types.

There are three main problems we need to solve: structs, enums, and heap types nested in other heap types. Let's start with structs:

Structs

Let's consider the following example:

struct AnotherStruct {
  a: u64,
  b: Vec<u64>
}

struct SomeStruct { 
  a: u64,
  b: u64,
  c: AnotherStruct
}
// ... some contract method below
fn foo() -> SomeStruct {
    SomeStruct {
        a: 10,
        b: 11,
        c: AnotherStruct {
            a: 12,
            b: vec![13, 14, 15],
        },
    }
}

Encoded it looks something like this

In order to support this we would need to inject a retd instruction that will yank the missing heap data into a separate receipt.

The only difference between this and our current workaround is that the vector pointer isn't encoded immediately at the start. We would need to calculate this offset and add it to the address in the RET register in order to yank the heap data.

This can be done easily since we already know the exact size of every part of the struct.

In this case, it doesn't matter how deep the vector is inside nested structs, a single retd with a correct offset can get its data.

Enums

Let's consider the following example:

struct WrappingStruct {
    a: u64,
    b: Vec<u64>,
}

enum TheEnum {
    A(u64),
    B(WrappingStruct),
}

fn foo() -> TheEnum {
    TheEnum::B(WrappingStruct {
        a: 10,
        b: vec![11, 12, 13],
    })
}

With enums, we need to inject bytecode that will branch depending on the enum discriminant. Variants that don't contain heap types don't need to be considered, e.g. our example will only check if the enum carries the second variant.

After the variant is determined, the enum data can be used to generate the additional receipt data as before.

Heap types in other heap types

struct WrappingStruct {
    a: u64,
    b: Vec<u64>,
}

struct ParentStruct {
    a: u64,
    b: Vec<WrappingStruct>,
}

fn foo() -> ParentStruct {
    ParentStruct {
        a: 9,
        b: vec![
            WrappingStruct {
                a: 10,
                b: vec![11, 12, 13],
            },
            WrappingStruct {
                a: 14,
                b: vec![15, 16, 17],
            },
        ],
    }
}

In order to get all the necessary data we'd need to issue 3 retd instructions.

Notice that this issue is recursive in nature. Once we issue the first and simplest retd we're right back to the same problem only now the starting address has been moved along.

After collecting the receipts

Due to the structured and deterministic way we've approached walking the type tree (a post-order transversal) we can now use the extra receipts to decode the return type.

Since the abi encoder also does post-order transversing we can adapt it to accept a stack of receipts, popping one each time a heap type is to be decoded.

Or we can merge all the receipts into one, taking care to update the pointers to point to their respective data. Decoding would then be trivial as though we had the VMs memory loaded.

But that is an implementation detail, not that relevant right now. The point is, decoding is possible.

Other technical challenges also exist, such as registry management, but nothing unsolvable as far as we can see.

Considerations

The indexer
They will not be able to use this every time, just as they weren't able to use the current injection approach. If the contract wasn't called directly through the SDK then the bytecode was never injected and the additional receipts were never generated.
They could handle returning raw or typed slices but nested heap types would not be possible then.
Logging
Logging would not be supported since we cannot inject bytecode at the appropriate places. Logging heap types will only be possible through typed slices without the support of nested heap types.
Script support
We need to investigate the possibility of injecting the bytecode at the end of user-provided scripts. If it transpires that we'll always have a deterministic way of reaching the script return data, then the approach will be the same as for contracts.

segfault-magnet · 2023-05-08T14:20:53Z

segfault-magnet
May 8, 2023
Collaborator Author

The following is a brief summary of a meeting held between @hal3e, @iqdecay and @segfault-magnet.

Is there any possibility for the compiler to always generate the needed extra receipts so that we don't have to inject retd instructions?

Will this cause unacceptable bloat on the blockchain? It will probably be unavoidable compiler-wise for the extra receipts to be generated even when they are not needed, e.g. in the case of a script calling a contract.

If the bloat is unacceptable, can we then investigate other possibilities, such as a sway decorator attribute to heap-type returning functions which will produce an additional version of the function that will spawn the receipts?

e.g.

#[generate_extra_receipts]
fn i_return_heap_types() -> Vec<Vec<u64>>;

which results in two functions, a normal one and one that logs the necessary heap data:

#[generate_extra_receipts]
fn i_return_heap_types() -> Vec<Vec<u64>>;
fn i_return_heap_types_extra() -> Vec<Vec<u64>>;

Scripts called without the need for the SDK could then call the fn without the overhead.

Is there any way we can support logging nested heap types? What about the above-mentioned considerations?

What about returning nested heap types in scripts? The previously explained approach would have difficulties working for user-provided scripts as the last instruction in scripts terminates the execution of the script without giving us a chance to do bytecode injection.

If we don't receive native support for returning nested heap types, is there a way to execute the user-provided script so that we may inject the retd instructions after the last script instruction?

Ideally, we would wish to avoid modifying the user script in any way, save for extending with the retd instructions.

0 replies

IGI-111 · 2023-05-18T19:25:44Z

IGI-111
May 18, 2023
Collaborator

I think producing the receipts in every case is unacceptable, but so is asking the user to figure out this esoteric an annotation for what should be normal behavior.

What we're missing here to be able to do this is from the compiler side is at least a way to know what is and isn't a heap type. Most probably through tracking alocs, having some sort of Box type or using the typed pointers we're building.
This is doable but it's not trivial and we should probably do an RFC for it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arbitrary nesting of heap types in contract returns #944

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Arbitrary nesting of heap types in contract returns #944

segfault-magnet Apr 26, 2023 Collaborator

Structs

Enums

Heap types in other heap types

After collecting the receipts

Considerations

Replies: 2 comments

segfault-magnet May 8, 2023 Collaborator Author

The following is a brief summary of a meeting held between @hal3e, @iqdecay and @segfault-magnet.

IGI-111 May 18, 2023 Collaborator

segfault-magnet
Apr 26, 2023
Collaborator

segfault-magnet
May 8, 2023
Collaborator Author

IGI-111
May 18, 2023
Collaborator