Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary attachments on an enhanced DeployData message #39

Open
dckc opened this issue Jun 10, 2021 · 6 comments
Open

binary attachments on an enhanced DeployData message #39

dckc opened this issue Jun 10, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@dckc
Copy link

dckc commented Jun 10, 2021

Motivation

RChain aspires to “content delivery at the scale of Facebook". One of the pain points in RCat was hex-encoding assets as rholang strings and then decoding them on chain.

Several projects involve binary assets: the encrypted ID wallet, dappy, etc.

Design Sketch

  • add a new GRPC message like DeployData, but in addition to the term string, it would have space for binary attachments; perhaps a list of them, or a map-like structure of names and byte-sequence values.
    • add an http analog, using multipart/form-data for the binary attachments (put the normal JSON DeployData stuff in one of the parts)
  • add a new name syntax for referring to these attachments from deployed code. For example:
new song1(`rho:attachment:1`) in {
  new stream in {
   contract stream(payment, ret) {
     ...
     ret!(song1)
   }
  }
}

This would let the stream contract send a ByteArray to ret. The bytes of the ByteArray would come from the 1st binary attachment in the GRPC message.

Drawbacks

Non-trivial development time. More stuff for client devs to learn (even if only to ignore it).

Requires a hard-fork.

Alternatives

Do nothing.

Perhaps the 2x size price for hex-encoding and the compute cost of hexToBytes not worth the bother? But that cost is ongoing, whereas the cost of this feature is mostly a one-time thing (modulo ongoing maintenance).

cc @steverosstalbot

@dckc dckc added the enhancement New feature or request label Jun 10, 2021
@jimscarver
Copy link
Collaborator

From the discussion in tech gov today considering compression of hex and conversion to binary, along with the inevitable size limits suggesting chunking is necessity and streaming will necessarily be done in rholang using linked lists of up to about 10 meg chunks each deployed separately. Compressing the rholang deploys on chain is under consideration in which case the hex compresses well and that was a simpler more general way to save space at this time..

The issue of retention was raised, perhaps using a timestamp or block height for expiration and maybe a deployId or some unforgeable to extend the expiration.

@tgrospic
Copy link
Collaborator

As @jimscarver mentioned, chunking can now be done on the deploy level with hex encoded string.
Using one deploy with large amount of data makes difficulty with gRPC because it has max message limit except used in streaming mode. Also validation of this kind of large deploy is becoming more complex.

I've tested the difference (or impact) of storing binary as hex and conversion.

[1] Storing hex string on a channel

new return(`rho:rchain:deployId`), x in {
  x!("<bytes>") |
  for(@a <- x) {
    return!(a)
  }
}

[2] Storing binary on a channel

new return(`rho:rchain:deployId`), x in {
  x!("<bytes>".hexToBytes()) |
  for(@a <- x) {
    return!(a)
  }
}

Cost in phlogiston of storing hex string with or without conversion to binary.

value\cost 10 bytes 20 bytes write/read per 10 bytes
[1] string 903 983 80
[2] bytes 919 989 70

Calling hexToBytes has constant overhead but it has lower cost per byte.

@tgrospic
Copy link
Collaborator

As part of REV vault changes @Isaac-DeFrain created this example of linked list which can be used to store chunks of binary data.

new
  empty,
  cons,
  print,
  stdout(`rho:io:stdout`)
in {
  // adds an element to the head of an existing linked list
  contract cons(@value, pointer, ret) = {
    new elem in {
      elem!(value, *pointer) |
      ret!(*elem)
    }
  } |
  // prints all elements in the list from head to tail
  contract print(elem, ret) = {
    for (@value, @next_elem <- elem) {
      if (value != Nil) {
        ret!(value) |
        print!(next_elem, *ret)
      }
    }
  } |
  // build a linked list and print the elements from head to tail
  new tmp in {
    empty!(Nil, Nil) |
    cons!(2, *empty, *tmp) |
    for (@elem <- tmp) {
      cons!(1, elem, *tmp) |
      for (@elem <- tmp) {
        cons!(0, elem, *tmp) |
        for (@elem <- tmp) {
          print!(elem, *stdout)
        }
      }
    }
  }
}

@dckc
Copy link
Author

dckc commented Jun 23, 2022

So clearly large amounts of data have to be split between deploys.

But still, for chunks of moderate size, the cost of hex-encoding seems fairly high. thanks for the measurements for the cost of evaluation, @tgrospic . We also have a charge for parsing the rholang source code containing the hex string, before interpreting it, right?

@dckc
Copy link
Author

dckc commented Jun 23, 2022

Meanwhile, I gather there's a bittorrent connection in progress; surely that would render this moot. Anybody have a pointer handy?

@tgrospic
Copy link
Collaborator

So clearly large amounts of data have to be split between deploys.

But still, for chunks of moderate size, the cost of hex-encoding seems fairly high. thanks for the measurements for the cost of evaluation, @tgrospic . We also have a charge for parsing the rholang source code containing the hex string, before interpreting it, right?

My measurement showed that hexToBytes conversion has constant overhead which means when parsing is done all binary data is already converted and conversion from hex is not dependent on size of data.

Cost of parsing is 1 phlogiston per byte of source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants