Creating better file content streaming RPC methods #799

aryanjassal · 2024-08-30T07:29:18Z

Description

Currently, VaultsSecretsGet and VaultsSecretsEdit are UnaryHandlers, which poses many restrictions on its functionality, and especially impacts speed and performance. As such, I will be replacing them by a trimmed-down version of fileTree.serializerStreamFactory and fileTree.parserTransformStreamFactory. (#785 (comment), #785 (comment)).

This PR will also remove the two aforementioned RPC methods, instead introducing two new methods, VaultsSecretsGet and VaultSecretsSet for doing the same, but using RawHandlers instead. (MatrixAI/Polykey-CLI#267 (comment)).

This is important for many UNIX commands like secrets edit, secrets cat, etc.

Issues Fixed

Relates to Implement secrets edit command Polykey-CLI#266
REF ENG-398

Tasks

1. Trim functionality of parserTransformerStreamFactory and serializerStreamFactory
2. Update VaultsSecretsGet
3. Update VaultsSecretsEdit to VaultsSecretsSet

Final checklist

linear · 2024-08-30T07:29:20Z

ENG-398 Implement `secrets edit` command

CMCDragonkai · 2024-08-30T13:35:32Z

If we aren't going to use the old methods anymore, then the API isn't guaranteed now anyway, you can just remove and replace. However I believe we have a naming convention for streaming methods. You should check.

CMCDragonkai · 2024-08-30T13:36:30Z

Especially that our SOV usually has to match the object methods. 30 Aug 2024 00:29:41 Aryan Jassal ***@***.***>:

…

*Description* Currently, *VaultsSecretsGet* and *VaultsSecretsEdit* are *UnaryHandler*s, which poses many restrictions on its functionality, and especially impacts speed and performance. As such, I will be replacing them by a trimmed-down version of *fileTree.serializerStreamFactory* and *fileTree.parserTransformStreamFactory*. (#785 (comment)[#785 (comment)], #785 (comment)[#785 (comment)]). This PR will also remove the two aforementioned RPC methods, instead introducing two new methods, *VaultsSecretsFetch* and *VaultSecretsUpdate* for doing the same, but using *RawHandler*s instead. (MatrixAI/Polykey-CLI#267 (comment)[MatrixAI/Polykey-CLI#267 (comment)]). This is important for many UNIX commands like *secrets edit*, *secrets cat*, etc. *Issues Fixed* * Relates to MatrixAI/Polykey-CLI#266[MatrixAI/Polykey-CLI#266] * REF ENG-398 *Tasks* * ☑1. Trim functionality of *parserTransformerStreamFactory* and *serializerStreamFactory* * ☐2. Remove *VaultsSecretsGet* and *VaultsSecretsEdit* * ☐3. Implement *VaultsSecretsFetch* and *VaultsSecretsUpdate* *Final checklist* * ☐Domain specific tests * ☐Full tests * ☐Updated inline-comment documentation * ☐Lint fixed * ☐Squash and rebased * ☐Sanity check the final build ---------------------------------------- *You can view, comment on, or merge this pull request online at:* #799 *Commit Summary* * 5aaae26[5aaae26] wip: pruning features of file tree serialization *File Changes* (2 files[https://github.com/MatrixAI/Polykey/pull/799/files]) * *M* src/vaults/fileTree.ts[https://github.com/MatrixAI/Polykey/pull/799/files#diff-7203abc9fad315307d043c1af677e66b8f032cc9e7e2e24ba508ad1efd0cea4a] (278) * *M* tests/vaults/fileTree.test.ts[https://github.com/MatrixAI/Polykey/pull/799/files#diff-5208b001b4e30ed6ebf162d181e11271e0e265d5881c19d19dfd18e033e74b6d] (29) *Patch Links:* * https://github.com/MatrixAI/Polykey/pull/799.patch * https://github.com/MatrixAI/Polykey/pull/799.diff — Reply to this email directly, view it on GitHub[#799], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAE4OHKERG4HEF66XVQLX5TZUANOHAVCNFSM6AAAAABNL5E32OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TMNBVHE3DEMY]. You are receiving this because you are subscribed to this thread. [Tracking image][https://github.com/notifications/beacon/AAE4OHNAQSUVSQFJDYJ6HWDZUANOHA5CNFSM6AAAAABNL5E32OWGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHJJTHTM4.gif]

aryanjassal · 2024-09-02T06:36:11Z

If we aren't going to use the old methods anymore, then the API isn't guaranteed now anyway, you can just remove and replace. However I believe we have a naming convention for streaming methods. You should check.

I asked @tegefaulkes and we settled on calling the methods VaultsSecretsGet and VaultsSecretsSet, as that is concise and makes the most sense.

If you don't like these names, an alternative naming scheme can be calling the methods VaultsSecretsFetch and VaultsSecretsUpdate.

aryanjassal · 2024-09-02T06:40:38Z

The behaviour of serializerStreamFactory is to stream the contents of multiple files in the order of the files. This will send data as a stream to the client.

But how do we handle incoming data? Do we also expect the client to stream secret contents back to us in the same format of multiple files? Or do we expect a single file only? Would it even make sense for the client to stream multiple files back? When would that even happen? The same question goes for streaming file contents to the client. When would we even need the functionality of streaming multiple files concurrently? AFAIK, no UNIX command we will be implementing would need that (globbing might require that, but we are not globbing anyways).

I just need some clarity on these topics @tegefaulkes.

aryanjassal · 2024-09-02T07:25:56Z

On that note, the CONTENT header can be simplified further by removing the iNode field. This is because the files returned will be in the same order that they were sent in. Thus, we don't need to keep track of which file is being returned and in what order.

Can't we also simplify it further by removing the type field, as it will always and only return CONTENT objects? Then, the header would be simplified to only the dataSize. At this point, can't we just make a terminator sequence and break/split the streams when we hit that sequence instead of dealing with serializing headers?

[ci skip]

tegefaulkes · 2024-09-02T22:33:23Z

Even without globbing we still need to send entire trees for the mv and cp command. In those cases we need to know about directories, their contents and sending multiple files at once.

aryanjassal · 2024-09-03T00:15:36Z

Even without globbing we still need to send entire trees for the mv and cp command. In those cases we need to know about directories, their contents and sending multiple files at once.

Now that I have removed the functionality of sending over file trees from serializer and parser, how do we plan on doing this? Maybe sending a JSON object as a response containing an array of the file paths in the order of serialisation might work, but I'm not sure how would I handle doing that for mv and cp commands.

Can't we also simplify it further by removing the type field, as it will always and only return CONTENT objects? Then, the header would be simplified to only the dataSize. At this point, can't we just make a terminator sequence and break/split the streams when we hit that sequence instead of dealing with serializing headers?

We cannot do this, as we will rely on sending over TreeNodes about file data for future iterations of secrets ls command. So, I think that we should retain the type field for the time being, and only remove the iNode field.

… serialization [ci skip]

CMCDragonkai · 2024-09-03T17:37:57Z

I haven't reviewed this in detail, but just some advice, remember to don't make the API too complex, we may expect other clients to integrate into the future. My style is to always start with simple functions and then combine them into more complex functionality using combinators. The functional programming style, in the context of RPC, we don't actually have procedure composition like in Cap'n Proto, so I always start with simple f and then create complex variants like f' and f''. And the suffix names is what is used to indicate variants. I want to maintain this conceptual integrity.

CMCDragonkai · 2024-09-03T17:39:23Z

I haven't reviewed this in detail, but just some advice, remember to don't make the API too complex, we may expect other clients to integrate into the future. My style is to always start with simple functions and then combine them into more complex functionality using combinators. The functional programming style, in the context of RPC, we don't actually have procedure composition like in Cap'n Proto, so I always start with simple f and then create complex variants like f' and f''. And the suffix names is what is used to indicate variants. I want to maintain this conceptual integrity.

Which means by the way that the underlying JS functions get composed together for each handler, but handlers themselves are rather more atomic and uncomposable (they technically be composed on the client side, but it's not as efficient as composing the procedure on the server side as it involves back and forth data transfer).

[ci skip]

aryanjassal · 2024-09-06T12:59:57Z

Apparently the RawHandler will not, by default, serialize errors. If an error is thrown, then it would propagate to the transport layer, and cause a read: 0 at the client. To handle this, the serializer needs a way to serialize errors, and the parser needs a way to watch for and parse thrown errors. The code for doing this can be taken from js-rpc and how it is currently handled in other RPC handlers like UnaryHandler, etc.

This, among the other implementation requirement, makes this PR take too long, and as such, won't be part of the first iteration. The first iteration of the secrets edit command will be a Polykey CLI change only, creating a file if it doesn't already exist. Using RawHandlers for better streaming is out of scope for that feature, and will be worked upon in future iterations. The code in this PR will be cleaned up, and worked upon later.

CMCDragonkai · 2024-09-06T23:45:54Z

@tegefaulkes Is the primary purpose of this PR mostly to improve performance of secret editing commands? As in, this is due to file content transfer between the PK client and the PK agent? I can see how raw content might help for larger files, especially if you need to stream edits, or block edits, a single unary call wouldn't get you any sort of progress update.

However I fear that doing this across all of our calls is going to make integration for other clients much more difficult in the future, necessitating streaming operations for them. Have you considered the trade offs here?

CMCDragonkai · 2024-09-06T23:47:15Z

src/utils/utils.ts

+  streamToAsyncGenerator,
+  asyncGeneratorToStream,


I feel like these should really be in its own utility system.

Also don't we already have ixjs combinators available? Doesn't that already do this? Usually that library has these features.

See the ix library 5.0.0. I feel like there would already be some combinator for async iterator to web streams.

CMCDragonkai · 2024-09-06T23:49:07Z

src/vaults/fileTree.ts

A file called fileTree.ts under src/vaults? Is this right? What is this file for? Usually utilities go until utils/ otherwise this should be a class based module. What's the meaning behind this?

Please start writing module-comments typedoc style, as well as function doc comments too.

tegefaulkes · 2024-09-10T01:52:56Z

When I was creating the utilities I was assuming the serialisation as a raw stream was going to be a necessity for some of the commands. Especially some of the more complicated ones such as mv and cp since they need to operate over a whole tree.

We may have to have a discussion on this to clarify the constraints and get on the same page @CMCDragonkai.

CMCDragonkai · 2024-09-10T02:53:09Z

When moving within a vault or between vaults, that wouldn't involve any serialisation.

When moving into and out of the vault. That may involve a serialisation or deserialisation of the tree structure.

Instead of inventing a new serialisation, why not use a standardized container format? Like zip or tar.

tegefaulkes · 2024-09-10T03:19:14Z

That is one option. I'll need look into it once I get some time to go over it.

CMCDragonkai · 2024-09-11T00:06:06Z

A standardized format would be easier for other implementations to do. Plus this means we can introduce this: https://github.com/matrixai/js-virtualtar. I never finished the implementation, but it would be cool to do.

Plus as a tar thing, it doesn't compress, but that's unnecessary since we expect the transport layer to introduce compression if necessary - no need double compression.

As a streaming container format, it can be stream deserialized into file contents, basically it's perfect for what we want to do.

It's also suitable for binary streams.

So I vote for virtualtar over a custom format, and not zip, because zip compresses things and we don't really need that, plus we don't need random access when streaming over the network.

Note that if we want to eventually support resumable uploads/downloads, one would also need implement deterministic tar. But that's down the line.

Plus by using tar, this actually enables another command secrets tar which would allow users to easily call this to stream out a particular archive of a secret contents, as an onramp/offramp, and even pipe such a command using - as the option to gzip, gnupg or any other unix command. It's perfect for this!

tegefaulkes · 2024-09-11T00:30:03Z

It will definitely simplify things if it's easy to integrate. I do like the idea but it will need to work with a DIed fs and be able to work with the EFS directly. If possible we really want to avoid writing to the disk just to construct the tar file. Not the end of the world but the less the secrets touch the disk the better.

aryanjassal · 2024-10-08T05:22:14Z

The relevant work for this PR has already been completed in other PRs. Closing this as not planned.

wip: pruning features of file tree serialization

5aaae26

aryanjassal requested a review from tegefaulkes August 30, 2024 07:29

aryanjassal self-assigned this Aug 30, 2024

aryanjassal marked this pull request as draft September 2, 2024 06:43

aryanjassal added 2 commits September 2, 2024 17:40

chore: simplified code

7531ef0

wip: updating secret content getter

8f886cf

[ci skip]

aryanjassal added 2 commits September 3, 2024 12:19

chore: using array of filenames instead of generator of treenodes for…

fc78adc

… serialization [ci skip]

chore: secrets get mostly done

378aa45

bug: working on throwing errors

0dfb838

[ci skip]

aryanjassal force-pushed the feature-unix-edit branch from dd7b99d to 0dfb838 Compare September 6, 2024 03:51

aryanjassal added 2 commits September 6, 2024 15:24

bug: working on not throwing read: 0

34323fd

[ci skip]

bug: working on stream reading

1f48da5

[ci skip]

CMCDragonkai reviewed Sep 6, 2024

View reviewed changes

aryanjassal mentioned this pull request Sep 10, 2024

Implement secrets cat command MatrixAI/Polykey-CLI#243

Closed

aryanjassal mentioned this pull request Sep 23, 2024

Use tar archival format to transfer file tree structure and data between Polykey vaults and local file systems #811

Open

aryanjassal mentioned this pull request Oct 8, 2024

Migrate all VaultsSecrets handlers to RawHandler #822

Open

8 tasks

aryanjassal closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating better file content streaming RPC methods #799

Creating better file content streaming RPC methods #799

aryanjassal commented Aug 30, 2024 •

edited

Loading

linear bot commented Aug 30, 2024

CMCDragonkai commented Aug 30, 2024 via email

CMCDragonkai commented Aug 30, 2024 via email

aryanjassal commented Sep 2, 2024

aryanjassal commented Sep 2, 2024

aryanjassal commented Sep 2, 2024

tegefaulkes commented Sep 2, 2024

aryanjassal commented Sep 3, 2024

CMCDragonkai commented Sep 3, 2024

CMCDragonkai commented Sep 3, 2024

aryanjassal commented Sep 6, 2024

CMCDragonkai commented Sep 6, 2024

CMCDragonkai Sep 6, 2024

CMCDragonkai Sep 6, 2024

CMCDragonkai Sep 6, 2024

CMCDragonkai Sep 6, 2024

tegefaulkes commented Sep 10, 2024

CMCDragonkai commented Sep 10, 2024

tegefaulkes commented Sep 10, 2024

CMCDragonkai commented Sep 11, 2024

tegefaulkes commented Sep 11, 2024

aryanjassal commented Oct 8, 2024 •

edited

Loading

Creating better file content streaming RPC methods #799

Creating better file content streaming RPC methods #799

Conversation

aryanjassal commented Aug 30, 2024 • edited Loading

Description

Issues Fixed

Tasks

Final checklist

linear bot commented Aug 30, 2024

CMCDragonkai commented Aug 30, 2024 via email

CMCDragonkai commented Aug 30, 2024 via email

aryanjassal commented Sep 2, 2024

aryanjassal commented Sep 2, 2024

aryanjassal commented Sep 2, 2024

tegefaulkes commented Sep 2, 2024

aryanjassal commented Sep 3, 2024

CMCDragonkai commented Sep 3, 2024

CMCDragonkai commented Sep 3, 2024

aryanjassal commented Sep 6, 2024

CMCDragonkai commented Sep 6, 2024

CMCDragonkai Sep 6, 2024

Choose a reason for hiding this comment

CMCDragonkai Sep 6, 2024

Choose a reason for hiding this comment

CMCDragonkai Sep 6, 2024

Choose a reason for hiding this comment

CMCDragonkai Sep 6, 2024

Choose a reason for hiding this comment

tegefaulkes commented Sep 10, 2024

CMCDragonkai commented Sep 10, 2024

tegefaulkes commented Sep 10, 2024

CMCDragonkai commented Sep 11, 2024

tegefaulkes commented Sep 11, 2024

aryanjassal commented Oct 8, 2024 • edited Loading

aryanjassal commented Aug 30, 2024 •

edited

Loading

aryanjassal commented Oct 8, 2024 •

edited

Loading