Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Get and Multi-Put with list of keys #39

Open
fm3 opened this issue Jun 5, 2023 · 3 comments
Open

Multi-Get and Multi-Put with list of keys #39

fm3 opened this issue Jun 5, 2023 · 3 comments

Comments

@fm3
Copy link
Member

fm3 commented Jun 5, 2023

Sometimes users know a specific list of keys they want to get or put. Currently, they have to send many grpcs, one per key (except for the special case of prefix scan gets). To reduce overhead, it would be nice to add API endpoints that directly support this.

message OptionalLong {
    optional uint64 value = 1;
}

message GetMultipleKeysByListRequest {
    required string collection = 1;
    repeated string keys = 2;
    repeated OptionalLong versions = 3;
}

// length of versions should either be equal to length of keys, in which case each individual version is considered optional, or 0 (empty list), in which case None is assumed for every key.

message GetMultipleKeysByListReply {
    required bool success = 1;
    repeated string errorMessage = 2;
    repeated bytes values = 4;
    repeated uint64 actualVersions = 5;
    repeated bool missingKeys = 6;
}

// if only some keys exist, values and actualVersions are be densified. missingValueIndices lists the keys for wich nothing has been found

(Note that the name GetMultipleKeys was unfortunately already taken by the prefix-scan-based request)

message PutMultipleKeysByListRequest {
    required string collection = 1;
    repeated string keys = 2;
    repeated OptionalLong versions = 3;
}

message PutMultipleKeysByListReply {
    required bool success = 1;
    optional string errorMessage = 2;
}

Any comments on the API draft? @normanrz @philippotto

@philippotto
Copy link
Member

Since I haven't really worked with the fossilDB yet, I might not be the best to give feedback, but my current thoughts are:

  • looks good overall 👍
  • maybe it makes sense to also support the case that GetMultipleKeysByListRequest has N keys and only 1 version (meaning that all keys would use that version). not sure whether this could be helpful for our use cases, but in general it might be nice.
  • if GetMultipleKeysByListReply contains missing keys, I imagine the structure a bit tedious to use. For example, when iterating over the output, one has to keep track of the iterated keys and which of these are missing to match the result with the request. How about something like this:
message GetMultipleKeysByListReply {
    required bool success = 1;
    repeated GetMultipleKeysResponseItem = 2;
    required bool missingKeys = 3;
}

message GetMultipleKeysResponseItem {
    required string key = 1;
    required bytes value = 2;
    required uint64 actualVersion = 3;
}

However, I understand that this introduces a bit of redundancy, since the keys are encoded again. Not sure whether this is a performance hit.

All in all, I don't have a strong opinion here.

@normanrz
Copy link
Member

normanrz commented Jun 5, 2023

What are the expected error cases. Does it make sense to treat the multi-get/puts as atomic transactions? They could even use the transaction system from rocksdb.

@fm3
Copy link
Member Author

fm3 commented Jun 12, 2023

when iterating over the output, one has to keep track of the iterated keys

I agree, I like the ResponseItem approach, and I’d assume that the overhead is not super significant.

N keys and only 1 version

Yes, might be a valid use case, I guess we can encode that by versions containing exactly one entry.

What are the expected error cases

  • Item missing (not an error, but entry would be empty)
  • i/o error (file missing, corrupt file system, file permissions, for writing also disk full)
  • I’d also expect that there may be exceptions thrown that are not specifically expected

Does it make sense to treat the multi-get/puts as atomic transactions?

It might!

  • I guess a con could be performance, but I don’t know how severe that would be.
  • During normal operation, webknossos would await the result anyway and only afterwards send the next request, so transaction management is not super important in the “happy path”
  • However, if the connection breaks, problems might arise that might be saved by making these transactions.
  • Another con might be that it would not really work with pagination/streaming. When iterating over many keys in the client, to have proper transaction management the client would have to fetch everything at once.

So in summary I agree that this would bring possible benefits for various codepaths. However, I’m not super sure that this would directly solve problems in wk, since there are a few ways in which the wk code is breaking up transactions into chunks. To achieve end-to-end transactions, larger changes in wk would be required. Also, I don’t know if transactions across column families are possible with rocksDB, those would probably be required to have truly transactional saves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants