CUBE <--> pfcon file transfer #43
jennydaman
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
During today's roundtable, @jbernal0019 presented CUBE plugininstances manager.py code:
We have several ideas for how the program can be optimized.
"In-Network" pfcon
First, we discuss the case where pfcon is able to access CUBE's file storage directly.
"No-transfer" pfcon
FNNDSC/pfcon#136
The simplest and most efficient solution is for pfcon to do nothing. pman would create job containers which mount CUBE's file storage directly.
"No-transfer" is a misnomer, but the oversimplification makes the concept easier to understand. It is possible for CUBE's file storage to be network-based, e.g. Swift or NFS. In these cases, file data are being transferred from storage to compute node transparently. This is desirable because the underlying system can do transfer more efficiently and more easily than if we were to implement transfer ourselves. The container engine (Docker/Podman/Singularity/Kubernetes) used by pman would need a plugin to be able to mount Swift or NFS as volumes to a container.
@jbernal0019 raised a concern about eventual consistency. Let's enumerate the possible situations:
"Pull" pfcon
@jbernal0019's proposed architecture is for pfcon to have direct access to CUBE's file storage.
*Assumption: Swift is eventually consistent, the volume is consistent.
It is less performant and more complicated to implement than "no-transfer" pfcon. However, it solves the problem of eventual consistency.
Deployment Targets
Our cyber-infrastructure is described here: https://github.com/FNNDSC/CHRIS_docs/wiki/Available-Infrastructure-for-the-ChRIS-Project
Our primary target is to use ChRIS internally at BCH, where NFS is the only option (true for rc-nfs and mghpcc-bch). Hence, the "no-transfer" pfcon is a low-effort, high-reward feature. "Pull" pfcon requires a lot more effort, and is more suitable for a highly-available cloud deployment.
Optimizing File Transmission to Pfcon
https://github.com/FNNDSC/ChRIS_ultron_backEnd/blob/59d3784be659d41c0c903edd31148cb57c293e82/chris_backend/plugininstances/services/manager.py#L482-L509
tl;dr current implementation is inefficient because it is sequential and buffered into memory. A more efficient solution would use streams.
Some criticisms of the implementation of file transmission to pfcon were previously described here: #27
Currently, it works like this:
The memory requirement of this program is about 150% of the size of the data. For instance, if your input data is 10GB, then you'll need 15GB of memory (20GB to be safe) to send the data to pfcon.
The program can be improved:
Moreover, steps 2 and 3 can be concurrent.
This is pretty much how chrs uploads/downloads files:
https://github.com/FNNDSC/chrs/blob/841e3957232d199ba8662c2915d12744217fd3d1/chrs/src/upload.rs#L85-L91
https://github.com/FNNDSC/chrs/blob/841e3957232d199ba8662c2915d12744217fd3d1/chrs/src/files/download.rs#L140-L142
The team agrees that the efficiency of file transmission to pfcon should be improved. @jennydaman says that, since it needs an overhaul, we should discuss more and take it further with the "pull-into-cache" pfcon idea: #27
Beta Was this translation helpful? Give feedback.
All reactions