CUBE <--> pfcon file transfer #43

jennydaman · 2023-06-06T22:40:20Z

jennydaman
Jun 6, 2023
Maintainer

During today's roundtable, @jbernal0019 presented CUBE plugininstances manager.py code:

We have several ideas for how the program can be optimized.

"In-Network" pfcon

First, we discuss the case where pfcon is able to access CUBE's file storage directly.

"No-transfer" pfcon

The simplest and most efficient solution is for pfcon to do nothing. pman would create job containers which mount CUBE's file storage directly.

"No-transfer" is a misnomer, but the oversimplification makes the concept easier to understand. It is possible for CUBE's file storage to be network-based, e.g. Swift or NFS. In these cases, file data are being transferred from storage to compute node transparently. This is desirable because the underlying system can do transfer more efficiently and more easily than if we were to implement transfer ourselves. The container engine (Docker/Podman/Singularity/Kubernetes) used by pman would need a plugin to be able to mount Swift or NFS as volumes to a container.

@jbernal0019 raised a concern about eventual consistency. Let's enumerate the possible situations:

Using filesystem or NFS storage, eventual consistency is not a concern.
Using Swift storage, eventual consistency becomes a problem and files produced by the plugin instance may be lost.

"Pull" pfcon

@jbernal0019's proposed architecture is for pfcon to have direct access to CUBE's file storage.

pfcon receives from CUBE the path of the plugin instance's input directory
pfcon downloads the objects from Swift and into a volume*
pman schedules the job container
pfcon uploads the objects from a volume to Swift
pfcon sends a list of files to CUBE
CUBE registers the files to the database

*Assumption: Swift is eventually consistent, the volume is consistent.

It is less performant and more complicated to implement than "no-transfer" pfcon. However, it solves the problem of eventual consistency.

Deployment Targets

Our cyber-infrastructure is described here: https://github.com/FNNDSC/CHRIS_docs/wiki/Available-Infrastructure-for-the-ChRIS-Project

Our primary target is to use ChRIS internally at BCH, where NFS is the only option (true for rc-nfs and mghpcc-bch). Hence, the "no-transfer" pfcon is a low-effort, high-reward feature. "Pull" pfcon requires a lot more effort, and is more suitable for a highly-available cloud deployment.

Optimizing File Transmission to Pfcon

https://github.com/FNNDSC/ChRIS_ultron_backEnd/blob/59d3784be659d41c0c903edd31148cb57c293e82/chris_backend/plugininstances/services/manager.py#L482-L509

tl;dr current implementation is inefficient because it is sequential and buffered into memory. A more efficient solution would use streams.

Some criticisms of the implementation of file transmission to pfcon were previously described here: #27

Currently, it works like this:

CUBE downloads all files one-by-one
Files are saved into a Python list variable (in-memory)
Files are added to a ZIP file in-memory
ZIP file is sent to pfcon

The memory requirement of this program is about 150% of the size of the data. For instance, if your input data is 10GB, then you'll need 15GB of memory (20GB to be safe) to send the data to pfcon.

The program can be improved:

Iteration over files should be lazy (use iterators, not lists)
Download files to a stream
Stream files to a request to pfcon
Do not use ZIP compression, use gzip or brotli with HTTP

Moreover, steps 2 and 3 can be concurrent.

This is pretty much how chrs uploads/downloads files:

https://github.com/FNNDSC/chrs/blob/841e3957232d199ba8662c2915d12744217fd3d1/chrs/src/upload.rs#L85-L91

https://github.com/FNNDSC/chrs/blob/841e3957232d199ba8662c2915d12744217fd3d1/chrs/src/files/download.rs#L140-L142

The team agrees that the efficiency of file transmission to pfcon should be improved. @jennydaman says that, since it needs an overhaul, we should discuss more and take it further with the "pull-into-cache" pfcon idea: #27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUBE <--> pfcon file transfer #43

{{title}}

Replies: 0 comments

Select a reply

CUBE <--> pfcon file transfer #43

jennydaman Jun 6, 2023 Maintainer

"In-Network" pfcon

"No-transfer" pfcon

"Pull" pfcon

Deployment Targets

Optimizing File Transmission to Pfcon

Replies: 0 comments

jennydaman
Jun 6, 2023
Maintainer