Convenient way to download all data connected to a flow #205

JaGeo · 2024-11-05T09:19:50Z

Hi all,

I was wondering if there is currently a convenient way to download all data connected to a flow. While I can retrieve data for a certain job, I haven't found such an option for a whole flow. I would assume that this might be something other users would be interested in as well.

gpetretto · 2024-11-06T10:32:24Z

Hi @JaGeo,

when you mention the data connected to a Flow, are you referring to the Flow structure (e.g. the list of all the Jobs information plus the connections between the Jobs) or to the Job outputs? Or both?
Which functionality exactly that is present for Jobs would you like to have available for the whole Flow?

JaGeo · 2024-11-06T11:25:10Z

@gpetretto I would like to be able to download all raw data, but ideally, such a download includes the information about the flow and its jobs as well. I did not think about this last part at first, but would make reconstructing the data much easier.

It would be nice to have some kind of archive option. Download of all raw data into one folder, get all outputs from the database and add all job connections. Or do you have a better solution in mind for moving data to long-term storage?

gpetretto · 2024-11-07T13:34:35Z

I see, sorry, I had mistakenly assumed that you were referring to the content of the DB and not the raw data. So basically an equivalent of jf job get but for flows.
An example could be a command like jf flow files get, that for a specific flow (or maybe many flows, based on results selected with a query) downloads the data from the worker and puts them in an organized folder? e.g. uuid of the flow as main folder, than it can also use the uuid of the hosts to group jobs accordingly to the subflows, then each job in a folder with "{job_name}{uuid}{index}". And maybe a dump of the connections in the main flow folder?

I would tend to keep separated the backup of the raw data and that of the output Store, because if the outputs document are split and dumped in the corresponding job folder it may be difficult to reconstruct the output Store afterwards, if needed. Since there are many kinds of Stores, I was inclined to think that it could be good to rely on specific tools for their backup. What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convenient way to download all data connected to a flow #205

Convenient way to download all data connected to a flow #205

JaGeo commented Nov 5, 2024

gpetretto commented Nov 6, 2024

JaGeo commented Nov 6, 2024

gpetretto commented Nov 7, 2024

Convenient way to download all data connected to a flow #205

Convenient way to download all data connected to a flow #205

Comments

JaGeo commented Nov 5, 2024

gpetretto commented Nov 6, 2024

JaGeo commented Nov 6, 2024

gpetretto commented Nov 7, 2024