You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
froster index [folders...] command is used to extract metadata of the given folders using pwalk crawler.
This command also has a couple of options:
--pwalk-copy PATH used to copy the pwalk generated csv to PATH location.
--pwalk-csv FILE to use the given FILE as csv input for the indexer instead of running pwalk again.
I have several questions regarding this command:
What should be the standard workflow for a user? Something like this?
Use froster index [folders...] to index folders and spot hotspots.
Use visidata or check directly the generated csv file to spot folders worth archiving.
Use froster archive FOLDER to archive FOLDER in the AWS S3 bucket.
Use froster delete or froster delete [FOLDERS...] to delete archived files/folders.
Why is the hotposts output copied again with the folder the user has write access to?
Why does it matter?
How does the user know which hotspot file to check?
Can we avoid copying the file again and have a command to retrieve this info? Like: froster index --archivable-hotspots [FOLDERS...]?
--pwalk-copy FOLDER
Why would a user use this file?
Is this file used elsewhere? As an input for --pwalk-csv FILE maybe?
--pwalk-csv FILE
Why use it? If the FILE is already generated, the hotspot file is already generated.
It appears to me that using --pwalk-csv FILE works fine with a single user but it struggles when used in a shared configuration, as the user's colleagues may not know if a folder has been indexed or where the output is located.
As a future feature: I assume that in a large HPC system, storing the raw output of the pwalk command is not feasible due to size constraints. If so, would it be possible to store the folder's metadata currently used to locate hotspots in the data_dir or shared_data_dir? This way when a user runs froster index [folders...] over a folder that has already been indexed, the command can automatically retrieve the stored information instead of running pwalk again. There could be an option to force the pwalk execution to refresh the folder's metadata.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
froster index [folders...]
command is used to extract metadata of the given folders using pwalk crawler.This command also has a couple of options:
--pwalk-copy PATH
used to copy the pwalk generated csv to PATH location.--pwalk-csv FILE
to use the given FILE as csv input for the indexer instead of running pwalk again.I have several questions regarding this command:
What should be the standard workflow for a user? Something like this?
froster index [folders...]
to index folders and spot hotspots.visidata
or check directly the generated csv file to spot folders worth archiving.froster archive FOLDER
to archive FOLDER in the AWS S3 bucket.froster delete
orfroster delete [FOLDERS...]
to delete archived files/folders.Why is the hotposts output copied again with the folder the user has write access to?
froster index --archivable-hotspots [FOLDERS...]
?--pwalk-copy FOLDER
--pwalk-csv FILE
maybe?--pwalk-csv FILE
--pwalk-csv FILE
works fine with a single user but it struggles when used in a shared configuration, as the user's colleagues may not know if a folder has been indexed or where the output is located.froster index [folders...]
over a folder that has already been indexed, the command can automatically retrieve the stored information instead of running pwalk again. There could be an option to force the pwalk execution to refresh the folder's metadata.Beta Was this translation helpful? Give feedback.
All reactions