Pyspark code for anonymizing the spider 2 snapshot files. This was used for obfusticating the snapshot data, which was used for the following study:
- Seung-Hwan Lim, Hyogi Sim, Raghul Gunasekaran, and Sudharshan S. Vazhkudai, "Scientific User Behavior and Data-Sharing Trends in a Petascale File System," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), New York, NY, USA, 2017.
This software ran on the Andes cluster at OLCF with the magpie framework.
The anonymized snaphost will be available in public.