Temp pool ressouce files written on disk on each trial #24

gdesor · 2024-06-21T12:47:49Z

Each time a chip is detected, the resource files (images, sounds, videos...) are written to the disk in a temporary folder, even if the experiment is the same. (FilePoolStore does not receive a 'folder' argument.)

The proposed solution is to modify the _get_osexp function of the OpenMonkeyMind class, and to proceed in the same way as for the 'osexp' file, by passing the 'pool_folder' argument to Experiment.

  def _get_osexp(self, json):

       t0 = time.time()
       for f in json['files']:
           if not f['type'] == 'experiment':
               continue
           path = f['path']
           updated_at = f['updated_at']
           size = f['size']
           break
       else:
           raise InvalidJSON(safe_decode(json))

       tmpname=os.path.join(
           tempfile.gettempdir(),
           hashlib.md5(safe_encode(path + updated_at)).hexdigest()
       )

       cache_path = tmpname + '.osexp'

       # If a cached file that matches in name and size exists, we re-use it.
       # The file name also includes the updated_at fields, and thus
       # re-uploading a new experiment with the same size will still refresh
       # the cache.
       if os.path.exists(cache_path) and os.path.getsize(cache_path) == size:
           oslogger.info('using cached {}'.format(cache_path))
       else:
           response = requests.get(self._osexp_url + path)
           if not response.ok:
               raise FailedToDownloadExperiment()
           with open(cache_path, 'wb') as fd:
               fd.write(response.content)
           oslogger.info('caching {} to {}'.format(path, cache_path))
       self._experiment = experiment(pool_folder = tmpname,string=cache_path)
       oslogger.info(
           'building experiment ({:.3f} s)'.format(time.time() - t0)
       )
       return self._experiment

The text was updated successfully, but these errors were encountered:

smathot · 2024-06-24T14:27:17Z

Thanks for this! I understand the idea, but fixing the pool folder will not really change anything because the osexp file will still be extracted each time. The only that would change is that this happens to the same folder over and over again.

Just to clarify this for me: What is the issue that you're trying to address?

Is it that OMM creates many temporary pool folders that are never cleaned up? If so, then this would indeed solve it, but it's no longer necessary because it's already fixed in the main OpenSesame code.

Or do you want to speed up the process by avoid unnecessary extraction of the file pool each time that a trial is started? If so, then this won't change that, unfortunately.

gdesor · 2024-06-25T14:34:36Z

Hello, you are right., thank you for the clarification. Since I am working on a Raspberry Pi and the read/write speed is slow and I don't have much disk space, I am looking to avoid writing the same files multiple times for two reasons: 1) to not consume too much disk space, and 2) to improve execution speed.

I haven't tested everything, but this seems to address both issues :

           #If the experiment has already been decompressed, 
           #'cache_path' variable is replaced with the name of the experiment script file.
           cache_path = tmpname + "/script.opensesame"

    def _get_osexp(self, json):

       t0 = time.time()
       for f in json['files']:
           if not f['type'] == 'experiment':
               continue
           path = f['path']
           updated_at = f['updated_at']
           size = f['size']
           break
       else:
           raise InvalidJSON(safe_decode(json))

       tmpname=os.path.join(
           tempfile.gettempdir(),
           hashlib.md5(safe_encode(path + updated_at)).hexdigest()
       )

       cache_path = tmpname + '.osexp'

       # If a cached file that matches in name and size exists, we re-use it.
       # The file name also includes the updated_at fields, and thus
       # re-uploading a new experiment with the same size will still refresh
       # the cache.
       if os.path.exists(cache_path) and os.path.getsize(cache_path) == size:
           oslogger.info('using cached {}'.format(cache_path))
           #If the experiment has already been decompressed, 
           #'cache_path' variable is replaced with the name of the experiment script file.
           cache_path = tmpname + "/script.opensesame"
       else:
           response = requests.get(self._osexp_url + path)
           if not response.ok:
               raise FailedToDownloadExperiment()
           with open(cache_path, 'wb') as fd:
               fd.write(response.content)
           oslogger.info('caching {} to {}'.format(path, cache_path))

       self._experiment = experiment(pool_folder = tmpname,string=cache_path)
       oslogger.info(
           'building experiment ({:.3f} s)'.format(time.time() - t0)
       )
       return self._experiment

But for this to work, you obviously need to comment out the last line (#os.remove(script_path)) of the _read_tarfile function in the 'osexpfile/_osexpreader.py' file.

smathot · 2024-06-26T08:08:31Z

Ok, I see. I'm willing to accept a patch along these lines, but not implemented like this, because it would require the main OpenSesame code to be hacked in order to make this possible. Here's how I would approach this:

Check if a cache folder exists for an experiment file. (So rather than having a cached file for an experiment, it's a folder.)
If the cache folder doesn't exist, use OSExpReader (you can instantiate it by passing None for the exp argument) directly to extract the experiment file into a new cache folder
Instantiate the experiment using the pool inside the cache folder for the pool folder and the script inside the cache folder for the string argument.

Does that make sense? If you want to implement this, please submit it as a pull request and also include tests to make sure that this works properly. There's a lot of room for breakage with things like this!

gdesor · 2024-06-26T08:46:13Z

From what I understand, if I use 'OSExpReader' as is, I will never have the script in the cache folder. This is because the '_read_tarfile' method deletes the script right after decompressing it.

So, for me, the only way to avoid modifying the main Opensesame code would be to rewrite the 'OSExpReader' class for OMM (that does not delete the script) , which would be used to decompress the osexp file.

Are you agree with that, or am I missing something?

I understand that modifying the main code is generally to be avoided, but what is the point of deleting the script from the cache? Leaving it there changes nothing, and it will be removed during the 'pool' cleanup by Opensesame. I think it makes sense to keep the script with the resource files

smathot · 2024-06-27T07:31:04Z

From what I understand, if I use 'OSExpReader' as is, I will never have the script in the cache folder. This is because the '_read_tarfile' method deletes the script right after decompressing it.

Ah yes, that's true. The script itself (as a string) is still preserved at the script property of the OSExpReader object. And you can use that to pass as the string argument to the Experiment object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temp pool ressouce files written on disk on each trial #24

Temp pool ressouce files written on disk on each trial #24

gdesor commented Jun 21, 2024

smathot commented Jun 24, 2024

gdesor commented Jun 25, 2024

smathot commented Jun 26, 2024 •

edited

Loading

gdesor commented Jun 26, 2024

smathot commented Jun 27, 2024

Temp pool ressouce files written on disk on each trial #24

Temp pool ressouce files written on disk on each trial #24

Comments

gdesor commented Jun 21, 2024

smathot commented Jun 24, 2024

gdesor commented Jun 25, 2024

smathot commented Jun 26, 2024 • edited Loading

gdesor commented Jun 26, 2024

smathot commented Jun 27, 2024

smathot commented Jun 26, 2024 •

edited

Loading