Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temp pool ressouce files written on disk on each trial #24

Open
gdesor opened this issue Jun 21, 2024 · 5 comments
Open

Temp pool ressouce files written on disk on each trial #24

gdesor opened this issue Jun 21, 2024 · 5 comments

Comments

@gdesor
Copy link

gdesor commented Jun 21, 2024

Each time a chip is detected, the resource files (images, sounds, videos...) are written to the disk in a temporary folder, even if the experiment is the same. (FilePoolStore does not receive a 'folder' argument.)

The proposed solution is to modify the _get_osexp function of the OpenMonkeyMind class, and to proceed in the same way as for the 'osexp' file, by passing the 'pool_folder' argument to Experiment.

  def _get_osexp(self, json):

       t0 = time.time()
       for f in json['files']:
           if not f['type'] == 'experiment':
               continue
           path = f['path']
           updated_at = f['updated_at']
           size = f['size']
           break
       else:
           raise InvalidJSON(safe_decode(json))

       tmpname=os.path.join(
           tempfile.gettempdir(),
           hashlib.md5(safe_encode(path + updated_at)).hexdigest()
       )

       cache_path = tmpname + '.osexp'

       # If a cached file that matches in name and size exists, we re-use it.
       # The file name also includes the updated_at fields, and thus
       # re-uploading a new experiment with the same size will still refresh
       # the cache.
       if os.path.exists(cache_path) and os.path.getsize(cache_path) == size:
           oslogger.info('using cached {}'.format(cache_path))
       else:
           response = requests.get(self._osexp_url + path)
           if not response.ok:
               raise FailedToDownloadExperiment()
           with open(cache_path, 'wb') as fd:
               fd.write(response.content)
           oslogger.info('caching {} to {}'.format(path, cache_path))
       self._experiment = experiment(pool_folder = tmpname,string=cache_path)
       oslogger.info(
           'building experiment ({:.3f} s)'.format(time.time() - t0)
       )
       return self._experiment
@smathot
Copy link
Collaborator

smathot commented Jun 24, 2024

Thanks for this! I understand the idea, but fixing the pool folder will not really change anything because the osexp file will still be extracted each time. The only that would change is that this happens to the same folder over and over again.

Just to clarify this for me: What is the issue that you're trying to address?

Is it that OMM creates many temporary pool folders that are never cleaned up? If so, then this would indeed solve it, but it's no longer necessary because it's already fixed in the main OpenSesame code.

Or do you want to speed up the process by avoid unnecessary extraction of the file pool each time that a trial is started? If so, then this won't change that, unfortunately.

@gdesor
Copy link
Author

gdesor commented Jun 25, 2024

Hello, you are right., thank you for the clarification. Since I am working on a Raspberry Pi and the read/write speed is slow and I don't have much disk space, I am looking to avoid writing the same files multiple times for two reasons: 1) to not consume too much disk space, and 2) to improve execution speed.

I haven't tested everything, but this seems to address both issues :

           #If the experiment has already been decompressed, 
           #'cache_path' variable is replaced with the name of the experiment script file.
           cache_path = tmpname + "/script.opensesame"
    def _get_osexp(self, json):

       t0 = time.time()
       for f in json['files']:
           if not f['type'] == 'experiment':
               continue
           path = f['path']
           updated_at = f['updated_at']
           size = f['size']
           break
       else:
           raise InvalidJSON(safe_decode(json))

       tmpname=os.path.join(
           tempfile.gettempdir(),
           hashlib.md5(safe_encode(path + updated_at)).hexdigest()
       )

       cache_path = tmpname + '.osexp'

       # If a cached file that matches in name and size exists, we re-use it.
       # The file name also includes the updated_at fields, and thus
       # re-uploading a new experiment with the same size will still refresh
       # the cache.
       if os.path.exists(cache_path) and os.path.getsize(cache_path) == size:
           oslogger.info('using cached {}'.format(cache_path))
           #If the experiment has already been decompressed, 
           #'cache_path' variable is replaced with the name of the experiment script file.
           cache_path = tmpname + "/script.opensesame"
       else:
           response = requests.get(self._osexp_url + path)
           if not response.ok:
               raise FailedToDownloadExperiment()
           with open(cache_path, 'wb') as fd:
               fd.write(response.content)
           oslogger.info('caching {} to {}'.format(path, cache_path))

       self._experiment = experiment(pool_folder = tmpname,string=cache_path)
       oslogger.info(
           'building experiment ({:.3f} s)'.format(time.time() - t0)
       )
       return self._experiment

But for this to work, you obviously need to comment out the last line (#os.remove(script_path)) of the _read_tarfile function in the 'osexpfile/_osexpreader.py' file.

@smathot
Copy link
Collaborator

smathot commented Jun 26, 2024

Ok, I see. I'm willing to accept a patch along these lines, but not implemented like this, because it would require the main OpenSesame code to be hacked in order to make this possible. Here's how I would approach this:

  • Check if a cache folder exists for an experiment file. (So rather than having a cached file for an experiment, it's a folder.)
  • If the cache folder doesn't exist, use OSExpReader (you can instantiate it by passing None for the exp argument) directly to extract the experiment file into a new cache folder
  • Instantiate the experiment using the pool inside the cache folder for the pool folder and the script inside the cache folder for the string argument.

Does that make sense? If you want to implement this, please submit it as a pull request and also include tests to make sure that this works properly. There's a lot of room for breakage with things like this!

@gdesor
Copy link
Author

gdesor commented Jun 26, 2024

From what I understand, if I use 'OSExpReader' as is, I will never have the script in the cache folder. This is because the '_read_tarfile' method deletes the script right after decompressing it.

So, for me, the only way to avoid modifying the main Opensesame code would be to rewrite the 'OSExpReader' class for OMM (that does not delete the script) , which would be used to decompress the osexp file.

Are you agree with that, or am I missing something?

I understand that modifying the main code is generally to be avoided, but what is the point of deleting the script from the cache? Leaving it there changes nothing, and it will be removed during the 'pool' cleanup by Opensesame. I think it makes sense to keep the script with the resource files

@smathot
Copy link
Collaborator

smathot commented Jun 27, 2024

From what I understand, if I use 'OSExpReader' as is, I will never have the script in the cache folder. This is because the '_read_tarfile' method deletes the script right after decompressing it.

Ah yes, that's true. The script itself (as a string) is still preserved at the script property of the OSExpReader object. And you can use that to pass as the string argument to the Experiment object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants