You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS : Ubuntu 20.04
Python : 3.10
DVC-data : 3.7.0 (but the bug is still present on the main branch)
I am using a DVCFilesystem object to get files from a remote repository. To make the process efficient, I added a local cache to prevent downloading the same md5 again. On that side, everything is good. However, after the file is downloaded into the cache, it gets copied in the final directory instead of symlinked, like it is supposed to be by configuration.
While debugging, I found that there is an error for this use-case in the fs.pymodule, more precisely, the get_files method of DataFileSystem. When a md5 is absent from cache, it gets downloaded using the _cache_remote_file method, but then gets copied, since the later _transfer uses the storage options from the remote instead of the cache_storageoptions it should.
Steps to replicate :
Create a DVCFilesystem, passing a remote configuration with remote_config and cache configuration with config.
The cache configuration must have the symlink or any other link type
Use the getmethod to pull a file from remote storage to a location
Inspect the file created at location, it will be a copy
Inspect the cache to discover a file is present there as well
The text was updated successfully, but these errors were encountered:
OS : Ubuntu 20.04
Python : 3.10
DVC-data : 3.7.0 (but the bug is still present on the main branch)
I am using a
DVCFilesystem
object to get files from a remote repository. To make the process efficient, I added a local cache to prevent downloading the same md5 again. On that side, everything is good. However, after the file is downloaded into the cache, it gets copied in the final directory instead of symlinked, like it is supposed to be by configuration.While debugging, I found that there is an error for this use-case in the
fs.py
module, more precisely, theget_files
method ofDataFileSystem
. When a md5 is absent from cache, it gets downloaded using the_cache_remote_file
method, but then gets copied, since the later_transfer
uses thestorage
options from theremote
instead of thecache_storage
options it should.Steps to replicate :
remote_config
and cache configuration withconfig
.symlink
or any otherlink
typeget
method to pull afile
from remote storage to alocation
location
, it will be a copyThe text was updated successfully, but these errors were encountered: