-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent][Feat] Add remote file caching #139
Closed
Bobholamovic
wants to merge
42
commits into
PaddlePaddle:develop
from
Bobholamovic:agent/dev/file_cache
Closed
[Agent][Feat] Add remote file caching #139
Bobholamovic
wants to merge
42
commits into
PaddlePaddle:develop
from
Bobholamovic:agent/dev/file_cache
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6f8adea
to
405edee
Compare
Bobholamovic
commented
Dec 13, 2023
@@ -178,7 +184,7 @@ def _get_default_headers(self) -> Dict[str, str]: | |||
|
|||
def _build_file_obj_from_dict(self, dict_: Dict[str, Any]) -> RemoteFile: | |||
metadata: Dict[str, Any] | |||
if "meta" in dict_: | |||
if dict_.get("meta"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
考虑meta
为空字符串的情况
… into agent/dev/file_cache
此PR已过期,目前与主分支差距过大,现关闭此PR。后续将基于新的主分支代码重新考虑方案。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
为
RemoteFile
增加可选的缓存以提升性能。缓存功能默认开启。当前的naive方案要点如下:
FileManager
在构造时支持指定参数cache_remote_files
以启用或关闭远程文件缓存功能。FileManager
内部使用一个FileCacheManager
对象,该对象负责为文件ID分配FileCache
对象。FileCache
对象被绑定到一个本地文件,支持文件读写、定时刷新等功能。FileCache
对象有三种状态:FileCache
初始化后处于前两种状态中的一种。前两种状态之间可以相互转换,但alive / not discarded -> not alive / discarded是不可逆的。FileCache
对象内部具有一个定时器,该定时器能够让缓存在超过一定时间后失效(active -> not active)。缓存失效后,将在下一次读取时自动更新,也可以调用FileCache
的方法手动更新。FileManager
创建远程文件时(通过create_file_from_path
、create_remote_file_from_path
或create_file_from_bytes
),如果缓存功能开启,则将创建RemoteFileWithCache
对象。具体而言,RemoteFileWithCache
对象是带有缓存支持的RemoteFile
,通过将一个FileCache
绑定到一个RemoteFile
得到。RemoteFileWithCache.read_contents
方法调用时,首先尝试从缓存中获取数据,如果缓存不可用,则仍然从远端拉取数据。当内存中不存在任何使用
FileCache
对象的对象时,该FileCache
对象将被销毁(alive / not discarded -> not alive / discarded),从FileCacheManager
中移除,并执行可能被委托的清理函数(如删除缓存文件)。更多细节详见comments。
暂时想到一些后续可以继续优化的点:
FileManager
统一管理和分配文件资源,在必要时将创建好的文件资源交给FileCache
使用。也就是说,FileCache
只对文件具有读、写权限,不具备创建、删除文件的能力,如果将文件管理能力进一步下放给FileCache
,那么可以做一些进一步的性能优化,例如在缓存失效时即时清理文件。FileCache
不再被任何对象使用时,才触发缓存的销毁,而考虑到FileManager
默认工作在auto-registering模式,“FileCache
不再被任何对象使用”是很少发生的,除非用户手动unregister来放弃文件。在存在大量文件的场景中,控制缓存最大数量,采用LRU、LFU等caching策略或许可以获得更好的性能。RemoteFileWithCache
对象不能得到通知,而是直接fall back成普通的不带缓存的RemoteFile
。可考虑使用观察者模式和中介者模式重构FileCache
、FileCacheManager
与RemoteFileWithCache
之间的关系,允许RemoteFileWithCache
中的缓存信息被动态更新。FileCacheManager
进一步中心化。CacheFile
使用的锁。