diff --git a/docs/modules/file.md b/docs/modules/file.md new file mode 100644 index 000000000..255e3c6d7 --- /dev/null +++ b/docs/modules/file.md @@ -0,0 +1,145 @@ +# File 模块 + +## 1. File 模块简介 + +文件管理模块提供了用于管理文件的一系列类,方便用户与Agent进行交互,其中包括 `File` 基类及其子类、`FileManager` 、`GlobalFileManagerHandler`以及与远程文件服务器交互的 `RemoteFileClient`。 + +推荐使用 `GlobalFileManagerHandler`在事件循环开始时初始化 `FileManager`以及获取全局的 `FileManager`,之后只需通过这个全局的 `FileManager`对文件进行增、删、查等操作以及获取Agent产生的文件。 + +!!! notes 注意 + + - **不推荐**用户自行操作 `File`类以免造成资源泄露。 + + - `FileManager`操作文件主要用于异步函数中,在同步函数中使用可能会无效。 + + - `FileManager`将作为此模块中生命周期最长的对象,它会在关闭时回收所有的持有对象(RemoteClient/temp local file),请不要随意关闭它。如果需要关闭已停止对其中所有注册文件的使用。 + +## 2. File 基类及其子类介绍 + +`File` 类是文件管理模块的基础类,用于表示通用的文件对象(不建议自行创建 `File` 类以免无法被 `Agent`识别使用以及无法被回收)。它包含文件的基本属性,如文件ID、文件名、文件大小、创建时间、文件用途和文件元数据。 + +此外, `File`类还定义了一系列抽象方法,比较常用的有: + +* 异步读取文件内容的 `read_contents`方法 +* 将文件内容写入本地路径的 `write_contents_to`方法 + +以及一些辅助方法: + +* 生成文件的字符串表示形式 +* 转换为字典形式 + +在File类的内部,其主要有两个继承子类,一个是 `Local File`,一个是 `Remote File`。 + +以下是 `File` 基类的属性以及方法介绍: + +| 属性 | 类型 | 描述 | +| ---------- | -------------- | --------------------------------------------------------- | +| id | str | 文件的唯一标识符 | +| filename | str | 文件名 | +| byte_size | int | 文件大小(以字节为单位) | +| created_at | str | 文件创建时间的时间戳 | +| purpose | str | 文件的目的或用途,有"assistants", "assistants_output"两种 | +| metadata | Dict[str, Any] | 与文件相关的附加元数据 | + +| 方法 | 描述 | +| ----------------- | ------------------------------ | +| read_contents | 异步读取文件内容 | +| write_contents_to | 异步将文件内容写入本地路径 | +| get_file_repr | 返回用于特定上下文的字符串表示 | +| to_dict | 将File对象转换为字典 | + +### 2.1 File 子类 + +#### 2.1.1 LocalFile 类 + +`LocalFile` 是 `File` 的子类,表示本地文件。除了继承自基类的属性外,它还添加了文件路径属性 `path`,用于表示文件在本地文件系统中的路径。 + +#### 2.1.2 RemoteFile 类 + +`RemoteFile` 也是 `File` 的子类,表示远程文件。它与 `LocalFile` 不同之处在于,它的文件内容存储在远程文件服务器交。`RemoteFile` 类还包含与远程文件服务器交互的相关逻辑。 + +## 3. FileManager 类介绍 + +`FileManager` 类是一个高级文件管理工具,封装了文件的创建、上传、删除等操作,以及与 `Agent`进行交互,无论是 `LocalFile`还是 `RemoteFile`都可以使用它来统一管理。`FileManager`集成了与远程文件服务器交互的逻辑(通过 `RemoteFileClient`完成上传、下载、删除等文件操作)以及与本地文件交互的逻辑(从本地路径创建 `LocalFile`)。它依赖于 `FileRegistry` 来对文件进行用于在整个应用程序中管理文件的注册和查找。 + +以下是相关的属性和方法 + +| 属性 | 类型 | 描述 | +| ------------------ | ------------------ | ------------------------ | +| remote_file_client | RemoteFileClient | 远程文件客户端 | +| save_dir | Optional[FilePath] | 用于保存本地文件的目录 | +| closed | bool | 文件管理器是否已关闭 | + +| 方法 | 描述 | +| ---------------------------- | ------------------------------------ | +| create_file_from_path | 从指定文件路径创建文件 | +| create_local_file_from_path | 从文件路径创建本地文件 | +| create_remote_file_from_path | 从文件路径创建远程文件并上传至服务器 | +| create_file_from_bytes | 从字节创建文件 | +| retrieve_remote_file_by_id | 通过ID获取远程文件 | +| look_up_file_by_id | 通过ID查找本地文件 | +| list_remote_files | 列出远程文件 | + +!!! notes 注意 + + - `FileManager` 类不可被复制以免造成资源泄露。 + + - 如果未指定 `save_dir`,那么当 `FileManager`关闭时,所有与之关联的本地文件都会被回收。反之,都会被保存。 + + - 如果 `FileManager` 类有相关联的 `RemoteFileClient`,那么当 `FileManager`关闭时,相关联的 `RemoteFileClient`也会一起关闭。 + +## 4. RemoteFileClient 类介绍 + +`RemoteFileClient` 是用于与远程文件服务器交互的类。它定义了文件上传、文件下载、文件删除等操作的方法。`AIStudioFileClient` 是 `RemoteFileClient` 的一个具体推荐实现,用于与文件服务交互,用户使用 `access token`作为参数用于身份验证,之后能够在AIStudio文件服务中上传、检索、列出文件,以及创建临时URL以访问文件。`RemoteFileClient`使用时被 `FileManager`持有,一旦 `FileManager`关闭,`RemoteFileClient`也会相应被关闭,其中的资源也会被相应释放。 + +!!! notes 注意 + * 一般情况下无需使用 `RemoteFile`,默认所有文件都为 `LocalFile`,如需使用,将 `GlobalFileManagerHandler`的`enable_remote_file`设置为True即可。 + +## 5. 使用方法 + +1. 通过 `GlobalFileManagerHandler`获取全局的FileManager,通过它来控制所有文件,注:它的生命周期同整个事件循环。 + +```python +from erniebot_agent.file import GlobalFileManagerHandler + +async def demo_function(): + file_manager = await GlobalFileManagerHandler().get() +``` +2. 通过 `GlobalFileManagerHandler`创建 `File` + +```python +from erniebot_agent.file import GlobalFileManagerHandler + +async def demo_function(): + file_manager = await GlobalFileManagerHandler().get() + # 从路径创建File, file_type可选择local或者remote file_purpose='assistant'代表用于给LLM输入使用 + local_file = await file_manager.create_file_from_path(file_path='your_path', file_type='local') +``` +3. 通过 `GlobalFileManagerHandler`搜索以及保存 `File` + +```python +from erniebot_agent.file import GlobalFileManagerHandler + +async def demo_function(): + file_manager = await GlobalFileManagerHandler().get() + # 通过fileid搜索文件 + file = file_manager.look_up_file_by_id(file_id='your_file_id') + # 读取file内容(bytes) + file_content = await file.read_contents() + # 写出到指定位置 + await local_file.write_contents_to('your_willing_path') +``` +4. 配置 `GlobalFileManagerHandler`从而在Agent中直接获取相关文件 + ```python + from erniebot_agent.file import GlobalFileManagerHandler + + async def demo_function(): + await GlobalFileManagerHandler().configure(save_dir='your_path') # 需要在事件循环最开始配置 + ... # 此处省略agent创建过程 + response = await agent.async_run('请帮我画一张北京市的图') + # 您可以通过AgentResponse.files获取agent所有文件也可以在save_dir中找到生成的图片 + files = response.files + ``` + +## 6 File的API接口 +`File`模块的API接口,请参考[文档](../../package/erniebot_agent/file/)。 \ No newline at end of file diff --git a/docs/package/erniebot_agent/file.md b/docs/package/erniebot_agent/file.md new file mode 100644 index 000000000..c1fe9d2eb --- /dev/null +++ b/docs/package/erniebot_agent/file.md @@ -0,0 +1,62 @@ +# File Module + +::: erniebot_agent.file + options: + summary: true + members: + - + + +::: erniebot_agent.file.base + options: + summary: true + separate_signature: true + show_signature_annotations: true + members: + - File + + +::: erniebot_agent.file.local_file + options: + summary: true + separate_signature: true + show_signature_annotations: true + line_length: 60 + members: + - LocalFile + +::: erniebot_agent.file.remote_file + options: + summary: true + separate_signature: true + show_signature_annotations: true + line_length: 60 + members: + - RemoteFile + +::: erniebot_agent.file.file_manager + options: + summary: true + separate_signature: true + show_signature_annotations: true + line_length: 60 + members: + - FileManager + +::: erniebot_agent.file.global_file_manager_handler + options: + summary: true + separate_signature: true + show_signature_annotations: true + line_length: 60 + members: + - GlobalFileManagerHandler + +::: erniebot_agent.file.remote_file + options: + summary: true + separate_signature: true + show_signature_annotations: true + line_length: 60 + members: + - AIStudioFileClient \ No newline at end of file diff --git a/erniebot-agent/src/erniebot_agent/file/__init__.py b/erniebot-agent/src/erniebot_agent/file/__init__.py index 2d89f4142..46fd483ae 100644 --- a/erniebot-agent/src/erniebot_agent/file/__init__.py +++ b/erniebot-agent/src/erniebot_agent/file/__init__.py @@ -12,4 +12,47 @@ # See the License for the specific language governing permissions and # limitations under the License. -from erniebot_agent.file.global_file_manager_handler import GlobalFileManagerHandler +""" +The file module provides a series of classes for managing files, +which facilitate user interaction with the agent, +including the File base class and its subclasses, FileManager, GlobalFileManagerHandler, +and RemoteFileClient, which interacts with remote file servers. + +It is recommended to use the GlobalFileManagerHandler to initialize the FileManager +and obtain the global FileManager at the beginning of the event loop. +Afterwards, you can simply use this global FileManager to perform operations such as adding, +deleting, and querying files, as well as obtaining files generated by the agent. + + +A few notes about this submodule: + +- If you do not set environment variable `AISTUDIO_ACCESS_TOKEN`, it will be under default setting. + +- Method `GlobalFileManagerHandler().configure()` can only be called **once** at the beginning. + +- When you want to get a file manger, you can use method `GlobalFileManagerHandler().get()`. + +- The lifecycle of the `FileManager` class is synchronized with the event loop. + +- `FileManager` class is Noncopyable. + +- If you want to get the content of `File` object, you can use `read_contents` + and use `write_contents_to` create the file to location you want. + +- We do **not** recommend you to create `File` object yourself. + +Examples: + >>> from erniebot_agent.file import GlobalFileManagerHandler + >>> async def demo_function(): + >>> # need to use at the beginning of event loop + >>> await GlobalFileManagerHandler().configure(save_dir='your_path') + >>> file_manager = await GlobalFileManagerHandler().get() + >>> local_file = await file_manager.create_file_from_path(file_path='your_path', file_type='local') + + >>> file = file_manager.look_up_file_by_id(file_id='your_file_id') + >>> file_content = await file.read_contents() # get file content(bytes) + >>> await local_file.write_contents_to('your_willing_path') # save to location you want +""" + +from .global_file_manager_handler import GlobalFileManagerHandler +from .remote_file import AIStudioFileClient diff --git a/erniebot-agent/src/erniebot_agent/file/base.py b/erniebot-agent/src/erniebot_agent/file/base.py index 3236ed04b..4ccc002cf 100644 --- a/erniebot-agent/src/erniebot_agent/file/base.py +++ b/erniebot-agent/src/erniebot_agent/file/base.py @@ -20,6 +20,24 @@ class File(metaclass=abc.ABCMeta): + """ + Abstract base class representing a generic file. + + Attributes: + id (str): Unique identifier for the file. + filename (str): File name. + byte_size (int): Size of the file in bytes. + created_at (str): Timestamp indicating the file creation time. + purpose (str): Purpose or use case of the file. + metadata (Dict[str, Any]): Additional metadata associated with the file. + + Methods: + read_contents: Abstract method to asynchronously read the file contents. + write_contents_to: Asynchronously write the file contents to a local path. + get_file_repr: Return a string representation for use in specific contexts. + to_dict: Convert the File object to a dictionary. + """ + def __init__( self, *, @@ -30,6 +48,20 @@ def __init__( purpose: str, metadata: Dict[str, Any], ) -> None: + """ + Init method for the File class. + + Args: + id (str): Unique identifier for the file. + filename (str): File name. + byte_size (int): Size of the file in bytes. + created_at (str): Timestamp indicating the file creation time. + purpose (str): Purpose or use case of the file. + metadata (Dict[str, Any]): Additional metadata associated with the file. + + Returns: + None + """ super().__init__() self.id = id self.filename = filename @@ -54,6 +86,7 @@ async def read_contents(self) -> bytes: raise NotImplementedError async def write_contents_to(self, local_path: Union[str, os.PathLike]) -> None: + """Create a file to the location you want.""" contents = await self.read_contents() await anyio.Path(local_path).write_bytes(contents) diff --git a/erniebot-agent/src/erniebot_agent/file/file_manager.py b/erniebot-agent/src/erniebot_agent/file/file_manager.py index a950845b8..2af4db6bd 100644 --- a/erniebot-agent/src/erniebot_agent/file/file_manager.py +++ b/erniebot-agent/src/erniebot_agent/file/file_manager.py @@ -50,6 +50,25 @@ @final class FileManager(Closeable, Noncopyable): + """ + Manages files, providing methods for creating, retrieving, and listing files. + + Attributes: + remote_file_client(RemoteFileClient): The remote file client. + save_dir (Optional[FilePath]): Directory for saving local files. + closed: Whether the file manager is closed. + + Methods: + create_file_from_path: Create a file from a specified file path. + create_local_file_from_path: Create a local file from a file path. + create_remote_file_from_path: Create a remote file from a file path. + create_file_from_bytes: Create a file from bytes. + retrieve_remote_file_by_id: Retrieve a remote file by its ID. + look_up_file_by_id: Look up a file by its ID. + list_remote_files: List remote files. + + """ + _temp_dir: Optional[tempfile.TemporaryDirectory] = None def __init__( @@ -59,6 +78,19 @@ def __init__( *, prune_on_close: bool = True, ) -> None: + """ + Initialize the FileManager object. + + Args: + remote_file_client (Optional[RemoteFileClient]): The remote file client. + prune_on_close (bool): Control whether to automatically clean up files + that can be safely deleted when the object is closed. + save_dir (Optional[FilePath]): Directory for saving local files. + + Returns: + None + + """ super().__init__() self._remote_file_client = remote_file_client @@ -131,6 +163,23 @@ async def create_file_from_path( file_metadata: Optional[Dict[str, Any]] = None, file_type: Optional[Literal["local", "remote"]] = None, ) -> Union[LocalFile, RemoteFile]: + """ + Create a file from a specified file path. + + Args: + file_path (FilePath): The path to the file. + file_purpose (FilePurpose): The purpose or use case of the file, + including `assistant`: used for llm and `assistant_output`: used for output. + file_metadata (Optional[Dict[str, Any]]): Additional metadata associated with the file. + file_type (Optional[Literal["local", "remote"]]): The type of file ("local" or "remote"). + + Returns: + Union[LocalFile, RemoteFile]: The created file. + + Raises: + ValueError: If an unsupported file type is provided. + + """ self.ensure_not_closed() file: Union[LocalFile, RemoteFile] if file_type is None: @@ -149,6 +198,19 @@ async def create_local_file_from_path( file_purpose: protocol.FilePurpose, file_metadata: Optional[Dict[str, Any]], ) -> LocalFile: + """ + Create a local file from a local file path. + + Args: + file_path (FilePath): The path to the file. + file_purpose (FilePurpose): The purpose or use case of the file, + including `assistant`: used for llm and `assistant_output`: used for output. + file_metadata (Optional[Dict[str, Any]]): Additional metadata associated with the file. + + Returns: + LocalFile: The created local file. + + """ file = await self._create_local_file_from_path( pathlib.Path(file_path), file_purpose, @@ -163,6 +225,19 @@ async def create_remote_file_from_path( file_purpose: protocol.FilePurpose, file_metadata: Optional[Dict[str, Any]], ) -> RemoteFile: + """ + Create a remote file from a file path and upload it to the client. + + Args: + file_path (FilePath): The path to the file. + file_purpose (FilePurpose): The purpose or use case of the file, + including `assistant`: used for llm and `assistant_output`: used for output. + file_metadata (Optional[Dict[str, Any]]): Additional metadata associated with the file. + + Returns: + RemoteFile: The created remote file. + + """ file = await self._create_remote_file_from_path( pathlib.Path(file_path), file_purpose, @@ -217,6 +292,20 @@ async def create_file_from_bytes( file_metadata: Optional[Dict[str, Any]] = None, file_type: Optional[Literal["local", "remote"]] = None, ) -> Union[LocalFile, RemoteFile]: + """ + Create a file from bytes. + + Args: + file_contents (bytes): The contents of the file. + filename (str): The name of the file. + file_purpose (FilePurpose): The purpose or use case of the file. + file_metadata (Optional[Dict[str, Any]]): Additional metadata associated with the file. + file_type (Optional[Literal["local", "remote"]]): The type of file ("local" or "remote"). + + Returns: + Union[LocalFile, RemoteFile]: The created file. + + """ self.ensure_not_closed() if file_type is None: file_type = self._get_default_file_type() @@ -250,6 +339,16 @@ async def create_file_from_bytes( return file async def retrieve_remote_file_by_id(self, file_id: str) -> RemoteFile: + """ + Retrieve a remote file by its ID. + + Args: + file_id (str): The ID of the remote file. + + Returns: + RemoteFile: The retrieved remote file. + + """ self.ensure_not_closed() file = await self._get_remote_file_client().retrieve_file(file_id) self._file_registry.register_file(file) @@ -261,6 +360,19 @@ async def list_remote_files(self) -> List[RemoteFile]: return files def look_up_file_by_id(self, file_id: str) -> File: + """ + Look up a file by its ID. + + Args: + file_id (str): The ID of the file. + + Returns: + file[File]: The looked-up file. + + Raises: + FileError: If the file with the specified ID is not found. + + """ self.ensure_not_closed() file = self._file_registry.look_up_file(file_id) if file is None: @@ -271,10 +383,18 @@ def look_up_file_by_id(self, file_id: str) -> File: return file def list_registered_files(self) -> List[File]: + """ + List remote files. + + Returns: + List[RemoteFile]: The list of remote files. + + """ self.ensure_not_closed() return self._file_registry.list_files() async def prune(self) -> None: + """Clean local cache of file manager.""" while True: try: file = self._fully_managed_files.pop() @@ -292,6 +412,7 @@ async def prune(self) -> None: self._file_registry.unregister_file(file) async def close(self) -> None: + """Delete the file manager and clean up its cache""" if not self._closed: if self._remote_file_client is not None: await self._remote_file_client.close() diff --git a/erniebot-agent/src/erniebot_agent/file/file_registry.py b/erniebot-agent/src/erniebot_agent/file/file_registry.py index f3de22a9a..8feffe3dd 100644 --- a/erniebot-agent/src/erniebot_agent/file/file_registry.py +++ b/erniebot-agent/src/erniebot_agent/file/file_registry.py @@ -19,11 +19,37 @@ @final class FileRegistry(object): + """ + Singleton class for managing file registration. + + + Methods: + register_file: Register a file in the registry. + unregister_file: Unregister a file from the registry. + look_up_file: Look up a file by its ID in the registry. + list_files: Get a list of all registered files. + + """ + def __init__(self) -> None: super().__init__() self._id_to_file: Dict[str, File] = {} def register_file(self, file: File, *, allow_overwrite: bool = False) -> None: + """ + Register a file in the registry. + + Args: + file (File): The file object to register. + allow_overwrite (bool): Allow overwriting if a file with the same ID is already registered. + + Returns: + None + + Raises: + RuntimeError: If the file ID is already registered and allow_overwrite is False. + + """ file_id = file.id if file_id in self._id_to_file: if not allow_overwrite: @@ -31,13 +57,43 @@ def register_file(self, file: File, *, allow_overwrite: bool = False) -> None: self._id_to_file[file_id] = file def unregister_file(self, file: File) -> None: + """ + Unregister a file from the registry. + + Args: + file (File): The file object to unregister. + + Returns: + None + + Raises: + RuntimeError: If the file ID is not registered. + + """ file_id = file.id if file_id not in self._id_to_file: raise ValueError(f"File with ID {repr(file_id)} is not registered.") self._id_to_file.pop(file_id) def look_up_file(self, file_id: str) -> Optional[File]: + """ + Look up a file by its ID in the registry. + + Args: + file_id (str): The ID of the file to look up. + + Returns: + Optional[File]: The File object if found, or None if not found. + + """ return self._id_to_file.get(file_id, None) def list_files(self) -> List[File]: + """ + Get a list of all registered files. + + Returns: + List[File]: The list of registered File objects. + + """ return list(self._id_to_file.values()) diff --git a/erniebot-agent/src/erniebot_agent/file/global_file_manager_handler.py b/erniebot-agent/src/erniebot_agent/file/global_file_manager_handler.py index c8ef83fb2..b0c3f4574 100644 --- a/erniebot-agent/src/erniebot_agent/file/global_file_manager_handler.py +++ b/erniebot-agent/src/erniebot_agent/file/global_file_manager_handler.py @@ -25,6 +25,20 @@ @final class GlobalFileManagerHandler(metaclass=SingletonMeta): + """Singleton handler for managing the global FileManager instance. + + This class provides a singleton instance for managing the global FileManager + and allows for its configuration and retrieval. + + + Methods: + get: Asynchronously retrieves the global FileManager instance. + configure: Asynchronously configures the global FileManager + at the beginning of event loop. + set: Asynchronously sets the global FileManager explicitly. + + """ + _file_manager: Optional[FileManager] def __init__(self) -> None: @@ -33,6 +47,18 @@ def __init__(self) -> None: self._file_manager = None async def get(self) -> FileManager: + """ + Retrieve the global FileManager instance. + + This method returns the existing global FileManager instance, + creating one if it doesn't exist. + + + Returns: + FileManager: The global FileManager instance. + + """ + async with self._lock: if self._file_manager is None: self._file_manager = await self._create_default_file_manager( @@ -50,6 +76,26 @@ async def configure( enable_remote_file: bool = False, **opts: Any, ) -> None: + """ + Configure the global FileManager. + + This method configures the global FileManager with the provided parameters + at the beginning of event loop. + If the global FileManager is already set, it raises an error. + + Args: + access_token (Optional[str]): The access token for remote file client. + save_dir (Optional[str]): The directory for saving local files. + enable_remote_file (bool): Whether to enable remote file. + **opts (Any): Additional options for FileManager. + + Returns: + None + + Raises: + RuntimeError: If the global FileManager is already set. + + """ async with self._lock: if self._file_manager is not None: self._raise_file_manager_already_set_error() @@ -61,6 +107,21 @@ async def configure( ) async def set(self, file_manager: FileManager) -> None: + """ + Set the global FileManager explicitly. + + This method sets the global FileManager instance explicitly. + If the global FileManager is already set, it raises an error. + + Args: + file_manager (FileManager): The FileManager instance to set as global. + + Returns: + None + + Raises: + RuntimeError: If the global FileManager is already set. + """ async with self._lock: if self._file_manager is not None: self._raise_file_manager_already_set_error() @@ -73,6 +134,8 @@ async def _create_default_file_manager( enable_remote_file: bool, **opts: Any, ) -> FileManager: + """Create the default FileManager instance.""" + async def _close_file_manager(): await file_manager.close() diff --git a/erniebot-agent/src/erniebot_agent/file/local_file.py b/erniebot-agent/src/erniebot_agent/file/local_file.py index f32027be3..e58ec8463 100644 --- a/erniebot-agent/src/erniebot_agent/file/local_file.py +++ b/erniebot-agent/src/erniebot_agent/file/local_file.py @@ -27,6 +27,22 @@ def create_local_file_from_path( file_purpose: protocol.FilePurpose, file_metadata: Dict[str, Any], ) -> "LocalFile": + """ + Create a LocalFile object from a local file path. + + Args: + file_path (pathlib.Path): The path to the local file. + file_purpose (protocol.FilePurpose): The purpose or use case of the file, + including "assistants" and "assistants_output". + file_metadata (Dict[str, Any]): Additional metadata associated with the file. + + Returns: + LocalFile: The created LocalFile object. + + Raises: + FileNotFoundError: If the specified file does not exist. + + """ if not file_path.exists(): raise FileNotFoundError(f"File {file_path} does not exist.") file_id = _generate_local_file_id() @@ -46,6 +62,26 @@ def create_local_file_from_path( class LocalFile(File): + """ + Represents a local file. + + Attributes: + id (str): Unique identifier for the file. + filename (str): File name. + byte_size (int): Size of the file in bytes. + created_at (str): Timestamp indicating the file creation time. + purpose (str): Purpose or use case of the file, + including "assistants" and "assistants_output". + metadata (Dict[str, Any]): Additional metadata associated with the file. + path (pathlib.Path): The path to the local file. + + Methods: + read_contents: Asynchronously read the contents of the local file. + write_contents_to: Asynchronously write the file contents to a local path. + get_file_repr: Return a string representation for use in specific contexts. + + """ + def __init__( self, *, @@ -58,6 +94,23 @@ def __init__( path: pathlib.Path, validate_file_id: bool = True, ) -> None: + """ + Initialize a LocalFile object. + + Args: + id (str): The unique identifier for the file. + filename (str): The name of the file. + byte_size (int): The size of the file in bytes. + created_at (str): The timestamp indicating the file creation time. + purpose (protocol.FilePurpose): The purpose or use case of the file. + metadata (Dict[str, Any]): Additional metadata associated with the file. + path (pathlib.Path): The path to the local file. + validate_file_id (bool): Flag to validate the file ID. Default is True. + + Raises: + ValueError: If the file ID is invalid. + + """ if validate_file_id: if not protocol.is_local_file_id(id): raise ValueError(f"Invalid file ID: {id}") @@ -72,6 +125,7 @@ def __init__( self.path = path async def read_contents(self) -> bytes: + """Asynchronously read the contents of the local file.""" return await anyio.Path(self.path).read_bytes() def _get_attrs_str(self) -> str: diff --git a/erniebot-agent/src/erniebot_agent/file/protocol.py b/erniebot-agent/src/erniebot_agent/file/protocol.py index 149c6c5b5..d2f4780cb 100644 --- a/erniebot-agent/src/erniebot_agent/file/protocol.py +++ b/erniebot-agent/src/erniebot_agent/file/protocol.py @@ -31,6 +31,7 @@ def create_local_file_id_from_uuid(uuid: str) -> str: + """Create a random local file id.""" return _LOCAL_FILE_ID_PREFIX + uuid @@ -39,26 +40,32 @@ def get_timestamp() -> str: def is_file_id(str_: str) -> bool: + """Judge whether a file id is valid or not.""" return is_local_file_id(str_) or is_remote_file_id(str_) def is_local_file_id(str_: str) -> bool: + """Judge whether a file id is a valid local file id or not.""" return _compiled_local_file_id_pattern.fullmatch(str_) is not None def is_remote_file_id(str_: str) -> bool: + """Judge whether a file id is a valid remote file id or not.""" return _compiled_remote_file_id_pattern.fullmatch(str_) is not None def extract_file_ids(str_: str) -> List[str]: + """Find all file ids in a string.""" return extract_local_file_ids(str_) + extract_remote_file_ids(str_) def extract_local_file_ids(str_: str) -> List[str]: + """Find all local file ids in a string.""" return _compiled_local_file_id_pattern.findall(str_) def extract_remote_file_ids(str_: str) -> List[str]: + """Find all remote file ids in a string.""" return _compiled_remote_file_id_pattern.findall(str_) diff --git a/erniebot-agent/src/erniebot_agent/file/remote_file.py b/erniebot-agent/src/erniebot_agent/file/remote_file.py index f9d851445..8c9415283 100644 --- a/erniebot-agent/src/erniebot_agent/file/remote_file.py +++ b/erniebot-agent/src/erniebot_agent/file/remote_file.py @@ -28,6 +28,28 @@ class RemoteFile(File): + """ + Represents a remote file. + + Attributes: + id (str): Unique identifier for the file. + filename (str): File name. + byte_size (int): Size of the file in bytes. + created_at (str): Timestamp indicating the file creation time. + purpose (str): Purpose or use case of the file, + including "assistants" and "assistants_output". + metadata (Dict[str, Any]): Additional metadata associated with the file. + client (RemoteFileClient): The client of remote file. + + Methods: + read_contents: Asynchronously read the contents of the local file. + write_contents_to: Asynchronously write the file contents to a local path. + get_file_repr: Return a string representation for use in specific contexts. + delete: Asynchronously delete the file from client. + create_temporary_url: Asynchronously create a temporary URL for the file. + + """ + def __init__( self, *, @@ -65,6 +87,7 @@ async def delete(self) -> None: await self._client.delete_file(self.id) async def create_temporary_url(self, expire_after: float = 600) -> str: + """To create a temporary valid URL for the file.""" return await self._client.create_temporary_url(self.id, expire_after) def get_file_repr_with_url(self, url: str) -> str: @@ -100,6 +123,20 @@ async def create_temporary_url(self, file_id: str, expire_after: float) -> str: class AIStudioFileClient(RemoteFileClient): + """ + Recommended remote file client: AI Studio. + + Methods: + upload_file: Upload a file to AI Studio client. + retrieve_file: Retrieve information about a file from AI Studio. + retrieve_file_contents: Retrieve the contents of a file from AI Studio. + list_files: List files available in AI Studio. + delete_file: Delete a file in AI Studio client(#TODO: not supported now). + create_temporary_url: Create a temporary URL for accessing a file in AI Studio. + close: Close the AIStudioFileClient. + + """ + _BASE_URL: ClassVar[str] = "https://sandbox-aistudio.baidu.com" _UPLOAD_ENDPOINT: ClassVar[str] = "/llm/lmapp/files" _RETRIEVE_ENDPOINT: ClassVar[str] = "/llm/lmapp/files/{file_id}" @@ -109,6 +146,14 @@ class AIStudioFileClient(RemoteFileClient): def __init__( self, access_token: str, *, aiohttp_session: Optional[aiohttp.ClientSession] = None ) -> None: + """ + Initialize the AIStudioFileClient. + + Args: + access_token (str): The access token for AI Studio. + aiohttp_session (Optional[aiohttp.ClientSession]): A custom aiohttp session (default is None). + + """ super().__init__() self._access_token = access_token if aiohttp_session is None: @@ -123,6 +168,7 @@ def closed(self) -> bool: async def upload_file( self, file_path: pathlib.Path, file_purpose: protocol.FilePurpose, file_metadata: Dict[str, Any] ) -> RemoteFile: + """Upload a file to AI Studio client.""" self.ensure_not_closed() url = self._get_url(self._UPLOAD_ENDPOINT) headers: Dict[str, str] = {} @@ -143,6 +189,7 @@ async def upload_file( return self._create_file_obj_from_dict(result) async def retrieve_file(self, file_id: str) -> RemoteFile: + """Retrieve a file in AI Studio client by id.""" self.ensure_not_closed() url = self._get_url(self._RETRIEVE_ENDPOINT).format(file_id=file_id) headers: Dict[str, str] = {} @@ -157,6 +204,7 @@ async def retrieve_file(self, file_id: str) -> RemoteFile: return self._create_file_obj_from_dict(result) async def retrieve_file_contents(self, file_id: str) -> bytes: + """Retrieve file content in AI Studio client by id.""" self.ensure_not_closed() url = self._get_url(self._RETRIEVE_CONTENTS_ENDPOINT).format(file_id=file_id) headers: Dict[str, str] = {} @@ -170,6 +218,7 @@ async def retrieve_file_contents(self, file_id: str) -> bytes: return resp_bytes async def list_files(self) -> List[RemoteFile]: + """List files in AI Studio client.""" self.ensure_not_closed() url = self._get_url(self._LIST_ENDPOINT) headers: Dict[str, str] = {} diff --git a/mkdocs.yml b/mkdocs.yml index 5e25450cf..2d2ceb3d7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -12,11 +12,13 @@ nav: - messages: 'modules/messages.md' - chat_modles: 'modules/chat_models.md' - memory: 'modules/memory.md' + - file: 'modules/file.md' - tools: 'modules/tools/create-tool.md' - API: - erniebot-agent: - messages: "package/erniebot_agent/messages.md" - memory: "package/erniebot_agent/memory.md" + - file: "package/erniebot_agent/file.md" - tools: "package/erniebot_agent/tools.md" - chat_models: "package/erniebot_agent/chat_models.md" - agents: "package/erniebot_agent/agents.md"