diff --git a/.github/issue_template/bug_report.yml b/.github/issue_template/bug_report.yml new file mode 100644 index 000000000..7a0c2f1a5 --- /dev/null +++ b/.github/issue_template/bug_report.yml @@ -0,0 +1,43 @@ +name: 🐞 Bug report +description: Submit Bug feedback +title: "[Bug]: " +labels: ["bug"] +body: + - type: markdown + attributes: + value: | + Thank you for reporting the bug. Please fill in the following information to help us reproduce the issue. + - type: input + id: env + attributes: + label: Environment Information + description: System version / Browser / Runtime environment + placeholder: macOS 14.5, Chrome 118 + validations: + required: true + - type: textarea + id: steps + attributes: + label: Reproduction Steps + description: Write down step by step how to trigger this bug + validations: + required: true + - type: textarea + id: expected + attributes: + label: Expected Behavior + description: What should happen ideally + validations: + required: true + - type: textarea + id: actual + attributes: + label: Actual Behavior + description: What actually happened + validations: + required: true + - type: textarea + id: screenshots + attributes: + label: Screenshots / Logs + description: Provide screenshots or logs (optional) diff --git a/.github/issue_template/enhancement.yml b/.github/issue_template/enhancement.yml new file mode 100644 index 000000000..c16631bb8 --- /dev/null +++ b/.github/issue_template/enhancement.yml @@ -0,0 +1,24 @@ +name: ⚙️ Enhancement +description: Propose optimization suggestions for existing features +title: "[Enhancement]: " +labels: ["enhancement"] +body: + - type: textarea + id: current + attributes: + label: Current Behavior + description: How existing features currently work + validations: + required: true + - type: textarea + id: improved + attributes: + label: Improvement Suggestions + description: How you would like to improve it + validations: + required: true + - type: textarea + id: benefit + attributes: + label: Benefits of Improvement + description: Benefits after improvement diff --git a/.github/issue_template/feature_request.yml b/.github/issue_template/feature_request.yml new file mode 100644 index 000000000..3314f3de9 --- /dev/null +++ b/.github/issue_template/feature_request.yml @@ -0,0 +1,28 @@ +name: ✨ Feature request +description: Submit new feature requirements +title: "[Feature]: " +labels: ["feature"] +body: + - type: markdown + attributes: + value: | + Please describe the new feature you would like to add and its use cases. + - type: textarea + id: description + attributes: + label: Feature Description + description: Briefly describe what the feature does + validations: + required: true + - type: textarea + id: usecase + attributes: + label: Use Cases + description: How this feature will be used + validations: + required: true + - type: textarea + id: design + attributes: + label: Design Ideas / Technical Details + description: (Optional) Your thoughts on implementation approach diff --git a/.github/issue_template/performance.yml b/.github/issue_template/performance.yml new file mode 100644 index 000000000..0345ab893 --- /dev/null +++ b/.github/issue_template/performance.yml @@ -0,0 +1,22 @@ +name: 🚀 Performance issue +description: Submit performance related issues or optimization suggestions +title: "[Performance]: " +labels: ["performance"] +body: + - type: textarea + id: bottleneck + attributes: + label: Current Bottleneck + description: Describe the performance issue + validations: + required: true + - type: textarea + id: metrics + attributes: + label: Performance Metrics + description: Provide test data / benchmarks + - type: textarea + id: proposal + attributes: + label: Optimization Suggestions + description: Your optimization ideas diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 000000000..80a7550fe --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,60 @@ +# Pull Request + +## Description +Brief description of the changes + +## Type of Change +- [ ] Feature addition +- [ ] Bug fix +- [ ] Performance optimization +- [ ] Code refactoring +- [ ] Documentation update +- [ ] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Changes Made +- [ ] Change 1 +- [ ] Change 2 +- [ ] Change 3 + +## Testing +- [ ] Unit tests pass +- [ ] Integration tests pass +- [ ] Manual testing completed +- [ ] Performance testing (if applicable) + +## Documentation +- [ ] Code comments added/updated +- [ ] README updated (if applicable) +- [ ] API documentation updated (if applicable) + +## Breaking Changes +- [ ] No breaking changes +- [ ] Breaking changes documented +- [ ] Migration guide provided + +## Related Issues +Closes #123, Related to #456 + +## Additional Notes +Any additional information or context + +--- + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/README.md b/.github/pull_request_template/README.md new file mode 100644 index 000000000..45a0c6fb4 --- /dev/null +++ b/.github/pull_request_template/README.md @@ -0,0 +1,55 @@ +# Pull Request Templates + +This directory contains specialized PR templates for different types of changes. Choose the appropriate template based on your change type. + +## Available Templates + +### 🚀 [Feature](feature.md) +Use this template when adding new functionality or features to the project. + +### 🐛 [Bug Fix](bugfix.md) +Use this template when fixing bugs or issues in the existing codebase. + +### ⚡ [Performance](performance.md) +Use this template when optimizing performance or improving efficiency. + +### 🔧 [Refactor](refactor.md) +Use this template when restructuring or improving existing code without changing functionality. + +### 📚 [Documentation](documentation.md) +Use this template when updating documentation, README files, or code comments. + +### 🧪 [Testing](testing.md) +Use this template when adding or improving tests, test coverage, or testing infrastructure. + +### 🔒 [Security](security.md) +Use this template when fixing security vulnerabilities or implementing security improvements. + +## How to Use + +1. **Choose the appropriate template** based on your change type +2. **Copy the template content** into your PR description +3. **Fill in all required sections** with relevant information +4. **Check the appropriate boxes** for change type, impact scope, and priority +5. **Provide detailed information** in the release notes points section + +## Standardized Fields + +All templates include standardized fields to help with: +- **Automated categorization** of PRs +- **Release note generation** +- **Impact assessment** +- **Priority determination** + +## Benefits + +These templates help: +- **Standardize PR descriptions** across the project +- **Ensure completeness** of information provided +- **Facilitate automated release note generation** +- **Improve code review efficiency** +- **Maintain consistent documentation** + +## Default Template + +If none of the specialized templates fit your needs, use the default template at `.github/pull_request_template.md`. \ No newline at end of file diff --git a/.github/pull_request_template/bugfix.md b/.github/pull_request_template/bugfix.md new file mode 100644 index 000000000..268fbd269 --- /dev/null +++ b/.github/pull_request_template/bugfix.md @@ -0,0 +1,61 @@ +# 🐛 Bug Fix + +## Problem Description +Detailed description of the encountered bug phenomenon + +## Problem Analysis +- Root cause of the bug +- Scope of impact +- Reproduction steps + +## Fix Solution +- Specific fix method +- Why this fix approach was chosen +- Other possible solutions + +## Fix Verification +- [ ] Post-fix testing +- [ ] Regression testing +- [ ] Edge case testing + +## Impact Assessment +- Impact of the fix on existing functionality +- Performance impact +- Backward compatibility + +## Prevention Measures +- How to avoid similar issues +- Whether test cases need to be added +- Code review points + +## Related Issues +Fixes #123, Related to #456 + +--- + +## Change Type +- [ ] Feature addition +- [x] Bug fix +- [ ] Performance optimization +- [ ] Code refactoring +- [ ] Documentation update +- [ ] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/documentation.md b/.github/pull_request_template/documentation.md new file mode 100644 index 000000000..79863a631 --- /dev/null +++ b/.github/pull_request_template/documentation.md @@ -0,0 +1,59 @@ +# 📚 Documentation Update + +## Update Scope +- Types of documentation to update +- Document chapters involved +- Depth and breadth of updates + +## Update Content +- New documentation content +- Modified documentation content +- Outdated content to remove + +## Update Reasons +- Why updates are needed +- User feedback +- Feature change synchronization + +## Documentation Quality +- [ ] Content accuracy +- [ ] Language expression +- [ ] Format standards +- [ ] Example completeness + +## User Impact +- Impact on user usage +- Whether user training is needed +- Migration guide requirements + +## Related Issues +Updates documentation for #123, Related to #456 + +--- + +## Change Type +- [ ] Feature addition +- [ ] Bug fix +- [ ] Performance optimization +- [ ] Code refactoring +- [x] Documentation update +- [ ] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/feature.md b/.github/pull_request_template/feature.md new file mode 100644 index 000000000..654cccc13 --- /dev/null +++ b/.github/pull_request_template/feature.md @@ -0,0 +1,68 @@ +# 🚀 Feature + +## Feature Description +Brief description of the new feature + +## Feature Details +- [ ] Specific feature point 1 +- [ ] Specific feature point 2 +- [ ] Specific feature point 3 + +## Use Cases +Describe the use cases and applicable scenarios for this feature + +## Technical Implementation +- Implementation approach +- Technologies involved +- Architectural considerations + +## Test Coverage +- [ ] Unit tests +- [ ] Integration tests +- [ ] Manual testing + +## Documentation Updates +- [ ] API documentation +- [ ] User guide +- [ ] Example code + +## Backward Compatibility +- [ ] Fully compatible +- [ ] Migration required +- [ ] Breaking changes + +## Performance Impact +- Performance improvement/degradation +- Resource consumption changes + +## Related Issues +Closes #123, Related to #456 + +--- + +## Change Type +- [x] Feature addition +- [ ] Bug fix +- [ ] Performance optimization +- [ ] Code refactoring +- [ ] Documentation update +- [ ] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/performance.md b/.github/pull_request_template/performance.md new file mode 100644 index 000000000..e96c50bfe --- /dev/null +++ b/.github/pull_request_template/performance.md @@ -0,0 +1,63 @@ +# ⚡ Performance Optimization + +## Optimization Goals +- Specific performance metrics to optimize +- Expected results + +## Performance Analysis +- Current performance bottlenecks +- Performance test data +- Performance analysis tools used + +## Optimization Strategy +- Specific optimization strategies +- Algorithm/data structure improvements +- Cache strategy optimization +- Concurrency processing optimization + +## Optimization Results +- Performance improvement data +- Resource consumption changes +- Benchmark test results + +## Optimization Verification +- [ ] Performance testing +- [ ] Stress testing +- [ ] Regression testing + +## Trade-off Considerations +- Side effects of optimization +- Code complexity changes +- Maintenance cost impact + +## Related Issues +Addresses #123, Related to #456 + +--- + +## Change Type +- [ ] Feature addition +- [ ] Bug fix +- [x] Performance optimization +- [ ] Code refactoring +- [ ] Documentation update +- [ ] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/refactor.md b/.github/pull_request_template/refactor.md new file mode 100644 index 000000000..8f4983e61 --- /dev/null +++ b/.github/pull_request_template/refactor.md @@ -0,0 +1,63 @@ +# 🔧 Code Refactoring + +## Refactoring Goals +- Code quality improvement objectives +- Maintainability enhancement +- Code structure optimization + +## Refactoring Content +- Specific modules/functions to refactor +- Before and after comparison +- Refactoring level (function/class/module level) + +## Refactoring Strategy +- Refactoring methodology +- Step-by-step refactoring plan +- Risk control measures + +## Refactoring Results +- Code quality metric changes +- Readability improvements +- Test coverage changes + +## Backward Compatibility +- [ ] Fully compatible +- [ ] Adaptation required +- [ ] Breaking changes + +## Testing Verification +- [ ] Functional testing +- [ ] Regression testing +- [ ] Performance testing + +## Related Issues +Refactors #123, Related to #456 + +--- + +## Change Type +- [ ] Feature addition +- [ ] Bug fix +- [ ] Performance optimization +- [x] Code refactoring +- [ ] Documentation update +- [ ] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/security.md b/.github/pull_request_template/security.md new file mode 100644 index 000000000..2e1326547 --- /dev/null +++ b/.github/pull_request_template/security.md @@ -0,0 +1,58 @@ +# 🔒 Security Fix + +## Security Vulnerability +- Vulnerability type and severity +- Impact scope and attack vectors +- How the vulnerability was discovered + +## Fix Solution +- Specific fix measures +- Security best practices applied +- Defense in depth strategy + +## Risk Assessment +- Risk level before fix +- Risk level after fix +- Remaining risk analysis + +## Security Testing +- [ ] Vulnerability verification testing +- [ ] Penetration testing +- [ ] Security scanning + +## User Notification +- Whether users need immediate updates +- Security advisory publication +- User guidance documentation + +## Related Issues +Security fix for #123, Related to #456 + +--- + +## Change Type +- [ ] Feature addition +- [ ] Bug fix +- [ ] Performance optimization +- [ ] Code refactoring +- [ ] Documentation update +- [ ] Testing related +- [x] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/pull_request_template/testing.md b/.github/pull_request_template/testing.md new file mode 100644 index 000000000..8889f4ad7 --- /dev/null +++ b/.github/pull_request_template/testing.md @@ -0,0 +1,59 @@ +# 🧪 Testing Related + +## Test Types +- Unit tests +- Integration tests +- End-to-end tests +- Performance tests + +## Test Coverage +- New test cases added +- Code coverage scope +- Test scenario completeness + +## Testing Tools +- Testing frameworks used +- Test data management +- Test environment configuration + +## Test Quality +- Test case design +- Edge condition coverage +- Exception handling + +## Continuous Integration +- CI/CD process updates +- Automated testing integration +- Test report generation + +## Related Issues +Adds tests for #123, Related to #456 + +--- + +## Change Type +- [ ] Feature addition +- [ ] Bug fix +- [ ] Performance optimization +- [ ] Code refactoring +- [ ] Documentation update +- [x] Testing related +- [ ] Security fix + +## Impact Scope +- [ ] User interface +- [ ] API interface +- [ ] Database +- [ ] Configuration files +- [ ] Dependencies + +## Priority +- [ ] High - Release immediately +- [ ] Medium - Next version +- [ ] Low - Future version + +## Release Notes Points +- User-visible changes +- Configuration change instructions +- Migration steps +- Known issues \ No newline at end of file diff --git a/.github/workflows/macOS_test.yml b/.github/workflows/macOS_test.yml index c05e8137e..e0d2d3b23 100644 --- a/.github/workflows/macOS_test.yml +++ b/.github/workflows/macOS_test.yml @@ -4,10 +4,20 @@ on: push: branches: - main + paths-ignore: + - "**.md" + - ".git*" + - "docs/assets/**" + - "docs/**" pull_request_target: types: [opened, synchronize, labeled, reopened] branches: - main + paths-ignore: + - "**.md" + - ".git*" + - "docs/assets/**" + - "docs/**" workflow_dispatch: concurrency: diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 81e75e042..0ee94a44a 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -4,6 +4,11 @@ on: push: branches: - main + paths-ignore: + - "**.md" + - ".git*" + - "docs/assets/**" + - "docs/**" pull_request: branches: - main @@ -11,6 +16,7 @@ on: - "**.md" - ".git*" - "docs/assets/**" + - "docs/**" env: CI_PATH: '/home/mnt/platform_ci/GitHub/${{ github.repository }}/${GITHUB_RUN_NUMBER}' diff --git a/.github/workflows/win_test.yml b/.github/workflows/win_test.yml index c3e0af390..d10edf947 100644 --- a/.github/workflows/win_test.yml +++ b/.github/workflows/win_test.yml @@ -4,10 +4,20 @@ on: push: branches: - main + paths-ignore: + - "**.md" + - ".git*" + - "docs/assets/**" + - "docs/**" pull_request_target: types: [opened, synchronize, labeled, reopened] branches: - main + paths-ignore: + - "**.md" + - ".git*" + - "docs/assets/**" + - "docs/**" workflow_dispatch: concurrency: diff --git a/LazyLLM-Env b/LazyLLM-Env index 0fb5bc5f6..cf704323d 160000 --- a/LazyLLM-Env +++ b/LazyLLM-Env @@ -1 +1 @@ -Subproject commit 0fb5bc5f68fefb5fb2e6c8d410e16a5adff86866 +Subproject commit cf704323d5dd229770d397b1150e267187a2e87b diff --git a/README.CN.md b/README.CN.md index e11ba6e52..db31b417a 100644 --- a/README.CN.md +++ b/README.CN.md @@ -14,6 +14,7 @@ LazyLLM是一款低代码构建**多Agent**大模型应用的开发工具,协助开发者用极低的成本构建复杂的AI应用,并可以持续的迭代优化效果。LazyLLM提供了便捷的搭建应用的workflow,并且为应用开发过程中的各个环节提供了大量的标准流程和工具。
基于LazyLLM的AI应用构建流程是**原型搭建 -> 数据回流 -> 迭代优化**,即您可以先基于LazyLLM快速跑通应用的原型,再结合场景任务数据进行bad-case分析,然后对应用中的关键环节进行算法迭代和模型微调,进而逐步提升整个应用的效果。
+LazyLLM致力于敏捷与效率的统一,开发者可以高效的迭代算法,然后将迭代好的算法应用到工业生产中,支持多用户、容错和高并发。 **用户文档**: https://docs.lazyllm.ai/
微信扫描下方二维码加入交流群(左)或通过观看视频了解更多(右)
@@ -340,3 +341,62 @@ Flow 是LazyLLM中定义的数据流,描述了数据如何从一个可调用 1. 您可以方便地组合、添加和替换各个模块和组件;Flow 的设计使得添加新功能变得简单,不同模块甚至项目之间的协作也变得更加容易。 2. 通过一套标准化的接口和数据流机制,Flow 减少了开发人员在处理数据传递和转换时的重复工作。开发人员可以将更多精力集中在核心业务逻辑上,从而提高整体开发效率。 3. 部分Flow 支持异步处理模式和并行执行,在处理大规模数据或复杂任务时,可以显著提高响应速度和系统性能。 + + +## 九、 后续计划 + +### 9.1 时间线 +V0.6 预计从9.1日开始,历时3个月,中间会不间断发布小版本,如v0.6.1, v0.6.2 +V0.7 预计从12.1日开始,历时3个月,中间会不间断发布小版本,如v0.7.1, v0.7.2 + +### 9.2 功能模块 +9.2.1 RAG + - 9.2.1.1 工程 + - 沉淀LazyRAG中的能力到LazyLLM (V0.6 ) + - RAG的宏观问答能力扩展到多知识库 (V0.6 ) + - RAG模块完全支持横向扩容,支持多机部署RAG的算法协同工作 (V0.6 ) + - 知识图谱接入至少1个开源框架 (V0.6 ) + - 支持常用的数据切分策略,不少于20种,覆盖各种类型的文档 (V0.6 ) + - 9.2.1.2 数据能力 + - 表格解析(V0.6 - 0.7 ) + - CAD图片解析(V0.7 - ) + - 9.2.1.3 算法能力 + - 支持对CSV等相对结构化的文本的处理 (V0.6 ) + - 多跳检索(文档中的链接,参考文献等) (V0.6 ) + - 信息冲突处理 (V0.7 ) + - AgenticRL & 写代码解问题能力(V0.7 ) + +9.2.2 功能模块 + - 支持记忆的能力 (V0.6 ) + - 分布式Launcher的支持 (V0.7) + - 基于数据库的Globals支持 (V0.6 ) + - ServerModule可以发布成mcp服务(v0.7) + - 线上沙箱服务的集成(v0.7) + +9.2.3 模型训推 + - 支持OpenAI接口的部署和推理 (V0.6 ) + - 统一微调和推理的提示词 (V0.7 ) + - Example中给出微调示例 (V0.7 ) + - 集成2-3个提示词仓库,可以直接选择提示词仓库中的提示词 (V0.6 ) + - 支持更智能的模型类型判断和推理框架选择,重构和简化auto-finetune选框架的逻辑 (V0.6 ) + - GRPO全链路支持 (V0.7 ) + +9.2.4 文档 + - 完善API文档,确保每个公开接口都有API文档,文档参数和函数参数一致,且有可执行的样例代码 (V0.6 ) + - 完善CookBook文档,案例增加至50个,并有和LangChain / Llamaindex的对比 (代码量,速度,扩展性) (V0.6 ) + - 完善Environment文档,补充在win/linux/macos的安装方式,补充对包的切分策略 (V0.6 ) + - 完善Learn文档,先教大家用大模型;然后教大家构建agent;然后教大家用workflow;再教大家搭建rag; (V0.6 ) + +9.2.5 质量 + - 通过对大部分模块进行Mock,将CI的时间降低到10分钟以内 (V0.6 ) + - 增加每日构建,高耗时 / token的任务放到每日构建中执行 (V0.6 ) + +9.2.6 开发、部署与发布 + - Debug优化(v0.7) + - 过程监控 [输出 + 性能](v0.7) + - 依赖的训推框架的环境隔离和环境的自动建设(V0.6 ) + +9.2.7 生态 + - 推动LazyCraft的开源 (V0.6 ) + - 推动LazyRAG的开源 (V0.7 ) + - 将代码传至Github以外的2个代码托管网站,并争取取得社区合作(V0.6 ) \ No newline at end of file diff --git a/README.md b/README.md index 2317ac3e4..b0e3df7e3 100644 --- a/README.md +++ b/README.md @@ -348,3 +348,61 @@ Flow in LazyLLM defines the data stream, describing how data is passed from one 1. You can easily combine, add, and replace various modules and components; the design of Flow makes adding new features simple and facilitates collaboration between different modules and even projects. 2. Through a standardized interface and data flow mechanism, Flow reduces the repetitive work developers face when handling data transfer and transformation. Developers can focus more on core business logic, thus improving overall development efficiency. 3. Some Flows support asynchronous processing and parallel execution, significantly enhancing response speed and system performance when dealing with large-scale data or complex tasks. + +## Future Plans + +### Timeline +V0.6 Expected to start from September 1st, lasting 3 months, with continuous small version releases in between, such as v0.6.1, v0.6.2 +V0.7 Expected to start from December 1st, lasting 3 months, with continuous small version releases in between, such as v0.7.1, v0.7.2 + +### Feature Modules +RAG + - Engineering + - Integrate LazyRAG capabilities into LazyLLM (V0.6) + - Extend RAG's macro Q&A capabilities to multiple knowledge bases (V0.6) + - RAG modules fully support horizontal scaling, supporting multi-machine deployment of RAG algorithm collaboration (V0.6) + - Integrate at least 1 open-source knowledge graph framework (V0.6) + - Support common data splitting strategies, no less than 20 types, covering various document types (V0.6) + - Data Capabilities + - Table parsing (V0.6 - 0.7) + - CAD image parsing (V0.7 -) + - Algorithm Capabilities + - Support processing of relatively structured texts like CSV (V0.6) + - Multi-hop retrieval (links in documents, references, etc.) (V0.6) + - Information conflict handling (V0.7) + - AgenticRL & code-writing problem-solving capabilities (V0.7) + +Functional Modules + - Support memory capabilities (V0.6) + - Support for distributed Launcher (V0.7) + - Database-based Globals support (V0.6) + - ServerModule can be published as MCP service (v0.7) + - Integration of online sandbox services (v0.7) + +Model Training and Inference + - Support OpenAI interface deployment and inference (V0.6) + - Unify fine-tuning and inference prompts (V0.7) + - Provide fine-tuning examples in Examples (V0.7) + - Integrate 2-3 prompt repositories, allowing direct selection of prompts from prompt repositories (V0.6) + - Support more intelligent model type judgment and inference framework selection, refactor and simplify auto-finetune framework selection logic (V0.6) + - Full-chain GRPO support (V0.7) + +Documentation + - Complete API documentation, ensure every public interface has API documentation, with consistent documentation parameters and function parameters, and executable sample code (V0.6) + - Complete CookBook documentation, increase cases to 50, with comparisons to LangChain/LlamaIndex (code volume, speed, extensibility) (V0.6) + - Complete Environment documentation, supplement installation methods on win/linux/macos, supplement package splitting strategies (V0.6) + - Complete Learn documentation, first teach how to use large models; then teach how to build agents; then teach how to use workflows; finally teach how to build RAG (V0.6) + +Quality + - Reduce CI time to within 10 minutes by mocking most modules (V0.6) + - Add daily builds, put high-time-consuming/token tasks in daily builds (V0.6) + +Development, Deployment and Release + - Debug optimization (v0.7) + - Process monitoring [output + performance] (v0.7) + - Environment isolation and automatic environment setup for dependent training and inference frameworks (V0.6) + +Ecosystem + - Promote LazyCraft open source (V0.6) + - Promote LazyRAG open source (V0.7) + - Upload code to 2 code hosting websites other than Github and strive for community collaboration (V0.6) diff --git a/docs/assets/env/git_bash.png b/docs/assets/env/git_bash.png new file mode 100644 index 000000000..c0f4bc819 Binary files /dev/null and b/docs/assets/env/git_bash.png differ diff --git a/docs/assets/env/install_python.png b/docs/assets/env/install_python.png new file mode 100644 index 000000000..6ed43de4a Binary files /dev/null and b/docs/assets/env/install_python.png differ diff --git a/docs/assets/env/map.png b/docs/assets/env/map.png new file mode 100644 index 000000000..1cf495eb2 Binary files /dev/null and b/docs/assets/env/map.png differ diff --git a/docs/assets/env/set_python_install_path.png b/docs/assets/env/set_python_install_path.png new file mode 100644 index 000000000..84fa6bac4 Binary files /dev/null and b/docs/assets/env/set_python_install_path.png differ diff --git a/docs/assets/env/virtualize.png b/docs/assets/env/virtualize.png new file mode 100644 index 000000000..ddb590aac Binary files /dev/null and b/docs/assets/env/virtualize.png differ diff --git a/docs/assets/env/virtualize_2.png b/docs/assets/env/virtualize_2.png new file mode 100644 index 000000000..16ca0c282 Binary files /dev/null and b/docs/assets/env/virtualize_2.png differ diff --git a/docs/assets/env/virtualize_3.png b/docs/assets/env/virtualize_3.png new file mode 100644 index 000000000..d218cf4c0 Binary files /dev/null and b/docs/assets/env/virtualize_3.png differ diff --git a/docs/assets/env/virtualize_4.png b/docs/assets/env/virtualize_4.png new file mode 100644 index 000000000..1e970cfe8 Binary files /dev/null and b/docs/assets/env/virtualize_4.png differ diff --git a/docs/assets/env/vscode_extensions.png b/docs/assets/env/vscode_extensions.png new file mode 100644 index 000000000..45c649604 Binary files /dev/null and b/docs/assets/env/vscode_extensions.png differ diff --git a/docs/assets/env/vscode_interpret.png b/docs/assets/env/vscode_interpret.png new file mode 100644 index 000000000..84c9ae36d Binary files /dev/null and b/docs/assets/env/vscode_interpret.png differ diff --git a/docs/assets/env/vscode_interpret_manual.png b/docs/assets/env/vscode_interpret_manual.png new file mode 100644 index 000000000..31445e421 Binary files /dev/null and b/docs/assets/env/vscode_interpret_manual.png differ diff --git a/docs/assets/env/winversion.png b/docs/assets/env/winversion.png new file mode 100644 index 000000000..81a5f24cc Binary files /dev/null and b/docs/assets/env/winversion.png differ diff --git a/docs/assets/env/winversion_2.png b/docs/assets/env/winversion_2.png new file mode 100644 index 000000000..518fa23bc Binary files /dev/null and b/docs/assets/env/winversion_2.png differ diff --git a/docs/assets/env/wsl_passward.png b/docs/assets/env/wsl_passward.png new file mode 100644 index 000000000..57ac9713d Binary files /dev/null and b/docs/assets/env/wsl_passward.png differ diff --git a/docs/assets/js/lang-redirect.js b/docs/assets/js/lang-redirect.js new file mode 100644 index 000000000..febc999b7 --- /dev/null +++ b/docs/assets/js/lang-redirect.js @@ -0,0 +1,42 @@ +document.addEventListener("DOMContentLoaded", function() { + console.log("[i18n] Language redirect initialized"); + + const currentUrl = new URL(window.location.href); + const currentPath = currentUrl.pathname; + const currentHash = currentUrl.hash; // #xxxx + const currentSearch = currentUrl.search; // ?key=value + + const currentLang = currentPath.startsWith('/zh-cn/') ? 'zh' : + currentPath.startsWith('/en/') ? 'en' : + 'default'; + + document.querySelectorAll('a[lang], a[hreflang]').forEach(link => { + const targetLang = link.getAttribute('lang') || link.getAttribute('hreflang'); + + if (targetLang === currentLang) { + link.addEventListener('click', (e) => { + e.preventDefault(); + console.log(`[i18n] Blocked redundant switch to ${targetLang}`); + }); + return; + } + + const newUrl = new URL(link.href, window.location.origin); + + if (currentPath.startsWith('/zh-cn/')) { + newUrl.pathname = currentPath.replace('/zh-cn/', '/en/'); + } + else if (currentPath.startsWith('/en/')) { + newUrl.pathname = currentPath.replace('/en/', '/zh-cn/'); + } + else { + newUrl.pathname = targetLang === 'en' ? '/en/' : '/zh-cn/'; + } + + newUrl.search = currentSearch; + newUrl.hash = currentHash; + + link.href = newUrl.toString(); + console.log(`[i18n] Converted to: ${newUrl}`); + }); +}); diff --git a/docs/en/API Reference/common.md b/docs/en/API Reference/common.md index 3df31ac99..8ea2b35c5 100644 --- a/docs/en/API Reference/common.md +++ b/docs/en/API Reference/common.md @@ -4,8 +4,23 @@ options: heading_level: 3 +::: lazyllm.common.registry.LazyDict + options: + heading_level: 3 + members: [remove, set_default] + --- +::: lazyllm.common.common.ResultCollector + members: + - keys + - items + exclude-members: + +::: lazyllm.common.common.EnvVarContextManager + members: + exclude-members: + ## Bind ::: lazyllm.common.bind @@ -20,6 +35,12 @@ options: heading_level: 3 +## Identity + +::: lazyllm.common.Identity + options: + heading_level: 3 + --- ## Compilation @@ -33,4 +54,53 @@ ::: lazyllm.common.FileSystemQueue members: enqueue, dequeue, peek, size, clear exclude-members: + +::: lazyllm.common.ReadOnlyWrapper + members: set, isNone + exclude-members: + +::: lazyllm.common.queue.RedisQueue + members: + exclude-members: + +::: lazyllm.common.CaseInsensitiveDict + members: + exclude-members: + +::: lazyllm.common.ProcessPoolExecutor + members: submit + exclude-members: + +## Multiprocessing + +::: lazyllm.common.ForkProcess + members: work, start + exclude-members: + +## Options + +::: lazyllm.common.Option + members: + exclude-members: + +::: lazyllm.common.multiprocessing.SpawnProcess + members: start + exclude-members: + +::: lazyllm.common.queue.SQLiteQueue + options: + heading_level: 3 + +## Threading + +::: lazyllm.common.Thread + members: work, get_result + exclude-members: + + +## LazyLLMCMD + +::: lazyllm.common.LazyLLMCMD + members: with_cmd, get_args + exclude-members: \ No newline at end of file diff --git a/docs/en/API Reference/components.md b/docs/en/API Reference/components.md index cff6b2c84..fe90b9ac0 100644 --- a/docs/en/API Reference/components.md +++ b/docs/en/API Reference/components.md @@ -12,6 +12,14 @@ options: heading_level: 3 +::: lazyllm.components.deploy.LazyLLMDeployBase + options: + heading_level: 3 + +::: lazyllm.components.deploy.LazyLLMDeployBase.extract_result + options: + heading_level: 3 + ::: lazyllm.components.finetune.FlagembeddingFinetune options: heading_level: 3 @@ -29,6 +37,7 @@ ::: lazyllm.components.deploy.Lightllm options: heading_level: 3 + members: [cmd, geturl, extract_result] ::: lazyllm.components.deploy.Vllm options: @@ -37,44 +46,49 @@ ::: lazyllm.components.deploy.LMDeploy options: heading_level: 3 + members: [cmd, geturl, extract_result] -::: lazyllm.components.auto.AutoDeploy +::: lazyllm.components.deploy.base.DummyDeploy options: heading_level: 3 -::: lazyllm.components.deploy.EmbeddingDeploy +::: lazyllm.components.auto.AutoDeploy options: heading_level: 3 -::: lazyllm.components.deploy.embed.RerankDeploy +::: lazyllm.components.deploy.embed.AbstractEmbedding options: heading_level: 3 -::: lazyllm.components.deploy.Mindie +::: lazyllm.components.deploy.EmbeddingDeploy options: heading_level: 3 -::: lazyllm.components.deploy.OCRDeploy + +::: lazyllm.components.deploy.embed.RerankDeploy options: heading_level: 3 ---- -## Launcher +::: lazyllm.components.deploy.embed.LazyHuggingFaceRerank + options: + heading_level: 3 + members: [load_reranker, rebuild] -::: lazyllm.launcher.EmptyLauncher +::: lazyllm.components.deploy.Mindie options: heading_level: 3 -::: lazyllm.launcher.RemoteLauncher + +::: lazyllm.components.deploy.OCRDeploy options: heading_level: 3 +--- -::: lazyllm.launcher.SlurmLauncher +::: lazyllm.components.deploy.relay.base.RelayServer options: heading_level: 3 - filters: - - '!get_idle' + members: [cmd, geturl] -::: lazyllm.launcher.ScoLauncher +::: lazyllm.components.deploy.OCRDeploy options: heading_level: 3 @@ -83,6 +97,22 @@ ## Prompter ::: lazyllm.components.prompter.LazyLLMPrompterBase + options: + heading_level: 3 + inherited_members: + - generate_prompt + - get_response + members: [pre_hook] + +::: lazyllm.components.prompter.EmptyPrompter + options: + heading_level: 3 + members: true + +::: lazyllm.components.Prompter + options: + heading_level: 3 + members: [from_dict, from_template, from_file, empty, generate_prompt, get_response] options: heading_level: 3 inherited_members: @@ -118,11 +148,45 @@ --- -## Register +## MultiModal -::: lazyllm.common.Register +### Text to Image + +::: lazyllm.components.StableDiffusionDeploy options: - heading_level: 3 + heading_level: 4 + +### Visual Question Answering + +Reference [LMDeploy][lazyllm.components.deploy.LMDeploy], which supports the Visual Question Answering model. + +### Text to Sound + +::: lazyllm.components.TTSDeploy + options: + heading_level: 4 + +::: lazyllm.components.ChatTTSDeploy + options: + heading_level: 4 + +::: lazyllm.components.BarkDeploy + options: + heading_level: 4 + +::: lazyllm.components.MusicGenDeploy + options: + heading_level: 4 + +### Speech to Text + +::: lazyllm.components.SenseVoiceDeploy + options: + heading_level: 4 + +::: lazyllm.components.deploy.speech_to_text.sense_voice.SenseVoice + options: + heading_level: 4 --- @@ -168,48 +232,47 @@ options: heading_level: 3 -::: lazyllm.components.JsonFormatter +::: lazyllm.components.formatter.formatterbase.JsonLikeFormatter options: heading_level: 3 -::: lazyllm.components.EmptyFormatter +::: lazyllm.components.formatter.formatterbase.PythonFormatter options: heading_level: 3 ---- - -## MultiModal - -### Text to Image - -::: lazyllm.components.StableDiffusionDeploy +::: lazyllm.components.formatter.FileFormatter options: - heading_level: 4 - -### Visual Question Answering + heading_level: 3 -Reference [LMDeploy][lazyllm.components.deploy.LMDeploy], which supports the Visual Question Answering model. +::: lazyllm.components.formatter.YamlFormatter + options: + heading_level: 3 -### Text to Sound +::: lazyllm.components.formatter.encode_query_with_filepaths + options: + heading_level: 3 -::: lazyllm.components.TTSDeploy +::: lazyllm.components.formatter.decode_query_with_filepaths options: - heading_level: 4 + heading_level: 3 -::: lazyllm.components.ChatTTSDeploy +::: lazyllm.components.formatter.lazyllm_merge_query options: - heading_level: 4 + heading_level: 3 -::: lazyllm.components.BarkDeploy +::: lazyllm.components.JsonFormatter options: - heading_level: 4 + heading_level: 3 -::: lazyllm.components.MusicGenDeploy +::: lazyllm.components.EmptyFormatter options: - heading_level: 4 + heading_level: 3 -### Speech to Text +--- -::: lazyllm.components.SenseVoiceDeploy +## ComponentBase + +::: lazyllm.components.core.ComponentBase options: - heading_level: 4 + heading_level: 3 + members: [apply, cmd] diff --git a/docs/en/API Reference/configs.md b/docs/en/API Reference/configs.md index 7f92ce611..74c3db8dd 100644 --- a/docs/en/API Reference/configs.md +++ b/docs/en/API Reference/configs.md @@ -4,4 +4,7 @@ - done - getenv - add - - get_all_configs \ No newline at end of file + - get_all_configs + - get_config + - temp + - refresh \ No newline at end of file diff --git a/docs/en/API Reference/flow.md b/docs/en/API Reference/flow.md index f21ceeaf3..3eb015de1 100644 --- a/docs/en/API Reference/flow.md +++ b/docs/en/API Reference/flow.md @@ -1,19 +1,26 @@ ::: lazyllm.flow.FlowBase - members: is_root, ancestor, for_each + members: is_root, ancestor, for_each, id exclude-members: ::: lazyllm.flow.LazyLLMFlowsBase members: + - register_hook + - unregister_hook + - clear_hooks + - set_sync + - wait + - invoke + - bind exclude-members: ::: lazyllm.flow.Pipeline - members: + members: output exclude-members: ::: lazyllm.flow.save_pipeline_result ::: lazyllm.flow.Parallel - members: + members: join, sequential exclude-members: ::: lazyllm.flow.Diverter diff --git a/docs/en/API Reference/hook.md b/docs/en/API Reference/hook.md new file mode 100644 index 000000000..d84ad89da --- /dev/null +++ b/docs/en/API Reference/hook.md @@ -0,0 +1,3 @@ +::: lazyllm.hook.LazyLLMHook + members: pre_hook, post_hook, report + exclude-members: \ No newline at end of file diff --git a/docs/en/API Reference/launcher.md b/docs/en/API Reference/launcher.md new file mode 100644 index 000000000..7a4a72dd9 --- /dev/null +++ b/docs/en/API Reference/launcher.md @@ -0,0 +1,35 @@ +::: lazyllm.LazyLLMLaunchersBase + options: + members: + - makejob + - launch + - cleanup + - wait + - clone + +::: lazyllm.launcher.EmptyLauncher + options: + heading_level: 3 + +::: lazyllm.launcher.RemoteLauncher + options: + heading_level: 3 + +::: lazyllm.launcher.SlurmLauncher + options: + heading_level: 3 + filters: + - '!get_idle' + +::: lazyllm.launcher.ScoLauncher + options: + heading_level: 3 + +::: lazyllm.launcher.Job + options: + heading_level: 3 + +::: lazyllm.launcher.K8sLauncher + options: + heading_level: 3 + members: [makejob, launch] \ No newline at end of file diff --git a/docs/en/API Reference/module.md b/docs/en/API Reference/module.md index 48e6c9ca4..cc30b5627 100644 --- a/docs/en/API Reference/module.md +++ b/docs/en/API Reference/module.md @@ -10,7 +10,14 @@ - start - restart - update - + +::: lazyllm.module.servermodule.LLMBase + options: + members: + - prompt + - formatter + - share + ::: lazyllm.module.ActionModule options: members: @@ -46,17 +53,25 @@ members: ::: lazyllm.module.TrialModule - members: start + members: [start] exclude-members: ::: lazyllm.module.OnlineChatModule members: exclude-members: +::: lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoModule + members: + exclude-members: + ::: lazyllm.module.OnlineEmbeddingModule members: exclude-members: +::: lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIEmbedding + members: + exclude-members: + ::: lazyllm.module.OnlineChatModuleBase options: members: @@ -66,3 +81,12 @@ ::: lazyllm.module.OnlineEmbeddingModuleBase members: exclude-members: forward + +::: lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoEmbedding + options: + members: + +::: lazyllm.module.llms.onlinemodule.fileHandler.FileHandlerBase + members: get_finetune_data + exclude-members: + diff --git a/docs/en/API Reference/tools.md b/docs/en/API Reference/tools.md index 0e6db035b..e7c3e1c92 100644 --- a/docs/en/API Reference/tools.md +++ b/docs/en/API Reference/tools.md @@ -1,15 +1,13 @@ -::: lazyllm.tools.Document +::: lazyllm.tools.IntentClassifier members: + - intent_promt_hook + - post_process_result exclude-members: -::: lazyllm.tools.rag.store.ChromadbStore +::: lazyllm.tools.Document members: exclude-members: -::: lazyllm.tools.rag.store.MilvusStore - members: - exclude-members: - ::: lazyllm.tools.rag.store.ChromadbStore members: exclude-members: @@ -20,135 +18,132 @@ ::: lazyllm.tools.rag.readers.ReaderBase members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.component.bm25.BM25 +::: lazyllm.tools.rag.readers.PandasCSVReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaItem +::: lazyllm.tools.rag.readers.PandasExcelReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocGenreAnalyser +::: lazyllm.tools.rag.readers.PDFReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaAnalyser +::: lazyllm.tools.rag.readers.PPTXReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocInfoExtractor +::: lazyllm.tools.rag.readers.VideoAudioReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocToDbProcessor +::: lazyllm.tools.SqlManager members: - - extract_info_from_docs - - analyze_info_schema_by_llm - exclude-members: - -::: lazyllm.tools.rag.doc_to_db.extract_db_schema_from_files - -::: lazyllm.tools.rag.readers.DocxReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.EpubReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.HWPReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.ImageReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.IPYNBReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.MagicPDFReader - members: - exclude-members: + - get_session + - check_connection + - set_desc + - get_all_tables + - get_table_orm_class + - execute_commit + - execute_query + - create_table + - drop_table + - insert_values + exclude-members: -::: lazyllm.tools.rag.readers.MarkdownReader +::: lazyllm.tools.Reranker members: - - remove_images - - remove_hyperlinks - exclude-members: + exclude-members: -::: lazyllm.tools.rag.readers.MboxReader +::: lazyllm.tools.rag.readers.readerBase.LazyLLMReaderBase members: exclude-members: -::: lazyllm.tools.rag.component.bm25.BM25 +::: lazyllm.tools.rag.component.bm25 members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaItem members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocGenreAnalyser members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaAnalyser - members: + members: analyse_info_schema exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocInfoExtractor - members: + members: extract_doc_info exclude-members: +::: lazyllm.tools.rag.doc_to_db.DocInfoExtractor + members: + exclude-members: + ::: lazyllm.tools.rag.doc_to_db.DocToDbProcessor members: - extract_info_from_docs - analyze_info_schema_by_llm - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.extract_db_schema_from_files ::: lazyllm.tools.rag.readers.DocxReader members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.readers.EpubReader members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.readers.HWPReader members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.readers.ImageReader members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.readers.IPYNBReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.readers.MagicPDFReader +::: lazyllm.tools.rag.readers.MineruPDFReader members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.readers.MarkdownReader members: - remove_images - remove_hyperlinks - exclude-members: + exclude-members: ::: lazyllm.tools.rag.readers.MboxReader members: exclude-members: +::: lazyllm.tools.SqlCall + members: + - sql_query_promt_hook + - sql_explain_prompt_hook + - extract_sql_from_response + exclude-members: + +::: lazyllm.tools.rag.default_index.DefaultIndex + members: + - update + - remove + - query + exclude-members: + ::: lazyllm.tools.Reranker - members: register_reranker - members: register_reranker + members: [register_reranker] exclude-members: forward ::: lazyllm.tools.Retriever @@ -156,17 +151,46 @@ exclude-members: forward ::: lazyllm.tools.rag.retriever.TempDocRetriever - members: + members: [create_node_group, add_subretriever] exclude-members: -::: lazyllm.tools.rag.retriever.TempDocRetriever - members: +::: lazyllm.tools.rag.retriever.UrlDocument + members: [find] exclude-members: ::: lazyllm.tools.rag.DocManager members: exclude-members: +::: lazyllm.tools.rag.utils.SqliteDocListManager + members: + - table_inited + - get_status_cond_and_params + - validate_paths + - update_need_reparsing + - list_files + - get_docs + - set_docs_new_meta + - fetch_docs_changed_meta + - list_all_kb_group + - add_kb_group + - list_kb_group_files + - delete_unreferenced_doc + - get_docs_need_reparse + - get_existing_paths_by_pattern + - update_file_message + - update_file_status + - add_files_to_kb_group + - delete_files_from_kb_group + - get_file_status + - update_kb_group + - release + exclude-members: + +::: lazyllm.tools.rag.data_loaders.DirectoryReader + members: load_data + exclude-members: + ::: lazyllm.tools.SentenceSplitter members: exclude-members: @@ -191,45 +215,34 @@ lazyllm.tools.rag.transform.NodeTransform members: exclude-members: -::: lazyllm.tools.rag.dataReader.SimpleDirectoryReader - members: - exclude-members: - -::: lazyllm.tools.rag.dataReader.FileReader - members: +::: lazyllm.tools.rag.doc_processor.DocumentProcessor + members: register_algorithm, drop_algorithm +::: lazyllm.tools.rag.doc_node.QADocNode + members: get_text exclude-members: -lazyllm.tools.rag.transform.NodeTransform - members: - exclude-members: - -::: lazyllm.tools.rag.transform.TransformArgs - members: - exclude-members: - -::: lazyllm.tools.rag.similarity.register_similarity +::: lazyllm.tools.rag.dataReader.SimpleDirectoryReader members: exclude-members: -::: lazyllm.tools.rag.doc_node.DocNode +::: lazyllm.tools.rag.dataReader.FileReader members: exclude-members: -::: lazyllm.tools.rag.dataReader.SimpleDirectoryReader - members: +::: lazyllm.tools.rag.transform.FuncNodeTransform + members: transform exclude-members: -::: lazyllm.tools.rag.dataReader.FileReader +::: lazyllm.tools.rag.web.DocWebModule members: - exclude-members: - + exclude-members: ::: lazyllm.tools.WebModule members: exclude-members: forward ::: lazyllm.tools.CodeGenerator - members: + members: [choose_prompt] exclude-members: forward ::: lazyllm.tools.ParameterExtractor @@ -237,7 +250,7 @@ lazyllm.tools.rag.transform.NodeTransform exclude-members: forward ::: lazyllm.tools.QustionRewrite - members: + members: choose_prompt exclude-members: forward ::: lazyllm.tools.agent.toolsManager.ToolManager @@ -272,8 +285,20 @@ lazyllm.tools.rag.transform.NodeTransform members: exclude-members: forward -::: lazyllm.tools.IntentClassifier - members: +::: lazyllm.tools.rag.smart_embedding_index.SmartEmbeddingIndex + members: update, remove, query + exclude-members: + +::: lazyllm.tools.rag.doc_node.ImageDocNode + members: do_embedding, get_content, get_text + exclude-members: + +::: lazyllm.tools.rag.transform.AdaptiveTransform + members: transform + exclude-members: + +::: lazyllm.tools.rag.rerank.ModuleReranker + members: forward exclude-members: ::: lazyllm.tools.rag.utils.DocListManager members: @@ -281,6 +306,19 @@ lazyllm.tools.rag.transform.NodeTransform ::: lazyllm.tools.rag.global_metadata.GlobalMetadataDesc members: exclude-members: + +::: lazyllm.tools.rag.IndexBase.update + members: + exclude-members: + +::: lazyllm.tools.rag.IndexBase.remove + members: + exclude-members: + +::: lazyllm.tools.rag.IndexBase.query + members: + exclude-members: + ::: lazyllm.tools.rag.index_base.IndexBase members: @@ -317,61 +355,36 @@ lazyllm.tools.rag.transform.NodeTransform exclude-members: ::: lazyllm.tools.DBManager - members: + members: execute_query exclude-members: ::: lazyllm.tools.MongoDBManager members: exclude-members: -::: lazyllm.tools.rag.utils.DocListManager - members: - exclude-members: -::: lazyllm.tools.rag.global_metadata.GlobalMetadataDesc - members: - exclude-members: -::: lazyllm.tools.rag.index_base.IndexBase - members: -::: lazyllm.tools.BaseEvaluator - members: - exclude-members: - -::: lazyllm.tools.ResponseRelevancy - members: - exclude-members: - -::: lazyllm.tools.Faithfulness - members: - exclude-members: - -::: lazyllm.tools.LLMContextRecall - members: - exclude-members: - -::: lazyllm.tools.NonLLMContextRecall +::: lazyllm.tools.HttpTool members: exclude-members: -::: lazyllm.tools.ContextRelevance +::: lazyllm.tools.agent.functionCall.StreamResponse members: exclude-members: -::: lazyllm.tools.HttpRequest - members: +::: lazyllm.tools.MCPClient + members: [call_tool, list_tools, get_tools, aget_tools, deploy] exclude-members: -::: lazyllm.tools.JobDescription - members: - exclude-members: +::: lazyllm.tools.tools.GoogleSearch + members: forward -::: lazyllm.tools.DBManager +::: lazyllm.tools.tools.tencent_search.TencentSearch members: exclude-members: -::: lazyllm.tools.MongoDBManager +::: lazyllm.tools.rag.web.WebUi members: exclude-members: -::: lazyllm.tools.HttpTool - members: +::: lazyllm.tools.http_request.http_executor_response.HttpExecutorResponse + members: extract_file, get_content_type exclude-members: \ No newline at end of file diff --git a/docs/en/Home/environment.md b/docs/en/Home/environment.md index c0023ef14..64bc6cf0d 100644 --- a/docs/en/Home/environment.md +++ b/docs/en/Home/environment.md @@ -20,3 +20,143 @@ - gradio_client: The Gradio client library allows users to load and use Gradio interfaces from a remote server. - protobuf: Google's Protocol Buffers Python implementation, used for serializing structured data. - setuptools: A Python package installation and distribution tool, used for packaging and distributing Python applications and libraries. + + +## Install on Different Operating Systems + +### Windows + +#### Step 1: Install Git +Download and install from: +https://github.com/git-for-windows/git/releases/download/v2.50.1.windows.1/Git-2.50.1-64-bit.exe + +#### Step 2: Install Python +Official website: https://python.p2hp.com/downloads/ +Recommended: Python 3.10.9 +1. Select the corresponding version to download, choose "Customize installation" during installation to customize the installation path, and check "Add to PATH" below +!!! Note + If already installed, you can choose "uninstall" to remove it and reinstall + +![install_python](../assets/env/install_python.png) + +2. Customize the installation path, you can set it to D:\Python\Python310 + +![set_python_install_path](../assets/env/set_python_install_path.png) + +#### Step 3: Install and Use VS Code +1. Download and install VS Code +2. Install Python extensions + +![vscode_extensions](../assets/env/vscode_extensions.png) + +3. After opening any Python file in VS Code, you can select the Python interpreter at the bottom + +![vscode_interpret](../assets/env/vscode_interpret.png) + +4. It will automatically detect all interpreters by default, choose one; or manually input D:\Python\Python310\python.exe twice + +![vscode_interpret_manual](../assets/env/vscode_interpret_manual.png) + +5. Choose Git Bash in the terminal to use a Linux-like command line environment + +![git_bash](../assets/env/git_bash.png) + +#### Step 4: Install LazyLLM +1. Install lazyllm through command line in the terminal +```bash +pip install lazyllm +``` + +2. Set environment variable keys + +In PowerShell, set them using the following code: +```powershell +$env:LAZYLLM_SENSENOVA_API_KEY = "7ACAxxxxxxxxxxxxxxx" +$env:LAZYLLM_SENSENOVA_SECRET_KEY = "2B0F7xxxxxxxxxxxxxxxx" +``` + +In Bash, set them using the following code: +```bash +export LAZYLLM_SENSENOVA_API_KEY="7ACACxxxxxxxxxxxxxxx" +export LAZYLLM_SENSENOVA_SECRET_KEY="2B0F72xxxxxxxxxxxxxx" +``` + +### Windows with WSL + +#### Prerequisites +1. Check the internal version, press Win + r and input "winver", requires greater than 19041; otherwise, you need to update the Windows system + +![winversion](../assets/env/winversion.png) +![winversion2](../assets/env/winversion_2.png) + +2. Open Task Manager and confirm that CPU virtualization is enabled. + +![virtualize](../assets/env/virtualize.png) +![winversion2](../assets/env/virtualize_2.png) + +If not enabled, you need to enable it and restart your computer + +![winversion3](../assets/env/virtualize_3.png) +![winversion4](../assets/env/virtualize_4.png) + +#### Download WSL2 Kernel Update Package +WSL 2 Linux kernel update package address: https://aka.ms/wsl2kernel +After downloading, run the file directly + +#### Install Linux System +1. Open PowerShell as Administrator, then view the list of available Linux distributions from the online store +```powershell +PS C:\Users\name> wsl --list --online +The following is a list of valid distributions that can be installed. +Use "wsl --install -d " to install. + +NAME FRIENDLY NAME +Ubuntu Ubuntu +Debian Debian GNU/Linux +kali-linux Kali Linux Rolling +Ubuntu-18.04 Ubuntu 18.04 LTS +Ubuntu-20.04 Ubuntu 20.04 LTS +Ubuntu-22.04 Ubuntu 22.04 LTS +Ubuntu-24.04 Ubuntu 24.04 LTS +OracleLinux_7_9 Oracle Linux 7.9 +OracleLinux_8_10 Oracle Linux 8.10 +OracleLinux_9_5 Oracle Linux 9.5 +openSUSE-Leap-15.6 openSUSE Leap 15.6 +SUSE-Linux-Enterprise-15-SP6 SUSE Linux Enterprise 15 SP6 +openSUSE-Tumbleweed openSUSE Tumbleweed +``` + +2. View installed systems (none by default) +```powershell +PS C:\Users\name> wsl --list --verbose +No distributions have been installed for the Windows Subsystem for Linux. +You can install distributions by visiting the Microsoft Store: +https://aka.ms/wslstore +``` + +3. Install the specified system +```powershell +PS C:\Users\name> wsl --install -d Ubuntu-22.04 +Installing: Ubuntu 22.04 LTS +[= 3.0% +``` + +4. After installation, you need to input a username and password + +![passward](../assets/env/wsl_passward.png) + +5. View the mapped local path +Press Win + r and input \\wsl$ +Click on the Ubuntu folder, right-click, and click "Map network drive" to add it to My Computer. Note that you can only open this disk after starting Ubuntu. + +![map](../assets/env/map.png) + +#### Using WSL in VS Code +1. Install the WSL extension +2. Open WSL in the terminal +3. Install Python and lazyllm + +#### Using Local Command Line +Search for WSL directly, open it, and you can enter the subsystem + +### macOS diff --git a/docs/gen_mkdocs_yaml.py b/docs/gen_mkdocs_yaml.py index efe3e3915..d6e9b1be0 100644 --- a/docs/gen_mkdocs_yaml.py +++ b/docs/gen_mkdocs_yaml.py @@ -1,15 +1,18 @@ import os +import yaml -language = os.getenv('LAZYLLM_LANGUAGE', 'ENGLISH') +language = os.getenv('LAZYLLM_LANGUAGE', 'ENGLISH').upper() assert language in ('ENGLISH', 'CHINESE') with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'mkdocs.template.yml')) as f: - content = f.read() + config = yaml.safe_load(f) doc_dir = 'en' if language == 'ENGLISH' else 'zh' -en_default = 'true' if language == 'ENGLISH' else 'false' -zh_default = 'true' if language == 'CHINESE' else 'false' -content = content.format(doc_dir=doc_dir, en_default=en_default, zh_default=zh_default) +config['docs_dir'] = f'docs/{doc_dir}' -with open(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'mkdocs.yml'), 'w+') as f: - f.write(content) +nav_file = 'nav_en.yml' if language == 'ENGLISH' else 'nav_zh.yml' +with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), nav_file)) as f: + config['nav'] = yaml.safe_load(f) + +with open(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'mkdocs.yml'), 'w') as f: + yaml.dump(config, f, allow_unicode=True, sort_keys=False) diff --git a/docs/mkdocs.template.yml b/docs/mkdocs.template.yml index 605c889b6..079252bf3 100644 --- a/docs/mkdocs.template.yml +++ b/docs/mkdocs.template.yml @@ -2,59 +2,6 @@ site_name: LazyLLM repo_url: https://github.com/LazyAGI/LazyLLM repo_name: LazyAGI/LazyLLM docs_dir: docs/{doc_dir} -nav: -- Home: - - Getting Started: index.md - - FAQ: Home/FAQ.md - - Environment: Home/environment.md - - Supported Models: Home/model_list.md -- Cookbook: - - Chatbot: Cookbook/robot.md - - Painting Master: Cookbook/painting_master.md - - Multimodal Chatbot: Cookbook/multimodal_robot.md - - Great Writer: Cookbook/great_writer.md - - RAG: Cookbook/rag.md - - Streaming: Cookbook/streaming.md -- Best Practice: - - Flow: Best Practice/flow.md - - Flowapp: Best Practice/flowapp.md - - Module: Best Practice/module.md - - Prompt: Best Practice/prompt.md - - Rag: Best Practice/rag.md - - FunctionCall: Best Practice/functionCall.md - - Stream: Best Practice/stream.md -- Advanced Topics: - - Contribution: Advanced Topics/contribution.md - - Changelog: Advanced Topics/changelog.md -- Api Reference: - - Cli: API Reference/cli.md - - Common: API Reference/common.md - - Components: API Reference/components.md - - Configs: API Reference/configs.md - - Flow: API Reference/flow.md - - Module: API Reference/module.md - - Tools: API Reference/tools.md -- Tutorials: - - Overview: Tutorial/index.md - - Lesson 1: Tutorial/1.md - - Lesson 2: Tutorial/2.md - - Lesson 3: Tutorial/3.md - - Lesson 4: Tutorial/4.md - - Lesson 5: Tutorial/5.md - - Lesson 6: Tutorial/6.md - - Lesson 7: Tutorial/7.md - - Lesson 8: Tutorial/8.md - - Lesson 9: Tutorial/9.md - - Lesson 10: Tutorial/10.md - - Lesson 11: Tutorial/11.md - - Lesson 12: Tutorial/12.md - - Lesson 13: Tutorial/13.md - - Lesson 14: Tutorial/14.md - - Lesson 15: Tutorial/15.md - - Lesson 16: Tutorial/16.md - - Lesson 17: Tutorial/17.md - - Lesson 18: Tutorial/18.md - - Lesson 19: Tutorial/19.md theme: language: en name: material @@ -109,6 +56,7 @@ extra: link: https://github.com/LazyAGI/LazyLLM extra_javascript: - 'assets/js/assistant.js' + - 'assets/js/lang-redirect.js' - 'https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js' plugins: - search: diff --git a/docs/nav_en.yml b/docs/nav_en.yml new file mode 100644 index 000000000..e8df2b178 --- /dev/null +++ b/docs/nav_en.yml @@ -0,0 +1,54 @@ +- Home: + - Getting Started: index.md + - FAQ: Home/FAQ.md + - Environment: Home/environment.md + - Supported Models: Home/model_list.md +- Cookbook: + - Chatbot: Cookbook/robot.md + - Painting Master: Cookbook/painting_master.md + - Multimodal Chatbot: Cookbook/multimodal_robot.md + - Great Writer: Cookbook/great_writer.md + - RAG: Cookbook/rag.md + - Streaming: Cookbook/streaming.md +- Best Practice: + - Flow: Best Practice/flow.md + - Flowapp: Best Practice/flowapp.md + - Module: Best Practice/module.md + - Prompt: Best Practice/prompt.md + - Rag: Best Practice/rag.md + - FunctionCall: Best Practice/functionCall.md + - Stream: Best Practice/stream.md +- Advanced Topics: + - Contribution: Advanced Topics/contribution.md + - Changelog: Advanced Topics/changelog.md +- Api Reference: + - CLI: API Reference/cli.md + - Common: API Reference/common.md + - Components: API Reference/components.md + - Flow: API Reference/flow.md + - Module: API Reference/module.md + - Tools: API Reference/tools.md + - Configs: API Reference/configs.md + - Launcher: API Reference/launcher.md + - Hook: API Reference/hook.md +- Tutorials: + - Overview: Tutorial/index.md + # - 1. RAG Fundamentals: Tutorial/1.md + # - 2. Quickstart with RAG: Tutorial/2.md + # - 3. Mastering LLM with LazyLLM: Tutorial/3.md + # - 4. Engineering Basics: Tutorial/4.md + # - 5. Custom Document Readers: Tutorial/5.md + # - 6. Retrieval Optimization: Tutorial/6.md + # - 7. Hands-on Retrieval Tuning: Tutorial/7.md + # - 8. Custom Retrieval Strategies: Tutorial/8.md + # - 9. Domain-Specific Fine-Tuning: Tutorial/9.md + # - 10. Deepseek Integration: Tutorial/10.md + # - 11. Performance Optimization: Tutorial/11.md + # - 12. Speed-Up Techniques: Tutorial/12.md + # - 13. Multimodal RAG: Tutorial/13.md + # - 14. Academic Paper QA: Tutorial/14.md + # - 15. Statistical RAG: Tutorial/15.md + # - 16. Advanced Paper QA: Tutorial/16.md + # - 17. Enterprise RAG Solutions: Tutorial/17.md + # - 18. Agentic RAG: Tutorial/18.md + # - 19. Knowledge Graph RAG: Tutorial/19.md diff --git a/docs/nav_zh.yml b/docs/nav_zh.yml new file mode 100644 index 000000000..167380f22 --- /dev/null +++ b/docs/nav_zh.yml @@ -0,0 +1,54 @@ +- 首页: + - 快速开始: index.md + - 常见问题: Home/FAQ.md + - 环境配置: Home/environment.md + - 支持模型: Home/model_list.md +- 使用示例: + - 聊天机器人: Cookbook/robot.md + - 绘画大师: Cookbook/painting_master.md + - 多模态聊天: Cookbook/multimodal_robot.md + - 写作大师: Cookbook/great_writer.md + - 检索增强: Cookbook/rag.md + - 流式输出: Cookbook/streaming.md +- 最佳实践: + - 工作流: Best Practice/flow.md + - 流程应用: Best Practice/flowapp.md + - 模块: Best Practice/module.md + - 提示词: Best Practice/prompt.md + - 检索增强: Best Practice/rag.md + - 函数调用: Best Practice/functionCall.md + - 流式处理: Best Practice/stream.md +- 高级主题: + - 贡献指南: Advanced Topics/contribution.md + - 更新日志: Advanced Topics/changelog.md +- API参考: + - 命令行: API Reference/cli.md + - 通用: API Reference/common.md + - 组件: API Reference/components.md + - 工作流: API Reference/flow.md + - 模块: API Reference/module.md + - 工具: API Reference/tools.md + - 配置: API Reference/configs.md + - 启动器: API Reference/launcher.md + - 钩子: API Reference/hook.md +- 教程: + - 概述: Tutorial/index.md + - 1. RAG原理解读: Tutorial/1.md + - 2. 快速上手RAG: Tutorial/2.md + - 3. Lazy玩转LLM: Tutorial/3.md + - 4. 工程化入门: Tutorial/4.md + - 5. 自定义Reader: Tutorial/5.md + - 6. 召回优化技巧: Tutorial/6.md + - 7. 召回优化实战: Tutorial/7.md + - 8. 自定义召回策略: Tutorial/8.md + - 9. 领域微调实践: Tutorial/9.md + - 10. Deepseek实战: Tutorial/10.md + - 11. 性能优化指南: Tutorial/11.md + - 12. 性能加速实践: Tutorial/12.md + - 13. 多模态RAG: Tutorial/13.md + - 14. 学术论文问答: Tutorial/14.md + - 15. RAG的统计问题: Tutorial/15.md + - 16. 论文问答进阶: Tutorial/16.md + - 17. 企业级RAG方案: Tutorial/17.md + - 18. Agentic RAG: Tutorial/18.md + - 19. 知识图谱 RAG: Tutorial/19.md diff --git a/docs/scripts/lazynote/manager/base.py b/docs/scripts/lazynote/manager/base.py index 3e4dc4aeb..eb93fb922 100644 --- a/docs/scripts/lazynote/manager/base.py +++ b/docs/scripts/lazynote/manager/base.py @@ -137,7 +137,7 @@ def traverse(self, obj: object, skip_modules: Optional[List[str]] = None) -> Non skip_modules = [] if get_member_type(obj) == MemberType.PACKAGE: - for importer, modname, ispkg in pkgutil.walk_packages(obj.__path__, obj.__name__ + "."): + for _, modname, ispkg in pkgutil.walk_packages(obj.__path__, obj.__name__ + "."): if any(modname.startswith(skip_mod) for skip_mod in skip_modules): continue if ispkg: @@ -172,7 +172,7 @@ async def atraverse(self, obj: object, skip_modules: Optional[List[str]] = None, if get_member_type(obj) == MemberType.PACKAGE: tasks = [] - for importer, modname, ispkg in pkgutil.walk_packages(obj.__path__, obj.__name__ + "."): + for _, modname, ispkg in pkgutil.walk_packages(obj.__path__, obj.__name__ + "."): if any(modname.startswith(skip_mod) for skip_mod in skip_modules): continue if ispkg: diff --git a/docs/zh/API Reference/common.md b/docs/zh/API Reference/common.md index 3df31ac99..c18768b2c 100644 --- a/docs/zh/API Reference/common.md +++ b/docs/zh/API Reference/common.md @@ -4,8 +4,23 @@ options: heading_level: 3 +::: lazyllm.common.registry.LazyDict + options: + heading_level: 3 + members: [remove, set_default] + --- +::: lazyllm.common.common.ResultCollector + members: + - keys + - items + exclude-members: + +::: lazyllm.common.common.EnvVarContextManager + members: + exclude-members: + ## Bind ::: lazyllm.common.bind @@ -22,15 +37,88 @@ --- +## Identity + +::: lazyllm.common.Identity + options: + heading_level: 3 + +--- + ## Compilation ::: lazyllm.common.compile_func options: heading_level: 3 +--- + ## Queue ::: lazyllm.common.FileSystemQueue members: enqueue, dequeue, peek, size, clear exclude-members: - \ No newline at end of file + +::: lazyllm.common.multiprocessing.SpawnProcess + members: start + +::: lazyllm.common.queue.SQLiteQueue + options: + heading_level: 3 + +::: lazyllm.common.ReadOnlyWrapper + members: set, isNone + exclude-members: + +::: lazyllm.common.queue.RedisQueue + members: + exclude-members: + +--- + +## Multiprocessing + +::: lazyllm.common.ForkProcess + members: work, start + exclude-members: + +--- + +## Options + +::: lazyllm.common.Option + members: + exclude-members: + +--- + +## DynamicDescriptor + +::: lazyllm.common.DynamicDescriptor + members: + - Impl + exclude-members: + +::: lazyllm.common.CaseInsensitiveDict + members: + exclude-members: + +::: lazyllm.common.ProcessPoolExecutor + members: submit + exclude-members: + +--- + +## Threading + +::: lazyllm.common.Thread + members: work, get_result + exclude-members: + +--- + +## LazyLLMCMD + +::: lazyllm.common.LazyLLMCMD + members: with_cmd, get_args + exclude-members: diff --git a/docs/zh/API Reference/components.md b/docs/zh/API Reference/components.md index cff6b2c84..b87852208 100644 --- a/docs/zh/API Reference/components.md +++ b/docs/zh/API Reference/components.md @@ -12,6 +12,14 @@ options: heading_level: 3 +::: lazyllm.components.deploy.LazyLLMDeployBase + options: + heading_level: 3 + +::: lazyllm.components.deploy.LazyLLMDeployBase.extract_result + options: + heading_level: 3 + ::: lazyllm.components.finetune.FlagembeddingFinetune options: heading_level: 3 @@ -29,6 +37,7 @@ ::: lazyllm.components.deploy.Lightllm options: heading_level: 3 + members: [cmd, geturl, extract_result] ::: lazyllm.components.deploy.Vllm options: @@ -37,44 +46,49 @@ ::: lazyllm.components.deploy.LMDeploy options: heading_level: 3 + members: [cmd, geturl, extract_result] -::: lazyllm.components.auto.AutoDeploy +::: lazyllm.components.deploy.base.DummyDeploy options: heading_level: 3 -::: lazyllm.components.deploy.EmbeddingDeploy +::: lazyllm.components.auto.AutoDeploy options: heading_level: 3 -::: lazyllm.components.deploy.embed.RerankDeploy +::: lazyllm.components.deploy.embed.AbstractEmbedding options: heading_level: 3 -::: lazyllm.components.deploy.Mindie +::: lazyllm.components.deploy.EmbeddingDeploy options: heading_level: 3 -::: lazyllm.components.deploy.OCRDeploy + +::: lazyllm.components.deploy.embed.RerankDeploy options: heading_level: 3 ---- -## Launcher +::: lazyllm.components.deploy.embed.LazyHuggingFaceRerank + options: + heading_level: 3 + members: [load_reranker, rebuild] -::: lazyllm.launcher.EmptyLauncher +::: lazyllm.components.deploy.Mindie options: heading_level: 3 -::: lazyllm.launcher.RemoteLauncher + +::: lazyllm.components.deploy.OCRDeploy options: heading_level: 3 +--- -::: lazyllm.launcher.SlurmLauncher +::: lazyllm.components.deploy.relay.base.RelayServer options: heading_level: 3 - filters: - - '!get_idle' + members: [cmd, geturl] -::: lazyllm.launcher.ScoLauncher +::: lazyllm.components.deploy.OCRDeploy options: heading_level: 3 @@ -83,12 +97,12 @@ ## Prompter ::: lazyllm.components.prompter.LazyLLMPrompterBase - options: - heading_level: 3 + options: + heading_level: 3 inherited_members: - generate_prompt - get_response - members: false + members: [pre_hook] ::: lazyllm.components.prompter.EmptyPrompter options: @@ -118,12 +132,45 @@ --- -## Register +## MultiModal + +### Text to Image -::: lazyllm.common.Register +::: lazyllm.components.StableDiffusionDeploy options: - heading_level: 3 + heading_level: 4 + +### Visual Question Answering + +Reference [LMDeploy][lazyllm.components.deploy.LMDeploy], which supports the Visual Question Answering model. + +### Text to Sound + +::: lazyllm.components.TTSDeploy + options: + heading_level: 4 + +::: lazyllm.components.ChatTTSDeploy + options: + heading_level: 4 + +::: lazyllm.components.BarkDeploy + options: + heading_level: 4 +::: lazyllm.components.MusicGenDeploy + options: + heading_level: 4 + +### Speech to Text + +::: lazyllm.components.SenseVoiceDeploy + options: + heading_level: 4 + +::: lazyllm.components.deploy.speech_to_text.sense_voice.SenseVoice + options: + heading_level: 4 --- ## ModelManager @@ -178,38 +225,8 @@ --- -## MultiModal - -### Text to Image - -::: lazyllm.components.StableDiffusionDeploy +## ComponentBase +::: lazyllm.components.core.ComponentBase options: - heading_level: 4 - -### Visual Question Answering - -Reference [LMDeploy][lazyllm.components.deploy.LMDeploy], which supports the Visual Question Answering model. - -### Text to Sound - -::: lazyllm.components.TTSDeploy - options: - heading_level: 4 - -::: lazyllm.components.ChatTTSDeploy - options: - heading_level: 4 - -::: lazyllm.components.BarkDeploy - options: - heading_level: 4 - -::: lazyllm.components.MusicGenDeploy - options: - heading_level: 4 - -### Speech to Text - -::: lazyllm.components.SenseVoiceDeploy - options: - heading_level: 4 + heading_level: 3 + members: [apply, cmd] diff --git a/docs/zh/API Reference/configs.md b/docs/zh/API Reference/configs.md index 7f92ce611..74c3db8dd 100644 --- a/docs/zh/API Reference/configs.md +++ b/docs/zh/API Reference/configs.md @@ -4,4 +4,7 @@ - done - getenv - add - - get_all_configs \ No newline at end of file + - get_all_configs + - get_config + - temp + - refresh \ No newline at end of file diff --git a/docs/zh/API Reference/flow.md b/docs/zh/API Reference/flow.md index f21ceeaf3..3eb015de1 100644 --- a/docs/zh/API Reference/flow.md +++ b/docs/zh/API Reference/flow.md @@ -1,19 +1,26 @@ ::: lazyllm.flow.FlowBase - members: is_root, ancestor, for_each + members: is_root, ancestor, for_each, id exclude-members: ::: lazyllm.flow.LazyLLMFlowsBase members: + - register_hook + - unregister_hook + - clear_hooks + - set_sync + - wait + - invoke + - bind exclude-members: ::: lazyllm.flow.Pipeline - members: + members: output exclude-members: ::: lazyllm.flow.save_pipeline_result ::: lazyllm.flow.Parallel - members: + members: join, sequential exclude-members: ::: lazyllm.flow.Diverter diff --git a/docs/zh/API Reference/hook.md b/docs/zh/API Reference/hook.md new file mode 100644 index 000000000..d84ad89da --- /dev/null +++ b/docs/zh/API Reference/hook.md @@ -0,0 +1,3 @@ +::: lazyllm.hook.LazyLLMHook + members: pre_hook, post_hook, report + exclude-members: \ No newline at end of file diff --git a/docs/zh/API Reference/launcher.md b/docs/zh/API Reference/launcher.md new file mode 100644 index 000000000..7a4a72dd9 --- /dev/null +++ b/docs/zh/API Reference/launcher.md @@ -0,0 +1,35 @@ +::: lazyllm.LazyLLMLaunchersBase + options: + members: + - makejob + - launch + - cleanup + - wait + - clone + +::: lazyllm.launcher.EmptyLauncher + options: + heading_level: 3 + +::: lazyllm.launcher.RemoteLauncher + options: + heading_level: 3 + +::: lazyllm.launcher.SlurmLauncher + options: + heading_level: 3 + filters: + - '!get_idle' + +::: lazyllm.launcher.ScoLauncher + options: + heading_level: 3 + +::: lazyllm.launcher.Job + options: + heading_level: 3 + +::: lazyllm.launcher.K8sLauncher + options: + heading_level: 3 + members: [makejob, launch] \ No newline at end of file diff --git a/docs/zh/API Reference/module.md b/docs/zh/API Reference/module.md index af1c71769..4d8f579a9 100644 --- a/docs/zh/API Reference/module.md +++ b/docs/zh/API Reference/module.md @@ -10,7 +10,14 @@ - start - restart - update - + +::: lazyllm.module.servermodule.LLMBase + options: + members: + - prompt + - formatter + - share + ::: lazyllm.module.ActionModule options: members: @@ -48,17 +55,25 @@ members: ::: lazyllm.module.TrialModule - members: start + members: [start] exclude-members: ::: lazyllm.module.OnlineChatModule members: exclude-members: +::: lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoModule + members: + exclude-members: + ::: lazyllm.module.OnlineEmbeddingModule members: exclude-members: +::: lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIEmbedding + members: + exclude-members: + ::: lazyllm.module.OnlineChatModuleBase options: members: @@ -68,3 +83,11 @@ ::: lazyllm.module.OnlineEmbeddingModuleBase members: exclude-members: forward + +::: lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoEmbedding + options: + members: + +::: lazyllm.module.llms.onlinemodule.fileHandler.FileHandlerBase + members: get_finetune_data + exclude-members: diff --git a/docs/zh/API Reference/tools.md b/docs/zh/API Reference/tools.md index 4d8da9bad..0f674d9ab 100644 --- a/docs/zh/API Reference/tools.md +++ b/docs/zh/API Reference/tools.md @@ -1,15 +1,13 @@ -::: lazyllm.tools.Document +::: lazyllm.tools.IntentClassifier members: + - intent_promt_hook + - post_process_result exclude-members: -::: lazyllm.tools.rag.store.ChromadbStore +::: lazyllm.tools.Document members: exclude-members: -::: lazyllm.tools.rag.store.MilvusStore - members: - exclude-members: - ::: lazyllm.tools.rag.store.ChromadbStore members: exclude-members: @@ -20,95 +18,67 @@ ::: lazyllm.tools.rag.readers.ReaderBase members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.component.bm25 +::: lazyllm.tools.rag.readers.readerBase.LazyLLMReaderBase members: exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaItem +::: lazyllm.tools.rag.readers.PandasExcelReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocGenreAnalyser +::: lazyllm.tools.rag.readers.PDFReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaAnalyser +::: lazyllm.tools.rag.readers.PPTXReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocInfoExtractor +::: lazyllm.tools.rag.readers.VideoAudioReader members: - exclude-members: + exclude-members: -::: lazyllm.tools.rag.doc_to_db.DocToDbProcessor +::: lazyllm.tools.SqlManager members: - - extract_info_from_docs - - analyze_info_schema_by_llm - exclude-members: - -::: lazyllm.tools.rag.doc_to_db.extract_db_schema_from_files - -::: lazyllm.tools.rag.readers.DocxReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.EpubReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.HWPReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.ImageReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.IPYNBReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.MagicPDFReader - members: - exclude-members: - -::: lazyllm.tools.rag.readers.MarkdownReader - members: - - remove_images - - remove_hyperlinks - exclude-members: - -::: lazyllm.tools.rag.readers.MboxReader - members: - exclude-members: + - get_session + - check_connection + - set_desc + - get_all_tables + - get_table_orm_class + - execute_commit + - execute_query + - create_table + - drop_table + - insert_values + exclude-members: -::: lazyllm.tools.rag.component.bm25 +::: lazyllm.tools.rag.component.bm25.BM25 members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaItem members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocGenreAnalyser members: - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocInfoSchemaAnalyser - members: + members: analyse_info_schema exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocInfoExtractor - members: + members: extract_doc_info exclude-members: ::: lazyllm.tools.rag.doc_to_db.DocToDbProcessor members: - extract_info_from_docs - analyze_info_schema_by_llm - exclude-members: + exclude-members: ::: lazyllm.tools.rag.doc_to_db.extract_db_schema_from_files @@ -132,7 +102,7 @@ members: exclude-members: -::: lazyllm.tools.rag.readers.MagicPDFReader +::: lazyllm.tools.rag.readers.MineruPDFReader members: exclude-members: @@ -146,9 +116,15 @@ members: exclude-members: +::: lazyllm.tools.rag.default_index.DefaultIndex + members: + - update + - remove + - query + exclude-members: + ::: lazyllm.tools.Reranker - members: register_reranker - members: register_reranker + members: [register_reranker] exclude-members: forward ::: lazyllm.tools.Retriever @@ -156,17 +132,46 @@ exclude-members: forward ::: lazyllm.tools.rag.retriever.TempDocRetriever - members: + members: [create_node_group, add_subretriever] exclude-members: -::: lazyllm.tools.rag.retriever.TempDocRetriever - members: +::: lazyllm.tools.rag.retriever.UrlDocument + members: [find] exclude-members: ::: lazyllm.tools.rag.DocManager members: exclude-members: +::: lazyllm.tools.rag.utils.SqliteDocListManager + members: + - table_inited + - get_status_cond_and_params + - validate_paths + - update_need_reparsing + - list_files + - get_docs + - set_docs_new_meta + - fetch_docs_changed_meta + - list_all_kb_group + - add_kb_group + - list_kb_group_files + - delete_unreferenced_doc + - get_docs_need_reparse + - get_existing_paths_by_pattern + - update_file_message + - update_file_status + - add_files_to_kb_group + - delete_files_from_kb_group + - get_file_status + - update_kb_group + - release + exclude-members: + +::: lazyllm.tools.rag.data_loaders.DirectoryReader + members: load_data + exclude-members: + ::: lazyllm.tools.SentenceSplitter members: exclude-members: @@ -191,28 +196,12 @@ lazyllm.tools.rag.transform.NodeTransform members: exclude-members: -::: lazyllm.tools.rag.dataReader.SimpleDirectoryReader - members: - exclude-members: - -::: lazyllm.tools.rag.dataReader.FileReader - members: - exclude-members: - -lazyllm.tools.rag.transform.NodeTransform - members: - exclude-members: - -::: lazyllm.tools.rag.transform.TransformArgs - members: - exclude-members: - -::: lazyllm.tools.rag.similarity.register_similarity - members: +::: lazyllm.tools.rag.doc_node.QADocNode + members: get_text exclude-members: -::: lazyllm.tools.rag.doc_node.DocNode - members: +::: lazyllm.tools.rag.doc_processor.DocumentProcessor + members: register_algorithm, drop_algorithm exclude-members: ::: lazyllm.tools.rag.dataReader.SimpleDirectoryReader @@ -222,14 +211,17 @@ lazyllm.tools.rag.transform.NodeTransform ::: lazyllm.tools.rag.dataReader.FileReader members: exclude-members: - + +::: lazyllm.tools.rag.web.DocWebModule + members: + exclude-members: ::: lazyllm.tools.WebModule members: exclude-members: forward ::: lazyllm.tools.CodeGenerator - members: + members: [choose_prompt] exclude-members: forward ::: lazyllm.tools.ParameterExtractor @@ -237,7 +229,7 @@ lazyllm.tools.rag.transform.NodeTransform exclude-members: forward ::: lazyllm.tools.QustionRewrite - members: + members: choose_prompt exclude-members: forward ::: lazyllm.tools.agent.toolsManager.ToolManager @@ -272,8 +264,20 @@ lazyllm.tools.rag.transform.NodeTransform members: exclude-members: forward -::: lazyllm.tools.IntentClassifier - members: +::: lazyllm.tools.rag.smart_embedding_index.SmartEmbeddingIndex + members: update, remove, query + exclude-members: + +::: lazyllm.tools.rag.doc_node.ImageDocNode + members: do_embedding, get_content, get_text + exclude-members: + +::: lazyllm.tools.rag.transform.AdaptiveTransform + members: transform + exclude-members: + +::: lazyllm.tools.rag.rerank.ModuleReranker + members: forward exclude-members: ::: lazyllm.tools.rag.utils.DocListManager members: @@ -284,6 +288,18 @@ lazyllm.tools.rag.transform.NodeTransform ::: lazyllm.tools.rag.index_base.IndexBase members: +::: lazyllm.tools.rag.IndexBase.update + members: + exclude-members: + +::: lazyllm.tools.rag.IndexBase.remove + members: + exclude-members: + +::: lazyllm.tools.rag.IndexBase.query + members: + exclude-members: + ::: lazyllm.tools.BaseEvaluator members: exclude-members: @@ -317,61 +333,36 @@ lazyllm.tools.rag.transform.NodeTransform exclude-members: ::: lazyllm.tools.DBManager - members: + members: execute_query exclude-members: ::: lazyllm.tools.MongoDBManager members: exclude-members: -::: lazyllm.tools.rag.utils.DocListManager - members: - exclude-members: -::: lazyllm.tools.rag.global_metadata.GlobalMetadataDesc - members: - exclude-members: -::: lazyllm.tools.rag.index_base.IndexBase - members: - -::: lazyllm.tools.BaseEvaluator - members: - exclude-members: - -::: lazyllm.tools.ResponseRelevancy - members: - exclude-members: -::: lazyllm.tools.Faithfulness - members: - exclude-members: - -::: lazyllm.tools.LLMContextRecall - members: - exclude-members: - -::: lazyllm.tools.NonLLMContextRecall +::: lazyllm.tools.HttpTool members: exclude-members: -::: lazyllm.tools.ContextRelevance +::: lazyllm.tools.agent.functionCall.StreamResponse members: exclude-members: -::: lazyllm.tools.HttpRequest - members: +::: lazyllm.tools.MCPClient + members: [call_tool, list_tools, get_tools, aget_tools, deploy] exclude-members: -::: lazyllm.tools.JobDescription - members: - exclude-members: +::: lazyllm.tools.tools.GoogleSearch + members: forward -::: lazyllm.tools.DBManager +::: lazyllm.tools.tools.tencent_search.TencentSearch members: exclude-members: -::: lazyllm.tools.MongoDBManager +::: lazyllm.tools.rag.web.WebUi members: exclude-members: -::: lazyllm.tools.HttpTool - members: +::: lazyllm.tools.http_request.http_executor_response.HttpExecutorResponse + members: extract_file, get_content_type exclude-members: \ No newline at end of file diff --git a/docs/zh/Home/environment.md b/docs/zh/Home/environment.md index 6754c4fe7..b095fc30c 100644 --- a/docs/zh/Home/environment.md +++ b/docs/zh/Home/environment.md @@ -20,3 +20,145 @@ - gradio_client: Gradio的客户端库,允许用户从远程服务器加载和使用Gradio界面。 - protobuf: Google的Protocol Buffers的Python实现,用于序列化结构化数据。 - setuptools: 一个Python包安装和分发工具,用于打包和分发Python应用程序和库。 + + +## 在不同操作系统上安装 + +### windows + +#### step 1: 安装git +下载并安装: +https://github.com/git-for-windows/git/releases/download/v2.50.1.windows.1/Git-2.50.1-64-bit.exe + +#### step 2: 安装python +官网:https://python.p2hp.com/downloads/ +推荐: python3.10.9 +1. 选择对应版本下载,安装时选择 Customize installation 自定义安装路径,勾选下面的加入PATH +!!! Note + 如果已经安装过可选择 uninstall 卸载后重新安装 + +![install_python](../assets/env/install_python.png) + +2. 自定义安装路径为,可以设置为 D:\Python\Python310 + +![set_python_install_path](../assets/env/set_python_install_path.png) + + +#### step 3: 安装和使用VS Code +1. 下载vscode并安装 +2. 安装python组件 + +![vscode_extensions](../assets/env/vscode_extensions.png) + +3. 在vscode中随便打开一个python文件后,可在最下面选择python解释器 + +![vscode_interpret](../assets/env/vscode_interpret.png) + +4. 默认会识别到所有的解释器,选择一个;或者手动输入两遍 D:\Python\Python310\python.exe + +![vscode_interpret_manual](../assets/env/vscode_interpret_manual.png) + +5. 终端中选用git bash 就可以使用类似 Linux 的命令行环境 + +![git_bash](../assets/env/git_bash.png) + +#### step 4: 安装LazyLLM +1. 在终端中通过命令行安装lazyllm +```code +pip install lazyllm +``` + +2. 设置环境变量 key + +在powershell中,通过如下代码设置 +```code +$env:LAZYLLM_SENSENOVA_API_KEY = "7ACAxxxxxxxxxxxxxxx" +$env:LAZYLLM_SENSENOVA_SECRET_KEY = "2B0F7xxxxxxxxxxxxxxxx" +``` + +在bash中,通过如下代码设置 +```code +export LAZYLLM_SENSENOVA_API_KEY="7ACACxxxxxxxxxxxxxxx" +export LAZYLLM_SENSENOVA_SECRET_KEY="2B0F72xxxxxxxxxxxxxx" +``` + +### windows with wsl + +#### 前置条件 +1. 查看内部版本,Win + r 输入winver 要求大于19041;否则需更新windows系统 + +![winversion](../assets/env/winversion.png) +![winversion2](../assets/env/winversion_2.png) + +2. 打开任务管理器,确认cpu虚拟化开启。 + +![virtualize](../assets/env/virtualize.png) +![winversion2](../assets/env/virtualize_2.png) + +如果没有的话,需打开,并重启电脑 + +![winversion3](../assets/env/virtualize_3.png) +![winversion4](../assets/env/virtualize_4.png) + +#### 下载wsl2内核更新包 +WSL 2 Linux内核更新包地址:https://aka.ms/wsl2kernel +下载好后,直接运行文件 + +#### 安装linux系统 +1. 调出powershell 以管理员身份运行,然后查看在线商店下载的可用 Linux 分发版的列表 +```code +PS C:\Users\name> wsl --list --online +以下是可安装的有效分发的列表。 +请使用“wsl --install -d <分发>”安装。 + +NAME FRIENDLY NAME +Ubuntu Ubuntu +Debian Debian GNU/Linux +kali-linux Kali Linux Rolling +Ubuntu-18.04 Ubuntu 18.04 LTS +Ubuntu-20.04 Ubuntu 20.04 LTS +Ubuntu-22.04 Ubuntu 22.04 LTS +Ubuntu-24.04 Ubuntu 24.04 LTS +OracleLinux_7_9 Oracle Linux 7.9 +OracleLinux_8_10 Oracle Linux 8.10 +OracleLinux_9_5 Oracle Linux 9.5 +openSUSE-Leap-15.6 openSUSE Leap 15.6 +SUSE-Linux-Enterprise-15-SP6 SUSE Linux Enterprise 15 SP6 +openSUSE-Tumbleweed openSUSE Tumbleweed +``` + +2. 查看已安装的系统,(默认没有安装过) +```code +PS C:\Users\name> wsl --list --verbose +适用于 Linux 的 Windows 子系统没有已安装的分发版。 +可以通过访问 Microsoft Store 来安装分发版: +https://aka.ms/wslstore +``` + +3. 安装指定系统 +```code +PS C:\Users\name> wsl --install -d Ubuntu-22.04 +正在安装: Ubuntu 22.04 LTS +[= 3.0% +``` + +4. 安装完后要输入一个账密 + +![passward](../assets/env/wsl_passward.png) + +5. 查看映射的本地路径 +Win + r 输入 \\wsl$ +点击Ubantu文件夹,右键,点击映射网络驱动器就可以添加到我的电脑里了,注意只有启动Ubantu之后才可以打开该磁盘。 + +![map](../assets/env/map.png) + +#### 在vscode中使用wsl +1. 安装插件wsl +2. 终端打开wsl +3. 安装python和lazyllm + +#### 本地命令行使用 +直接搜索wsl,打开,即可进入子系统 + +### macOS + diff --git a/examples/rag_map_store_with_milvus_index.py b/examples/rag_map_store_with_milvus_index.py index 1687ffc44..792d91bf8 100644 --- a/examples/rag_map_store_with_milvus_index.py +++ b/examples/rag_map_store_with_milvus_index.py @@ -6,59 +6,61 @@ import tempfile def run(query): - _, store_file = tempfile.mkstemp(suffix=".db") - - milvus_store_conf = { - 'type': 'map', - 'indices': { - 'smart_embedding_index': { - 'backend': 'milvus', - 'kwargs': { - 'uri': store_file, - 'index_kwargs': { - 'index_type': 'HNSW', - 'metric_type': 'COSINE', - } + fd, store_file = tempfile.mkstemp(suffix=".db") + os.close(fd) + try: + milvus_store_conf = { + 'type': 'map', + 'indices': { + 'smart_embedding_index': { + 'backend': 'milvus', + 'kwargs': { + 'uri': store_file, + 'index_kwargs': { + 'index_type': 'FLAT', + 'metric_type': 'COSINE', + } + }, }, }, - }, - } + } - documents = lazyllm.Document(dataset_path="rag_master", - embed=lazyllm.TrainableModule("bge-large-zh-v1.5"), - manager=False, - store_conf=milvus_store_conf) + documents = lazyllm.Document(dataset_path="rag_master", + embed=lazyllm.TrainableModule("bge-large-zh-v1.5"), + manager=False, + store_conf=milvus_store_conf) - documents.create_node_group(name="sentences", - transform=lambda s: '。'.split(s)) + documents.create_node_group(name="sentences", + transform=lambda s: [x for x in s.split('。') if x.strip()]) - prompt = 'You will play the role of an AI Q&A assistant and complete a dialogue task.'\ - ' In this task, you need to provide your answer based on the given context and question.' + prompt = 'You will play the role of an AI Q&A assistant and complete a dialogue task.'\ + ' In this task, you need to provide your answer based on the given context and question.' - with lazyllm.pipeline() as ppl: - ppl.retriever = lazyllm.Retriever(doc=documents, group_name="sentences", topk=3, - index='smart_embedding_index') + with lazyllm.pipeline() as ppl: + ppl.retriever = lazyllm.Retriever(doc=documents, group_name="sentences", topk=3) - ppl.reranker = lazyllm.Reranker(name='ModuleReranker', - model="bge-reranker-large", - topk=1, - output_format='content', - join=True) | bind(query=ppl.input) + ppl.reranker = lazyllm.Reranker(name='ModuleReranker', + model="bge-reranker-large", + topk=1, + output_format='content', + join=True) | bind(query=ppl.input) - ppl.formatter = ( - lambda nodes, query: dict(context_str=nodes, query=query) - ) | bind(query=ppl.input) + ppl.formatter = ( + lambda nodes, query: dict(context_str=nodes, query=query) + ) | bind(query=ppl.input) - ppl.llm = lazyllm.TrainableModule('internlm2-chat-7b').prompt( - lazyllm.ChatPrompter(instruction=prompt, extra_keys=['context_str'])) + ppl.llm = lazyllm.TrainableModule('internlm2-chat-7b').prompt( + lazyllm.ChatPrompter(instruction=prompt, extra_keys=['context_str'])) rag = lazyllm.ActionModule(ppl) rag.start() res = rag(query) - - os.remove(store_file) - - return res + return res + finally: + try: + os.remove(store_file) + except Exception: + pass if __name__ == '__main__': res = run('何为天道?') diff --git a/lazyllm/__init__.py b/lazyllm/__init__.py index 2f2042a3f..1cd672ffd 100644 --- a/lazyllm/__init__.py +++ b/lazyllm/__init__.py @@ -18,14 +18,16 @@ FunctionCallAgent, fc_register, ReactAgent, PlanAndSolveAgent, ReWOOAgent, SentenceSplitter, LLMParser) from .docs import add_doc -from . import patch +from .patch import patch_os_env config.done() +patch_os_env(lambda key, value: config.refresh(key), config.refresh) del LazyLLMRegisterMetaClass # noqa F821 +del LazyLLMRegisterMetaABCClass # noqa F821 del _get_base_cls_from_registry # noqa F821 -del patch +del patch_os_env __all__ = [ diff --git a/lazyllm/cli/install.py b/lazyllm/cli/install.py index 04a40dc4a..5e46c9311 100644 --- a/lazyllm/cli/install.py +++ b/lazyllm/cli/install.py @@ -130,6 +130,18 @@ def install_multiple_packages(package_names_with_versions): packages_to_install.append(package_with_version) install_packages(packages_to_install) +def install_mineru(): + try: + subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--upgrade', 'pip', '-i', + 'https://mirrors.aliyun.com/pypi/simple/']) + subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'uv', '-i', + 'https://mirrors.aliyun.com/pypi/simple/']) + subprocess.check_call([sys.executable, '-m', 'uv', 'pip', 'install', + 'mineru[all]==2.1.10', '-i', 'https://mirrors.aliyun.com/pypi/simple/']) + except subprocess.CalledProcessError as e: + logging.error(f"Mineru installation failed: {e}") + sys.exit(1) + def install(commands): # noqa C901 extras_desc = load_extras_descriptions() epilog_lines = ["Supported extras groups:"] @@ -157,10 +169,14 @@ def install(commands): # noqa C901 logging.error("Extras for finetune/local inference are not supported on macOS/Windows.") sys.exit(1) - extras = load_extras() # dict of extras + extras = load_extras() # dict of extras deps = load_dependencies() # dict of dependencies to_install = OrderedDict() + if "mineru" in items: + install_mineru() + items.remove("mineru") + for cmd in items: if cmd in extras: for pkg in extras[cmd]: diff --git a/lazyllm/common/__init__.py b/lazyllm/common/__init__.py index aa8d1f883..b135bca1a 100644 --- a/lazyllm/common/__init__.py +++ b/lazyllm/common/__init__.py @@ -1,4 +1,4 @@ -from .registry import LazyLLMRegisterMetaClass, _get_base_cls_from_registry, Register +from .registry import LazyLLMRegisterMetaClass, LazyLLMRegisterMetaABCClass, _get_base_cls_from_registry, Register from .common import package, kwargs, arguments, LazyLLMCMD, timeout, final, ReadOnlyWrapper, DynamicDescriptor, override from .common import FlatList, Identity, ResultCollector, ArgsDict, CaseInsensitiveDict from .common import ReprRule, make_repr, modify_repr, is_valid_url, is_valid_path @@ -17,6 +17,7 @@ __all__ = [ # registry 'LazyLLMRegisterMetaClass', + 'LazyLLMRegisterMetaABCClass', '_get_base_cls_from_registry', 'Register', @@ -95,5 +96,5 @@ 'LOG', # file-system queue - 'FileSystemQueue', + 'FileSystemQueue' ] diff --git a/lazyllm/common/bind.py b/lazyllm/common/bind.py index b1a1d6c24..c40a65335 100644 --- a/lazyllm/common/bind.py +++ b/lazyllm/common/bind.py @@ -1,14 +1,14 @@ import copy import builtins import itertools -from typing import Callable, Any +from typing import Callable, Any, Optional, List from .globals import globals from .common import package class AttrTree(object): - def __init__(self, name=None, pres=[]): - self._path = copy.deepcopy(pres) + def __init__(self, name: Optional[str] = None, pres: Optional[List[str]] = None): + self._path = copy.deepcopy(pres or []) if name is not None: self._path.append(name) @@ -58,7 +58,7 @@ def __reduce__(self) -> tuple[Any, ...]: def _setattr(self, key, v): raise RuntimeError(f'Cannot set attr for Placeholder, you want to set {key}={v}') -setattr(Placeholder, '__setattr__', _setattr) +Placeholder.__setattr__ = _setattr class _MetaBind(type): @@ -152,4 +152,4 @@ def __setattr__(self, __name: str, __value: Any) -> None: return super(__class__, self).__setattr__(__name, __value) -setattr(builtins, 'bind', Bind) +builtins.bind = Bind diff --git a/lazyllm/common/common.py b/lazyllm/common/common.py index e1de1c6d7..7e9ad245b 100644 --- a/lazyllm/common/common.py +++ b/lazyllm/common/common.py @@ -2,7 +2,7 @@ import os import builtins import typing -from typing import Any, Callable, Optional +from typing import Any, Callable, Optional, List, Dict from contextlib import contextmanager import copy import threading @@ -110,7 +110,7 @@ def append(self, x): return self -setattr(builtins, 'package', package) +builtins.package = package class LazyLLMCMD(object): @@ -248,7 +248,9 @@ def check_combine(cls, cate, type, subs): def rreplace(s, old, new, count): return (s[::-1].replace(old[::-1], new[::-1], count))[::-1] -def make_repr(category, type, *, name=None, subs=[], attrs=dict(), **kw): +def make_repr(category: str, type: str, *, name: Optional[str] = None, + subs: Optional[List[str]] = None, attrs: Optional[Dict[str, Any]] = None, **kw): + subs, attrs = subs or [], attrs or {} if len(kw) > 0: assert len(attrs) == 0, 'Cannot provide attrs and kwargs at the same time' attrs = kw diff --git a/lazyllm/common/multiprocessing.py b/lazyllm/common/multiprocessing.py index 64b802db2..960886ff4 100644 --- a/lazyllm/common/multiprocessing.py +++ b/lazyllm/common/multiprocessing.py @@ -24,8 +24,8 @@ def start(self): class ForkProcess(multiprocessing.Process): def __init__(self, group=None, target=None, name=None, args=(), - kwargs={}, *, daemon=None, sync=True): - super().__init__(group, ForkProcess.work(target, sync), name, args, kwargs, daemon=daemon) + kwargs=None, *, daemon=None, sync=True): + super().__init__(group, ForkProcess.work(target, sync), name, args, kwargs or {}, daemon=daemon) @staticmethod def work(f, sync): diff --git a/lazyllm/common/registry.py b/lazyllm/common/registry.py index 3b25891f7..67676e4e9 100644 --- a/lazyllm/common/registry.py +++ b/lazyllm/common/registry.py @@ -5,6 +5,7 @@ from .bind import _MetaBind from ..configs import config from typing import Optional +from abc import ABCMeta # Special Dict for lazy programmer. Suppose we have a LazyDict as follows: # >>> ld = LazyDict(name='ld', ALd=int) @@ -106,6 +107,9 @@ def __new__(metas, name, bases, attrs): return new_cls +class LazyLLMRegisterMetaABCClass(LazyLLMRegisterMetaClass, ABCMeta): pass + + def _get_base_cls_from_registry(cls_str, *, registry=LazyLLMRegisterMetaClass.all_clses): if cls_str == '': return registry.base diff --git a/lazyllm/components/auto/autodeploy.py b/lazyllm/components/auto/autodeploy.py index 68d99b7ec..7a3700b59 100644 --- a/lazyllm/components/auto/autodeploy.py +++ b/lazyllm/components/auto/autodeploy.py @@ -37,7 +37,6 @@ def _get_embed_deployer(launcher, type, kw): def get_deployer(cls, base_model: str, source: Optional[str] = None, trust_remote_code: bool = True, launcher: Optional[LazyLLMLaunchersBase] = None, type: Optional[str] = None, log_path: Optional[str] = None, **kw): - base_model = ModelManager(source).download(base_model) or '' model_name = get_model_name(base_model) kw['log_path'], kw['trust_remote_code'] = log_path, trust_remote_code if not type: diff --git a/lazyllm/components/auto/autofinetune.py b/lazyllm/components/auto/autofinetune.py index be42b3269..97ca26495 100644 --- a/lazyllm/components/auto/autofinetune.py +++ b/lazyllm/components/auto/autofinetune.py @@ -9,7 +9,7 @@ class AutoFinetune(LazyLLMFinetuneBase): def __new__(cls, base_model, target_path, source=lazyllm.config['model_source'], merge_path=None, ctx_len=1024, - batch_size=32, lora_r=8, launcher=launchers.remote(ngpus=1), **kw): + batch_size=32, lora_r=8, launcher=launchers.remote(ngpus=1), **kw): # noqa B008 base_model = ModelManager(source).download(base_model) or '' model_name = get_model_name(base_model) model_type = ModelManager.get_model_type(model_name) diff --git a/lazyllm/components/core.py b/lazyllm/components/core.py index c95533988..2cb288f04 100644 --- a/lazyllm/components/core.py +++ b/lazyllm/components/core.py @@ -5,7 +5,7 @@ from typing import Union class ComponentBase(object, metaclass=LazyLLMRegisterMetaClass): - def __init__(self, *, launcher=launchers.empty()): + def __init__(self, *, launcher=launchers.empty()): # noqa B008 self._llm_name = None self.job = ReadOnlyWrapper() if isinstance(launcher, LazyLLMLaunchersBase): diff --git a/lazyllm/components/deploy/base.py b/lazyllm/components/deploy/base.py index 5391950b0..54f3def19 100644 --- a/lazyllm/components/deploy/base.py +++ b/lazyllm/components/deploy/base.py @@ -19,7 +19,7 @@ class LazyLLMDeployBase(ComponentBase): def extract_result(output, inputs): return output - def __init__(self, *, launcher=launchers.remote()): + def __init__(self, *, launcher=launchers.remote()): # noqa B008 super().__init__(launcher=launcher) @@ -33,7 +33,7 @@ class DummyDeploy(LazyLLMDeployBase, flows.Pipeline): } } - def __init__(self, launcher=launchers.remote(sync=False), *, stream=False, **kw): + def __init__(self, launcher=launchers.remote(sync=False), *, stream=False, **kw): # noqa B008 super().__init__(launcher=launcher) def func(): diff --git a/lazyllm/components/deploy/infinity.py b/lazyllm/components/deploy/infinity.py index 7b39a091c..60919c55b 100644 --- a/lazyllm/components/deploy/infinity.py +++ b/lazyllm/components/deploy/infinity.py @@ -19,7 +19,7 @@ class Infinity(LazyLLMDeployBase): default_headers = {'Content-Type': 'application/json'} target_name = 'embeddings' - def __init__(self, launcher=launchers.remote(ngpus=1), model_type='embed', log_path=None, **kw): + def __init__(self, launcher=launchers.remote(ngpus=1), model_type='embed', log_path=None, **kw): # noqa B008 super().__init__(launcher=launcher) self.kw = ArgsDict({ 'host': '0.0.0.0', diff --git a/lazyllm/components/deploy/lightllm.py b/lazyllm/components/deploy/lightllm.py index 23ebfc927..5931b19bf 100644 --- a/lazyllm/components/deploy/lightllm.py +++ b/lazyllm/components/deploy/lightllm.py @@ -33,7 +33,7 @@ class Lightllm(LazyLLMDeployBase): stream_url_suffix = '_stream' stream_parse_parameters = {"delimiter": b"\n\n"} - def __init__(self, trust_remote_code=True, launcher=launchers.remote(ngpus=1), log_path=None, **kw): + def __init__(self, trust_remote_code=True, launcher=launchers.remote(ngpus=1), log_path=None, **kw): # noqa B008 super().__init__(launcher=launcher) self.kw = ArgsDict({ 'tp': 1, diff --git a/lazyllm/components/deploy/lmdeploy.py b/lazyllm/components/deploy/lmdeploy.py index 9e62fe65c..43176e92f 100644 --- a/lazyllm/components/deploy/lmdeploy.py +++ b/lazyllm/components/deploy/lmdeploy.py @@ -44,7 +44,7 @@ class LMDeploy(LazyLLMDeployBase): } stream_parse_parameters = {"delimiter": b"\n"} - def __init__(self, launcher=launchers.remote(ngpus=1), trust_remote_code=True, log_path=None, **kw): + def __init__(self, launcher=launchers.remote(ngpus=1), trust_remote_code=True, log_path=None, **kw): # noqa B008 super().__init__(launcher=launcher) self.kw = ArgsDict({ 'server-name': '0.0.0.0', diff --git a/lazyllm/components/deploy/mindie.py b/lazyllm/components/deploy/mindie.py index db7e61ee7..3d8ec58de 100644 --- a/lazyllm/components/deploy/mindie.py +++ b/lazyllm/components/deploy/mindie.py @@ -34,7 +34,7 @@ class Mindie(LazyLLMDeployBase): 'max_seq_len': ('maxSeqLen', int) } - def __init__(self, trust_remote_code=True, launcher=launchers.remote(), log_path=None, **kw): + def __init__(self, trust_remote_code=True, launcher=launchers.remote(), log_path=None, **kw): # noqa B008 super().__init__(launcher=launcher) assert lazyllm.config['mindie_home'], 'Ensure you have installed MindIE and \ "export LAZYLLM_MINDIE_HOME=/path/to/mindie/latest"' diff --git a/lazyllm/components/deploy/mineru/__init__.py b/lazyllm/components/deploy/mineru/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/lazyllm/components/deploy/mineru/mineru_patches.py b/lazyllm/components/deploy/mineru/mineru_patches.py new file mode 100644 index 000000000..f52910f06 --- /dev/null +++ b/lazyllm/components/deploy/mineru/mineru_patches.py @@ -0,0 +1,196 @@ +import copy +from mineru.backend.pipeline import pipeline_middle_json_mkcontent +from mineru.backend.pipeline.pipeline_middle_json_mkcontent import merge_para_with_text as pipeline_merge_para_with_text +from mineru.backend.vlm import vlm_middle_json_mkcontent +from mineru.backend.vlm.vlm_middle_json_mkcontent import merge_para_with_text as vlm_merge_para_with_text +from mineru.utils.enum_class import BlockType, ContentType + +# patches to mineru (to output bbox) + +def _parse_line_spans(para_block, page_idx): + lines_metas = [] + if 'lines' in para_block: + for line_info in para_block['lines']: + if not line_info['spans']: + continue + line_meta = copy.deepcopy(line_info['spans'][0]) + line_meta.pop('score', None) + cross_page = line_meta.pop('cross_page', None) + line_meta['page'] = page_idx + 1 if cross_page is True else page_idx + lines_metas.append(line_meta) + return lines_metas + + +# patches to pipeline + +def pipeline_make_blocks_to_content_list(para_block, img_buket_path, page_idx): # noqa: C901 + para_type = para_block['type'] + para_content = {} + if para_type in [BlockType.TEXT, BlockType.LIST, BlockType.INDEX]: + para_content = { + 'type': ContentType.TEXT, + 'text': pipeline_merge_para_with_text(para_block), + 'lines': _parse_line_spans(para_block, page_idx) + } + elif para_type == BlockType.TITLE: + para_content = { + 'type': ContentType.TEXT, + 'text': pipeline_merge_para_with_text(para_block), + 'lines': _parse_line_spans(para_block, page_idx) + } + title_level = pipeline_middle_json_mkcontent.get_title_level(para_block) + if title_level != 0: + para_content['text_level'] = title_level + elif para_type == BlockType.INTERLINE_EQUATION: + if len(para_block['lines']) == 0 or len(para_block['lines'][0]['spans']) == 0: + return None + para_content = { + 'type': ContentType.EQUATION, + 'img_path': f"{img_buket_path}/{para_block['lines'][0]['spans'][0].get('image_path', '')}", + 'lines': _parse_line_spans(para_block, page_idx) + } + if para_block['lines'][0]['spans'][0].get('content', ''): + para_content['text'] = pipeline_merge_para_with_text(para_block) + para_content['text_format'] = 'latex' + elif para_type == BlockType.IMAGE: + image_lines_metas = [] + para_content = { + 'type': ContentType.IMAGE, + 'img_path': '', + BlockType.IMAGE_CAPTION: [], + BlockType.IMAGE_FOOTNOTE: [] + } + for block in para_block['blocks']: + image_lines_metas.extend(_parse_line_spans(block, page_idx)) + if block['type'] == BlockType.IMAGE_BODY: + for line in block['lines']: + for span in line['spans']: + if span['type'] == ContentType.IMAGE: + if span.get('image_path', ''): + para_content['img_path'] = f"{img_buket_path}/{span['image_path']}" + if block['type'] == BlockType.IMAGE_CAPTION: + para_content[BlockType.IMAGE_CAPTION].append( + pipeline_merge_para_with_text(block)) + if block['type'] == BlockType.IMAGE_FOOTNOTE: + para_content[BlockType.IMAGE_FOOTNOTE].append( + pipeline_merge_para_with_text(block)) + para_content['lines'] = image_lines_metas + elif para_type == BlockType.TABLE: + para_content = { + 'type': ContentType.TABLE, + 'img_path': '', + BlockType.TABLE_CAPTION: [], + BlockType.TABLE_FOOTNOTE: [] + } + table_lines_metas = [] + for block in para_block['blocks']: + table_lines_metas.extend(_parse_line_spans(block, page_idx)) + if block['type'] == BlockType.TABLE_BODY: + for line in block['lines']: + for span in line['spans']: + if span['type'] == ContentType.TABLE: + if span.get('html', ''): + para_content[BlockType.TABLE_BODY] = f"{span['html']}" + + if span.get('image_path', ''): + para_content['img_path'] = f"{img_buket_path}/{span['image_path']}" + + if block['type'] == BlockType.TABLE_CAPTION: + para_content[BlockType.TABLE_CAPTION].append( + pipeline_merge_para_with_text(block)) + if block['type'] == BlockType.TABLE_FOOTNOTE: + para_content[BlockType.TABLE_FOOTNOTE].append( + pipeline_merge_para_with_text(block)) + para_content['lines'] = table_lines_metas + + para_content['page_idx'] = page_idx + para_content['bbox'] = para_block['bbox'] + return para_content + + +pipeline_middle_json_mkcontent.make_blocks_to_content_list = pipeline_make_blocks_to_content_list + + +# patches to vlm + +def vlm_make_blocks_to_content_list(para_block, img_buket_path, page_idx): # noqa: C901 + para_type = para_block['type'] + para_content = {} + if para_type in [BlockType.TEXT, BlockType.LIST, BlockType.INDEX]: + para_content = { + 'type': ContentType.TEXT, + 'text': vlm_merge_para_with_text(para_block), + 'lines': _parse_line_spans(para_block, page_idx) + } + elif para_type == BlockType.TITLE: + title_level = vlm_middle_json_mkcontent.get_title_level(para_block) + para_content = { + 'type': ContentType.TEXT, + 'text': vlm_merge_para_with_text(para_block), + 'lines': _parse_line_spans(para_block, page_idx) + } + if title_level != 0: + para_content['text_level'] = title_level + elif para_type == BlockType.INTERLINE_EQUATION: + para_content = { + 'type': ContentType.EQUATION, + 'text': vlm_merge_para_with_text(para_block), + 'text_format': 'latex', + 'lines': _parse_line_spans(para_block, page_idx) + } + elif para_type == BlockType.IMAGE: + image_lines_metas = [] + para_content = { + 'type': ContentType.IMAGE, + 'img_path': '', + BlockType.IMAGE_CAPTION: [], + BlockType.IMAGE_FOOTNOTE: [] + } + for block in para_block['blocks']: + image_lines_metas.extend(_parse_line_spans(block, page_idx)) + if block['type'] == BlockType.IMAGE_BODY: + for line in block['lines']: + for span in line['spans']: + if span['type'] == ContentType.IMAGE: + if span.get('image_path', ''): + para_content['img_path'] = f"{img_buket_path}/{span['image_path']}" + if block['type'] == BlockType.IMAGE_CAPTION: + para_content[BlockType.IMAGE_CAPTION].append( + vlm_merge_para_with_text(block)) + if block['type'] == BlockType.IMAGE_FOOTNOTE: + para_content[BlockType.IMAGE_FOOTNOTE].append( + vlm_merge_para_with_text(block)) + para_content['lines'] = image_lines_metas + elif para_type == BlockType.TABLE: + table_lines_metas = [] + para_content = { + 'type': ContentType.TABLE, + 'img_path': '', + BlockType.TABLE_CAPTION: [], + BlockType.TABLE_FOOTNOTE: [] + } + for block in para_block['blocks']: + table_lines_metas.extend(_parse_line_spans(block, page_idx)) + if block['type'] == BlockType.TABLE_BODY: + for line in block['lines']: + for span in line['spans']: + if span['type'] == ContentType.TABLE: + if span.get('html', ''): + para_content[BlockType.TABLE_BODY] = f"{span['html']}" + + if span.get('image_path', ''): + para_content['img_path'] = f"{img_buket_path}/{span['image_path']}" + + if block['type'] == BlockType.TABLE_CAPTION: + para_content[BlockType.TABLE_CAPTION].append( + vlm_merge_para_with_text(block)) + if block['type'] == BlockType.TABLE_FOOTNOTE: + para_content[BlockType.TABLE_FOOTNOTE].append( + vlm_merge_para_with_text(block)) + para_content['lines'] = table_lines_metas + + para_content['page_idx'] = page_idx + para_content['bbox'] = para_block['bbox'] + return para_content + +vlm_middle_json_mkcontent.make_blocks_to_content_list = vlm_make_blocks_to_content_list diff --git a/lazyllm/components/deploy/mineru/mineru_server_module.py b/lazyllm/components/deploy/mineru/mineru_server_module.py new file mode 100644 index 000000000..97d820043 --- /dev/null +++ b/lazyllm/components/deploy/mineru/mineru_server_module.py @@ -0,0 +1,417 @@ +import os +import json +import subprocess +import platform +import shutil +import hashlib +import uuid +import tempfile +import atexit +from pathlib import Path +from fastapi import UploadFile, File, Form, HTTPException +from fastapi.responses import JSONResponse +from typing import List, Optional, Union + +from lazyllm import ServerModule, LOG +from lazyllm import FastapiApp as app + +from mineru.cli.common import aio_do_parse, read_fn, pdf_suffixes, image_suffixes +from . import mineru_patches # noqa: F401 + + +def _check_libreoffice(): + system = platform.system() + + if system != 'Linux': + LOG.warning(f'[MINERU SERVER] The current system type only supports PDF parsing: {system}') + return False + + libreoffice_installed = False + commands = ['libreoffice', 'soffice'] + + for cmd in commands: + try: + result = subprocess.run([cmd, '--version'], capture_output=True, text=True, timeout=5) + if result.returncode == 0: + version = result.stdout.strip().split('\n')[0] + LOG.info(f'[MINERU SERVER] LibreOffice is installed: {version}') + libreoffice_installed = True + break + except (FileNotFoundError, subprocess.TimeoutExpired): + continue + + if not libreoffice_installed: + LOG.warning('[MINERU SERVER] LibreOffice is not installed, only PDF is supported') + return False + + try: + output = subprocess.check_output(['fc-list', ':lang=zh'], encoding='utf-8') + if not output.strip(): + LOG.warning('[MINERU SERVER] No Chinese fonts were detected, \ + the converted document may not display Chinese content properly. \ + It is recommended to install Chinese fonts: sudo apt install fonts-noto-cjk') + except Exception: + LOG.error('[MINERU SERVER] Font check failed') + + return True + + +class MineruServerBase: + def __init__(self, cache_dir: str = None, image_save_dir: str = None, + default_backend: str = 'pipeline', default_lang: str = 'ch_server', + default_parse_method: str = 'auto', default_formula_enable: bool = True, + default_table_enable: bool = True, default_return_md: bool = False, + default_return_content_list: bool = True, mem_fraction_static: float = 0.8): + if default_backend not in ['pipeline', 'vlm-sglang-engine', 'vlm-transformers']: + raise ValueError(f'Invalid backend: {default_backend}, \ + only support pipeline, vlm-sglang-engine, vlm-transformers') + if default_lang not in ['ch', 'ch_server', 'ch_lite', 'en']: + raise ValueError(f'Invalid language: {default_lang}, \ + only support ch, ch_server, ch_lite, en') + self._default_backend = default_backend + self._cache_dir = cache_dir + if image_save_dir: + self._image_save_dir = os.path.join(image_save_dir, 'images') + else: + self._image_save_dir = None + self._default_lang = default_lang + self._default_parse_method = default_parse_method + self._default_formula_enable = default_formula_enable + self._default_table_enable = default_table_enable + self._default_return_md = default_return_md + self._default_return_content_list = default_return_content_list + self._mem_fraction_static = mem_fraction_static + self._supported_office_types = ['.pptx', '.ppt', '.docx', '.doc'] if _check_libreoffice() else [] + LOG.info(f'[MINERU SERVER] Supported office types: {self._supported_office_types}') + self._middle_file_dir = tempfile.mkdtemp() + atexit.register(lambda: shutil.rmtree(self._middle_file_dir, ignore_errors=True)) + try: + for path in [self._cache_dir, self._image_save_dir]: + if path: + os.makedirs(path, exist_ok=True) + except Exception as e: + raise Exception(f'Failed to create directory: {e}') + + @app.post('/api/v1/pdf_parse') + async def parse_pdf(self, # noqa: C901 + files: List[str] = Form([]), # noqa B008 + upload_files: List[UploadFile] = File([]), # noqa B008 + use_cache: bool = Form(False, description='if True, chache_dir should be set'), # noqa B008 + lang: str = Form('ch_server', # noqa B008 + description='only use for pipeline,ch|ch_server|ch_lite|en'), + backend: str = Form(None, description='Parsing mode, vlm-sglang-engine|pipeline'), # noqa B008 + parse_method: str = Form('auto'), # noqa B008 + formula_enable: bool = Form(None, description='Whether to enable formula parsing'), # noqa B008 + table_enable: bool = Form(None, description='Whether to enable table parsing'), # noqa B008 + return_md: bool = Form(None, description='Whether to return markdown content'), # noqa B008 + return_content_list: bool = Form(None, description='Whether to return content list')): # noqa B008 + if files and upload_files: + raise HTTPException(status_code=400, detail='Either provide only \'files\' or only \'upload_files\'!') + for file in files: + if not os.path.isfile(file): + raise HTTPException(status_code=400, detail=f'File Not Found: {file}') + + if lang and lang not in ['ch', 'ch_server', 'ch_lite', 'en']: + raise HTTPException(status_code=400, detail=f'Invalid language: {lang}, \ + only support ch, ch_server, ch_lite, en') + + if backend and backend not in ['pipeline', 'vlm-sglang-engine', 'vlm-transformers']: + raise HTTPException(status_code=400, detail=f'Invalid backend: {backend}, \ + only support pipeline, vlm-sglang-engine, vlm-transformers') + + unique_id = str(uuid.uuid4()) + unique_dir = os.path.join(self._middle_file_dir, unique_id) + os.makedirs(unique_dir, exist_ok=True) + + if upload_files: + files = await self._resolve_upload_files(upload_files, unique_dir) + + for file in files: + if Path(file).suffix.lower() not in self._supported_office_types + ['.pdf']: + raise HTTPException(status_code=400, detail=f'Unsupported file type: {Path(file).suffix}') + + backend = backend or self._default_backend + lang = lang or self._default_lang + parse_method = parse_method or self._default_parse_method + formula_enable = formula_enable if formula_enable is not None else self._default_formula_enable + table_enable = table_enable if table_enable is not None else self._default_table_enable + return_md = return_md if return_md is not None else self._default_return_md + return_content_list = return_content_list if return_content_list is not None \ + else self._default_return_content_list + + LOG.info(f'[MINERU SERVER] GOT FILE {[Path(file).stem for file in files]} --- BACKEND: {backend}') + + try: + results = {file: {} for file in files} + if use_cache and not self._cache_dir: + LOG.warning('[MINERU SERVER] CACHE_DIR is not set, the Cache will not be used!') + + files_to_process = files + if use_cache and self._cache_dir: + results, files_to_process = self._check_cache(files, results, backend, + return_md, return_content_list, + table_enable, formula_enable) + if not files_to_process: + LOG.info(f'[MINERU SERVER] RETURN RESULTS FROM CACHE: {files}') + results = [results[file] for file in files] + return JSONResponse(status_code=200, content={'result': results, 'unique_id': unique_id}) + + mineru_results = await self._run_mineru(files_to_process, unique_dir, backend, lang, + parse_method, formula_enable, table_enable, + return_md, return_content_list) + results.update(mineru_results) + results = [results[file] for file in files] + LOG.info(f'[MINERU SERVER] RETURN RESULTS: {files}') + return JSONResponse(status_code=200, + content={'result': results, 'unique_id': unique_id}) + except Exception as e: + LOG.error(f'[MINERU SERVER] Parse Failed: {str(e)}') + return JSONResponse(status_code=500, + content={'error': f'Failed to process file: {str(e)}'}) + finally: + shutil.rmtree(unique_dir) + + async def _run_mineru(self, files_to_process, unique_dir, backend, lang, # noqa: C901 + parse_method, formula_enable, table_enable, + return_md, return_content_list): + results = {file: {} for file in files_to_process} + + pdf_file_names = [] + pdf_bytes_list = [] + + for file in files_to_process: + pdf_file_name, pdf_byte = self._load_files(Path(file), unique_dir) + pdf_file_names.append(pdf_file_name) + pdf_bytes_list.append(pdf_byte) + + lang_list = [lang] * len(pdf_bytes_list) + + params = dict(output_dir=unique_dir, pdf_file_names=pdf_file_names, + pdf_bytes_list=pdf_bytes_list, p_lang_list=lang_list, backend=backend, + parse_method=parse_method, formula_enable=formula_enable, + table_enable=table_enable, f_draw_layout_bbox=False, f_draw_span_bbox=False, + f_dump_md=True, f_dump_middle_json=False, f_dump_model_output=False, + f_dump_orig_pdf=False, f_dump_content_list=True) + if backend == 'vlm-sglang-engine': + params['mem_fraction_static'] = self._mem_fraction_static + + await aio_do_parse(**params) + + for pdf_name, pdf_path in zip(pdf_file_names, files_to_process): + # Directory output by mineru + if backend.startswith('pipeline'): + parse_dir = os.path.join(unique_dir, pdf_name, parse_method) + else: + parse_dir = os.path.join(unique_dir, pdf_name, 'vlm') + + if os.path.exists(parse_dir): + hash_id = self._file_sha256(pdf_path) + md_content = self._read_parse_result('.md', pdf_name, parse_dir) + content_list = self._read_parse_result('_content_list.json', pdf_name, parse_dir) + + if return_md: + if md_content: + results[pdf_path]['md_content'] = md_content + if return_content_list: + if content_list: + results[pdf_path]['content_list'] = content_list + if self._cache_dir: + self._cache_parse_result(hash_id, results[pdf_path], mode=backend, + table_enable=table_enable, + formula_enable=formula_enable) + + if self._image_save_dir: + source_dir = Path(f'{parse_dir}/images/') + target_dir = Path(self._image_save_dir) + for jpg_file in source_dir.glob('*.jpg'): + shutil.move(str(jpg_file), str(target_dir / jpg_file.name)) + + return results + + async def _resolve_upload_files(self, upload_files: List[UploadFile], unique_dir: str) -> List[str]: + if not upload_files: + return [] + + temp_upload_dir = os.path.join(self._middle_file_dir, f'{unique_dir}/upload') + os.makedirs(temp_upload_dir, exist_ok=True) + file_paths = [] + for upload_file in upload_files: + content = await upload_file.read() + temp_file_path = os.path.join(temp_upload_dir, upload_file.filename) + with open(temp_file_path, 'wb') as f: + f.write(content) + file_paths.append(temp_file_path) + return file_paths + + def _get_func_suffix(self, table_enable, formula_enable): + if table_enable and formula_enable: + return '_a' + elif table_enable: + return '_t' + elif formula_enable: + return '_f' + else: + return '_n' + + def _check_cache(self, files, results, backend, return_md, return_content_list, + table_enable, formula_enable): + if not self._cache_dir: + return results, files + + func_suffix = self._get_func_suffix(table_enable, formula_enable) + func_suffix_map = {'_a': ['_a'], + '_t': ['_t', '_a'], + '_f': ['_f', '_a'], + '_n': ['_n', '_a', '_t', '_f']} + func_suffix_list = func_suffix_map[func_suffix] + + uncached_files = [] + + for file in files: + file_hash = self._file_sha256(file) + valid_hash_ids = [file_hash + func_suffix for func_suffix in func_suffix_list] + result = {} + + file_content_list_found = False + file_md_found = False + + if return_content_list: + for valid_hash in valid_hash_ids: + json_path = os.path.join(self._cache_dir, backend, f'{valid_hash}_content_list.json') + if os.path.isfile(json_path): + with open(json_path, 'r', encoding='utf-8') as f: + result['content_list'] = json.load(f) + file_content_list_found = True + break + + if return_md: + for valid_hash in valid_hash_ids: + md_path = os.path.join(self._cache_dir, backend, f'{valid_hash}.md') + if os.path.isfile(md_path): + with open(md_path, 'r', encoding='utf-8') as f: + result['md_content'] = f.read() + file_md_found = True + break + + results[file].update(result) + + file_cache_complete = True + if return_content_list and not file_content_list_found: + file_cache_complete = False + if return_md and not file_md_found: + file_cache_complete = False + + if not file_cache_complete: + uncached_files.append(file) + + return results, uncached_files + + def _read_parse_result(self, file_suffix_identifier: str, + pdf_name: str, parse_dir: str) -> Optional[Union[str, dict]]: + result_file_path = os.path.join(parse_dir, f'{pdf_name}{file_suffix_identifier}') + if os.path.exists(result_file_path): + try: + if file_suffix_identifier == '.md': + with open(result_file_path, 'r', encoding='utf-8') as fp: + return fp.read() + elif file_suffix_identifier == '_content_list.json': + with open(result_file_path, 'r', encoding='utf-8') as fp: + return json.load(fp) + except Exception: + LOG.error(f'[MINERU SERVER] Failed to read result file {result_file_path}') + return None + return None + + def _cache_parse_result(self, hash_id: str, result: dict, mode: str, + table_enable: bool, formula_enable: bool): + try: + cache_subdir = os.path.join(self._cache_dir, mode) + os.makedirs(cache_subdir, exist_ok=True) + + if table_enable and formula_enable: + func_suffix = '_a' + elif table_enable: + func_suffix = '_t' + elif formula_enable: + func_suffix = '_f' + else: + func_suffix = '_n' + + hash_id += func_suffix + md_content = result.get('md_content', None) + if md_content: + cache_path = os.path.join(cache_subdir, f'{hash_id}.md') + with open(cache_path, 'w', encoding='utf-8') as f: + f.write(md_content) + + content_list = result.get('content_list', None) + if content_list: + cache_path = os.path.join(cache_subdir, f'{hash_id}_content_list.json') + with open(cache_path, 'w', encoding='utf-8') as f: + json.dump(content_list, f, ensure_ascii=False, indent=4) + + except Exception as e: + LOG.error(f'Failed to cache data for {hash_id}: {e}') + + def _load_files(self, file_path: str, unique_dir: str): + suffix = file_path.suffix.lower() + if suffix in pdf_suffixes + image_suffixes + self._supported_office_types: + if suffix in self._supported_office_types: + self._convert_file_to_pdf(file_path, unique_dir) + output_path = os.path.join(unique_dir, file_path.name.replace(suffix, '.pdf')) + file_path = Path(output_path) + try: + pdf_bytes = read_fn(file_path) + return (file_path.stem, pdf_bytes) + except Exception as e: + raise HTTPException(status_code=400, detail=f'File Not Found: {file_path}: {e}') + else: + raise HTTPException(status_code=400, detail=f'Unsupported file type: {file_path.suffix}') + + def _convert_file_to_pdf(self, input_path, output_dir): + if not os.path.isfile(input_path): + raise FileNotFoundError(f'The input file {input_path} does not exist.') + + os.makedirs(output_dir, exist_ok=True) + + cmd = [ + 'libreoffice', + '--headless', + '--norestore', + '--invisible', + '--convert-to', 'pdf', + '--outdir', str(output_dir), + str(input_path) + ] + + process = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + + if process.returncode != 0: + raise Exception(f'LibreOffice convert failed: {process.stderr.decode()}') + + def _file_sha256(self, file_path: str) -> str: + hasher = hashlib.sha256() + with open(file_path, 'rb') as f: + for chunk in iter(lambda: f.read(8192), b''): + hasher.update(chunk) + return hasher.hexdigest() + + +class MineruServer(ServerModule): + def __init__(self, + cache_dir: str = None, + image_save_dir: str = None, + default_backend: str = 'pipeline', + default_lang: str = 'ch_server', + default_parse_method: str = 'auto', + default_formula_enable: bool = True, + default_table_enable: bool = True, + default_return_md: bool = False, + default_return_content_list: bool = True, + *args, **kwargs): + mineru_server = MineruServerBase( + cache_dir=cache_dir, image_save_dir=image_save_dir, default_backend=default_backend, + default_lang=default_lang, default_parse_method=default_parse_method, + default_formula_enable=default_formula_enable, default_table_enable=default_table_enable, + default_return_md=default_return_md, default_return_content_list=default_return_content_list) + super().__init__(mineru_server, *args, **kwargs) diff --git a/lazyllm/components/deploy/ray.py b/lazyllm/components/deploy/ray.py index 02ec43808..e8be6aec4 100644 --- a/lazyllm/components/deploy/ray.py +++ b/lazyllm/components/deploy/ray.py @@ -33,7 +33,7 @@ def reallocate_launcher(launcher): class Distributed(LazyLLMDeployBase): - def __init__(self, launcher=launchers.remote(ngpus=1), port=None): + def __init__(self, launcher=launchers.remote(ngpus=1), port=None): # noqa B008 super().__init__(launcher=launcher) self.port = port or random.randint(30000, 40000) self.finetuned_model = None diff --git a/lazyllm/components/deploy/relay/base.py b/lazyllm/components/deploy/relay/base.py index fd338b8eb..fab892c5b 100644 --- a/lazyllm/components/deploy/relay/base.py +++ b/lazyllm/components/deploy/relay/base.py @@ -14,7 +14,7 @@ class RelayServer(LazyLLMDeployBase): message_format = None def __init__(self, port=None, *, func=None, pre_func=None, post_func=None, - pythonpath=None, log_path=None, cls=None, launcher=launchers.remote(sync=False)): + pythonpath=None, log_path=None, cls=None, launcher=launchers.remote(sync=False)): # noqa B008 # func must dump in __call__ to wait for dependancies. self.func = func self.pre = dump_obj(pre_func) diff --git a/lazyllm/components/deploy/vllm.py b/lazyllm/components/deploy/vllm.py index 8a2c0bced..48220b225 100644 --- a/lazyllm/components/deploy/vllm.py +++ b/lazyllm/components/deploy/vllm.py @@ -41,7 +41,8 @@ class Vllm(LazyLLMDeployBase, metaclass=_VllmStreamParseParametersMeta): optional_keys = set(["max-model-len"]) # TODO(wangzhihong): change default value for `openai_api` argument to True - def __init__(self, trust_remote_code: bool = True, launcher: LazyLLMLaunchersBase = launchers.remote(ngpus=1), + def __init__(self, trust_remote_code: bool = True, + launcher: LazyLLMLaunchersBase = launchers.remote(ngpus=1), # noqa B008 log_path: str = None, openai_api: bool = False, **kw): self.launcher_list, launcher = reallocate_launcher(launcher) super().__init__(launcher=launcher) diff --git a/lazyllm/components/finetune/alpaca-lora/finetune.py b/lazyllm/components/finetune/alpaca-lora/finetune.py index 380f9db23..115f7b450 100755 --- a/lazyllm/components/finetune/alpaca-lora/finetune.py +++ b/lazyllm/components/finetune/alpaca-lora/finetune.py @@ -60,7 +60,7 @@ def train( # noqa C901 # model/data params base_model: str = "", # the only required argument data_path: str = "", - output_dir: str = os.path.abspath("./output_dir"), + output_dir: str = os.path.abspath("./output_dir"), # noqa B008 # training hyperparams batch_size: int = 128, micro_batch_size: int = 4, @@ -73,7 +73,7 @@ def train( # noqa C901 lora_r: int = 8, lora_alpha: int = 16, lora_dropout: float = 0.05, - lora_target_modules: List[str] = [ + lora_target_modules: List[str] = [ # noqa B006 "q_proj", "v_proj", ], @@ -220,7 +220,7 @@ def generate_and_tokenize_prompt(data_point): else: datas.append(load_dataset(data_path)) elif os.path.isdir(data_path): - for root, dirs, files in os.walk(data_path): + for root, _, files in os.walk(data_path): for file in files: if file.endswith(".json") or file.endswith(".jsonl"): file_path = os.path.join(root, file) diff --git a/lazyllm/components/finetune/alpacalora.py b/lazyllm/components/finetune/alpacalora.py index c0571aa59..6d64f73c4 100644 --- a/lazyllm/components/finetune/alpacalora.py +++ b/lazyllm/components/finetune/alpacalora.py @@ -36,7 +36,7 @@ def __init__(self, merge_path=None, model_name='LLM', cp_files='tokeniz*', - launcher=launchers.remote(ngpus=1), + launcher=launchers.remote(ngpus=1), # noqa B008 **kw ): if not merge_path: diff --git a/lazyllm/components/finetune/base.py b/lazyllm/components/finetune/base.py index 78182c50f..616b2d45e 100644 --- a/lazyllm/components/finetune/base.py +++ b/lazyllm/components/finetune/base.py @@ -5,7 +5,7 @@ class LazyLLMFinetuneBase(ComponentBase): __reg_overwrite__ = 'cmd' - def __init__(self, base_model, target_path, *, launcher=launchers.remote()): + def __init__(self, base_model, target_path, *, launcher=launchers.remote()): # noqa B008 super().__init__(launcher=launcher) self.base_model = base_model self.target_path = target_path @@ -20,7 +20,7 @@ def __call__(self, *args, **kw): class DummyFinetune(LazyLLMFinetuneBase): - def __init__(self, base_model='base', target_path='target', *, launcher=launchers.remote(), **kw): + def __init__(self, base_model='base', target_path='target', *, launcher=launchers.remote(), **kw): # noqa B008 super().__init__(base_model, target_path, launcher=launchers.empty) self.kw = kw diff --git a/lazyllm/components/finetune/collie.py b/lazyllm/components/finetune/collie.py index 1659fae9f..bc6b542c3 100644 --- a/lazyllm/components/finetune/collie.py +++ b/lazyllm/components/finetune/collie.py @@ -35,7 +35,7 @@ def __init__(self, merge_path=None, model_name='LLM', cp_files='tokeniz*', - launcher=launchers.remote(ngpus=1), + launcher=launchers.remote(ngpus=1), # noqa B008 **kw ): if not merge_path: diff --git a/lazyllm/components/finetune/flagembedding.py b/lazyllm/components/finetune/flagembedding.py index dada5c9e6..da9e4c22c 100644 --- a/lazyllm/components/finetune/flagembedding.py +++ b/lazyllm/components/finetune/flagembedding.py @@ -62,7 +62,7 @@ def __init__( self, base_model, target_path, - launcher=launchers.remote(ngpus=1, sync=True), + launcher=launchers.remote(ngpus=1, sync=True), # noqa B008 **kw ): model_type = ModelManager.get_model_type(base_model.split('/')[-1]) diff --git a/lazyllm/components/finetune/llamafactory.py b/lazyllm/components/finetune/llamafactory.py index 440db9f56..d58b9a50f 100644 --- a/lazyllm/components/finetune/llamafactory.py +++ b/lazyllm/components/finetune/llamafactory.py @@ -27,7 +27,7 @@ def __init__(self, lora_r=None, modules_to_save=None, lora_target_modules=None, - launcher=launchers.remote(ngpus=1, sync=True), + launcher=launchers.remote(ngpus=1, sync=True), # noqa B008 **kw ): if not os.path.exists(base_model): diff --git a/lazyllm/components/utils/downloader/model_downloader.py b/lazyllm/components/utils/downloader/model_downloader.py index e752ff480..9c7e58390 100644 --- a/lazyllm/components/utils/downloader/model_downloader.py +++ b/lazyllm/components/utils/downloader/model_downloader.py @@ -173,7 +173,7 @@ def _do_download(self, model='', call_back=None): try: return self.hub_downloader.download(model, full_model_dir, call_back) # Use `BaseException` to capture `KeyboardInterrupt` and normal `Exceptioin`. - except BaseException as e: + except BaseException as e: # noqa B036 lazyllm.LOG.warning(f"Download encountered an error: {e}") if not self.token and 'Permission denied' not in str(e): lazyllm.LOG.warning('Token is empty, which may prevent private models from being downloaded, ' diff --git a/lazyllm/configs.py b/lazyllm/configs.py index 1a1366c2e..78896e184 100644 --- a/lazyllm/configs.py +++ b/lazyllm/configs.py @@ -1,7 +1,7 @@ import os from enum import Enum import json -from typing import List, Union +from typing import List, Union, Optional from contextlib import contextmanager import logging @@ -13,7 +13,7 @@ class Mode(Enum): class Config(object): - def __init__(self, prefix='LAZYLLM', home=os.path.join(os.path.expanduser('~'), '.lazyllm')): + def __init__(self, prefix='LAZYLLM', home=os.path.join(os.path.expanduser('~'), '.lazyllm')): # noqa B008 self._config_params = dict() self._env_map_name = dict() self.prefix = prefix @@ -49,7 +49,7 @@ def temp(self, name, value): yield self.impl[name] = old_value - def add(self, name, type, default=None, env=None): + def add(self, name: str, type: type, default: Optional[Union[int, str, bool]] = None, env: Union[str, dict] = None): update_params = (type, default, env) if name not in self._config_params or self._config_params[name] != update_params: if name in self._config_params: @@ -64,7 +64,8 @@ def add(self, name, type, default=None, env=None): self._update_impl(name, type, default, env) return self - def _update_impl(self, name, type, default=None, env=None): + def _update_impl(self, name: str, type: type, default: Optional[Union[int, str, bool]] = None, + env: Union[str, dict] = None): self.impl[name] = self.cfgs.pop(name) if name in self.cfgs else default if isinstance(env, dict): for k, v in env.items(): @@ -78,15 +79,17 @@ def _update_impl(self, name, type, default=None, env=None): def __getitem__(self, name): try: + if isinstance(name, bytes): name = name.decode('utf-8') return self.impl[name] except KeyError: - raise RuntimeError(f'Key {name} is not in lazyllm global config') + raise RuntimeError(f'Key `{name}` is not in lazyllm global config') def __str__(self): return str(self.impl) - def refresh(self, targets: Union[str, List[str]] = None) -> None: + def refresh(self, targets: Union[bytes, str, List[str]] = None) -> None: names = targets + if isinstance(targets, bytes): targets = targets.decode('utf-8') if isinstance(targets, str): names = targets.lower() if names.startswith('lazyllm_'): @@ -97,7 +100,7 @@ def refresh(self, targets: Union[str, List[str]] = None) -> None: names = list(set([self._env_map_name[key] for key in curr_envs if key in self._env_map_name])) assert isinstance(names, list) for name in names: - self._update_impl(name, *self._config_params[name]) + if name in self.impl: self._update_impl(name, *self._config_params[name]) config = Config().add('mode', Mode, Mode.Normal, dict(DISPLAY=Mode.Display, DEBUG=Mode.Debug) ).add('repr_ml', bool, False, 'REPR_USE_ML' diff --git a/lazyllm/docs/common.py b/lazyllm/docs/common.py index d40f029f3..b95476668 100644 --- a/lazyllm/docs/common.py +++ b/lazyllm/docs/common.py @@ -71,6 +71,83 @@ # ... return input # ''') +add_chinese_doc('registry.LazyDict', '''\ +一个为懒惰的程序员设计的特殊字典类。支持多种便捷的访问和操作方式。 + +特性: +1. 使用点号代替['str']访问字典元素 +2. 支持首字母小写来使语句更像函数调用 +3. 当字典只有一个元素时支持直接调用 +4. 支持动态默认键 +5. 如果组名出现在名称中,允许省略组名 + +参数: + name (str): 字典的名称,默认为空字符串。 + base: 基类引用,默认为None。 + *args: 位置参数,传递给dict父类。 + **kw: 关键字参数,传递给dict父类。 +''') + +add_english_doc('registry.LazyDict', '''\ +A special dictionary class designed for lazy programmers. Supports various convenient access and operation methods. + +Features: +1. Use dot notation instead of ['str'] to access dictionary elements +2. Support lowercase first character to make statements more like function calls +3. Support direct calls when dictionary has only one element +4. Support dynamic default keys +5. Allow omitting group name if it appears in the name + +Args: + name (str): Name of the dictionary, defaults to empty string. + base: Base class reference, defaults to None. + *args: Positional arguments passed to dict parent class. + **kw: Keyword arguments passed to dict parent class. +''') + +add_chinese_doc('registry.LazyDict.remove', '''\ +从字典中移除指定的键值对。 + +参数: + key (str): 要移除的键。支持与__getattr__相同的键匹配规则,包括首字母小写和组名省略等特性。 + +注意: + 如果找不到匹配的键,将抛出AttributeError异常。 +''') + +add_english_doc('registry.LazyDict.remove', '''\ +Remove the specified key-value pair from the dictionary. + +Args: + key (str): The key to remove. Supports the same key matching rules as __getattr__, + including lowercase first character and group name omission features. + +Note: + Raises AttributeError if no matching key is found. +''') + +add_chinese_doc('registry.LazyDict.set_default', '''\ +设置字典的默认键。设置后可以通过.default属性访问该键对应的值。 + +参数: + key (str): 要设置为默认的键名。 + +注意: + - key必须是字符串类型 + - 设置后可以通过.default访问,或在字典只有一个元素时直接调用 +''') + +add_english_doc('registry.LazyDict.set_default', '''\ +Set the default key for the dictionary. After setting, the value can be accessed through the .default property. + +Args: + key (str): The key name to set as default. + +Note: + - key must be a string type + - After setting, can be accessed via .default, or called directly when dictionary has only one element +''') + add_chinese_doc('compile_func', ''' 将一段 python 函数字符串编译成一个可执行函数并返回。 @@ -94,6 +171,114 @@ assert identity('hello') == 'hello' ''') +# ============= Threading +# Thread +add_chinese_doc('Thread', '''\ +LazyLLM 提供的增强线程类,继承自 Python 标准库的 `threading.Thread`。此类提供了额外的功能,包括会话ID管理、预钩子函数支持和异常处理机制。 + +Args: + group: 线程组,默认为 ``None`` + target: 要在线程中执行的函数,默认为 ``None`` + name: 线程名称,默认为 ``None`` + args: 传递给目标函数的参数元组,默认为 ``()`` + kwargs: 传递给目标函数的关键字参数字典,默认为 ``None`` + prehook: 在线程执行前要调用的函数或函数列表,默认为 ``None`` + daemon: 是否为守护线程,默认为 ``None`` +''') + +add_english_doc('Thread', '''\ +Enhanced thread class provided by LazyLLM, inheriting from Python's standard library `threading.Thread`. This class provides additional functionality including session ID management, pre-hook function support, and exception handling mechanisms. + +Args: + group: Thread group, default to ``None`` + target: Function to be executed in the thread, default to ``None`` + name: Thread name, default to ``None`` + args: Tuple of arguments to pass to the target function, default to ``()`` + kwargs: Dictionary of keyword arguments to pass to the target function, default to ``None`` + prehook: Function or list of functions to call before thread execution, default to ``None`` + daemon: Whether the thread is a daemon thread, default to ``None`` +''') + +add_example('Thread', '''\ +>>> import lazyllm +>>> from lazyllm.common.threading import Thread +>>> import time +>>> def simple_task(name): +... time.sleep(0.1) +... return f"Hello from {name}" +>>> thread = Thread(target=simple_task, args=("Worker",)) +>>> thread.start() +>>> result = thread.get_result() +>>> print(result) +Hello from Worker +>>> def setup_environment(): +... print("Setting up environment...") +... return "environment_ready" +>>> def validate_input(data): +... print(f"Validating input: {data}") +... if not isinstance(data, (int, float)): +... raise ValueError("Input must be numeric") +>>> def process_data(data): +... print(f"Processing data: {data}") +... time.sleep(0.1) +... return data * 2 +>>> thread = Thread( +... target=process_data, +... args=(42,), +... prehook=[setup_environment, lambda: validate_input(42)] +... ) +>>> thread.start() +Setting up environment... +Validating input: 42 +Processing data: 42 +>>> result = thread.get_result() +>>> print(f"Final result: {result}") +Final result: 84 +''') + +# Thread.work +add_chinese_doc('Thread.work', '''\ +线程的核心工作方法,负责执行预钩子函数、目标函数,并处理异常和结果。 + +Args: + prehook: 预钩子函数列表,在线程执行前调用 + target: 要执行的目标函数 + args: 传递给目标函数的参数 + **kw: 传递给目标函数的关键字参数 + +**注意**: 此方法由 `Thread` 类内部调用,用户通常不需要直接调用此方法。 +''') + +add_english_doc('Thread.work', '''\ +Core working method of the thread, responsible for executing pre-hook functions, target function, and handling exceptions and results. + +Args: + prehook: List of pre-hook functions to call before thread execution + target: Target function to execute + args: Arguments to pass to the target function + **kw: Keyword arguments to pass to the target function + +**Note**: This method is called internally by the `Thread` class, users typically don't need to call this method directly. +''') + +# Thread.get_result +add_chinese_doc('Thread.get_result', '''\ +获取线程执行结果的方法。此方法会阻塞直到线程执行完成,然后返回执行结果或重新抛出异常。 + +**Returns:**\n +- 线程执行的结果。如果目标函数正常执行,返回其返回值;如果发生异常,会重新抛出该异常。 + +**注意**: 此方法应该在调用 `thread.start()` 之后使用,用于获取线程的执行结果。 +''') + +add_english_doc('Thread.get_result', '''\ +Method to retrieve the thread execution result. This method blocks until the thread execution is complete, then returns the execution result or re-raises the exception. + +**Returns:**\n +- The result of thread execution. If the target function executes normally, returns its return value; if an exception occurs, re-raises that exception. + +**Note**: This method should be used after calling `thread.start()` to retrieve the thread execution result. +''') # ============= Bind/bind add_chinese_doc('bind', '''\ Bind 类用于函数绑定与延迟调用,支持动态参数传入和上下文参数解析,实现灵活的函数组合与流水线式调用。 @@ -110,6 +295,70 @@ **kw: 绑定时固定的关键字参数,可以包含占位符。 ''') +add_chinese_doc('common.CaseInsensitiveDict', '''\ +大小写不敏感的字典类。 + +CaseInsensitiveDict 继承自 dict,提供大小写不敏感的键值存储和检索功能。所有的键都会被转换为小写形式存储,确保无论使用大写、小写或混合大小写的键名都能访问到相同的值。 + +特点: + - 所有键在存储时自动转换为小写 + - 支持标准的字典操作(获取、设置、检查包含关系) + - 保持字典的原有功能,只是键名处理方式不同 + +Args: + *args: 传递给父类 dict 的位置参数 + **kwargs: 传递给父类 dict 的关键字参数 +''') + +add_english_doc('common.CaseInsensitiveDict', '''\ +Case-insensitive dictionary class. + +CaseInsensitiveDict inherits from dict and provides case-insensitive key-value storage and retrieval. All keys are converted to lowercase when stored, ensuring that values can be accessed regardless of whether the key name is uppercase, lowercase, or mixed case. + +Features: + - All keys are automatically converted to lowercase when stored + - Supports standard dictionary operations (get, set, check containment) + - Maintains all original dict functionality, only differs in key name handling + +Args: + *args: Positional arguments passed to the parent dict class + **kwargs: Keyword arguments passed to the parent dict class +''') + +add_example('common.CaseInsensitiveDict', '''\ +>>> from lazyllm.common import CaseInsensitiveDict +>>> # 创建大小写不敏感的字典 +>>> d = CaseInsensitiveDict({'Name': 'John', 'AGE': 25, 'City': 'New York'}) +>>> +>>> # 使用不同大小写访问相同的键 +>>> print(d['name']) # 使用小写 +... 'John' +>>> print(d['NAME']) # 使用大写 +... 'John' +>>> print(d['Name']) # 使用首字母大写 +... 'John' +>>> +>>> # 设置值时也会转换为小写 +>>> d['EMAIL'] = 'john@example.com' +>>> print(d['email']) # 使用小写访问 +... 'john@example.com' +>>> +>>> # 检查键是否存在(大小写不敏感) +>>> 'AGE' in d +True +>>> 'age' in d +True +>>> 'Age' in d +True +>>> +>>> # 支持标准字典操作 +>>> d['PHONE'] = '123-456-7890' +>>> print(d.get('phone')) +... '123-456-7890' +>>> print(len(d)) +... 5 +''') + add_english_doc('bind', '''\ The Bind class provides function binding and deferred invocation capabilities, supporting dynamic argument passing and context-based argument resolution for flexible function composition and pipeline-style calls. @@ -362,4 +611,457 @@ 0 >>> queue.peek() is None True -""") \ No newline at end of file +""") + + +add_chinese_doc('common.ResultCollector', '''\ +结果收集器,用于在流程或任务执行过程中按名称存储和访问结果。 +它通过调用自身(传入 name)返回一个可调用的 Impl 对象来收集指定名称的结果。 +适用于需要跨步骤共享中间结果的场景。 +''') + +add_english_doc('common.ResultCollector', '''\ +A result collector used to store and access results by name during the execution of a flow or task. +Calling the instance with a name returns a callable Impl object that collects results for that name. +Useful for scenarios where intermediate results need to be shared across steps. +''') +add_chinese_doc('common.ResultCollector.Impl', '''\ +ResultCollector 的内部实现类,负责为指定名称收集结果。 +不应直接实例化,需通过 ResultCollector(name) 获取。 + +Args: + name (str): 结果名称。 + value (dict): 存储结果的字典引用。 +''') + +add_english_doc('common.ResultCollector.Impl', '''\ +Internal implementation class of ResultCollector, responsible for collecting results for a given name. +Should not be instantiated directly; obtain via ResultCollector(name). + +Args: + name (str): The result name. + value (dict): A reference to the dictionary where results are stored. +''') + + +add_chinese_doc('common.ResultCollector.keys', '''\ +获取所有已存储结果的名称。 + +**Returns**\n +- KeysView[str]: 结果名称集合。 +''') + +add_english_doc('common.ResultCollector.keys', '''\ +Get all stored result names. + +**Returns**\n +- KeysView[str]: A set-like object containing result names. +''') + +add_chinese_doc('common.ResultCollector.items', '''\ +获取所有已存储的 (名称, 值) 对。 + +**Returns**\n +- ItemsView[str, Any]: 结果的键值对集合。 +''') + +add_english_doc('common.ResultCollector.items', '''\ +Get all stored (name, value) pairs. + +**Returns**\n +- ItemsView[str, Any]: A set-like object containing name-value pairs of results. +''') + +add_chinese_doc('common.EnvVarContextManager', '''\ +环境变量上下文管理器,用于 在代码块执行期间临时设置环境变量,退出时自动恢复原始环境变量。 + +Args: + env_vars_dict (dict): 需要临时设置的环境变量字典,值为 None 的变量将被忽略。 +''') + +add_english_doc('common.EnvVarContextManager', '''\ +Environment variable context manager used to temporarily set environment variables during the execution of a code block, automatically restoring original environment variables upon exit. + +Args: + env_vars_dict (dict): Dictionary of environment variables to temporarily set; variables with None values are ignored. +''') + +add_chinese_doc('ReadOnlyWrapper', '''\ +一个轻量级只读包装器,用于包裹任意对象并对外提供只读访问(实际并未完全禁止修改,但复制时不会携带原始对象)。包装器可以动态替换内部对象,并提供判断对象是否为空的辅助方法。 +Args: + obj (Optional[Any]): 初始被包装的对象,默认为 None。 +''') + +add_english_doc('ReadOnlyWrapper', '''\ +A lightweight read-only wrapper that holds an arbitrary object and exposes its attributes. It supports swapping the internal object dynamically and provides utility for checking emptiness. Note: it does not enforce deep immutability, but deepcopy drops the wrapped object. +Args: + obj (Optional[Any]): The initial wrapped object, defaults to None. +''') + +add_chinese_doc('ReadOnlyWrapper.set', '''\ +替换当前包装的内部对象。 + +Args: + obj (Any): 新的内部对象。 +''') + +add_english_doc('ReadOnlyWrapper.set', '''\ +Replace the currently wrapped internal object. + +Args: + obj (Any): New object to wrap. +''') + +add_chinese_doc('ReadOnlyWrapper.isNone', '''\ +检查当前包装器是否未持有任何对象。 + +Args: + None. + +**Returns**\n +- bool: 如果内部对象为 None 返回 True,否则 False。 +''') + +add_english_doc('ReadOnlyWrapper.isNone', '''\ +Check whether the wrapper currently holds no object. + +Args: + None. + +**Returns**\n +- bool: True if the internal object is None, otherwise False. +''') + +add_chinese_doc('queue.RedisQueue', '''\ +基于 Redis 实现的文件系统队列(继承自 FileSystemQueue),用于跨进程/节点的消息传递与队列管理。内部使用指定的 redis_url 初始化并管理底层存储,同时提供线程安全的初始化逻辑。 + +Args: + klass (str): 队列的分类名称,用于区分不同队列实例,默认值为 '__default__'。 +''') + +add_english_doc('queue.RedisQueue', '''\ +Redis-backed file system queue (inherits from FileSystemQueue) for cross-process/node message passing and queue management. It initializes its underlying storage using a configured Redis URL and employs thread-safe setup logic. + +Args: + klass (str): Classification name for the queue instance to distinguish different queues. Defaults to '__default__'. +''') + + +add_chinese_doc('Identity', '''\ +恒等模块,用于直接返回输入值。 + +该模块常用于模块拼接结构中占位,无实际处理逻辑。若输入为多个参数,将自动打包为一个整体结构输出。 + +Args: + *args: 可选的位置参数,占位用。 + **kw: 可选的关键字参数,占位用。 +''') + +add_english_doc('Identity', '''\ +Identity module that directly returns the input as output. + +This module serves as a no-op placeholder in composition pipelines. If multiple inputs are provided, they are packed together before returning. + +Args: + *args: Optional positional arguments for placeholder compatibility. + **kw: Optional keyword arguments for placeholder compatibility. +''') + + + +add_chinese_doc('ProcessPoolExecutor.submit', '''\ +将任务提交到进程池中执行。 + +此方法将一个函数及其参数序列化后提交到进程池中执行,返回一个 `Future` 对象,用于获取任务执行结果或状态。 + +Args: + fn (Callable): 要执行的函数。 + *args: 传递给函数的位置参数。 + **kwargs: 传递给函数的关键字参数。 + +Returns: + concurrent.futures.Future: 表示任务执行状态的 `Future` 对象。 +''') + +add_english_doc('ProcessPoolExecutor.submit', '''\ +Submit a task to the process pool for execution. + +This method serializes a function and its arguments, then submits them to the process pool for execution. It returns a `Future` object to track the task's status or result. + +Args: + fn (Callable): The function to execute. + *args: Positional arguments passed to the function. + **kwargs: Keyword arguments passed to the function. + +Returns: + concurrent.futures.Future: A `Future` object representing the task's execution status. +''') + +add_example('ProcessPoolExecutor.submit', '''\ +>>> from lazyllm.common.multiprocessing import ProcessPoolExecutor +>>> import time +>>> +>>> def task(x): +... time.sleep(1) +... return x * 2 +... +>>> with ProcessPoolExecutor(max_workers=2) as executor: +... future = executor.submit(task, 5) +... result = future.result() +... print(result) +10 +''') + + +# ============= Multiprocessing +# ForkProcess +add_chinese_doc('ForkProcess', '''\ +LazyLLM 提供的增强进程类,继承自 Python 标准库的 `multiprocessing.Process`。此类专门使用 fork 启动方法来创建子进程,并提供了同步/异步执行模式的支持。 + +Args: + group: 进程组,默认为 ``None`` + target: 要在进程中执行的函数,默认为 ``None`` + name: 进程名称,默认为 ``None`` + args: 传递给目标函数的参数元组,默认为 ``()`` + kwargs: 传递给目标函数的关键字参数字典,默认为 ``{}`` + daemon: 是否为守护进程,默认为 ``None`` + sync: 是否为同步模式,默认为 ``True``。在同步模式下,进程执行完目标函数后会自动退出;在异步模式下,进程会持续运行直到被手动终止。 + +**注意**: 此类主要用于 LazyLLM 内部的进程管理,特别是在需要长期运行的服务器进程中。 +''') + +add_english_doc('ForkProcess', '''\ +Enhanced process class provided by LazyLLM, inheriting from Python's standard library `multiprocessing.Process`. This class specifically uses the fork start method to create child processes and provides support for synchronous/asynchronous execution modes. + +Args: + group: Process group, default to ``None`` + target: Function to be executed in the process, default to ``None`` + name: Process name, default to ``None`` + args: Tuple of arguments to pass to the target function, default to ``()`` + kwargs: Dictionary of keyword arguments to pass to the target function, default to ``{}`` + daemon: Whether the process is a daemon process, default to ``None`` + sync: Whether to use synchronous mode, default to ``True``. In synchronous mode, the process automatically exits after executing the target function; in asynchronous mode, the process continues running until manually terminated. + +**Note**: This class is primarily used for LazyLLM's internal process management, especially in long-running server processes. +''') + +add_example('ForkProcess', '''\ +>>> import lazyllm +>>> from lazyllm.common import ForkProcess +>>> import time +>>> import os +>>> def simple_task(task_id): +... print(f"Process {os.getpid()} executing task {task_id}") +... time.sleep(0.1) +... return f"Task {task_id} completed by process {os.getpid()}" +>>> process = ForkProcess(target=simple_task, args=(1,), sync=True) +>>> process.start() +Process 12345 executing task 1 +''') + +# ForkProcess.work +add_chinese_doc('ForkProcess.work', '''\ +ForkProcess 的核心工作方法,负责包装目标函数并处理同步/异步执行逻辑。 + +Args: + f: 要执行的目标函数 + sync: 是否为同步模式。在同步模式下,执行完目标函数后进程会退出;在异步模式下,进程会持续运行。 +''') + +add_english_doc('ForkProcess.work', '''\ +Core working method of ForkProcess, responsible for wrapping the target function and handling synchronous/asynchronous execution logic. + +Args: + f: Target function to execute + sync: Whether to use synchronous mode. In synchronous mode, the process exits after executing the target function; in asynchronous mode, the process continues running. +''') + +# ForkProcess.start +add_chinese_doc('ForkProcess.start', '''\ +启动 ForkProcess 进程。此方法会使用 fork 启动方法来创建子进程,并开始执行目标函数。 + +此方法的特点: + +- **Fork 启动**: 使用 fork 方法创建子进程,在 Unix/Linux 系统上提供更好的性能 +- **上下文管理**: 自动管理进程启动方法的上下文,确保使用正确的启动方式 +- **继承父类**: 继承自 `multiprocessing.Process.start()` 的所有功能 + +**注意**: 此方法会实际创建新的进程并开始执行,调用后进程会立即开始运行。 + +''') + +add_english_doc('ForkProcess.start', '''\ +Start the ForkProcess. This method uses the fork start method to create a child process and begin executing the target function. + +Features of this method: + +- **Fork Start**: Uses fork method to create child processes, providing better performance on Unix/Linux systems +- **Context Management**: Automatically manages the context of process start methods, ensuring the correct start method is used +- **Parent Inheritance**: Inherits all functionality from `multiprocessing.Process.start()` + +**Note**: This method actually creates a new process and begins execution, the process starts running immediately after calling. + +''') + +# ============= Options +# Option +add_chinese_doc('Option', '''\ +LazyLLM 提供的选项管理类,用于管理多个选项值并在它们之间进行迭代。此类主要用于参数网格搜索和超参数调优场景。 + +Args: + *obj: 一个或多个选项值,可以是任意类型的对象。如果传入单个列表或元组,会自动展开。 + +此类的主要特性: + +- **多选项管理**: 可以管理多个不同的选项值 +- **迭代支持**: 支持标准的 Python 迭代协议,可以遍历所有选项 +- **当前值访问**: 始终可以访问当前选中的选项值 +- **深度复制**: 支持深度复制当前选中的选项值 +- **多进程兼容**: 支持在多进程环境中使用 + +**注意**: 此类主要用于 LazyLLM 内部的参数搜索和试验管理,特别是在 TrialModule 中进行参数网格搜索时。 + +''') + +add_english_doc('Option', '''\ +Option management class provided by LazyLLM, used for managing multiple option values and iterating between them. This class is primarily used for parameter grid search and hyperparameter tuning scenarios. + +Args: + *obj: One or more option values, which can be objects of any type. If a single list or tuple is passed, it will be automatically expanded. + +Key features of this class: + +- **Multi-option Management**: Can manage multiple different option values +- **Iteration Support**: Supports standard Python iteration protocol, can iterate through all options +- **Current Value Access**: Always can access the currently selected option value +- **Deep Copy**: Supports deep copying of the currently selected option value +- **Multi-process Compatibility**: Supports usage in multi-process environments + +**Note**: This class is primarily used for LazyLLM's internal parameter search and trial management, especially in TrialModule for parameter grid search. + +''') + +add_example('Option', '''\ +>>> import lazyllm +>>> from lazyllm.common.option import Option +>>> learning_rates = Option(0.001, 0.01, 0.1) +>>> print(f"当前学习率: {learning_rates}") +当前学习率: