Skip to content

[XPU] add fetch req log#8070

Open
cmcamdy wants to merge 1 commit into
PaddlePaddle:developfrom
cmcamdy:update_log
Open

[XPU] add fetch req log#8070
cmcamdy wants to merge 1 commit into
PaddlePaddle:developfrom
cmcamdy:update_log

Conversation

@cmcamdy

@cmcamdy cmcamdy commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Motivation

需要在 PD disaggregation 场景下定位 prefill 和 decode 之间的请求传递问题,通过环境变量控制是否输出请求调试日志。

Modifications

  • fastdeploy/envs.py 中新增 FD_PD_LOG_REQUEST 环境变量,默认关闭。
  • fastdeploy/engine/common_engine.py 的 prefill 发送请求和 decode 接收请求路径增加 PD 请求日志。
  • 日志内容需脱敏为请求摘要,避免记录原始 prompt、messages、token ids 或多模态数据。

Usage or Command

设置 FD_PD_LOG_REQUEST=1 可启用 PD 请求日志;默认 0 关闭。

Accuracy Tests

N/A(仅新增日志开关和日志输出,不改变模型计算结果。)

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@4653221). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/common_engine.py 0.00% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8070   +/-   ##
==========================================
  Coverage           ?   67.50%           
==========================================
  Files              ?      475           
  Lines              ?    66885           
  Branches           ?    10315           
==========================================
  Hits               ?    45154           
  Misses             ?    18860           
  Partials           ?     2871           
Flag Coverage Δ
GPU 77.53% <0.00%> (?)
XPU 6.95% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-23 18:56:43

📋 Review 摘要

PR 概述:新增 FD_PD_LOG_REQUEST 开关,在 PD 分离的 prefill 发送和 decode 接收路径打印请求日志。
变更范围fastdeploy/engine/common_engine.pyfastdeploy/envs.py
影响面 Tag[Engine] [PD Disaggregation]

问题

级别 文件 概述
🔴 安全 fastdeploy/engine/common_engine.py:985 prefill 连续资源分支将完整用户请求写入 INFO 日志
🔴 安全 fastdeploy/engine/common_engine.py:1017 prefill 非连续资源分支将完整用户请求写入 INFO 日志
🔴 安全 fastdeploy/engine/common_engine.py:2103 decode 接收分支将完整用户请求写入 INFO 日志

📝 PR 规范检查

标题当前使用 [XPU],但 diff 主要修改 PD 分离/Engine 请求日志;PR 描述仍是模板占位内容,未填写实际 Motivation/Modifications/Usage/Accuracy Tests。

标题建议(可直接复制):

  • [PD Disaggregation] Add gated PD request logging
PR 描述建议(点击展开,可直接复制)
## Motivation
需要在 PD disaggregation 场景下定位 prefill 和 decode 之间的请求传递问题,通过环境变量控制是否输出请求调试日志。

## Modifications
-`fastdeploy/envs.py` 中新增 `FD_PD_LOG_REQUEST` 环境变量,默认关闭。
-`fastdeploy/engine/common_engine.py` 的 prefill 发送请求和 decode 接收请求路径增加 PD 请求日志。
- 日志内容需脱敏为请求摘要,避免记录原始 prompt、messages、token ids 或多模态数据。

## Usage or Command
设置 `FD_PD_LOG_REQUEST=1` 可启用 PD 请求日志;默认 `0` 关闭。

## Accuracy Tests
N/A(仅新增日志开关和日志输出,不改变模型计算结果。)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

新增开关默认关闭,但当前实现一旦开启会把完整用户请求内容落到 INFO 日志,存在敏感信息泄漏风险。建议先改为脱敏摘要日志后再合入。

)
task.metrics.ask_decode_resource_start_time = time.time()
if envs.FD_PD_LOG_REQUEST:
self.llm_logger.info(f"[PD_LOG] P sends Request: {task.to_dict()}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 安全 这里会把完整用户请求写入 INFO 日志。

Request.to_dict() 包含 promptprompt_token_idsmessagessystemhistorytoolsmultimodal_data 等用户输入字段;一旦 FD_PD_LOG_REQUEST=1,PD 请求的原始内容会进入持久化日志,命中仓库 checklist 的日志泄漏项。

建议修复方式:
改为专用的脱敏/摘要结构,只输出 request_idprompt_token_ids_lenidxnum_computed_tokens、必要的 block/resource 元数据;不要调用 task.to_dict() 直接落日志。

)
task.metrics.ask_decode_resource_start_time = time.time()
if envs.FD_PD_LOG_REQUEST:
self.llm_logger.info(f"[PD_LOG] P sends Request: {task.to_dict()}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 安全 这里会把完整用户请求写入 INFO 日志。

Request.to_dict() 包含 promptprompt_token_idsmessagessystemhistorytoolsmultimodal_data 等用户输入字段;一旦 FD_PD_LOG_REQUEST=1,PD 请求的原始内容会进入持久化日志,命中仓库 checklist 的日志泄漏项。

建议修复方式:
改为专用的脱敏/摘要结构,只输出 request_idprompt_token_ids_lenidxnum_computed_tokens、必要的 block/resource 元数据;不要调用 task.to_dict() 直接落日志。

)
for task in tasks:
if envs.FD_PD_LOG_REQUEST:
self.llm_logger.info(f"[PD_LOG] D received Request: {task.to_dict()}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 安全 这里会把完整用户请求写入 INFO 日志。

Request.to_dict() 包含 promptprompt_token_idsmessagessystemhistorytoolsmultimodal_data 等用户输入字段;一旦 FD_PD_LOG_REQUEST=1,PD 请求的原始内容会进入持久化日志,命中仓库 checklist 的日志泄漏项。

建议修复方式:
改为专用的脱敏/摘要结构,只输出 request_idprompt_token_ids_lenidxnum_computed_tokens、必要的 block/resource 元数据;不要调用 task.to_dict() 直接落日志。

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 23, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-25 20:22:15

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 0c8e80d | Merge base: 4653221 (branch: develop)


1 Required任务 : 8/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 36 5 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题 Job
Approval 需要 Approval Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例: 差异覆盖率检查

用例 错误摘要
fastdeploy/engine/common_engine.py:985, :1017, :2103 新增 PD 请求日志分支未被差异覆盖,diff coverage 为 50%,低于 80% 阈值

关键日志:

failed_steps: Verify Code Coverage Threshold (80%)
TEST_EXIT_CODE: 0
COVERAGE_EXIT_CODE: 9
violations: [[985, null], [1017, null], [2103, null]]
total_num_lines: 6
total_num_violations: 3
total_percent_covered: 50
Process completed with exit code 9.
  • 根因摘要: 新增 PD 请求日志行缺少差异覆盖

PR 新增了 FD_PD_LOG_REQUEST 开关下的 3 条日志输出,分别位于 fastdeploy/engine/common_engine.py:985:1017:2103。单元测试步骤本身通过,但覆盖率校验发现这些新增可执行行未被覆盖,导致差异覆盖率只有 50%,触发 80% 阈值失败。

修复建议:

  1. FD_PD_LOG_REQUEST=1 场景补充或扩展测试,覆盖 prefill 发送请求的两条路径以及 decode 接收请求路径,确保新增日志分支被执行。
  2. 如果这些日志行按项目规范不要求覆盖,应使用仓库认可的覆盖率忽略方式处理,但优先建议补测试验证开关行为。

关联变更: fastdeploy/engine/common_engine.py:984:1016:2102fastdeploy/envs.py:273

🔴 Approval — 需要 Approval(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

  • 根因摘要: Workflow 等待人工审批

修复建议:

  1. 请通过人工审批。

关联变更: 无代码关联

@hong19860320 hong19860320 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants