scripts: add HIP->FlyDSL multi-agent port orchestrator#666
Open
fsx950223 wants to merge 8 commits into
Open
Conversation
Adds port_hip_to_flydsl_agent.py: an Anthropic-API multi-agent loop (analyze -> implement -> test-author -> evaluate) that ports a HIP kernel to FlyDSL, driven by a single natural-language prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new Python CLI under scripts/ that orchestrates a multi-agent Anthropic Messages API workflow to port HIP kernels to FlyDSL, including optional test generation and iterative evaluation with artifact capture (IR dumps, perf notes, traces).
Changes:
- Introduces
scripts/port_hip_to_flydsl_agent.py, implementing the end-to-end analyzer → implementer → (optional) test-author → evaluator loop with local tool execution. - Adds HIP source fetching support (local path, URL download, git/GitHub clone) and structured prompt→config parsing via a dedicated “task parser” agent.
- Records iteration artifacts (plan, per-run performance markdown, IR dump directory) and enforces a local
COMPILE_ONLYsmoke gate before evaluation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1165
to
+1181
| test_reference = str(Path(fields["test_reference"]).resolve()) if fields["test_reference"] else "" | ||
|
|
||
| eval_mode = detect_eval_mode(fields["eval_mode"]) | ||
| if fields["ssh_host"] and fields["eval_mode"] == "auto": | ||
| eval_mode = "gpu" # a remote GPU is configured; don't fall back to compile_ir | ||
| print(f"Evaluation mode: {eval_mode}{' (remote GPU)' if fields['ssh_host'] else ''}") | ||
| if eval_mode == "gpu" and not trace_skill.exists(): | ||
| print(f"WARNING: capture-kernel-trace skill not found at {trace_skill}; " | ||
| "the evaluator will record trace as unavailable.") | ||
|
|
||
| return Config( | ||
| hip_source=hip_source, | ||
| hip_root=hip_root, | ||
| kernel_name=fields["kernel_name"], | ||
| output=output, | ||
| test_file=Path(fields["test_file"]).resolve() if fields["test_file"] else None, | ||
| repo_root=repo_root, |
…col) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
General task context now reaches all agents; test-construction stays isolated via the explicit per-agent guards + test_reference field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
|
We don't need this script? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
scripts/port_hip_to_flydsl_agent.py— an Anthropic-API multi-agentorchestrator that ports a HIP kernel to FlyDSL from a single natural-language
prompt. A parser agent turns the request into structured config; then a loop of
specialized agents runs until the port is accepted or max iterations is hit:
max): reads the HIP source + deps, plans an LLVM-1:1-aligned FlyDSL port; on later iterations diffs FlyDSL↔HIP LLVM IR and the captured kernel trace to refine the plan.xhigh): writes the FlyDSL kernel, then must pass a localCOMPILE_ONLYgate (a self-written smoke harness; failures are fed back and retried) before going further.medium): runs accuracy + performance + ATT trace + real device-IR export on GPU, records everything to a per-runperformance_<hash>.md, and emits a structured verdict.Supports local path / URL / git / GitHub-blob HIP sources (clones with deps),
and prompt-driven remote-GPU execution (ssh/srun/podman) for environments where
the test GPU is remote.
Validation —
gemm1_a4w4(aiter MXFP4 MoE GEMM-1)The tool ported the aiter MXFP4 MoE GEMM-1 kernel to FlyDSL and validated it
bit-exact against the aiter/HIP gemm1 golden (e8m0 scale bytes + packed-fp4
nibbles match exactly), with 1:1 LLVM intrinsic alignment (e.g.
mfma.scale.f32.16x16x128.f8f6f4449=449,buffer.load.lds33=33, barriers344=344). Real KIMI MXFP4 inputs/layouts, measured on MI355 (gfx950) via
rocprofv3
--kernel-trace --stats(per-dispatch End−Start):Notes: both kernels produce identical correct output. The HIP launcher
dispatches a MAX_M-derived grid where most workgroups early-exit, while FlyDSL
dispatches the exact needed grid — so part of the speedup is launching fewer
idle workgroups. FlyDSL uses VGPR=72 vs HIP=64 (same SGPR=112, LDS=32 KB). Only
M=16/64 measured.
🤖 Generated with Claude Code