Skip to content

scripts: add HIP->FlyDSL multi-agent port orchestrator#666

Open
fsx950223 wants to merge 8 commits into
mainfrom
add-port-hip-to-flydsl-agent
Open

scripts: add HIP->FlyDSL multi-agent port orchestrator#666
fsx950223 wants to merge 8 commits into
mainfrom
add-port-hip-to-flydsl-agent

Conversation

@fsx950223

Copy link
Copy Markdown
Contributor

Summary

Adds scripts/port_hip_to_flydsl_agent.py — an Anthropic-API multi-agent
orchestrator that ports a HIP kernel to FlyDSL from a single natural-language
prompt. A parser agent turns the request into structured config; then a loop of
specialized agents runs until the port is accepted or max iterations is hit:

  • Analyzer (effort max): reads the HIP source + deps, plans an LLVM-1:1-aligned FlyDSL port; on later iterations diffs FlyDSL↔HIP LLVM IR and the captured kernel trace to refine the plan.
  • Implementer (effort xhigh): writes the FlyDSL kernel, then must pass a local COMPILE_ONLY gate (a self-written smoke harness; failures are fed back and retried) before going further.
  • Test author: if no test is given, generates a numerical-correctness pytest using the HIP/aiter kernel as the golden (can build the test from a reference harness).
  • Evaluator (effort medium): runs accuracy + performance + ATT trace + real device-IR export on GPU, records everything to a per-run performance_<hash>.md, and emits a structured verdict.

Supports local path / URL / git / GitHub-blob HIP sources (clones with deps),
and prompt-driven remote-GPU execution (ssh/srun/podman) for environments where
the test GPU is remote.

Validation — gemm1_a4w4 (aiter MXFP4 MoE GEMM-1)

The tool ported the aiter MXFP4 MoE GEMM-1 kernel to FlyDSL and validated it
bit-exact against the aiter/HIP gemm1 golden (e8m0 scale bytes + packed-fp4
nibbles match exactly), with 1:1 LLVM intrinsic alignment (e.g.
mfma.scale.f32.16x16x128.f8f6f4 449=449, buffer.load.lds 33=33, barriers
344=344). Real KIMI MXFP4 inputs/layouts, measured on MI355 (gfx950) via
rocprofv3 --kernel-trace --stats (per-dispatch End−Start):

M FlyDSL HIP/aiter ratio (fly/hip)
16 69.0 µs 87.6 µs 0.79 (~21% faster)
64 150.3 µs 163.8 µs 0.92 (~8% faster)

Notes: both kernels produce identical correct output. The HIP launcher
dispatches a MAX_M-derived grid where most workgroups early-exit, while FlyDSL
dispatches the exact needed grid — so part of the speedup is launching fewer
idle workgroups. FlyDSL uses VGPR=72 vs HIP=64 (same SGPR=112, LDS=32 KB). Only
M=16/64 measured.

🤖 Generated with Claude Code

Adds port_hip_to_flydsl_agent.py: an Anthropic-API multi-agent loop
(analyze -> implement -> test-author -> evaluate) that ports a HIP kernel
to FlyDSL, driven by a single natural-language prompt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 08:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Python CLI under scripts/ that orchestrates a multi-agent Anthropic Messages API workflow to port HIP kernels to FlyDSL, including optional test generation and iterative evaluation with artifact capture (IR dumps, perf notes, traces).

Changes:

  • Introduces scripts/port_hip_to_flydsl_agent.py, implementing the end-to-end analyzer → implementer → (optional) test-author → evaluator loop with local tool execution.
  • Adds HIP source fetching support (local path, URL download, git/GitHub clone) and structured prompt→config parsing via a dedicated “task parser” agent.
  • Records iteration artifacts (plan, per-run performance markdown, IR dump directory) and enforces a local COMPILE_ONLY smoke gate before evaluation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/port_hip_to_flydsl_agent.py Outdated
Comment thread scripts/port_hip_to_flydsl_agent.py
Comment on lines +1165 to +1181
test_reference = str(Path(fields["test_reference"]).resolve()) if fields["test_reference"] else ""

eval_mode = detect_eval_mode(fields["eval_mode"])
if fields["ssh_host"] and fields["eval_mode"] == "auto":
eval_mode = "gpu" # a remote GPU is configured; don't fall back to compile_ir
print(f"Evaluation mode: {eval_mode}{' (remote GPU)' if fields['ssh_host'] else ''}")
if eval_mode == "gpu" and not trace_skill.exists():
print(f"WARNING: capture-kernel-trace skill not found at {trace_skill}; "
"the evaluator will record trace as unavailable.")

return Config(
hip_source=hip_source,
hip_root=hip_root,
kernel_name=fields["kernel_name"],
output=output,
test_file=Path(fields["test_file"]).resolve() if fields["test_file"] else None,
repo_root=repo_root,
Comment thread scripts/port_hip_to_flydsl_agent.py
Comment thread scripts/port_hip_to_flydsl_agent.py
fsx950223 and others added 7 commits June 8, 2026 08:23
…col)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
General task context now reaches all agents; test-construction stays
isolated via the explicit per-agent guards + test_reference field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderfeli

Copy link
Copy Markdown
Collaborator

We don't need this script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants