Add comprehensive logging with IRIS_LOG_LEVEL support#522
Merged
Conversation
Instrument 17 files with ~30 log lines covering CCL ops, symmetric heap, allocators, distributed helpers, HIP platform, kernel launches, and fused ops. Two-tier logging: DEBUG for entry-point tracing, INFO for lifecycle events (init, peer access, IPC). Enhanced log format with timestamp, level, module name, and rank. Add IRIS_LOG_LEVEL env var to control level at import time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_log_rank() now captures the caller's filename via sys._getframe so the formatter can show [module]. User-facing ctx.info()/ctx.debug() calls go through _log_with_rank(pathname="") so the module bracket is omitted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The distributed_allgather, distributed_broadcast_tensor, and distributed_barrier functions were calling _log_rank without rank= and num_ranks= kwargs, causing log lines to show ?/? instead of the actual rank info. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds rank-aware, env-configurable logging across Iris distributed ops and runtime to improve debuggability while keeping default output low-noise.
Changes:
- Introduces
IRIS_LOG_LEVELenv override and_log_rank()helper for rank-aware internal logging. - Updates
IrisFormatterto include timestamps/levels/rank and module information. - Instruments CCL ops, matmul collectives, allocators/symmetric heap, kernel launch tracing, and distributed helpers with DEBUG/INFO logs.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| iris/host/logging/logging.py | Adds IRIS_LOG_LEVEL, expands formatter output, and introduces _log_rank() helper. |
| iris/host/tracing/kernel_artifacts.py | Adds DEBUG tracing for kernel launches via _log_rank(). |
| iris/host/platform/hip.py | Logs HIP/CUDA errors and DMA-BUF export at ERROR/DEBUG. |
| iris/host/memory/symmetric_heap.py | Adds INFO/DEBUG lifecycle tracing for heap init, allocations, and peer refresh. |
| iris/host/memory/allocators/vmem_allocator.py | Adds INFO init and DEBUG allocation tracing. |
| iris/host/memory/allocators/torch_allocator.py | Adds INFO init, DEBUG allocation tracing, and ERROR OOM log. |
| iris/host/iris.py | Adds INFO init log and DEBUG barrier trace. |
| iris/host/distributed/helpers.py | Adds DEBUG tracing for allgather/broadcast/barrier helpers. |
| iris/host/distributed/fd_passing.py | Adds DEBUG tracing for FD infrastructure setup. |
| iris/ccl/all_reduce.py | Adds DEBUG tracing for all-reduce preamble and main call. |
| iris/ccl/all_gather.py | Adds DEBUG tracing for all-gather entry. |
| iris/ccl/all_to_all.py | Adds DEBUG tracing for all-to-all entry. |
| iris/ccl/reduce_scatter.py | Adds DEBUG tracing for reduce-scatter entry. |
| iris/ops/matmul_all_reduce.py | Adds DEBUG tracing for matmul all-reduce entry. |
| iris/ops/matmul_all_gather.py | Adds DEBUG tracing for matmul all-gather entry. |
| iris/ops/matmul_reduce_scatter.py | Adds DEBUG tracing for matmul reduce-scatter entry. |
| iris/ops/all_gather_matmul.py | Adds DEBUG tracing for fused all-gather matmul entry. |
- Only show [module] for internal _log_rank() calls via iris_internal flag, not for user-facing ctx.info()/ctx.debug() logs - Only set iris_num_ranks when num_ranks is not None (avoids "None" in output) - Guard eager .item() calls behind logger.isEnabledFor(DEBUG) - Lower refresh_peer_access() logs from INFO to DEBUG (too noisy) - Guard f-string log in iris.py init behind isEnabledFor(INFO) - Update test assertions to match new timestamp+level format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
IRIS_LOG_LEVELenv var (DEBUG/INFO/WARNING/ERROR) to control iris log verbosity_log_rank()helper for rank-aware logging with lazy formatting[module]bracket only for internal iris logs, not user-facingctx.info()callsAt default INFO level, only 4-5 lifecycle lines appear during init. At DEBUG, full tracing of every CCL call, kernel launch, allocation, and barrier.
Test plan
ruff check iris/ && ruff format --check iris/IRIS_LOG_LEVEL=DEBUG torchrun --nproc_per_node=4 tests/run_tests_distributed.py tests/ccl/test_all_reduce.py -v— verify debug output with[Iris]prefix🤖 Generated with Claude Code