Skip to content

Conversation

@diaconuccalin
Copy link
Contributor

@diaconuccalin diaconuccalin commented Sep 24, 2025

This PR brings the changes required for a working minimal TinyViT to work on the Siracusa platform, without tiling.

Added

  • PULP 2D FP DW conv Im2Col template and kernel, with bias support.
  • Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
  • PULP FP DW conv 2D parser.
  • FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
  • FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
  • FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
  • RemoveEmptyConvBiasPass to the PULP optimizer.

Changed

  • Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.

Fixed

  • Fixed bug in alias_of node parameter handling, that takes care of the lifetime of buffers in skip connection situations.
  • Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
  • Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
  • Data types in the PULPOpen FP add and mul templates.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@diaconuccalin diaconuccalin self-assigned this Sep 24, 2025
@diaconuccalin diaconuccalin added the Feature Addition of new features label Sep 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added floating-point depthwise 2D convolution with bias support.
    • Added floating-point slice operations.
    • Added floating-point reduce mean operations with mixed data types.
    • Enabled multi-core configuration parameter for deployment optimization.
  • Bug Fixes

    • Fixed batching handling in matrix operations.
    • Corrected data type handling in arithmetic operations.
    • Resolved naming conflicts in generated code.
  • Tests

    • Extended test coverage for new floating-point operations on Siracusa platform.

Walkthrough

Adds floating‑point depthwise 2D conv (im2col) with bias, FP slice/reduce-mean bindings and parsers, threads core count (n_cores) through contexts/deployers/tiler/tests/templates, updates FP templates/kernels for bias and per-core im2col sizing, prefixes generated closure names with an underscore, and adds Siracusa CI FP tests.

Changes

Cohort / File(s) Summary
CI test coverage
\.github/workflows/ci-platform-siracusa.yml
Added Siracusa test variants for FP conv (regular + DW, bias/no-bias), FP reshape with skip connection, and TinyViT demo.
Core-count / context propagation
Deeploy/DeeployTypes.py, Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py, Deeploy/Targets/PULPOpen/Deployer.py, Deeploy/TilingExtension/TilerExtension.py
Introduced n_cores parameter stored on NetworkContext/NetworkContainer/NetworkDeployer and SignPropDeployer; threaded existing context constructions to propagate ctxt.n_cores; TilerExtension sets self.ctxt.n_cores.
Test & CLI plumbing
DeeployTest/testMVP.py, DeeployTest/testUtils/platformMapping.py, DeeployTest/testUtils/testRunner.py, DeeployTest/testRunner_tiled_siracusa.py
Propagated n_cores through setup/map/deployer calls, added -n_cores CLI option, integrated cores into TestRunner and CMake gen args.
Parsers
Deeploy/Targets/PULPOpen/Parsers.py
FP Conv2D now accepts 2–3 inputs and optional bias mapping; added PULPFPDWConv2DParser to detect/parse depthwise FP conv with optional bias.
Bindings
Deeploy/Targets/PULPOpen/Bindings.py
Added PULPSliceBindings and PULPFloatDWConv2DBindings; extended reduce-mean bindings for float/int combos; DMASlice binding uses DMASliceTemplate reference.
Platform mapping & passes
Deeploy/Targets/PULPOpen/Platform.py
Registered new parsers/mappers/bindings (FPDWConv2D, DMASlice/Slice), exposed PULPFloatDWConv2DBindings/PULPSliceBindings, and added RemoveEmptyConvBiasPass to topology passes.
FP templates / codegen
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py, .../FloatAddTemplate.py, .../FloatMulTemplate.py, .../FloatGemmTemplate.py, .../FloatMatMulTemplate.py, .../ReshapeTemplate.py
FloatConv: switched im2col buffer sizing to use ctxt.n_cores; added PULP2DFloatDWConvIm2ColTemplate and hoisted ctxtBuffer; Add/Mul updated integer widths/casts to unsigned 32-bit; Gemm/MatMul fixed batched/non-batched handling; Reshape now uses VariableBuffer and _GenericReshapeTemplate pattern.
PULP kernel headers & sources
TargetLibraries/PULPOpen/inc/kernel/Conv.h, TargetLibraries/PULPOpen/src/Convolution_fp32.c, TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
Extended conv/Im2Col signatures to accept pSrcBias and has_bias; regular conv adjusted for per-core output-channel slicing; added PULP_DW_Conv2d_Im2Col_fp32 implementation with per-core channel slicing and bias support.
Code transformation
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
Prefixed generated closure names with an underscore to avoid problematic starting characters.
Generic reshape template rename
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py, Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py
Renamed internal _ReshapeTemplate_GenericReshapeTemplate; PULPOpen reshape now inherits from generic reshape and uses VariableBuffer aliasing.
Changelog
CHANGELOG.md
Documented FP DW conv, slice/reduce-mean bindings, bias support, core-count configurability, new tests, and fixes for batching/naming/types.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI/Test
    participant Setup as setupDeployer
    participant Map as mapDeployer
    participant PulpDep as PULPDeployer
    participant Context as NetworkContext

    CLI->>Setup: setupDeployer(..., n_cores)
    Setup->>Map: mapDeployer(..., n_cores)
    Map->>PulpDep: PULPDeployer(..., n_cores)
    PulpDep->>Context: NetworkContext(..., n_cores)
    Context-->>PulpDep: context (n_cores)
Loading
sequenceDiagram
    participant Parser as PULPFPDWConv2DParser
    participant Mapper as FPDWConv2DMapper
    participant Binding as PULPFloatDWConv2DBindings
    participant Template as PULP2DFloatDWConvIm2ColTemplate
    participant Kernel as PULP_DW_Conv2d_Im2Col_fp32

    Parser->>Parser: parseNode(2-3 inputs, optional bias)
    Parser->>Mapper: create node mapping (with bias flag)
    Mapper->>Binding: select binding
    Binding->>Template: choose reference template
    Template->>Template: computeTransientBuffersSize(ctxt.n_cores)
    Template->>Kernel: emit kernel call (pass pContextBuffer per-core)
    Kernel->>Kernel: per-core channel slicing + optional bias accumulation
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • Matching template argument/ordering changes (pSrcBias, has_bias, ctxtBuffer) between templates, mappers, and kernel signatures.
    • Per-core channel-slicing math and early-exit conditions in conv and DW conv kernels.
    • Correct propagation and defaulting of n_cores through contexts/deployers/tests.
    • Integer/unsigned cast changes in Add/Mul templates for overflow/ABI implications.
    • Parser depthwise detection correctness and binding/mapper registration consistency.

Possibly related PRs

Suggested reviewers

  • Xeratec
  • Victor-Jung

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "TinyViT on non-tiled Siracusa" is fully related to the changeset and clearly communicates the primary objective. The changes across multiple files—including new templates, parsers, bindings, optimizations, and CI pipeline updates—all work together to achieve this goal. The title is concise, specific, and avoids generic terms; a teammate reviewing the history would immediately understand this PR enables TinyViT deployment on the Siracusa platform without tiling.
Description Check ✅ Passed The PR description is clearly related to the changeset. It provides a well-structured summary organized into Added, Changed, and Fixed sections that correspond to the actual changes in the repository. The description is specific and detailed, covering new templates and kernels with bias support, parsers, bindings, optimizations, test additions, bug fixes, and CI pipeline updates. The description demonstrates clear understanding of the changes and is not vague or generic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

344-346: Prevent KeyError when 'group' attr is absent

Depthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.

-    if opNode.attrs['group'] == 1:
-        return graph
+    group = opNode.attrs.get('group', 1)
+    if group == 1:
+        return graph
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)

16-23: Loop index type should match buffer indexing

Use unsigned 32-bit for i to match array indexing and avoid UB on large sizes.

(Handled in the diff above.)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct

  • Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
  • Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
  • (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-11: Avoid libm at runtime: use integer log2 for NUM_CORES

Replace double log2() with __builtin_ctz() to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.

Apply this diff:

-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two");
+#endif
+uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);
.github/workflows/ci-platform-siracusa.yml (1)

98-98: TinyViT demo test: validate runtime envelope on Siracusa

Great to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.

Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.

This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.

Deeploy/DeeployTypes.py (3)

320-324: Prefer None over mutable default and let ctor initialize alias_of.

Passing alias_of = [] creates a new list here (safe), but use None consistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            alias_of = None,
+        )

469-474: Avoid sharing alias list between buffers; copy it.

Passing buffer.alias_of by reference couples future mutations between the source and new ConstantBuffer.

-        ret = cls(
+        ret = cls(
             name = buffer.name,
             shape = buffer.shape,
             values = values,
-            alias_of = buffer.alias_of,
+            alias_of = list(buffer.alias_of),
         )

1176-1180: Pass alias_of=None and let the ctor initialize.

Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    alias_of = None,
+                )
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)

137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop

im2col_idx % (P * Q) is redundant. Precompute a per-filter base pointer for tighter inner loops.

Apply this diff in both bias and non-bias paths:

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum +=
-                  im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q);
+            for (uint32_t k = 0; k < P * Q; k++) {
+              sum += im2col_buffer[k] * local_w[k];
+            }

Also applies to: 236-240


47-48: Remove unused variable kernel_size

kernel_size is computed but never used.

-  uint32_t kernel_size = P * Q * F_total;

61-97: Consider simplifying padding handling with a single p/q nested loop

You can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

20-27: Document pContextBuffer size expectations and bias semantics

The new signatures look correct. Please add brief comments indicating required pContextBuffer sizes:

  • Conv2d_Im2Col: NUM_CORES * (CPQ) floats
  • DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
    Also note that has_bias governs whether pSrcBias is read.

Also applies to: 29-36

Deeploy/Targets/PULPOpen/Bindings.py (2)

239-245: Constrain FP DWConv binding to supported float type(s).

FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.

Apply this diff:

-        ForkTransformer) for float_type in FloatDataTypes
+        ForkTransformer) for float_type in [float32_t]

299-302: ReduceMean (float): restrict axes integer type to int32.

ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.

Apply this diff:

-    for integer_type in SignedIntegerDataTypes
-    for float_type in FloatDataTypes
+    for integer_type in [int32_t]
+    for float_type in FloatDataTypes
Deeploy/Targets/PULPOpen/Platform.py (1)

108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.

If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.

Apply this diff to prioritize DWConv first:

-    'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]),
+    'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),

Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

42-51: Fix return typing and silence ARG004 (unused ctxt)

  • Return type now returns a str for size; widen the annotation to reflect this.
  • Rename ctxt to _ctxt to address Ruff ARG004.

Apply:

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(_ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:

93-93: Typo in comment: “ChannelOout” → “ChannelOut”

Minor doc polish.

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

124-126: Typo in comment: “ChannelOout” → “ChannelOut”

Same typo in DW template header.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa3c31 and de2752b.

📒 Files selected for processing (17)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2 hunks)
  • Deeploy/DeeployTypes.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • TargetLibraries/PULPOpen/inc/DeeployPULPMath.h (1 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • add_aliases (326-345)
  • lookup (763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (82-224)
  • computeTransientBuffersSize (137-157)
  • NetworkContext (554-1063)
  • hoistTransientBuffers (160-180)
  • hoistTransientBuffer (864-883)
  • lookup (763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (667-679)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (122-188)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1695-1838)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2492-2509)
  • lookup (763-795)
  • parseNode (1076-1091)
  • parseNodeCtxt (1094-1119)
  • NetworkContext (554-1063)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1182-1248)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

43-43: Unused static method argument: ctxt

(ARG004)

Deeploy/DeeployTypes.py

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: unsigned arithmetic for start/end bounds

Casting ${size} to uint32_t aligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistently uint32_t.

.github/workflows/ci-platform-siracusa.yml (2)

75-75: Reshape + skip connection test: ensure alias_of fix is covered

Good addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.


61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo

.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.

Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)

27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.

Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).


39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.

If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:

  • enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
  • gate the out advance with an explicit data_out_batched flag.

Example if such a flag is available:

-    ref_${data_out}_${data_out} += ${M} * ${O};
+    % if data_out_batched:
+    ref_${data_out}_${data_out} += ${M} * ${O};
+    % endif
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

370-374: Verify FP DW Conv layout: input-only transpose may be insufficient

For FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.


385-403: New PULPDWConvPass wiring looks good

Pattern graph + partial replacement is clear and consistent with the pipeline.


404-426: New PULPFPDWConvPass wiring looks good

Extends NHWC conversion to FP DW conv as intended.


505-505: Pass ordering LGTM

Adding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

41-43: Confirm context buffer sizing per core

This implementation expects pContextBuffer sized to NUM_CORES * (P*Q) floats. Please confirm templates/allocations provide that.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-16: LGTM on signature extension and core partitioning

Bias handling and per-core channel slicing are consistent. Weight and output indexing are correct.

Please ensure all call sites updated to pass pSrcBias, has_bias, and pContextBuffer (for the Im2Col variant).

Also applies to: 18-39


139-176: Good reuse of im2col buffer and contiguous weight blocks

The inner product uses a contiguous kernel_size slice, matching the buffer layout. Looks correct.

Also applies to: 179-218

Deeploy/Targets/PULPOpen/Parsers.py (3)

104-116: Bias handling looks correct; maps optional third input and flags has_bias

Parsing and context mapping are consistent with the new kernels.

Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.


127-159: DW FP2D parser shape/padding checks look reasonable

Depthwise gating via inputs count and padding symmetry is fine.


185-186: Depthwise condition check aligns with ONNX semantics

group == ch_im_in is the expected DW condition.

Deeploy/Targets/PULPOpen/Bindings.py (2)

143-153: DMA Slice binding switch looks good.

Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.


155-165: Float Slice mapping: validate starts/ends/axes/steps typing.

Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.

Deeploy/Targets/PULPOpen/Platform.py (4)

77-79: FP DWConv mapper wiring: LGTM.

Parser and bindings are correctly introduced and exposed.


92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.

Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.


8-9: RemoveEmptyConvBiasPass import: LGTM.

Consistent with optimizer usage below.


235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.

Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

60-62: Good: transient buffer typed to input element

Ensures the generated C sees the correct pointee type instead of void. This helps alloc/codegen correctness.


81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWC

TargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.

@Xeratec Xeratec added this to the Release 0.2.1 milestone Sep 29, 2025
@Xeratec Xeratec added the Bug Something isn't working label Sep 29, 2025
@Xeratec Xeratec moved this to Need Reviewer in Deeploy Sep 29, 2025
@Xeratec Xeratec moved this from Need Reviewer to In review in Deeploy Sep 29, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)

241-241: Fix remaining mutable default arguments.

VariableBuffer.__init__ still has mutable defaults (shape = [1], alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples or None) to follow Python best practices and avoid shared mutable state across calls.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):

The defensive copy on line 260 already handles the None case correctly, so no further changes are needed there.

Based on past review feedback and static analysis (Ruff B006).

🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)

454-454: Optional: Avoid redundant defensive copy.

Since VariableBuffer.__init__ already defensively copies alias_of (line 260), passing list(alias_of) here results in double copying. You can simplify by passing the list directly:

-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+        super().__init__(name, shape, alias_of)

This is a minor optimization and doesn't affect correctness.


490-495: Optional: Make alias propagation more explicit.

While correct (the constructor defensively copies alias_of), explicitly copying here would make the intent clearer:

     ret = cls(
         name = buffer.name,
         shape = buffer.shape,
         values = values,
-        alias_of = buffer.alias_of,
+        alias_of = list(buffer.alias_of),  # Explicit copy for clarity
     )

This is a minor clarity improvement and doesn't affect correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce7e4c8 and 4165983.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

325-329: LGTM!

Correctly initializes alias_of with an explicit empty list for each new instance, avoiding shared mutable state.


362-362: LGTM!

Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.


404-406: LGTM!

The updates to support size as Union[int, str] are consistent across TransientBuffer.__init__, _bufferRepresentation, and hoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.

Also applies to: 431-431, 888-888


1200-1204: LGTM!

Correctly initializes non-global outputs with an explicit empty alias_of list, maintaining consistency with the alias tracking implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)

241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)

Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.

Apply:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):
         self.name: str = name  #: str: Canonical name that this buffer is registered as in the NetworkContext
-        self.shape: Sequence[
-            int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
+        self.shape: Sequence[int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
@@
-        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []
+        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []

As per Ruff B006.


452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)

Use immutable defaults and forward alias_of safely.

Apply:

-    def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None):
-        # Pass a copy of alias_of to avoid shared references
-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+    def __init__(self,
+                 name: str = '',
+                 shape: Sequence[int] = (1,),
+                 values: Union[Sequence, np.ndarray] = (0,),
+                 alias_of: Optional[List[str]] = None):
+        super().__init__(name, shape, alias_of)

(values processing via np.asarray below remains correct.)

🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

21-24: Im2Col buffer bytes should derive from input element width (not weight).

The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.

Apply:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
-            'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \
+                     * ctxt.n_cores \
+                     * operatorRepresentation['ch_im_in'] \
+                     * operatorRepresentation['dim_kernel_x'] \
+                     * operatorRepresentation['dim_kernel_y']

43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.

  • Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
  • Comment says “value is only used as string”, but an int is produced. Update comment and/or value.

Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
@@
-        # Memory allocation for the im2col buffer can be dynamic, based on the number of cores.
-        # WARNING: This works because value is only used as string, in the allocate template.
-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS.
+        # Byte size is derived from the buffer's referenced type at allocation/codegen time.
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']

Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.


124-154: DW Im2Col call wiring is consistent; minor docstring typo.

The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
Deeploy/DeeployTypes.py (1)

2708-2714: Preserve context name when recreating NetworkContext during parse

Recreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.

Apply:

-        self.ctxt = NetworkContext(
+        self.ctxt = NetworkContext(
             variableBuffer = self.Platform.VariableBuffer,
             constantBuffer = self.Platform.ConstantBuffer,
             structBuffer = self.Platform.StructBuffer,
             transientBuffer = self.Platform.TransientBuffer,
-            n_cores = self.ctxt.n_cores,
+            n_cores = self.ctxt.n_cores,
+            name = self.name,
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4165983 and af78d75.

📒 Files selected for processing (9)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3256-3633)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
  • TestRunner (279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2451-2494)
  • TopologyOptimizer (2249-2278)
  • NetworkDeployer (3256-3633)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (575-1090)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (891-910)
  • lookup (790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py

[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?


[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max

🪛 Ruff (0.13.3)
DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

81-84: Bias/has_bias plumbing for direct conv looks good.

Bias flags are correctly passed to the FP Conv2d HWC call.


100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.

The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.

If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.

Deeploy/TilingExtension/TilerExtension.py (1)

987-989: LGTM: propagate core count to context

Assigning ctxt.n_cores ensures downstream kernels see the configured core count.

DeeployTest/testRunner_tiled_siracusa.py (1)

20-26: LGTM: cores threaded into TestRunner and CMake

Passing args.cores to TestRunner and exporting NUM_CORES is correct.

Also applies to: 28-28

DeeployTest/testUtils/platformMapping.py (1)

92-104: LGTM: add n_cores and forward to PULPDeployer

Signature and forwarding align with deployer changes.

Also applies to: 214-224

DeeployTest/testMVP.py (1)

57-60: LGTM: introduce and propagate n_cores

CLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.

Also applies to: 84-93, 201-206, 261-263

Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)

deeployStateDir: str = "DeeployState",
inputOffsets: Dict[str, int] = {}):
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add the n_cores argument? We have the N_CORES define passed through the cmake script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.

Copy link
Collaborator

@lukamac lukamac Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a universal feature, meaning some targets don't have multiple cores, for example CortexM or Generic. You should add this just to the specific target's deployer, i.e., PULPDeployer

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)

31-53: Avoid the shared mutable default for inputOffsets

Keeping inputOffsets = {} means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safe None default and materialize a new dict per instance before calling super().

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
+        if inputOffsets is None:
+            inputOffsets = {}
+
         super().__init__(
             graph = graph,
             deploymentPlatform = deploymentPlatform,
             inputTypes = inputTypes,
             loweringOptimizer = loweringOptimizer,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4db4978 and 48b2ff5.

📒 Files selected for processing (5)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • add_aliases (331-350)
  • get_aliases_of (352-364)
  • lookup (788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2579-2596)
  • lookup (788-820)
  • parseNode (1101-1116)
  • parseNodeCtxt (1119-1144)
  • NetworkContext (574-1088)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1204-1270)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b2ff5 and 7cba0cf.

📒 Files selected for processing (2)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
  • contextagnostic (285-298)
  • ReplaceSequentialPatternPass (265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

250-250: LGTM! Improved group attribute handling.

Using .get('group', 1) properly handles the ONNX Conv default and is more robust than explicit attribute checking.


370-373: LGTM! Correctly handles FP DW Conv input transpose.

The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.


380-426: LGTM! Well-structured pass classes with clear separation of concerns.

The refactoring of PULPDWConvPass improves readability with explicit comments and sections. The new PULPFPDWConvPass class properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function via partial().


505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.

The placement after PULPDWConvPass and before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)

105-107: Compute NSize based on transA.

Current code assumes no transpose; wrong K/N leads to bad tiles.

-        NSize = buffA.shape[-1]
+        NSize = buffA.shape[-1] if transA == 0 else buffA.shape[-2]

215-223: Critical undefined identifiers remain in IntGEMMTileConstraint.serializeTilingSolution.

The verification reveals multiple undefined variables that will cause runtime errors:

  • Line 132: BSize used but never defined (should be BatchSize)
  • Lines 151, 155, 167, 171, 370, 374, 386, 390, 405, 407: BatchOffset used but never defined
  • Lines 156, 172, 375, 391, 407: BOffset used but never defined

In contrast, FloatGEMMTileConstraint.serializeTilingSolution correctly uses BatchSize at line 341, suggesting the IntGEMMTileConstraint version requires similar fixes. The undefined BatchOffset and BOffset offsets need to be extracted from cube.offset similar to how MOffset and OOffset are extracted at line 119.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

22-28: Avoid 16-bit overflow in channel partitioning.

Use 32-bit for ch_out_* to support large F_total; identical change recommended in both functions.

Apply:

-  uint16_t ch_out_chunk =
+  uint32_t ch_out_chunk =
       (F_total >> log2Core) + ((F_total & (NUM_CORES - 1)) != 0);
-  uint16_t ch_out_start = MIN(ch_out_chunk * core_id, F_total);
-  uint16_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total);
-  uint16_t ch_out_count = ch_out_stop - ch_out_start;
+  uint32_t ch_out_start = MIN(ch_out_chunk * core_id, F_total);
+  uint32_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total);
+  uint32_t ch_out_count = ch_out_stop - ch_out_start;

Repeat the same replacement in PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC.

Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)

119-125: Mutable default list for passes.

Use None and initialize to avoid shared state across instances.

Apply:

-                 memoryLevelAnnotationPasses: List[NetworkOptimizationPass] = []):
+                 memoryLevelAnnotationPasses: List[NetworkOptimizationPass] | None = None):
@@
-        if len(memoryLevelAnnotationPasses) == 0:
+        if not memoryLevelAnnotationPasses:
             memoryLevelAnnotationPasses.append(AnnotateDefaultMemoryLevel(self.Platform.memoryHierarchy))
         self.memoryLevelAnnotationOptimizer = NetworkOptimizer(memoryLevelAnnotationPasses)

Repeat the same pattern in MemoryLevelAwareSignPropDeployer.init and MemoryDeployerWrapper.init.

♻️ Duplicate comments (5)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

25-26: n_cores addition looks reasonable; mutable default still present.

The n_cores parameter addition aligns with the PR's goal of supporting dynamic im2col buffer sizing. However, two observations:

  1. The mutable default inputOffsets: Dict[str, int] = {} remains unfixed from the previous review.
  2. The n_cores parameter lacks documentation and validation (e.g., ensuring it's positive).

Consider adding a docstring or inline comment explaining the n_cores parameter's purpose and optionally validate it:

                 inputOffsets: Dict[str, int] = {},
-                 n_cores: int = 8):
+                 n_cores: int = 8):  # Number of cores for im2col buffer sizing
+        if n_cores <= 0:
+            raise ValueError(f"n_cores must be positive, got {n_cores}")

For the mutable default issue, refer to the previous review comment suggesting inputOffsets: Optional[Dict[str, int]] = None.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

43-51: Fix DW Im2Col buffer sizing: remove typeWidth multiplier and correct return type annotation

In PULP2DFloatDWConvIm2ColTemplate.computeTransientBuffersSize() (lines 43–51), the size computation multiplies by typeWidth // 8 to produce byte count, but hoistTransientBuffers() (lines 59–61) then assigns the buffer a concrete type. The allocator subsequently multiplies by sizeof(type), causing double-accounting on tight L2 budgets. Additionally, the return type annotation incorrectly declares str instead of Union[int, IntVar].

Align with the existing PULP2DFloatConvIm2ColTemplate pattern (lines 18–24): compute element count (no typeWidth division) and let type-setting handle the byte multiplication.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

61-97: Critical: signed/unsigned mix in im2col padding loops leads to out‑of‑bounds indexing

h_in/w_in are uint32_t while h_in_start/w_in_start are int32_t; expressions like (h_in - h_in_start) and (w_in - w_in_start) perform unsigned arithmetic when mixed, producing incorrect large indices. Use int32_t consistently and cast bounds.

Apply (pattern applies similarly to LEFT/RIGHT blocks):

-        for (uint32_t h_in = MAX(H, h_in_start); h_in < h_in_start + P; h_in++) {
+        for (int32_t h_in = MAX((int32_t)H, h_in_start); h_in < (int32_t)(h_in_start + (int32_t)P); h_in++) {
           for (int32_t w_in = (int32_t)w_in_start;
                w_in < (int32_t)(w_in_start + Q); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }
-        for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) {
+        for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) {
           for (int32_t w_in = (int32_t)w_in_start;
                w_in < MIN(0, (int32_t)(w_in_start + Q)); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }
-        for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) {
+        for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) {
-          for (uint32_t w_in = MAX(W, w_in_start); w_in < w_in_start + Q; w_in++) {
+          for (int32_t w_in = MAX((int32_t)W, w_in_start); w_in < (int32_t)(w_in_start + (int32_t)Q); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }

Repeat the same signed fixes in the no‑bias branch (lines 170‑195).

Deeploy/Targets/PULPOpen/Deployer.py (1)

40-53: Avoid mutable default for inputOffsets.

Switch to Optional and init when calling super.

Apply:

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 deeployStateDir: str = "DeeployStateDir",
-                 inputOffsets = {},
+                 deeployStateDir: str = "DeeployStateDir",
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
         super().__init__(
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets if inputOffsets is not None else {},
             n_cores = n_cores,
         )

If SignPropDeployer already accepts Optional[Dict[str, int]], you can pass it through directly and drop the or {}.

DeeployTest/testUtils/testRunner.py (1)

349-356: Remove duplicate command construction; keep single source of truth.

The first assignment is redundant; append -n_cores via self.n_cores once.

Apply:

-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
         if self._tiling is True:
             command += f" -n_cores {self.n_cores}"
         command += f" {self.gen_args}"
🧹 Nitpick comments (15)
CHANGELOG.md (1)

26-32: Vary sentence structure in Added section to improve readability.

The static analysis tool flagged repetitive sentence beginnings ("FP PULP DW conv", "PULP FP DW conv", "FP bindings"). While the entries are accurate and comprehensive, consider rephrasing for better variety. For example:

- PULP 2D FP DW conv Im2Col template and kernel, with bias support.
- Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
- PULP FP DW conv 2D parser.
- FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
- FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
- FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
+ 2D FP depthwise convolution Im2Col template, kernel, and parser for PULP, with bias support.
+ Bias support for PULP 2D FP regular convolution Im2Col template and kernel.
+ FP convolution 2D (regular and depthwise), reshape, skip connection, and TinyViT demo tests in the non-tiled Siracusa CI pipeline.
+ FP bindings and mappings: PULP slice, depthwise convolution 2D, and reduce mean operations.
+ Depthwise convolution lowering optimization pass (FP variant) for PULP, analogous to the integer version.

This maintains factual accuracy while improving stylistic variety.

Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1)

519-526: Tighten schedule pairing with zip(strict=True)

Ensure 1:1 correspondence between base input loads and requant cubes.

-        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)]
+        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1)

515-517: Add zip(strict=True) to guard schedule alignment

Match counts between input and requant loads for WMEM DW RQS.

-        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)]
+        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1)

295-299: Strengthen pairing with zip(strict=True) in PW RQS

Guarantee equal lengths when combining base and requant schedules.

-        } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule)]
+        } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule, strict=True)]
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-14: Avoid FP log2; use bit ops and assert power-of-two NUM_CORES

log2() pulls in libm and risks FP rounding; use CTZ and add a compile-time check. Also removes any need for <math.h>/-lm.

-uint32_t ${nodeName}_core_id = pi_core_id();
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+uint32_t ${nodeName}_core_id = pi_core_id();
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be a power of two");
+const uint32_t ${nodeName}_log2Core = __builtin_ctz(NUM_CORES);
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

93-93: Fix typo in comment: “ChannelOout” → “ChannelOut”

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

125-154: DW Im2Col call: argument order LGTM; minor comment typo

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

Based on learnings.


46-51: Clarify misleading comment about size-as-string

The value is currently computed as int, not string; please update the comment to avoid confusion.

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

181-185: Use wider unsigned types for tiling sizes for consistency and headroom.

Dims rarely fit in int8; GEMM uses uint16_t/uint8_t. Align MatMul for consistency and to avoid overflow.

Apply:

-from Deeploy.CommonExtensions.DataTypes import int8_t
+from Deeploy.CommonExtensions.DataTypes import uint8_t, uint16_t
...
-        replacementTypes = {
-            "M": PointerClass(int8_t),
-            "N": PointerClass(int8_t),
-            "O": PointerClass(int8_t),
-            "batch": PointerClass(int8_t)
-        }
+        replacementTypes = {
+            "M": PointerClass(uint16_t),
+            "N": PointerClass(uint16_t),
+            "O": PointerClass(uint16_t),
+            "batch": PointerClass(uint8_t),
+        }

If other MatMul code relies on int8_t here, confirm and we can gate this by target or keep uint16_t only for sizes ≥ 256.

Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)

83-87: Avoid assigning unused variable.

modulus is unused. Either drop assignment or use _ =.

-            modulus = tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")
+            tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")
Deeploy/Targets/Generic/Parsers.py (1)

1071-1101: Reshape aliasing looks good; mark constant shape as non-deploy and use cached alias list

  • Alias flow is correct and symmetric; iterate the cached list to avoid relying on a second lookup.
  • If the shape input is a constant, mark it non‑live/non‑deploy to avoid useless allocations (mirrors Unsqueeze).

Apply near this block:

-        new_output_node_aliases = input_node.get_aliases_of()
+        new_output_node_aliases = list(input_node.get_aliases_of())
         new_output_node_aliases.append(input_node.name)
@@
-        for alias in output_node.get_aliases_of():
+        for alias in new_output_node_aliases:
             alias_node = ctxt.lookup(alias)
             alias_node.add_aliases(aliases_to_add = [
                 output_node.name,
             ])
+
+        # If shape is a constant tensor, exclude it from deployment
+        shape_buf = ctxt.lookup(node.inputs[inputs.index("shape")].name)
+        if hasattr(shape_buf, "values"):
+            shape_buf._live = False
+            shape_buf._deploy = False
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

137-145: Minor: remove redundant modulo and consider precomputing weight base

im2col_idx runs [0, PQ), so % (PQ) is redundant. Also, precompute (f - ch_out_start)PQ outside the inner loop.

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum += im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            uint32_t w_base = (f - ch_out_start) * P * Q;
+            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
+              sum += im2col_buffer[im2col_idx] * weight_ptr[w_base + im2col_idx];
+            }

47-51: Nit: kernel_size is unused

kernel_size is computed but never used; remove to avoid confusion.

Deeploy/DeeployTypes.py (1)

3262-3317: Redundant n_cores field in child; base already stores it.

NetworkDeployer saves n_cores; removing the extra self.n_cores avoids duplication.

Apply:

-        self.prepared = False
-        self.n_cores = n_cores
+        self.prepared = False
DeeployTest/testMVP.py (1)

57-59: Consider removing the unused verbose parameter.

The verbose parameter is declared but never used in the function body. While the n_cores parameter addition looks good and aligns with the PR objectives, consider removing verbose to clean up the signature.

Apply this diff:

 def setupDeployer(graph: gs.Graph, memoryHierarchy: MemoryHierarchy, defaultTargetMemoryLevel: MemoryLevel,
-                  defaultIoMemoryLevel: MemoryLevel, verbose: CodeGenVerbosity, args: argparse.Namespace,
-                  n_cores: int) -> Tuple[NetworkDeployer, bool]:
+                  defaultIoMemoryLevel: MemoryLevel, args: argparse.Namespace,
+                  n_cores: int) -> Tuple[NetworkDeployer, bool]:

Note: Ensure to update the call site on line 256 as well if you apply this change.

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da162b0 and 85edf76.

📒 Files selected for processing (30)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5 hunks)
  • Deeploy/DeeployTypes.py (14 hunks)
  • Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (6 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (3 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
  • DeeployTest/testUtils/platformMapping.py
  • DeeployTest/testRunner_tiled_siracusa.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (14)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (587-592)
Deeploy/Targets/PULPOpen/Parsers.py (2)
  • PULPFPConv2DParser (65-120)
  • PULPFPDWConv2DParser (123-189)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1736-1892)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Bindings.py (5)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2366-2400)
  • NodeBinding (1588-1733)
  • NodeTemplate (87-229)
Deeploy/FutureExtension/Bindings/AutoFutureBinding.py (1)
  • AutoFutureBinding (13-62)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2583-2600)
  • lookup (792-824)
  • parseNode (1105-1120)
  • parseNodeCtxt (1123-1148)
  • NetworkContext (578-1092)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1246-1312)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (2)
Deeploy/DeeployTypes.py (1)
  • lookup (792-824)
Deeploy/TilingExtension/TilingCodegen.py (2)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/DeeployTypes.py (3)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
  • sizeInBytes (181-182)
Deeploy/AbstractDataTypes.py (2)
  • typeWidth (399-400)
  • Pointer (303-388)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • lookup (792-824)
  • get_aliases_of (352-364)
  • add_aliases (331-350)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • TilingSchedule (53-122)
  • AbsoluteHyperRectangle (39-49)
  • VariableReplacementScheme (126-158)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/TilingExtension/TileConstraint.py (3)
  • TileConstraint (19-221)
  • serializeTilingSolution (184-221)
  • extractBaseAddr (56-74)
Deeploy/DeeployTypes.py (1)
  • lookup (792-824)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (3)
  • uint8_t (36-39)
  • uint32_t (48-51)
  • uint16_t (42-45)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
  • Conv2DTileConstraint (225-435)
  • computeInputCube (320-347)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
  • contextagnostic (285-298)
  • SequentialPass (201-210)
  • ReplaceSequentialPatternPass (265-282)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • TilingSchedule (53-122)
  • AbsoluteHyperRectangle (39-49)
  • VariableReplacementScheme (126-158)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-402)
  • addTensorDimToModel (143-157)
  • getTensorDimVar (131-135)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (3)
  • uint8_t (36-39)
  • uint32_t (48-51)
  • uint16_t (42-45)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
  • Conv2DTileConstraint (225-435)
  • computeInputCube (320-347)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
Deeploy/DeeployTypes.py (5)
  • _parseNode (2670-2677)
  • ONNXLayer (1895-2223)
  • NetworkContext (578-1092)
  • optimize (2260-2280)
  • optimize (2310-2329)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (578-1092)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (893-912)
  • lookup (792-824)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3258-3635)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
🪛 LanguageTool
CHANGELOG.md

[style] ~31-~31: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.1)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py

233-233: Undefined name absoluteCube

(F821)

Deeploy/DeeployTypes.py

401-401: Redefinition of unused sizeInBytes from line 390

(F811)


447-447: Redefinition of unused sizeInBytes from line 444

(F811)


456-456: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


456-456: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py

263-263: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


263-263: Undefined name inputRequantCubes

(F821)


281-281: Local variable dilation is assigned to but never used

Remove assignment to unused variable dilation

(F841)


289-289: Local variable inputChannelVar is assigned to but never used

Remove assignment to unused variable inputChannelVar

(F841)


294-294: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


317-317: Local variable weightBuffer is assigned to but never used

Remove assignment to unused variable weightBuffer

(F841)


320-320: Local variable inputBatchVar is assigned to but never used

Remove assignment to unused variable inputBatchVar

(F841)


325-325: Local variable outputBatchVar is assigned to but never used

Remove assignment to unused variable outputBatchVar

(F841)


326-326: Local variable outputHeightVar is assigned to but never used

Remove assignment to unused variable outputHeightVar

(F841)


327-327: Local variable outputWidthVar is assigned to but never used

Remove assignment to unused variable outputWidthVar

(F841)


328-328: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


331-331: Local variable padding is assigned to but never used

Remove assignment to unused variable padding

(F841)


413-413: Unpacked variable BatchOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable HOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable WOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable COffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


414-414: Unpacked variable BatchSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


426-426: Unpacked variable inBSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


426-426: Unpacked variable inHSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


522-522: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py

122-122: Undefined name BSize

(F821)


133-133: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


133-133: Undefined name BatchOffset

(F821)


133-133: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


136-139: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


137-137: Undefined name BatchOffset

(F821)


138-138: Undefined name BOffset

(F821)


140-143: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


142-142: Undefined name BSize

(F821)


149-149: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


149-149: Undefined name BatchOffset

(F821)


149-149: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


152-155: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


153-153: Undefined name BatchOffset

(F821)


154-154: Undefined name BOffset

(F821)


156-159: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


158-158: Undefined name BSize

(F821)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

305-306: Expected an indented block after if statement

(invalid-syntax)


310-310: Expected an indented block after if statement

(invalid-syntax)

Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py

132-132: Undefined name BSize

(F821)


151-151: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


151-151: Undefined name BatchOffset

(F821)


151-151: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


154-157: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


155-155: Undefined name BatchOffset

(F821)


156-156: Undefined name BOffset

(F821)


158-161: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


160-160: Undefined name BSize

(F821)


167-167: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


167-167: Undefined name BatchOffset

(F821)


167-167: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


170-173: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


171-171: Undefined name BatchOffset

(F821)


172-172: Undefined name BOffset

(F821)


174-177: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


176-176: Undefined name BSize

(F821)


370-370: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


370-370: Undefined name BatchOffset

(F821)


370-370: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


373-376: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


374-374: Undefined name BatchOffset

(F821)


375-375: Undefined name BOffset

(F821)


377-380: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


379-379: Undefined name BSize

(F821)


386-386: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


386-386: Undefined name BatchOffset

(F821)


386-386: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


389-392: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


390-390: Undefined name BatchOffset

(F821)


391-391: Undefined name BOffset

(F821)


393-396: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


395-395: Undefined name BSize

(F821)


405-405: Consider (BatchOffset, *CMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *CMatrixOffsets)

(RUF005)


405-405: Undefined name BatchOffset

(F821)


405-405: Consider (BatchSize, *CMatrixShape) instead of concatenation

Replace with (BatchSize, *CMatrixShape)

(RUF005)


407-407: Consider (BatchOffset, BOffset, *CMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *CMatrixOffsets)

(RUF005)


407-407: Undefined name BatchOffset

(F821)


407-407: Undefined name BOffset

(F821)


407-407: Consider (BatchSize, BSize, *CMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *CMatrixShape)

(RUF005)


407-407: Undefined name BSize

(F821)

Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py

259-259: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


259-259: Undefined name inputRequantCubes

(F821)


277-277: Local variable dilation is assigned to but never used

Remove assignment to unused variable dilation

(F841)


314-314: Local variable weightBuffer is assigned to but never used

Remove assignment to unused variable weightBuffer

(F841)


317-317: Local variable inputBatchVar is assigned to but never used

Remove assignment to unused variable inputBatchVar

(F841)


320-320: Local variable inputChannelVar is assigned to but never used

Remove assignment to unused variable inputChannelVar

(F841)


322-322: Local variable outputBatchVar is assigned to but never used

Remove assignment to unused variable outputBatchVar

(F841)


323-323: Local variable outputHeightVar is assigned to but never used

Remove assignment to unused variable outputHeightVar

(F841)


324-324: Local variable outputWidthVar is assigned to but never used

Remove assignment to unused variable outputWidthVar

(F841)


325-325: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


328-328: Local variable padding is assigned to but never used

Remove assignment to unused variable padding

(F841)


408-408: Unpacked variable BatchOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable HOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable WOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable COffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


409-409: Unpacked variable BatchSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


421-421: Unpacked variable inBSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


421-421: Unpacked variable inHSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


517-517: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py

131-131: Undefined name ONNXLayer

(F821)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

🔇 Additional comments (29)
.github/workflows/ci-platform-siracusa.yml (1)

57-75: Test additions well-integrated and properly organized.

The new kernel tests (lines 57–63 for Float2D convolutions with and without DW variants, line 75 for reshape with skip connection) and the model test (line 99 for TinyViT demo) follow established naming conventions, are logically grouped by operation type, and align with the PR's stated objectives to add FP test coverage.

Based on the established pattern (documented in learnings), these test names map to directories in DeeployTest/Tests/, and the prior review confirmed these directories exist and tests pass successfully.

Also applies to: 99-99

CHANGELOG.md (1)

100-103: Approve changelog entries for bug fixes.

The fixed section comprehensively documents the four key bug fixes (alias_of parameter handling, non-batched FP GEMM/matmul, closure naming, and FP add/mul data types) in alignment with the PR objectives. The descriptions are clear and concise.

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

35-35: LGTM!

The storage of n_cores is straightforward and correct.

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

127-131: Nice performance improvement.

The refactored implementation using iterative construction is O(n) instead of O(n²) with repeated .index() calls, and the logic is clearer.


338-383: New DW convolution passes look well-structured.

The NCHWtoNHWCDwConvPass and PULPFPDWConvPass classes follow the established pattern-based pass architecture correctly. However, they depend on _PULPDWNCHWtoNHWC_fun, which has a syntax error that must be fixed first.

Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: MIN casts prevent signed/unsigned mixups

Casting ${size} to uint32_t here is correct and avoids UB on MIN comparisons.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

100-117: Im2Col call argument order looks correct

X/Y, kernel, and stride ordering matches the expected implementation for these templates.

Based on learnings.


21-24: Conv2D Im2Col scratch sizing formula is correct

The formula matches the C kernel's pContextBuffer requirement. The buffer is correctly sized as (typeWidth // 8) * n_cores * ch_im_in * kx * ky bytes, which corresponds to the kernel logic where each core gets ch_im_in * kernel_x * kernel_y float elements, split across n_cores with pointer arithmetic advancing by element count. For fp32 (typeWidth=32), this yields the exact buffer needed: n_cores * ch_im_in * kx * ky * 4 bytes.

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

164-173: Review comment is incorrect: prior cleanup was not completed.

The first BCube construction block (lines 146-160) still exists as dead code. It is never appended to inputBCubes—only the second BCube constructed at line 172 is appended (line 173). The review comment incorrectly asserts that "earlier deleted branch already tried to build BCube," but this branch was never deleted. While lines 164-173 correctly represent the single active path, the orphaned BCube construction at lines 146-160 contradicts the cleanup claim.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/Parsers.py (3)

80-82: FP Conv2D: bias arity check LGTM

Allowing 2 or 3 inputs at parse time is correct and aligns with bias support.


105-117: FP Conv2D: bias flags wiring is consistent

Setting has_bias ("true"/"false") and mapping bias when present matches existing templates.

Please confirm FloatConvTemplate.reference2DIm2ColTemplate expects:

  • keys: data_in, weight, data_out, has_bias, bias.

123-190: Remove suggestion to validate DW early in parseNode; current design is sound.

The review comment suggests early depthwise validation in parseNode, but this is architecturally infeasible. The parseNode method receives only the node (without context), so ch_im_in—needed to verify group == ch_im_in—is unavailable. Input channel dimensions are only accessible in parseNodeCtxt via ctxt.lookup(), which is where the DW check correctly occurs (line ~167).

The mention of "F_total % C == 1" references an undefined formula not present anywhere in the codebase.

The current two-stage design (attribute validation in parseNode, context validation in parseNodeCtxt) correctly reflects the available information at each stage and is the proper approach.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/Bindings.py (5)

12-19: Import expansion LGTM

Float data types and new templates are correctly imported for the added FP bindings.


151-152: DMA Slice: template swap LGTM

Switch to DMASliceTemplate.referenceTemplate is consistent with DMA path.


163-172: Slice bindings for FP types: OK

SliceChecker + SliceTemplate with ForkTransformer over FloatDataTypes matches the Float path.


299-303: Float ReduceMean bindings: OK

Mixed float + integer axes signature bound to FloatReduceMeanTemplate with ClusterTransformer is consistent with parser behavior for FP opset.


239-245: FP DW Conv2D binding confirmed: template keys match parser

Verified that PULPFPDWConv2DParser.parseNodeCtxt() produces both has_bias and bias keys in operatorRepresentation (Parsers.py:175–176, 179), which are correctly expected by FloatConvTemplate.referenceDW2DIm2ColTemplate (FloatConvTemplate.py:81, 110, 142). The binding is correct.

Deeploy/TilingExtension/TilerExtension.py (1)

988-989: Review comment is incorrect — no bug exists

NetworkContext.copy() uses copy.copy(self), which is a shallow copy that preserves instance attributes like n_cores (they remain in the object's __dict__). _convertCtxtToStaticSchedule returns the same ctxt instance modified in-place, so n_cores naturally survives. Line 988's reassignment is defensive programming but not required for correctness.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

36-39: Check argument order consistency at call sites.

Kernels accept (H, W, P, Q, SP, SQ); FloatConvTemplate uses X,Y ordering per prior notes. Please confirm bindings pass the correct order to avoid transposed dims.

Based on learnings

Also applies to: 134-138

Deeploy/DeeployTypes.py (1)

1205-1210: Constructor argument mismatch in parseOutputs.

Use aliases=[] instead of alias_of=[] to match VariableBuffer.init.

Apply:

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    aliases = [],
+                )

Likely an incorrect or invalid review comment.

DeeployTest/testMVP.py (3)

84-92: LGTM!

The n_cores parameter is correctly propagated to mapDeployer, and the formatting is clean and consistent.


201-205: LGTM!

The CLI argument is well-defined with a sensible default and clear help text. The parameter correctly flows through to the deployment pipeline.


256-262: LGTM!

The n_cores parameter is correctly passed from CLI arguments to setupDeployer, completing the propagation chain through the deployment pipeline.

Deeploy/Targets/PULPOpen/Platform.py (6)

8-9: LGTM!

All new imports are properly structured and used within the file. The additions support FP depthwise convolution, slice operations, and the new optimization pass.

Also applies to: 32-32, 35-36


77-77: LGTM!

The FPDWConv2DMapper definition follows the established pattern and correctly pairs the FP depthwise conv parser with its bindings.


92-94: LGTM!

Both slice mappers correctly use SliceParser with different bindings to support DMA-based and software-based slice implementations. The mapper defined first in the mapping list (line 134) will have precedence during binding selection.


234-235: LGTM!

The RemoveEmptyConvBiasPass is correctly positioned near the end of the optimization pipeline, after merge and requantization passes. This placement allows the pass to clean up any empty bias parameters that may have been introduced or remain after earlier transformations.


134-134: No action needed; original review comment is incorrect.

The mapper order does not affect precedence. PULPDMASliceBindings operates on IntegerDataTypes while PULPSliceBindings operates on FloatDataTypes—these are mutually exclusive sets. The SliceChecker type constraints determine which binding applies, not the array order. The current ordering [SliceMapper, DMASliceMapper] is safe.

Likely an incorrect or invalid review comment.


108-108: Based on my verification of the parser constraint logic:

FPConv2DMapper matches regular convolutions exclusively:

  • PULPFPConv2DParser.parseNode() explicitly requires self.operatorRepresentation['group'] == 1
  • This constraint rejects any depthwise convolution (where group == ch_im_in)

FPDWConv2DMapper matches depthwise convolutions:

  • PULPFPDWConv2DParser.parseNode() does not have a group==1 constraint
  • Depthwise validation occurs in parseNodeCtxt() at line 186: if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']

The mapper ordering [FPConv2DMapper, FPDWConv2DMapper] is correct: the more constrained parser (requiring group==1) is tried first, ensuring regular convolutions are matched before attempting depthwise matching. Any regular convolution caught by FPConv2DMapper will not reach FPDWConv2DMapper.

…ffer size computation. Added fixes to remove warnings
…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa
…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings
lukamac and others added 15 commits October 27, 2025 13:54
* Use [reuse-tool](https://github.com/fsfe/reuse-tool) for checking license headers

Added a wrapper python script to filter the files so we can manually
skip what we don't want reuse to check. Otherwise it complains about
every single file.

* Replace LICENSE file with LICENSES directory and Apache-2.0.txt file

Reuse explicitly checks for licenses in the LICENSES directory and
doesn't look for it in the LICENSE file.

* Add MIT license for the scripts/run_clang_format.py

* Add missing license headers

* Configure  and add CC BY-NC 4.0 license

* Update Readme

* Call reuse via python if not in PATH

* Ignore "TEST_*" directories

* Update CONTRIBUTING.md

---------

Co-authored-by: Philip Wiese <[email protected]>
This pull request introduces several improvements and refactors across the codebase, focusing on enhanced logging, memory tracking, type checking, and developer usability. The most significant changes include replacing print statements with structured logging, adding detailed memory usage tracking for buffers, improving type and level checking for data types, and providing more informative string representations and debugging messages for core classes.

All runners and the generation scripts support different verbosity levels:
- `-v`: Show errors, warning and info messages
- `-vv`: Show errors, warnings, info and debug messages
- `-vvv`: Maximum verbosity with information about where the message was emitted. Also enable verbose compilation during the build step for the test runners.

- Logging and Error Handling Improvements: Replaced all `print` statements with structured logging using `DEFAULT_LOGGER` throughout the codebase, improving consistency and enabling better log management.
- Added debug-level logs to type checking and parsing routines to indicate success/failure and binding exhaustion, aiding troubleshooting.
- Implemented `__repr__` methods for core classes (`NodeTemplate`, `NodeBinding`, `NodeMapper`) to provide more informative string representations for debugging and logging
- Added `checkNumLevels` methods to relevant data type classes to validate that the number of levels assigned to buffers is supported by their type, and integrated warning logs if mismatches are detected during type inference.
- Introduced a `sizeInBytes` method to `VariableBuffer`, `TransientBuffer`, and related classes to standardize buffer size calculation.

- Report the maximum static and dynamic memory usage for all memory level:
  - Added dynamic and maximum memory usage tracking to `NetworkContext` via `_dynamicSize` and `_maxDynamicSize` dictionaries, and updated memory allocation/deallocation logic in `MemoryAllocation.py` to maintain these statistics per memory level. (This is only used for untitled platforms to track the maximum buffer size.)
  - Exposed new summary methods (`_printMemorySummary`, `_printInputOutputSummary`) in deployer wrappers and implemented input/output summary logging in `SignPropDeployer`.

- Deployer workflow now uses `prepare(...)` instead of `generateFunction(...)`.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)

241-245: Replace mutable default (shape=[1]) with immutable and type-annotate.

Avoid shared mutable defaults.

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), aliases: Optional[List[str]] = None):

379-388: Fix visited set creation in has_live_ancestors (bug).

set(self.name) splits into characters; use a singleton set.

-        visited = set(self.name)
+        visited = {self.name}
♻️ Duplicate comments (6)
CHANGELOG.md (1)

23-23: Remove duplicate PR entry (#105).

The “Refactor tiling code generation [#105]” line is duplicated. Remove the second occurrence to avoid confusion.

- - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

25-26: Avoid mutable default for inputOffsets.

Use Optional and initialize inside to prevent shared state.

-                 inputOffsets: Dict[str, int] = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
@@
-        if inputOffsets == {}:
-            for key in inputTypes.keys():
-                inputOffsets[key] = 0
+        if inputOffsets is None:
+            inputOffsets = {key: 0 for key in inputTypes.keys()}
Deeploy/Targets/PULPOpen/Deployer.py (1)

40-53: Avoid mutable default for inputOffsets; forward Optional to base.

Align with SignPropDeployer’s fix.

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets,
             n_cores = n_cores,
Deeploy/DeeployTypes.py (3)

325-329: Constructor arg mismatch: use ‘aliases’, not ‘alias_of’.

This raises a TypeError at runtime.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            aliases = [],
+        )

442-447: Avoid mutable defaults in ConstantBuffer.init.

Use immutable defaults and convert.

-    def __init__(self, name: str = '', shape = [1], values = [0]):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), values: Union[Sequence, np.ndarray] = (0,)):
         # Pass a copy of alias_of to avoid shared references
         super().__init__(name, shape)
 
-        values = np.asarray(values)
+        values = np.asarray(values)

331-365: Align alias helpers with self.aliases (set) and remove alias_of usage.

Current code references a non-existent attribute and never updates aliases.

-    def add_aliases(self, aliases_to_add: List[str]):
+    def add_aliases(self, aliases_to_add: List[str]):
@@
-        if not hasattr(self, "alias_of"):
-            return None
-
-        for alias in aliases_to_add:
-            if alias not in self.alias_of:
-                self.alias_of.append(alias)
+        if not hasattr(self, "aliases") or self.aliases is None:
+            self.aliases = set()
+        for alias in aliases_to_add:
+            self.aliases.add(alias)
         return None
@@
-    def get_aliases_of(self):
+    def get_aliases_of(self) -> List[str]:
@@
-        if hasattr(self, "alias_of"):
-            return list(self.alias_of)
-        else:
-            return list()
+        return list(self.aliases) if hasattr(self, "aliases") and self.aliases is not None else []
🧹 Nitpick comments (14)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)

158-159: Consider defensive handling for edge cases in name prefixing.

The underscore prefix addresses names starting with numbers as intended. However, the implementation doesn't guard against:

  • Empty or None name values
  • Names already starting with _ (which would produce __, a reserved identifier)

Consider adding a check:

-        # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers)
-        self.closureName = "_" + name + self.closureSuffix
+        # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers)
+        # Avoid creating double-underscore if name already starts with underscore
+        prefix = "" if name and name[0] == "_" else "_"
+        self.closureName = prefix + name + self.closureSuffix
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM! Correct fix for non-batched input handling.

The conditional logic properly addresses the bug by only advancing input pointers when the corresponding input is actually batched. When A_batched or B_batched is false, the base pointer is reused across all batch iterations, which correctly implements broadcasting semantics.

The pointer arithmetic for the batched case is correct:

  • A offset: b * M * N for shape [batch, M, N]
  • B offset: b * N * O for shape [batch, N, O]

Minor: Line 17 has trailing whitespace after the colon that could be cleaned up.

-    % if B_batched:                  
+    % if B_batched:
Deeploy/Targets/Generic/Parsers.py (1)

1071-1074: Consider adding strict=True to zip() calls for defensive programming.

The zip() calls iterate over inputs and outputs with expected fixed-length lists. While parseNode already validates these lengths, adding strict=True (Python 3.10+) provides an explicit safety guarantee and catches bugs if the validation logic changes.

Apply this diff if Python 3.10+ is used:

-    for tensor, symName in zip(node.inputs, ['data_in', 'shape']):
+    for tensor, symName in zip(node.inputs, ['data_in', 'shape'], strict=True):
         self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name
-    for tensor, symName in zip(node.outputs, ['data_out']):
+    for tensor, symName in zip(node.outputs, ['data_out'], strict=True):
         self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name

Based on static analysis hints.

Deeploy/TilingExtension/TilerExtension.py (1)

988-989: Propagate n_cores earlier (and guard against missing attribute).

Set ctxt.n_cores when the context is created (or before tiling) to avoid surprises, and guard for attribute presence to prevent AttributeError if wrappers differ.

Example:

-        self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap)
-        self.ctxt.n_cores = self.n_cores
+        self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap)
+        if hasattr(self, "n_cores"):
+            self.ctxt.n_cores = self.n_cores
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

35-35: Drop redundant self.n_cores assignment.

NetworkDeployer already stores n_cores; remove this duplicate to keep a single source of truth.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)

43-51: Fix return type annotation to match base (Union[int, IntVar]).

The function returns an int, but is annotated as str. Align with NodeTemplate’s signature.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(
+        ctxt: NetworkContext,
+        operatorRepresentation: OperatorRepresentation,
+    ) -> List[Tuple[str, Union[int, IntVar]]]:

46-51: Update stale comment about using size as string.

The returned size is numeric and also passed to hoistTransientBuffer; the “used as string” warning is misleading. Remove or clarify it.


59-65: Prefer annotateType over mutating referencedType.

Setting _type.referencedType mutates the class and can have side effects. Use annotateType to set a precise Pointer type and instance.

-        # Manually set the type of the im2col buffer to match the input type, since it defaults to void for transient buffers
-        ctxt.lookup(im2col_name)._type.referencedType = ctxt.lookup(
-            operatorRepresentation['data_in'])._type.referencedType
+        # Set the buffer pointer type to match data_in
+        ctxt.annotateType(im2col_name, ctxt.lookup(operatorRepresentation['data_in'])._type)

93-101: Nit: fix comment typos (“ChannelOout” → “ChannelOut”).

Purely cosmetic; improves readability.

Also applies to: 124-131

Deeploy/DeeployTypes.py (1)

573-576: Avoid mutable dict defaults in NetworkContext.init.

Initialize dicts inside to prevent shared state.

-                 globalObjects = {},
-                 localObjects = {},
+                 globalObjects: Optional[OrderedDict] = None,
+                 localObjects: Optional[OrderedDict] = None,
@@
-        self.globalObjects = OrderedDict()
-        self.localObjects = OrderedDict()
+        self.globalObjects = OrderedDict() if globalObjects is None else globalObjects
+        self.localObjects = OrderedDict() if localObjects is None else localObjects
DeeployTest/testUtils/platformMapping.py (2)

92-104: Use int for n_cores (not Optional) since default is numeric.

-    n_cores: Optional[int] = 8,
+    n_cores: int = 8,

114-188: Propagate n_cores to all SignProp-based deployers for consistency.

CMSIS, MemPool, SoftHier, Generic deployers inherit SignPropDeployer and accept n_cores. Pass it through so CLI behavior is uniform.

Example for each block:

-        deployer = CMSISDeployer(...,
-                                 deeployStateDir = deeployStateDir,
-                                 inputOffsets = inputOffsets)
+        deployer = CMSISDeployer(...,
+                                 deeployStateDir = deeployStateDir,
+                                 inputOffsets = inputOffsets,
+                                 n_cores = n_cores)

Apply similarly to MemPoolDeployer, SoftHierDeployer, GenericDeployer.

Also applies to: 140-188

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

50-148: Code duplication acceptable for performance.

The has_bias and no-bias paths are duplicated (lines 50-148 vs 149-248), which increases maintainability burden but avoids branching inside tight loops. This trade-off is reasonable for performance-critical embedded kernels.

Consider adding a comment noting this is intentional for performance if future maintainers question the duplication.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

41-101: Code duplication acceptable for performance.

The has_bias and no-bias code paths are duplicated to avoid conditional branching inside the innermost loops (lines 47-64). This is a common optimization in performance-critical embedded code where avoiding branches can significantly improve execution speed.

Similar duplication exists in the im2col path (lines 140-218). Consider adding a brief comment explaining this trade-off for future maintainers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85edf76 and 945b610.

📒 Files selected for processing (24)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1 hunks)
  • Deeploy/DeeployTypes.py (12 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (6 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
  • DeeployTest/testRunner_tiled_siracusa.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (9)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • inputs (2569-2586)
  • lookup (778-810)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2569-2586)
  • lookup (778-810)
  • parseNode (1091-1106)
  • parseNodeCtxt (1109-1134)
  • NetworkContext (564-1078)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1220-1286)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3244-3629)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (564-1078)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (879-898)
  • lookup (778-810)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2352-2386)
  • NodeBinding (1574-1719)
  • NodeTemplate (87-229)
Deeploy/Targets/Generic/TypeCheckers.py (2)
  • SliceChecker (37-51)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2439-2482)
  • TopologyOptimizer (2237-2266)
  • NetworkDeployer (3244-3629)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-127)
Deeploy/DeeployTypes.py (2)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/AbstractDataTypes.py (1)
  • Pointer (303-388)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (519-524)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-190)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1722-1878)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.1)
Deeploy/Targets/Generic/Parsers.py

1071-1071: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


1073-1073: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (18)
.github/workflows/ci-platform-siracusa.yml (1)

57-64: CI test coverage additions align well with PR features.

The new tests for FP 2D convolution (standard and depthwise variants with/without bias), reshape with skip connection, and TinyViT demo align cleanly with the PR's feature additions. Test naming is consistent with existing patterns, and per the earlier review resolution, these tests exist as directories in DeeployTest/Tests/. The additions integrate smoothly into the existing kernel and model test lists without logic changes or structural issues.

Also applies to: 75-75, 99-99

Deeploy/Targets/Generic/Parsers.py (1)

1070-1076: LGTM! Clear and Pythonic mapping implementation.

The explicit mapping of inputs to ['data_in', 'shape'] and outputs to ['data_out'] improves clarity and follows the pattern established by other parsers in the codebase (e.g., iRMSNormParser lines 65-71). The use of zip() is more Pythonic than index-based access.

Deeploy/DeeployTypes.py (2)

879-899: Ensure hoistTransientBuffer size is always int.

If any templates pass symbolic strings, they’ll break here. Confirm all computeTransientBuffersSize implementations return int/IntVar as per signature.


2696-2702: Good: preserve n_cores on context reinit.

Reconstructing NetworkContext with n_cores=self.ctxt.n_cores avoids losing state on parse.

Deeploy/Targets/PULPOpen/Parsers.py (3)

80-81: LGTM! Bias support correctly integrated.

The relaxed input count check now allows optional bias (2 inputs without bias, 3 inputs with bias), which aligns with the bias handling added in parseNodeCtxt.


106-118: LGTM! Clean bias handling implementation.

The conditional bias handling correctly maps inputs based on the node's input count, setting appropriate flags for downstream code generation.


124-191: LGTM! Well-structured depthwise convolution parser.

The parser correctly validates depthwise convolution constraints:

  • Square padding requirement
  • Optional bias support (2 or 3 inputs)
  • Depthwise verification via group == ch_im_in check (line 187)

The structure mirrors the regular FP Conv2D parser while adding depthwise-specific validation.

DeeployTest/testMVP.py (1)

57-59: LGTM! Clean n_cores parameter propagation.

The n_cores parameter is consistently threaded through the deployment setup:

  1. Added to setupDeployer signature
  2. Passed to mapDeployer
  3. Exposed via CLI argument (default: 8)
  4. Propagated from parsed arguments

This enables proper sizing of per-core buffers (e.g., im2col buffers) in multi-core PULP deployments.

Also applies to: 84-92, 201-205, 261-262

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

10-36: LGTM! Proper per-core work distribution.

The function correctly partitions output channels across cores:

  • Computes per-core channel chunks with remainder handling
  • Early-exits when no channels assigned to a core
  • Adjusts weight and im2col buffer pointers per core

This enables efficient parallel execution on multi-core PULP clusters.


103-145: LGTM! Correct depthwise convolution logic.

The input channel iteration (lines 103-104) correctly computes the range based on output channels assigned to the core, accounting for the depthwise constraint where each input channel produces F_total/C output channels (the "group" parameter in ONNX Conv).

The output channel range computation (lines 118-130) properly clamps to the core's assigned range, ensuring correct work distribution.

Deeploy/Targets/PULPOpen/Bindings.py (3)

151-151: LGTM! Clear separation of DMA and non-DMA slice operations.

The update clarifies the distinction:

  • PULPDMASliceBindings (line 151): Uses DMASliceTemplate for asynchronous DMA-based slicing returning a Future
  • PULPSliceBindings (lines 163-172): Uses SliceTemplate for synchronous float slicing with ForkTransformer

This separation enables appropriate code transformation for each use case.

Also applies to: 163-172


239-245: LGTM! Depthwise convolution bindings correctly configured.

The new PULPFloatDWConv2DBindings:

  • Support all float types via FloatDataTypes iteration
  • Use FloatConvTemplate.referenceDW2DIm2ColTemplate (the new DW im2col template)
  • Apply ForkTransformer for multi-core execution

This aligns with the new PULPFPDWConv2DParser and enables FP depthwise convolution support.


298-302: LGTM! Extended ReduceMean support for mixed float/integer inputs.

The addition supports ReduceMean operations with float data and integer axes/keepdims parameters, using FloatReduceMeanTemplate. This is a natural extension of the existing integer-only ReduceMean bindings.

Deeploy/Targets/PULPOpen/Platform.py (2)

8-9: LGTM! Proper wiring of new depthwise and slice capabilities.

The platform changes correctly integrate the new functionality:

  1. FPDWConv2DMapper (line 77) combines the new parser and bindings for FP depthwise convolution
  2. Conv layer (line 108) now supports both FPConv2DMapper and FPDWConv2DMapper
  3. Slice operation (line 134) supports both regular and DMA-based mappers
  4. All necessary imports added for new parsers, bindings, and optimization passes

This completes the integration stack for FP depthwise convolution on PULP.

Also applies to: 32-32, 35-36, 77-77, 92-94, 108-108, 134-134


234-235: LGTM! RemoveEmptyConvBiasPass addition is appropriate.

Adding RemoveEmptyConvBiasPass to the PULPOptimizer makes sense when introducing bias support. This pass likely removes unnecessary bias parameters when the bias tensor is empty or zero, optimizing the generated code.

The placement after PULPAddRequantMergePass is reasonable in the optimization pass sequence.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

12-27: LGTM! Header declarations consistent with implementations.

The function signatures correctly extend the existing FP32 convolution API:

  1. PULP_Conv2d_fp32_fp32_fp32_HWC (lines 12-18): Added pSrcBias and has_bias parameters
  2. PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 20-27): Added bias parameters and pContextBuffer
  3. PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 29-36): New entry point for depthwise im2col convolution

All declarations match the implementations in Convolution_fp32.c and DWConvolution_fp32.c. Based on learnings.

Also applies to: 29-36

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-34: LGTM! Proper multi-core parallelization added.

The updated function now includes:

  1. Bias parameter support (pSrcBias, has_bias)
  2. Per-core output channel distribution (lines 18-31)
  3. Core-specific weight pointer adjustment (line 34)
  4. Early exit for cores with no assigned channels (lines 29-31)

This enables efficient parallel execution while maintaining correctness. Based on learnings.


66-67: LGTM! Correct per-core output indexing and bias application.

The output indexing (lines 66-67, 96-97, 173, 212-214) correctly accounts for:

  1. Global output index calculation: (h_out * W_out + w_out) * F_total + ...
  2. Core-specific channel offset: ch_out_start + f or just f (depending on loop variable)
  3. Bias application using the global channel index (line 67: pSrcBias[f + ch_out_start], line 175: pSrcBias[f])

This ensures correct results when work is distributed across multiple cores.

Also applies to: 175-175

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)

241-241: Complete the mutable default fix for shape parameter.

The aliases parameter correctly uses None, but shape = [1] is still a mutable default. This was flagged in past reviews and by static analysis.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, aliases: Optional[List[str]] = None):
         self.name: str = name
-        self.shape: Sequence[int] = shape
+        self.shape: Sequence[int] = shape if shape is not None else (1,)

As per past review feedback and static analysis (Ruff B006).


437-439: Fix mutable default arguments in ConstantBuffer.__init__.

Both shape = [1] and values = [0] are mutable defaults, flagged by past reviews and static analysis (Ruff B006).

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], values = [0]):
-        super().__init__(name, shape)
+    def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, values: Union[Sequence, np.ndarray, None] = None):
+        super().__init__(name, shape if shape is not None else (1,))
-        values = np.asarray(values)
+        values = np.asarray(values if values is not None else [0])

As per past review feedback and static analysis.

♻️ Duplicate comments (2)
CHANGELOG.md (1)

23-23: Remove duplicate PR reference.

Line 23 duplicates the PR #105 entry already present at line 18. Remove the redundant entry to maintain changelog clarity.

- Use Docker digests instead of arch-specific tags [#106](https://github.com/pulp-platform/Deeploy/pull/106)
- Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
- Fix `Unsqueeze` Op. when using ONNX opset 13 or higher (from attribute to input) [#119](https://github.com/pulp-platform/Deeploy/pull/119)
Deeploy/DeeployTypes.py (1)

331-350: Critical bug: cannot append to a set.

Line 260 initializes self.aliases as a set, but line 348 calls .append() which doesn't exist on sets. This will raise an AttributeError at runtime.

Apply this diff:

     def add_aliases(self, aliases_to_add: List[str]):
         """
         Adds list of aliases to the aliases attribute.
         Parameters
         ----------
         alias_to_add : List[str]
             List of names of aliases to add to the aliases attribute.
         Returns
         -------
         None
         """
 
         if not hasattr(self, "aliases"):
             return None
 
         for alias in aliases_to_add:
-            if alias not in self.aliases:
-                self.aliases.append(alias)
+            self.aliases.add(alias)
 
         return None
🧹 Nitpick comments (1)
CHANGELOG.md (1)

30-32: Reduce successive FP prefix repetition.

Three consecutive entries (lines 30–32) begin with "FP", which reduces readability. Consider rewording one or more entries to vary the opening phrase.

Example refactors:

  • Line 30: "Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests..."
  • Line 31: "Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations."
  • Line 32: "DW conv lowering optimization pass (FP variant) similar to the existent one for integer version."
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba1e15a and ffe7a27.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (114)
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

558-567: LGTM: n_cores parameter added correctly.

The n_cores parameter is properly added to NetworkContext.__init__ with a default value of 8 and stored as an instance variable. No duplicate assignment.


2479-2519: LGTM: n_cores propagated correctly to NetworkContext.

The n_cores parameter is properly added to NetworkContainer.__init__ and correctly passed through to the NetworkContext constructor at lines 2518-2519.


2679-2685: LGTM: NetworkContext recreation preserves n_cores.

The new NetworkContext is correctly created with n_cores = self.ctxt.n_cores, preserving the core count from the existing context.


3231-3285: LGTM: n_cores propagated through NetworkDeployer.

The n_cores parameter is properly added to NetworkDeployer.__init__, passed to the parent NetworkContainer via super().__init__(), and stored as an instance variable at line 3285.

@diaconuccalin diaconuccalin requested a review from lukamac October 27, 2025 18:58
Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I reviewed the whole PR. There are still some changes I would like to see specifically around the n_cores and some remnant changes around reshape aliasing. I left some comments also in the older threads so check those too. Thanks!

deeployStateDir: str = "DeeployState",
inputOffsets: Dict[str, int] = {}):
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Copy link
Collaborator

@lukamac lukamac Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a universal feature, meaning some targets don't have multiple cores, for example CortexM or Generic. You should add this just to the specific target's deployer, i.e., PULPDeployer

localObjects = {},
name: str = 'DeeployNetwork'):
name: str = 'DeeployNetwork',
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, add this to the target that uses it, not everyone

name: str = 'DeeployNetwork',
deeployStateDir: str = "DeeployState"):
deeployStateDir: str = "DeeployState",
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that you will have to revert 99% of your changes to DeeployTypes.py

operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
im2col_dim = 4 * 8 * (operatorRepresentation['ch_im_in'] * operatorRepresentation['dim_kernel_x'] *
operatorRepresentation['dim_kernel_y'])
im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of getting the n_cores information might be through the operatorRepresentation which you add in the parse step of the PULP deployer. This might be interesting for the future if we ever want to have the number of cores variable per layer. I'm kinda leaning more towards that approach. Let me know what you think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xeratec any opinions on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this information should be added at the operator level. In a future multi-cluster application, the number of cores will only be relevant for executing a kernel, not for the entire network. Therefore, this information should be associated with the engine and reflected in the operator representation.

I’m fine with merging a temporary solution for now. However, the change should only affect the platform currently under consideration and be linked to the DeploymentEngine, where we ultimately want it to reside anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to ReshapeTemplate.py should be unnecessary after #125, no?

Copy link
Contributor Author

@diaconuccalin diaconuccalin Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's actually porting the changes you made in the generic template, since this one is used in the Siracusa tests. Without this, the ReshapeWithSkipConnection test (and also others containing reshape if I remember correctly) would fail on Siracusa.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I change my question to: Why are we not using the generic reshape template in the first place? 😅 could you please try with the generic one and confirm that it's still doing the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra _alias leftover parameter that is still used by the tiler, which breaks if removed. I will remove the unnecessary extra code by inheriting the template from generic and only adding this extra parameter.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)

42-48: Consider removing redundant buffer retrieval.

The parent class _GenericReshapeTemplate.alignToContext() already retrieves and validates these buffers (lines 29-33 in Generic template). Since you call super().alignToContext() on line 40, you can retrieve the buffers directly from the returned ctxt without re-fetching them.

Apply this diff to streamline the code:

-        # Get buffers
-        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
-        assert isinstance(bufferIn, VariableBuffer)
-
-        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
-        assert isinstance(bufferOut, VariableBuffer)
-
+        # Get buffers (already retrieved and validated by parent)
+        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
+        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
+

Or even more concisely, inline them:

-        # Get buffers
-        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
-        assert isinstance(bufferIn, VariableBuffer)
-
-        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
-        assert isinstance(bufferOut, VariableBuffer)
-
         # Linking required for tiling
-        bufferOut._alias = bufferIn.name
+        ctxt.lookup(operatorRepresentation['data_out'])._alias = ctxt.lookup(operatorRepresentation['data_in']).name
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ea12db and 71b7e79.

📒 Files selected for processing (2)
  • Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
Deeploy/DeeployTypes.py (3)
  • NodeTemplate (87-229)
  • lookup (722-754)
  • VariableBuffer (232-360)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
Deeploy/DeeployTypes.py (4)
  • NetworkContext (508-1022)
  • VariableBuffer (232-360)
  • alignToContext (119-139)
  • lookup (722-754)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (2)
  • _GenericReshapeTemplate (10-39)
  • alignToContext (15-39)
🔇 Additional comments (3)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)

10-10: LGTM! Clean refactoring to clarify template hierarchy.

The rename from _ReshapeTemplate to _GenericReshapeTemplate improves clarity and aligns well with the inheritance pattern where the PULP-specific template now extends this base class.

Also applies to: 42-42

Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)

28-32: LGTM! Inheritance approach correctly implemented.

The addition of the VariableBuffer import and inheritance from _GenericReshapeTemplate properly implements the agreed-upon design from the past review discussion.


49-50: LGTM! Tiler-specific aliasing correctly preserved.

The _alias attribute assignment is correctly preserved for tiler compatibility, as discussed in the past review comments. This works alongside the parent's bidirectional aliases set mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Feature Addition of new features

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants