-
Couldn't load subscription status.
- Fork 23
TinyViT on non-tiled Siracusa #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
TinyViT on non-tiled Siracusa #117
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds floating‑point depthwise 2D conv (im2col) with bias, FP slice/reduce-mean bindings and parsers, threads core count (n_cores) through contexts/deployers/tiler/tests/templates, updates FP templates/kernels for bias and per-core im2col sizing, prefixes generated closure names with an underscore, and adds Siracusa CI FP tests. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI/Test
participant Setup as setupDeployer
participant Map as mapDeployer
participant PulpDep as PULPDeployer
participant Context as NetworkContext
CLI->>Setup: setupDeployer(..., n_cores)
Setup->>Map: mapDeployer(..., n_cores)
Map->>PulpDep: PULPDeployer(..., n_cores)
PulpDep->>Context: NetworkContext(..., n_cores)
Context-->>PulpDep: context (n_cores)
sequenceDiagram
participant Parser as PULPFPDWConv2DParser
participant Mapper as FPDWConv2DMapper
participant Binding as PULPFloatDWConv2DBindings
participant Template as PULP2DFloatDWConvIm2ColTemplate
participant Kernel as PULP_DW_Conv2d_Im2Col_fp32
Parser->>Parser: parseNode(2-3 inputs, optional bias)
Parser->>Mapper: create node mapping (with bias flag)
Mapper->>Binding: select binding
Binding->>Template: choose reference template
Template->>Template: computeTransientBuffersSize(ctxt.n_cores)
Template->>Kernel: emit kernel call (pass pContextBuffer per-core)
Kernel->>Kernel: per-core channel slicing + optional bias accumulation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
344-346: Prevent KeyError when 'group' attr is absentDepthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.
- if opNode.attrs['group'] == 1: - return graph + group = opNode.attrs.get('group', 1) + if group == 1: + return graphDeeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)
16-23: Loop index type should match buffer indexingUse unsigned 32-bit for
ito match array indexing and avoid UB on large sizes.(Handled in the diff above.)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct
- Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
- Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
- (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-11: Avoid libm at runtime: use integer log2 for NUM_CORESReplace double
log2()with__builtin_ctz()to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.Apply this diff:
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES); +// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L +_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two"); +#endif +uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);.github/workflows/ci-platform-siracusa.yml (1)
98-98: TinyViT demo test: validate runtime envelope on SiracusaGreat to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)
11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.
Deeploy/DeeployTypes.py (3)
320-324: Prefer None over mutable default and let ctor initializealias_of.Passing
alias_of = []creates a new list here (safe), but useNoneconsistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.- return (cls( - name = node.name, - shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, - alias_of = [], - )) + return cls( + name = node.name, + shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, + alias_of = None, + )
469-474: Avoid sharing alias list between buffers; copy it.Passing
buffer.alias_ofby reference couples future mutations between the source and newConstantBuffer.- ret = cls( + ret = cls( name = buffer.name, shape = buffer.shape, values = values, - alias_of = buffer.alias_of, + alias_of = list(buffer.alias_of), )
1176-1180: Passalias_of=Noneand let the ctor initialize.Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.
- nb = ctxt.VariableBuffer( - name = name, - shape = node.shape, - alias_of = [], - ) + nb = ctxt.VariableBuffer( + name = name, + shape = node.shape, + alias_of = None, + )TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)
137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop
im2col_idx % (P * Q)is redundant. Precompute a per-filter base pointer for tighter inner loops.Apply this diff in both bias and non-bias paths:
- for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { - sum += - im2col_buffer[im2col_idx] * - weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)]; - } + const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q); + for (uint32_t k = 0; k < P * Q; k++) { + sum += im2col_buffer[k] * local_w[k]; + }Also applies to: 236-240
47-48: Remove unused variablekernel_size
kernel_sizeis computed but never used.- uint32_t kernel_size = P * Q * F_total;
61-97: Consider simplifying padding handling with a single p/q nested loopYou can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)
20-27: DocumentpContextBuffersize expectations and bias semanticsThe new signatures look correct. Please add brief comments indicating required
pContextBuffersizes:
- Conv2d_Im2Col: NUM_CORES * (CPQ) floats
- DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
Also note thathas_biasgoverns whetherpSrcBiasis read.Also applies to: 29-36
Deeploy/Targets/PULPOpen/Bindings.py (2)
239-245: Constrain FP DWConv binding to supported float type(s).FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.
Apply this diff:
- ForkTransformer) for float_type in FloatDataTypes + ForkTransformer) for float_type in [float32_t]
299-302: ReduceMean (float): restrict axes integer type to int32.ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.
Apply this diff:
- for integer_type in SignedIntegerDataTypes - for float_type in FloatDataTypes + for integer_type in [int32_t] + for float_type in FloatDataTypesDeeploy/Targets/PULPOpen/Platform.py (1)
108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.
Apply this diff to prioritize DWConv first:
- 'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]), + 'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
42-51: Fix return typing and silence ARG004 (unusedctxt)
- Return type now returns a
strfor size; widen the annotation to reflect this.- Rename
ctxtto_ctxtto address Ruff ARG004.Apply:
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(_ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:
93-93: Typo in comment: “ChannelOout” → “ChannelOut”Minor doc polish.
-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
124-126: Typo in comment: “ChannelOout” → “ChannelOut”Same typo in DW template header.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
-
.github/workflows/ci-platform-siracusa.yml(3 hunks) -
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks) -
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(2 hunks) -
Deeploy/DeeployTypes.py(4 hunks) -
Deeploy/Targets/Generic/Parsers.py(1 hunks) -
Deeploy/Targets/PULPOpen/Bindings.py(5 hunks) -
Deeploy/Targets/PULPOpen/Parsers.py(1 hunks) -
Deeploy/Targets/PULPOpen/Platform.py(7 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(4 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks) -
TargetLibraries/PULPOpen/inc/DeeployPULPMath.h(1 hunks) -
TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks) -
TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks) -
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
add_aliases(326-345)lookup(763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(82-224)computeTransientBuffersSize(137-157)NetworkContext(554-1063)hoistTransientBuffers(160-180)hoistTransientBuffer(864-883)lookup(763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
PULP_Conv2d_fp32_fp32_fp32_HWC(10-102)PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC(104-219)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC(10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t(74-78)int8_t(12-15)int32_t(24-27)uint8_t(36-39)Deeploy/Targets/Generic/TypeCheckers.py (3)
SliceChecker(37-51)ConvChecker(456-474)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(667-679)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser(122-188)Deeploy/DeeployTypes.py (1)
NodeMapper(1695-1838)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2492-2509)lookup(763-795)parseNode(1076-1091)parseNodeCtxt(1094-1119)NetworkContext(554-1063)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1182-1248)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-251)parseNode(291-304)parseNode(326-338)parseNode(375-387)parseNode(422-426)parseNode(449-460)parseNode(484-495)parseNode(531-535)parseNode(551-554)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(253-283)parseNodeCtxt(306-318)parseNodeCtxt(340-367)parseNodeCtxt(389-414)parseNodeCtxt(428-441)parseNodeCtxt(462-476)parseNodeCtxt(497-523)parseNodeCtxt(537-543)parseNodeCtxt(556-573)parseNodeCtxt(586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
43-43: Unused static method argument: ctxt
(ARG004)
Deeploy/DeeployTypes.py
433-433: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
433-433: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
433-433: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
13-14: LGTM: unsigned arithmetic for start/end boundsCasting
${size}touint32_taligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistentlyuint32_t..github/workflows/ci-platform-siracusa.yml (2)
75-75: Reshape + skip connection test: ensure alias_of fix is coveredGood addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.
61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)
27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).
39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:
- enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
- gate the out advance with an explicit data_out_batched flag.
Example if such a flag is available:
- ref_${data_out}_${data_out} += ${M} * ${O}; + % if data_out_batched: + ref_${data_out}_${data_out} += ${M} * ${O}; + % endifDeeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
370-374: Verify FP DW Conv layout: input-only transpose may be insufficientFor FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.
385-403: New PULPDWConvPass wiring looks goodPattern graph + partial replacement is clear and consistent with the pipeline.
404-426: New PULPFPDWConvPass wiring looks goodExtends NHWC conversion to FP DW conv as intended.
505-505: Pass ordering LGTMAdding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
41-43: Confirm context buffer sizing per coreThis implementation expects
pContextBuffersized toNUM_CORES * (P*Q)floats. Please confirm templates/allocations provide that.TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
10-16: LGTM on signature extension and core partitioningBias handling and per-core channel slicing are consistent. Weight and output indexing are correct.
Please ensure all call sites updated to pass
pSrcBias,has_bias, andpContextBuffer(for the Im2Col variant).Also applies to: 18-39
139-176: Good reuse of im2col buffer and contiguous weight blocksThe inner product uses a contiguous
kernel_sizeslice, matching the buffer layout. Looks correct.Also applies to: 179-218
Deeploy/Targets/PULPOpen/Parsers.py (3)
104-116: Bias handling looks correct; maps optional third input and flags has_biasParsing and context mapping are consistent with the new kernels.
Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.
127-159: DW FP2D parser shape/padding checks look reasonableDepthwise gating via inputs count and padding symmetry is fine.
185-186: Depthwise condition check aligns with ONNX semantics
group == ch_im_inis the expected DW condition.Deeploy/Targets/PULPOpen/Bindings.py (2)
143-153: DMA Slice binding switch looks good.Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.
155-165: Float Slice mapping: validate starts/ends/axes/steps typing.Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.
Deeploy/Targets/PULPOpen/Platform.py (4)
77-79: FP DWConv mapper wiring: LGTM.Parser and bindings are correctly introduced and exposed.
92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.
8-9: RemoveEmptyConvBiasPass import: LGTM.Consistent with optimizer usage below.
235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
60-62: Good: transient buffer typed to input elementEnsures the generated C sees the correct pointee type instead of
void. This helps alloc/codegen correctness.
81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWCTargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)
241-241: Fix remaining mutable default arguments.
VariableBuffer.__init__still has mutable defaults (shape = [1],alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples orNone) to follow Python best practices and avoid shared mutable state across calls.Apply this diff:
- def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):The defensive copy on line 260 already handles the
Nonecase correctly, so no further changes are needed there.Based on past review feedback and static analysis (Ruff B006).
🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)
454-454: Optional: Avoid redundant defensive copy.Since
VariableBuffer.__init__already defensively copiesalias_of(line 260), passinglist(alias_of)here results in double copying. You can simplify by passing the list directly:- super().__init__(name, shape, list(alias_of) if alias_of is not None else None) + super().__init__(name, shape, alias_of)This is a minor optimization and doesn't affect correctness.
490-495: Optional: Make alias propagation more explicit.While correct (the constructor defensively copies
alias_of), explicitly copying here would make the intent clearer:ret = cls( name = buffer.name, shape = buffer.shape, values = values, - alias_of = buffer.alias_of, + alias_of = list(buffer.alias_of), # Explicit copy for clarity )This is a minor clarity improvement and doesn't affect correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(4 hunks)Deeploy/DeeployTypes.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)
325-329: LGTM!Correctly initializes
alias_ofwith an explicit empty list for each new instance, avoiding shared mutable state.
362-362: LGTM!Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.
404-406: LGTM!The updates to support
sizeasUnion[int, str]are consistent acrossTransientBuffer.__init__,_bufferRepresentation, andhoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.Also applies to: 431-431, 888-888
1200-1204: LGTM!Correctly initializes non-global outputs with an explicit empty
alias_oflist, maintaining consistency with the alias tracking implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)
241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.
Apply:
- def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None): self.name: str = name #: str: Canonical name that this buffer is registered as in the NetworkContext - self.shape: Sequence[ - int] = shape #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes + self.shape: Sequence[int] = shape #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes @@ - self.alias_of: List[str] = list(alias_of) if alias_of is not None else [] + self.alias_of: List[str] = list(alias_of) if alias_of is not None else []As per Ruff B006.
452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)Use immutable defaults and forward alias_of safely.
Apply:
- def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None): - # Pass a copy of alias_of to avoid shared references - super().__init__(name, shape, list(alias_of) if alias_of is not None else None) + def __init__(self, + name: str = '', + shape: Sequence[int] = (1,), + values: Union[Sequence, np.ndarray] = (0,), + alias_of: Optional[List[str]] = None): + super().__init__(name, shape, alias_of)(values processing via np.asarray below remains correct.)
🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
21-24: Im2Col buffer bytes should derive from input element width (not weight).The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.
Apply:
- im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[ - 'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \ + * ctxt.n_cores \ + * operatorRepresentation['ch_im_in'] \ + * operatorRepresentation['dim_kernel_x'] \ + * operatorRepresentation['dim_kernel_y']
43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.
- Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
- Comment says “value is only used as string”, but an int is produced. Update comment and/or value.
Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]: @@ - # Memory allocation for the im2col buffer can be dynamic, based on the number of cores. - # WARNING: This works because value is only used as string, in the allocate template. - im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8 - ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS. + # Byte size is derived from the buffer's referenced type at allocation/codegen time. + im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.
124-154: DW Im2Col call wiring is consistent; minor docstring typo.The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})Deeploy/DeeployTypes.py (1)
2708-2714: Preserve context name when recreating NetworkContext during parseRecreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.
Apply:
- self.ctxt = NetworkContext( + self.ctxt = NetworkContext( variableBuffer = self.Platform.VariableBuffer, constantBuffer = self.Platform.ConstantBuffer, structBuffer = self.Platform.StructBuffer, transientBuffer = self.Platform.TransientBuffer, - n_cores = self.ctxt.n_cores, + n_cores = self.ctxt.n_cores, + name = self.name, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py(2 hunks)Deeploy/DeeployTypes.py(15 hunks)Deeploy/Targets/PULPOpen/Deployer.py(2 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/TilingExtension/TilerExtension.py(1 hunks)DeeployTest/testMVP.py(4 hunks)DeeployTest/testRunner_tiled_siracusa.py(1 hunks)DeeployTest/testUtils/platformMapping.py(2 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer(3256-3633)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler(29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
TestRunner(279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
DeploymentPlatform(2451-2494)TopologyOptimizer(2249-2278)NetworkDeployer(3256-3633)Deeploy/Targets/PULPOpen/Deployer.py (1)
PULPDeployer(29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(575-1090)hoistTransientBuffers(165-185)hoistTransientBuffer(891-910)lookup(790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py
[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?
[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max
🪛 Ruff (0.13.3)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/DeeployTypes.py
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
81-84: Bias/has_bias plumbing for direct conv looks good.Bias flags are correctly passed to the FP Conv2d HWC call.
100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.
If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.
Deeploy/TilingExtension/TilerExtension.py (1)
987-989: LGTM: propagate core count to contextAssigning ctxt.n_cores ensures downstream kernels see the configured core count.
DeeployTest/testRunner_tiled_siracusa.py (1)
20-26: LGTM: cores threaded into TestRunner and CMakePassing args.cores to TestRunner and exporting NUM_CORES is correct.
Also applies to: 28-28
DeeployTest/testUtils/platformMapping.py (1)
92-104: LGTM: add n_cores and forward to PULPDeployerSignature and forwarding align with deployer changes.
Also applies to: 214-224
DeeployTest/testMVP.py (1)
57-60: LGTM: introduce and propagate n_coresCLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.
Also applies to: 84-93, 201-206, 261-263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
| deeployStateDir: str = "DeeployState", | ||
| inputOffsets: Dict[str, int] = {}): | ||
| inputOffsets: Dict[str, int] = {}, | ||
| n_cores: int = 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add the n_cores argument? We have the N_CORES define passed through the cmake script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a universal feature, meaning some targets don't have multiple cores, for example CortexM or Generic. You should add this just to the specific target's deployer, i.e., PULPDeployer
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)
31-53: Avoid the shared mutable default forinputOffsetsKeeping
inputOffsets = {}means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safeNonedefault and materialize a new dict per instance before callingsuper().-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - inputOffsets = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): + if inputOffsets is None: + inputOffsets = {} + super().__init__( graph = graph, deploymentPlatform = deploymentPlatform, inputTypes = inputTypes, loweringOptimizer = loweringOptimizer,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/DeeployTypes.py(15 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
add_aliases(331-350)get_aliases_of(352-364)lookup(788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2579-2596)lookup(788-820)parseNode(1101-1116)parseNodeCtxt(1119-1144)NetworkContext(574-1088)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1204-1270)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-251)parseNode(291-304)parseNode(326-338)parseNode(375-387)parseNode(422-426)parseNode(449-460)parseNode(484-495)parseNode(531-535)parseNode(551-554)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(253-283)parseNodeCtxt(306-318)parseNodeCtxt(340-367)parseNodeCtxt(389-414)parseNodeCtxt(428-441)parseNodeCtxt(462-476)parseNodeCtxt(497-523)parseNodeCtxt(537-543)parseNodeCtxt(556-573)parseNodeCtxt(586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(4 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
contextagnostic(285-298)ReplaceSequentialPatternPass(265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
250-250: LGTM! Improved group attribute handling.Using
.get('group', 1)properly handles the ONNX Conv default and is more robust than explicit attribute checking.
370-373: LGTM! Correctly handles FP DW Conv input transpose.The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.
380-426: LGTM! Well-structured pass classes with clear separation of concerns.The refactoring of
PULPDWConvPassimproves readability with explicit comments and sections. The newPULPFPDWConvPassclass properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function viapartial().
505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.The placement after
PULPDWConvPassand before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
bf5a286 to
85edf76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)
105-107: Compute NSize based on transA.Current code assumes no transpose; wrong K/N leads to bad tiles.
- NSize = buffA.shape[-1] + NSize = buffA.shape[-1] if transA == 0 else buffA.shape[-2]
215-223: Critical undefined identifiers remain inIntGEMMTileConstraint.serializeTilingSolution.The verification reveals multiple undefined variables that will cause runtime errors:
- Line 132:
BSizeused but never defined (should beBatchSize)- Lines 151, 155, 167, 171, 370, 374, 386, 390, 405, 407:
BatchOffsetused but never defined- Lines 156, 172, 375, 391, 407:
BOffsetused but never definedIn contrast,
FloatGEMMTileConstraint.serializeTilingSolutioncorrectly usesBatchSizeat line 341, suggesting theIntGEMMTileConstraintversion requires similar fixes. The undefinedBatchOffsetandBOffsetoffsets need to be extracted fromcube.offsetsimilar to howMOffsetandOOffsetare extracted at line 119.TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)
22-28: Avoid 16-bit overflow in channel partitioning.Use 32-bit for ch_out_* to support large F_total; identical change recommended in both functions.
Apply:
- uint16_t ch_out_chunk = + uint32_t ch_out_chunk = (F_total >> log2Core) + ((F_total & (NUM_CORES - 1)) != 0); - uint16_t ch_out_start = MIN(ch_out_chunk * core_id, F_total); - uint16_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total); - uint16_t ch_out_count = ch_out_stop - ch_out_start; + uint32_t ch_out_start = MIN(ch_out_chunk * core_id, F_total); + uint32_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total); + uint32_t ch_out_count = ch_out_stop - ch_out_start;Repeat the same replacement in PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC.
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
119-125: Mutable default list for passes.Use None and initialize to avoid shared state across instances.
Apply:
- memoryLevelAnnotationPasses: List[NetworkOptimizationPass] = []): + memoryLevelAnnotationPasses: List[NetworkOptimizationPass] | None = None): @@ - if len(memoryLevelAnnotationPasses) == 0: + if not memoryLevelAnnotationPasses: memoryLevelAnnotationPasses.append(AnnotateDefaultMemoryLevel(self.Platform.memoryHierarchy)) self.memoryLevelAnnotationOptimizer = NetworkOptimizer(memoryLevelAnnotationPasses)Repeat the same pattern in MemoryLevelAwareSignPropDeployer.init and MemoryDeployerWrapper.init.
♻️ Duplicate comments (5)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
25-26: n_cores addition looks reasonable; mutable default still present.The
n_coresparameter addition aligns with the PR's goal of supporting dynamic im2col buffer sizing. However, two observations:
- The mutable default
inputOffsets: Dict[str, int] = {}remains unfixed from the previous review.- The
n_coresparameter lacks documentation and validation (e.g., ensuring it's positive).Consider adding a docstring or inline comment explaining the
n_coresparameter's purpose and optionally validate it:inputOffsets: Dict[str, int] = {}, - n_cores: int = 8): + n_cores: int = 8): # Number of cores for im2col buffer sizing + if n_cores <= 0: + raise ValueError(f"n_cores must be positive, got {n_cores}")For the mutable default issue, refer to the previous review comment suggesting
inputOffsets: Optional[Dict[str, int]] = None.Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
43-51: Fix DW Im2Col buffer sizing: remove typeWidth multiplier and correct return type annotationIn
PULP2DFloatDWConvIm2ColTemplate.computeTransientBuffersSize()(lines 43–51), the size computation multiplies bytypeWidth // 8to produce byte count, buthoistTransientBuffers()(lines 59–61) then assigns the buffer a concrete type. The allocator subsequently multiplies bysizeof(type), causing double-accounting on tight L2 budgets. Additionally, the return type annotation incorrectly declaresstrinstead ofUnion[int, IntVar].Align with the existing
PULP2DFloatConvIm2ColTemplatepattern (lines 18–24): compute element count (no typeWidth division) and let type-setting handle the byte multiplication.- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]: - im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8 - ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
61-97: Critical: signed/unsigned mix in im2col padding loops leads to out‑of‑bounds indexingh_in/w_in are uint32_t while h_in_start/w_in_start are int32_t; expressions like (h_in - h_in_start) and (w_in - w_in_start) perform unsigned arithmetic when mixed, producing incorrect large indices. Use int32_t consistently and cast bounds.
Apply (pattern applies similarly to LEFT/RIGHT blocks):
- for (uint32_t h_in = MAX(H, h_in_start); h_in < h_in_start + P; h_in++) { + for (int32_t h_in = MAX((int32_t)H, h_in_start); h_in < (int32_t)(h_in_start + (int32_t)P); h_in++) { for (int32_t w_in = (int32_t)w_in_start; w_in < (int32_t)(w_in_start + Q); w_in++) { im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f; } } - for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) { + for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) { for (int32_t w_in = (int32_t)w_in_start; w_in < MIN(0, (int32_t)(w_in_start + Q)); w_in++) { im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f; } } - for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) { + for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) { - for (uint32_t w_in = MAX(W, w_in_start); w_in < w_in_start + Q; w_in++) { + for (int32_t w_in = MAX((int32_t)W, w_in_start); w_in < (int32_t)(w_in_start + (int32_t)Q); w_in++) { im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f; } }Repeat the same signed fixes in the no‑bias branch (lines 170‑195).
Deeploy/Targets/PULPOpen/Deployer.py (1)
40-53: Avoid mutable default for inputOffsets.Switch to Optional and init when calling super.
Apply:
-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - deeployStateDir: str = "DeeployStateDir", - inputOffsets = {}, + deeployStateDir: str = "DeeployStateDir", + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): super().__init__( @@ - inputOffsets = inputOffsets, + inputOffsets = inputOffsets if inputOffsets is not None else {}, n_cores = n_cores, )If SignPropDeployer already accepts Optional[Dict[str, int]], you can pass it through directly and drop the or {}.
DeeployTest/testUtils/testRunner.py (1)
349-356: Remove duplicate command construction; keep single source of truth.The first assignment is redundant; append -n_cores via self.n_cores once.
Apply:
- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}" - - command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" + command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" if self._tiling is True: command += f" -n_cores {self.n_cores}" command += f" {self.gen_args}"
🧹 Nitpick comments (15)
CHANGELOG.md (1)
26-32: Vary sentence structure in Added section to improve readability.The static analysis tool flagged repetitive sentence beginnings ("FP PULP DW conv", "PULP FP DW conv", "FP bindings"). While the entries are accurate and comprehensive, consider rephrasing for better variety. For example:
- PULP 2D FP DW conv Im2Col template and kernel, with bias support. - Bias support for PULP 2D FP regular conv Im2Col in template & kernel. - PULP FP DW conv 2D parser. - FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline. - FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass similar to the existent one for integer version. + 2D FP depthwise convolution Im2Col template, kernel, and parser for PULP, with bias support. + Bias support for PULP 2D FP regular convolution Im2Col template and kernel. + FP convolution 2D (regular and depthwise), reshape, skip connection, and TinyViT demo tests in the non-tiled Siracusa CI pipeline. + FP bindings and mappings: PULP slice, depthwise convolution 2D, and reduce mean operations. + Depthwise convolution lowering optimization pass (FP variant) for PULP, analogous to the integer version.This maintains factual accuracy while improving stylistic variety.
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1)
519-526: Tighten schedule pairing with zip(strict=True)Ensure 1:1 correspondence between base input loads and requant cubes.
- } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)] + } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1)
515-517: Add zip(strict=True) to guard schedule alignmentMatch counts between input and requant loads for WMEM DW RQS.
- } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)] + } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1)
295-299: Strengthen pairing with zip(strict=True) in PW RQSGuarantee equal lengths when combining base and requant schedules.
- } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule)] + } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule, strict=True)]Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-14: Avoid FP log2; use bit ops and assert power-of-two NUM_CORESlog2() pulls in libm and risks FP rounding; use CTZ and add a compile-time check. Also removes any need for <math.h>/-lm.
-uint32_t ${nodeName}_core_id = pi_core_id(); -uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES); +uint32_t ${nodeName}_core_id = pi_core_id(); +_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be a power of two"); +const uint32_t ${nodeName}_log2Core = __builtin_ctz(NUM_CORES);Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
93-93: Fix typo in comment: “ChannelOout” → “ChannelOut”-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
125-154: DW Im2Col call: argument order LGTM; minor comment typo-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})Based on learnings.
46-51: Clarify misleading comment about size-as-stringThe value is currently computed as int, not string; please update the comment to avoid confusion.
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
181-185: Use wider unsigned types for tiling sizes for consistency and headroom.Dims rarely fit in int8; GEMM uses uint16_t/uint8_t. Align MatMul for consistency and to avoid overflow.
Apply:
-from Deeploy.CommonExtensions.DataTypes import int8_t +from Deeploy.CommonExtensions.DataTypes import uint8_t, uint16_t ... - replacementTypes = { - "M": PointerClass(int8_t), - "N": PointerClass(int8_t), - "O": PointerClass(int8_t), - "batch": PointerClass(int8_t) - } + replacementTypes = { + "M": PointerClass(uint16_t), + "N": PointerClass(uint16_t), + "O": PointerClass(uint16_t), + "batch": PointerClass(uint8_t), + }If other MatMul code relies on int8_t here, confirm and we can gate this by target or keep uint16_t only for sizes ≥ 256.
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
83-87: Avoid assigning unused variable.
modulusis unused. Either drop assignment or use_ =.- modulus = tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_") + tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")Deeploy/Targets/Generic/Parsers.py (1)
1071-1101: Reshape aliasing looks good; mark constant shape as non-deploy and use cached alias list
- Alias flow is correct and symmetric; iterate the cached list to avoid relying on a second lookup.
- If the shape input is a constant, mark it non‑live/non‑deploy to avoid useless allocations (mirrors Unsqueeze).
Apply near this block:
- new_output_node_aliases = input_node.get_aliases_of() + new_output_node_aliases = list(input_node.get_aliases_of()) new_output_node_aliases.append(input_node.name) @@ - for alias in output_node.get_aliases_of(): + for alias in new_output_node_aliases: alias_node = ctxt.lookup(alias) alias_node.add_aliases(aliases_to_add = [ output_node.name, ]) + + # If shape is a constant tensor, exclude it from deployment + shape_buf = ctxt.lookup(node.inputs[inputs.index("shape")].name) + if hasattr(shape_buf, "values"): + shape_buf._live = False + shape_buf._deploy = FalseTargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)
137-145: Minor: remove redundant modulo and consider precomputing weight baseim2col_idx runs [0, PQ), so % (PQ) is redundant. Also, precompute (f - ch_out_start)PQ outside the inner loop.
- for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { - sum += im2col_buffer[im2col_idx] * - weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)]; - } + uint32_t w_base = (f - ch_out_start) * P * Q; + for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { + sum += im2col_buffer[im2col_idx] * weight_ptr[w_base + im2col_idx]; + }
47-51: Nit: kernel_size is unusedkernel_size is computed but never used; remove to avoid confusion.
Deeploy/DeeployTypes.py (1)
3262-3317: Redundant n_cores field in child; base already stores it.NetworkDeployer saves n_cores; removing the extra self.n_cores avoids duplication.
Apply:
- self.prepared = False - self.n_cores = n_cores + self.prepared = FalseDeeployTest/testMVP.py (1)
57-59: Consider removing the unusedverboseparameter.The
verboseparameter is declared but never used in the function body. While then_coresparameter addition looks good and aligns with the PR objectives, consider removingverboseto clean up the signature.Apply this diff:
def setupDeployer(graph: gs.Graph, memoryHierarchy: MemoryHierarchy, defaultTargetMemoryLevel: MemoryLevel, - defaultIoMemoryLevel: MemoryLevel, verbose: CodeGenVerbosity, args: argparse.Namespace, - n_cores: int) -> Tuple[NetworkDeployer, bool]: + defaultIoMemoryLevel: MemoryLevel, args: argparse.Namespace, + n_cores: int) -> Tuple[NetworkDeployer, bool]:Note: Ensure to update the call site on line 256 as well if you apply this change.
Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
.github/workflows/ci-platform-siracusa.yml(3 hunks)CHANGELOG.md(4 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py(2 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(5 hunks)Deeploy/DeeployTypes.py(14 hunks)Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py(4 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py(1 hunks)Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py(1 hunks)Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/Bindings.py(6 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(7 hunks)Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py(3 hunks)Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py(1 hunks)Deeploy/TilingExtension/TilerExtension.py(1 hunks)DeeployTest/testMVP.py(4 hunks)DeeployTest/testRunner_tiled_siracusa.py(1 hunks)DeeployTest/testUtils/platformMapping.py(2 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks)TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
- Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
- Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
- DeeployTest/testUtils/platformMapping.py
- DeeployTest/testRunner_tiled_siracusa.py
- TargetLibraries/PULPOpen/inc/kernel/Conv.h
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.cTargetLibraries/PULPOpen/src/Convolution_fp32.cDeeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.pyDeeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (14)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(587-592)Deeploy/Targets/PULPOpen/Parsers.py (2)
PULPFPConv2DParser(65-120)PULPFPDWConv2DParser(123-189)Deeploy/DeeployTypes.py (1)
NodeMapper(1736-1892)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
Deeploy/Targets/PULPOpen/Bindings.py (5)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t(74-78)int8_t(12-15)int32_t(24-27)uint8_t(36-39)Deeploy/DeeployTypes.py (3)
CodeTransformation(2366-2400)NodeBinding(1588-1733)NodeTemplate(87-229)Deeploy/FutureExtension/Bindings/AutoFutureBinding.py (1)
AutoFutureBinding(13-62)Deeploy/Targets/Generic/TypeCheckers.py (3)
SliceChecker(37-51)ConvChecker(456-474)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2583-2600)lookup(792-824)parseNode(1105-1120)parseNodeCtxt(1123-1148)NetworkContext(578-1092)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1246-1312)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-243)parseNode(271-293)parseNode(333-351)parseNode(373-385)parseNode(422-434)parseNode(469-473)parseNode(496-507)parseNode(531-542)parseNode(578-582)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(544-570)parseNodeCtxt(584-590)parseNodeCtxt(603-620)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (2)
Deeploy/DeeployTypes.py (1)
lookup(792-824)Deeploy/TilingExtension/TilingCodegen.py (2)
HyperRectangle(24-35)calculateFlatOffsetInBytes(239-242)
Deeploy/DeeployTypes.py (3)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
sizeInBytes(181-182)Deeploy/AbstractDataTypes.py (2)
typeWidth(399-400)Pointer(303-388)DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
lookup(792-824)get_aliases_of(352-364)add_aliases(331-350)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
TilingSchedule(53-122)AbsoluteHyperRectangle(39-49)VariableReplacementScheme(126-158)HyperRectangle(24-35)calculateFlatOffsetInBytes(239-242)Deeploy/TilingExtension/TileConstraint.py (3)
TileConstraint(19-221)serializeTilingSolution(184-221)extractBaseAddr(56-74)Deeploy/DeeployTypes.py (1)
lookup(792-824)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (3)
uint8_t(36-39)uint32_t(48-51)uint16_t(42-45)Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
Conv2DTileConstraint(225-435)computeInputCube(320-347)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
contextagnostic(285-298)SequentialPass(201-210)ReplaceSequentialPatternPass(265-282)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
TilingSchedule(53-122)AbsoluteHyperRectangle(39-49)VariableReplacementScheme(126-158)HyperRectangle(24-35)calculateFlatOffsetInBytes(239-242)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-402)addTensorDimToModel(143-157)getTensorDimVar(131-135)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (3)
uint8_t(36-39)uint32_t(48-51)uint16_t(42-45)Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
Conv2DTileConstraint(225-435)computeInputCube(320-347)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
Deeploy/DeeployTypes.py (5)
_parseNode(2670-2677)ONNXLayer(1895-2223)NetworkContext(578-1092)optimize(2260-2280)optimize(2310-2329)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(578-1092)hoistTransientBuffers(165-185)hoistTransientBuffer(893-912)lookup(792-824)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer(3258-3635)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler(29-33)
🪛 LanguageTool
CHANGELOG.md
[style] ~31-~31: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.1)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py
233-233: Undefined name absoluteCube
(F821)
Deeploy/DeeployTypes.py
401-401: Redefinition of unused sizeInBytes from line 390
(F811)
447-447: Redefinition of unused sizeInBytes from line 444
(F811)
456-456: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
456-456: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py
263-263: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
263-263: Undefined name inputRequantCubes
(F821)
281-281: Local variable dilation is assigned to but never used
Remove assignment to unused variable dilation
(F841)
289-289: Local variable inputChannelVar is assigned to but never used
Remove assignment to unused variable inputChannelVar
(F841)
294-294: Local variable outputChannelVar is assigned to but never used
Remove assignment to unused variable outputChannelVar
(F841)
317-317: Local variable weightBuffer is assigned to but never used
Remove assignment to unused variable weightBuffer
(F841)
320-320: Local variable inputBatchVar is assigned to but never used
Remove assignment to unused variable inputBatchVar
(F841)
325-325: Local variable outputBatchVar is assigned to but never used
Remove assignment to unused variable outputBatchVar
(F841)
326-326: Local variable outputHeightVar is assigned to but never used
Remove assignment to unused variable outputHeightVar
(F841)
327-327: Local variable outputWidthVar is assigned to but never used
Remove assignment to unused variable outputWidthVar
(F841)
328-328: Local variable outputChannelVar is assigned to but never used
Remove assignment to unused variable outputChannelVar
(F841)
331-331: Local variable padding is assigned to but never used
Remove assignment to unused variable padding
(F841)
413-413: Unpacked variable BatchOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
413-413: Unpacked variable HOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
413-413: Unpacked variable WOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
413-413: Unpacked variable COffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
414-414: Unpacked variable BatchSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
426-426: Unpacked variable inBSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
426-426: Unpacked variable inHSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
522-522: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
122-122: Undefined name BSize
(F821)
133-133: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *AMatrixOffsets)
(RUF005)
133-133: Undefined name BatchOffset
(F821)
133-133: Consider (BatchSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, *AMatrixShape)
(RUF005)
136-139: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *AMatrixOffsets)
(RUF005)
137-137: Undefined name BatchOffset
(F821)
138-138: Undefined name BOffset
(F821)
140-143: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *AMatrixShape)
(RUF005)
142-142: Undefined name BSize
(F821)
149-149: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *BMatrixOffsets)
(RUF005)
149-149: Undefined name BatchOffset
(F821)
149-149: Consider (BatchSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, *BMatrixShape)
(RUF005)
152-155: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *BMatrixOffsets)
(RUF005)
153-153: Undefined name BatchOffset
(F821)
154-154: Undefined name BOffset
(F821)
156-159: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *BMatrixShape)
(RUF005)
158-158: Undefined name BSize
(F821)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
305-306: Expected an indented block after if statement
(invalid-syntax)
310-310: Expected an indented block after if statement
(invalid-syntax)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py
132-132: Undefined name BSize
(F821)
151-151: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *AMatrixOffsets)
(RUF005)
151-151: Undefined name BatchOffset
(F821)
151-151: Consider (BatchSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, *AMatrixShape)
(RUF005)
154-157: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *AMatrixOffsets)
(RUF005)
155-155: Undefined name BatchOffset
(F821)
156-156: Undefined name BOffset
(F821)
158-161: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *AMatrixShape)
(RUF005)
160-160: Undefined name BSize
(F821)
167-167: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *BMatrixOffsets)
(RUF005)
167-167: Undefined name BatchOffset
(F821)
167-167: Consider (BatchSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, *BMatrixShape)
(RUF005)
170-173: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *BMatrixOffsets)
(RUF005)
171-171: Undefined name BatchOffset
(F821)
172-172: Undefined name BOffset
(F821)
174-177: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *BMatrixShape)
(RUF005)
176-176: Undefined name BSize
(F821)
370-370: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *AMatrixOffsets)
(RUF005)
370-370: Undefined name BatchOffset
(F821)
370-370: Consider (BatchSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, *AMatrixShape)
(RUF005)
373-376: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *AMatrixOffsets)
(RUF005)
374-374: Undefined name BatchOffset
(F821)
375-375: Undefined name BOffset
(F821)
377-380: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *AMatrixShape)
(RUF005)
379-379: Undefined name BSize
(F821)
386-386: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *BMatrixOffsets)
(RUF005)
386-386: Undefined name BatchOffset
(F821)
386-386: Consider (BatchSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, *BMatrixShape)
(RUF005)
389-392: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *BMatrixOffsets)
(RUF005)
390-390: Undefined name BatchOffset
(F821)
391-391: Undefined name BOffset
(F821)
393-396: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *BMatrixShape)
(RUF005)
395-395: Undefined name BSize
(F821)
405-405: Consider (BatchOffset, *CMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *CMatrixOffsets)
(RUF005)
405-405: Undefined name BatchOffset
(F821)
405-405: Consider (BatchSize, *CMatrixShape) instead of concatenation
Replace with (BatchSize, *CMatrixShape)
(RUF005)
407-407: Consider (BatchOffset, BOffset, *CMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *CMatrixOffsets)
(RUF005)
407-407: Undefined name BatchOffset
(F821)
407-407: Undefined name BOffset
(F821)
407-407: Consider (BatchSize, BSize, *CMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *CMatrixShape)
(RUF005)
407-407: Undefined name BSize
(F821)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py
259-259: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
259-259: Undefined name inputRequantCubes
(F821)
277-277: Local variable dilation is assigned to but never used
Remove assignment to unused variable dilation
(F841)
314-314: Local variable weightBuffer is assigned to but never used
Remove assignment to unused variable weightBuffer
(F841)
317-317: Local variable inputBatchVar is assigned to but never used
Remove assignment to unused variable inputBatchVar
(F841)
320-320: Local variable inputChannelVar is assigned to but never used
Remove assignment to unused variable inputChannelVar
(F841)
322-322: Local variable outputBatchVar is assigned to but never used
Remove assignment to unused variable outputBatchVar
(F841)
323-323: Local variable outputHeightVar is assigned to but never used
Remove assignment to unused variable outputHeightVar
(F841)
324-324: Local variable outputWidthVar is assigned to but never used
Remove assignment to unused variable outputWidthVar
(F841)
325-325: Local variable outputChannelVar is assigned to but never used
Remove assignment to unused variable outputChannelVar
(F841)
328-328: Local variable padding is assigned to but never used
Remove assignment to unused variable padding
(F841)
408-408: Unpacked variable BatchOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
408-408: Unpacked variable HOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
408-408: Unpacked variable WOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
408-408: Unpacked variable COffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
409-409: Unpacked variable BatchSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
421-421: Unpacked variable inBSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
421-421: Unpacked variable inHSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
517-517: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py
131-131: Undefined name ONNXLayer
(F821)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
🔇 Additional comments (29)
.github/workflows/ci-platform-siracusa.yml (1)
57-75: Test additions well-integrated and properly organized.The new kernel tests (lines 57–63 for Float2D convolutions with and without DW variants, line 75 for reshape with skip connection) and the model test (line 99 for TinyViT demo) follow established naming conventions, are logically grouped by operation type, and align with the PR's stated objectives to add FP test coverage.
Based on the established pattern (documented in learnings), these test names map to directories in
DeeployTest/Tests/, and the prior review confirmed these directories exist and tests pass successfully.Also applies to: 99-99
CHANGELOG.md (1)
100-103: Approve changelog entries for bug fixes.The fixed section comprehensively documents the four key bug fixes (alias_of parameter handling, non-batched FP GEMM/matmul, closure naming, and FP add/mul data types) in alignment with the PR objectives. The descriptions are clear and concise.
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
35-35: LGTM!The storage of
n_coresis straightforward and correct.Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
127-131: Nice performance improvement.The refactored implementation using iterative construction is O(n) instead of O(n²) with repeated
.index()calls, and the logic is clearer.
338-383: New DW convolution passes look well-structured.The
NCHWtoNHWCDwConvPassandPULPFPDWConvPassclasses follow the established pattern-based pass architecture correctly. However, they depend on_PULPDWNCHWtoNHWC_fun, which has a syntax error that must be fixed first.Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
13-14: LGTM: MIN casts prevent signed/unsigned mixupsCasting ${size} to uint32_t here is correct and avoids UB on MIN comparisons.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
100-117: Im2Col call argument order looks correctX/Y, kernel, and stride ordering matches the expected implementation for these templates.
Based on learnings.
21-24: Conv2D Im2Col scratch sizing formula is correctThe formula matches the C kernel's
pContextBufferrequirement. The buffer is correctly sized as(typeWidth // 8) * n_cores * ch_im_in * kx * kybytes, which corresponds to the kernel logic where each core getsch_im_in * kernel_x * kernel_yfloat elements, split acrossn_coreswith pointer arithmetic advancing by element count. For fp32 (typeWidth=32), this yields the exact buffer needed:n_cores * ch_im_in * kx * ky * 4bytes.Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
164-173: Review comment is incorrect: prior cleanup was not completed.The first BCube construction block (lines 146-160) still exists as dead code. It is never appended to
inputBCubes—only the second BCube constructed at line 172 is appended (line 173). The review comment incorrectly asserts that "earlier deleted branch already tried to build BCube," but this branch was never deleted. While lines 164-173 correctly represent the single active path, the orphaned BCube construction at lines 146-160 contradicts the cleanup claim.Likely an incorrect or invalid review comment.
Deeploy/Targets/PULPOpen/Parsers.py (3)
80-82: FP Conv2D: bias arity check LGTMAllowing 2 or 3 inputs at parse time is correct and aligns with bias support.
105-117: FP Conv2D: bias flags wiring is consistentSetting has_bias ("true"/"false") and mapping bias when present matches existing templates.
Please confirm FloatConvTemplate.reference2DIm2ColTemplate expects:
- keys: data_in, weight, data_out, has_bias, bias.
123-190: Remove suggestion to validate DW early in parseNode; current design is sound.The review comment suggests early depthwise validation in
parseNode, but this is architecturally infeasible. TheparseNodemethod receives only the node (without context), soch_im_in—needed to verifygroup == ch_im_in—is unavailable. Input channel dimensions are only accessible inparseNodeCtxtviactxt.lookup(), which is where the DW check correctly occurs (line ~167).The mention of "F_total % C == 1" references an undefined formula not present anywhere in the codebase.
The current two-stage design (attribute validation in
parseNode, context validation inparseNodeCtxt) correctly reflects the available information at each stage and is the proper approach.Likely an incorrect or invalid review comment.
Deeploy/Targets/PULPOpen/Bindings.py (5)
12-19: Import expansion LGTMFloat data types and new templates are correctly imported for the added FP bindings.
151-152: DMA Slice: template swap LGTMSwitch to DMASliceTemplate.referenceTemplate is consistent with DMA path.
163-172: Slice bindings for FP types: OKSliceChecker + SliceTemplate with ForkTransformer over FloatDataTypes matches the Float path.
299-303: Float ReduceMean bindings: OKMixed float + integer axes signature bound to FloatReduceMeanTemplate with ClusterTransformer is consistent with parser behavior for FP opset.
239-245: FP DW Conv2D binding confirmed: template keys match parserVerified that
PULPFPDWConv2DParser.parseNodeCtxt()produces bothhas_biasandbiaskeys inoperatorRepresentation(Parsers.py:175–176, 179), which are correctly expected byFloatConvTemplate.referenceDW2DIm2ColTemplate(FloatConvTemplate.py:81, 110, 142). The binding is correct.Deeploy/TilingExtension/TilerExtension.py (1)
988-989: Review comment is incorrect — no bug existsNetworkContext.copy() uses
copy.copy(self), which is a shallow copy that preserves instance attributes liken_cores(they remain in the object's__dict__). _convertCtxtToStaticSchedule returns the same ctxt instance modified in-place, son_coresnaturally survives. Line 988's reassignment is defensive programming but not required for correctness.TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)
36-39: Check argument order consistency at call sites.Kernels accept (H, W, P, Q, SP, SQ); FloatConvTemplate uses X,Y ordering per prior notes. Please confirm bindings pass the correct order to avoid transposed dims.
Based on learnings
Also applies to: 134-138
Deeploy/DeeployTypes.py (1)
1205-1210: Constructor argument mismatch in parseOutputs.Use aliases=[] instead of alias_of=[] to match VariableBuffer.init.
Apply:
- nb = ctxt.VariableBuffer( - name = name, - shape = node.shape, - alias_of = [], - ) + nb = ctxt.VariableBuffer( + name = name, + shape = node.shape, + aliases = [], + )Likely an incorrect or invalid review comment.
DeeployTest/testMVP.py (3)
84-92: LGTM!The
n_coresparameter is correctly propagated tomapDeployer, and the formatting is clean and consistent.
201-205: LGTM!The CLI argument is well-defined with a sensible default and clear help text. The parameter correctly flows through to the deployment pipeline.
256-262: LGTM!The
n_coresparameter is correctly passed from CLI arguments tosetupDeployer, completing the propagation chain through the deployment pipeline.Deeploy/Targets/PULPOpen/Platform.py (6)
8-9: LGTM!All new imports are properly structured and used within the file. The additions support FP depthwise convolution, slice operations, and the new optimization pass.
Also applies to: 32-32, 35-36
77-77: LGTM!The
FPDWConv2DMapperdefinition follows the established pattern and correctly pairs the FP depthwise conv parser with its bindings.
92-94: LGTM!Both slice mappers correctly use
SliceParserwith different bindings to support DMA-based and software-based slice implementations. The mapper defined first in the mapping list (line 134) will have precedence during binding selection.
234-235: LGTM!The
RemoveEmptyConvBiasPassis correctly positioned near the end of the optimization pipeline, after merge and requantization passes. This placement allows the pass to clean up any empty bias parameters that may have been introduced or remain after earlier transformations.
134-134: No action needed; original review comment is incorrect.The mapper order does not affect precedence.
PULPDMASliceBindingsoperates onIntegerDataTypeswhilePULPSliceBindingsoperates onFloatDataTypes—these are mutually exclusive sets. TheSliceCheckertype constraints determine which binding applies, not the array order. The current ordering[SliceMapper, DMASliceMapper]is safe.Likely an incorrect or invalid review comment.
108-108: Based on my verification of the parser constraint logic:FPConv2DMapper matches regular convolutions exclusively:
PULPFPConv2DParser.parseNode()explicitly requiresself.operatorRepresentation['group'] == 1- This constraint rejects any depthwise convolution (where
group == ch_im_in)FPDWConv2DMapper matches depthwise convolutions:
PULPFPDWConv2DParser.parseNode()does not have a group==1 constraint- Depthwise validation occurs in
parseNodeCtxt()at line 186:if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']The mapper ordering
[FPConv2DMapper, FPDWConv2DMapper]is correct: the more constrained parser (requiringgroup==1) is tried first, ensuring regular convolutions are matched before attempting depthwise matching. Any regular convolution caught by FPConv2DMapper will not reach FPDWConv2DMapper.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
Outdated
Show resolved
Hide resolved
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
Outdated
Show resolved
Hide resolved
…ed closure naming issue
…ffer size computation. Added fixes to remove warnings
…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa
…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings
…lice and reduce mean bindings for Siracusa.
* Use [reuse-tool](https://github.com/fsfe/reuse-tool) for checking license headers Added a wrapper python script to filter the files so we can manually skip what we don't want reuse to check. Otherwise it complains about every single file. * Replace LICENSE file with LICENSES directory and Apache-2.0.txt file Reuse explicitly checks for licenses in the LICENSES directory and doesn't look for it in the LICENSE file. * Add MIT license for the scripts/run_clang_format.py * Add missing license headers * Configure and add CC BY-NC 4.0 license * Update Readme * Call reuse via python if not in PATH * Ignore "TEST_*" directories * Update CONTRIBUTING.md --------- Co-authored-by: Philip Wiese <[email protected]>
This pull request introduces several improvements and refactors across the codebase, focusing on enhanced logging, memory tracking, type checking, and developer usability. The most significant changes include replacing print statements with structured logging, adding detailed memory usage tracking for buffers, improving type and level checking for data types, and providing more informative string representations and debugging messages for core classes. All runners and the generation scripts support different verbosity levels: - `-v`: Show errors, warning and info messages - `-vv`: Show errors, warnings, info and debug messages - `-vvv`: Maximum verbosity with information about where the message was emitted. Also enable verbose compilation during the build step for the test runners. - Logging and Error Handling Improvements: Replaced all `print` statements with structured logging using `DEFAULT_LOGGER` throughout the codebase, improving consistency and enabling better log management. - Added debug-level logs to type checking and parsing routines to indicate success/failure and binding exhaustion, aiding troubleshooting. - Implemented `__repr__` methods for core classes (`NodeTemplate`, `NodeBinding`, `NodeMapper`) to provide more informative string representations for debugging and logging - Added `checkNumLevels` methods to relevant data type classes to validate that the number of levels assigned to buffers is supported by their type, and integrated warning logs if mismatches are detected during type inference. - Introduced a `sizeInBytes` method to `VariableBuffer`, `TransientBuffer`, and related classes to standardize buffer size calculation. - Report the maximum static and dynamic memory usage for all memory level: - Added dynamic and maximum memory usage tracking to `NetworkContext` via `_dynamicSize` and `_maxDynamicSize` dictionaries, and updated memory allocation/deallocation logic in `MemoryAllocation.py` to maintain these statistics per memory level. (This is only used for untitled platforms to track the maximum buffer size.) - Exposed new summary methods (`_printMemorySummary`, `_printInputOutputSummary`) in deployer wrappers and implemented input/output summary logging in `SignPropDeployer`. - Deployer workflow now uses `prepare(...)` instead of `generateFunction(...)`.
98d51f7 to
945b610
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)
241-245: Replace mutable default (shape=[1]) with immutable and type-annotate.Avoid shared mutable defaults.
- def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), aliases: Optional[List[str]] = None):
379-388: Fix visited set creation in has_live_ancestors (bug).set(self.name) splits into characters; use a singleton set.
- visited = set(self.name) + visited = {self.name}
♻️ Duplicate comments (6)
CHANGELOG.md (1)
23-23: Remove duplicate PR entry (#105).The “Refactor tiling code generation [#105]” line is duplicated. Remove the second occurrence to avoid confusion.
- - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
25-26: Avoid mutable default for inputOffsets.Use Optional and initialize inside to prevent shared state.
- inputOffsets: Dict[str, int] = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): @@ - if inputOffsets == {}: - for key in inputTypes.keys(): - inputOffsets[key] = 0 + if inputOffsets is None: + inputOffsets = {key: 0 for key in inputTypes.keys()}Deeploy/Targets/PULPOpen/Deployer.py (1)
40-53: Avoid mutable default for inputOffsets; forward Optional to base.Align with SignPropDeployer’s fix.
-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - inputOffsets = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): @@ - inputOffsets = inputOffsets, + inputOffsets = inputOffsets, n_cores = n_cores,Deeploy/DeeployTypes.py (3)
325-329: Constructor arg mismatch: use ‘aliases’, not ‘alias_of’.This raises a TypeError at runtime.
- return (cls( - name = node.name, - shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, - alias_of = [], - )) + return cls( + name = node.name, + shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, + aliases = [], + )
442-447: Avoid mutable defaults in ConstantBuffer.init.Use immutable defaults and convert.
- def __init__(self, name: str = '', shape = [1], values = [0]): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), values: Union[Sequence, np.ndarray] = (0,)): # Pass a copy of alias_of to avoid shared references super().__init__(name, shape) - values = np.asarray(values) + values = np.asarray(values)
331-365: Align alias helpers with self.aliases (set) and remove alias_of usage.Current code references a non-existent attribute and never updates aliases.
- def add_aliases(self, aliases_to_add: List[str]): + def add_aliases(self, aliases_to_add: List[str]): @@ - if not hasattr(self, "alias_of"): - return None - - for alias in aliases_to_add: - if alias not in self.alias_of: - self.alias_of.append(alias) + if not hasattr(self, "aliases") or self.aliases is None: + self.aliases = set() + for alias in aliases_to_add: + self.aliases.add(alias) return None @@ - def get_aliases_of(self): + def get_aliases_of(self) -> List[str]: @@ - if hasattr(self, "alias_of"): - return list(self.alias_of) - else: - return list() + return list(self.aliases) if hasattr(self, "aliases") and self.aliases is not None else []
🧹 Nitpick comments (14)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)
158-159: Consider defensive handling for edge cases in name prefixing.The underscore prefix addresses names starting with numbers as intended. However, the implementation doesn't guard against:
- Empty or None
namevalues- Names already starting with
_(which would produce__, a reserved identifier)Consider adding a check:
- # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers) - self.closureName = "_" + name + self.closureSuffix + # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers) + # Avoid creating double-underscore if name already starts with underscore + prefix = "" if name and name[0] == "_" else "_" + self.closureName = prefix + name + self.closureSuffixDeeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)
11-21: LGTM! Correct fix for non-batched input handling.The conditional logic properly addresses the bug by only advancing input pointers when the corresponding input is actually batched. When
A_batchedorB_batchedis false, the base pointer is reused across all batch iterations, which correctly implements broadcasting semantics.The pointer arithmetic for the batched case is correct:
- A offset:
b * M * Nfor shape[batch, M, N]- B offset:
b * N * Ofor shape[batch, N, O]Minor: Line 17 has trailing whitespace after the colon that could be cleaned up.
- % if B_batched: + % if B_batched:Deeploy/Targets/Generic/Parsers.py (1)
1071-1074: Consider addingstrict=Truetozip()calls for defensive programming.The
zip()calls iterate over inputs and outputs with expected fixed-length lists. WhileparseNodealready validates these lengths, addingstrict=True(Python 3.10+) provides an explicit safety guarantee and catches bugs if the validation logic changes.Apply this diff if Python 3.10+ is used:
- for tensor, symName in zip(node.inputs, ['data_in', 'shape']): + for tensor, symName in zip(node.inputs, ['data_in', 'shape'], strict=True): self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name - for tensor, symName in zip(node.outputs, ['data_out']): + for tensor, symName in zip(node.outputs, ['data_out'], strict=True): self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).nameBased on static analysis hints.
Deeploy/TilingExtension/TilerExtension.py (1)
988-989: Propagate n_cores earlier (and guard against missing attribute).Set ctxt.n_cores when the context is created (or before tiling) to avoid surprises, and guard for attribute presence to prevent AttributeError if wrappers differ.
Example:
- self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap) - self.ctxt.n_cores = self.n_cores + self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap) + if hasattr(self, "n_cores"): + self.ctxt.n_cores = self.n_coresDeeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
35-35: Drop redundant self.n_cores assignment.NetworkDeployer already stores n_cores; remove this duplicate to keep a single source of truth.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)
43-51: Fix return type annotation to match base (Union[int, IntVar]).The function returns an int, but is annotated as str. Align with NodeTemplate’s signature.
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize( + ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation, + ) -> List[Tuple[str, Union[int, IntVar]]]:
46-51: Update stale comment about using size as string.The returned size is numeric and also passed to hoistTransientBuffer; the “used as string” warning is misleading. Remove or clarify it.
59-65: Prefer annotateType over mutating referencedType.Setting
_type.referencedTypemutates the class and can have side effects. Use annotateType to set a precise Pointer type and instance.- # Manually set the type of the im2col buffer to match the input type, since it defaults to void for transient buffers - ctxt.lookup(im2col_name)._type.referencedType = ctxt.lookup( - operatorRepresentation['data_in'])._type.referencedType + # Set the buffer pointer type to match data_in + ctxt.annotateType(im2col_name, ctxt.lookup(operatorRepresentation['data_in'])._type)
93-101: Nit: fix comment typos (“ChannelOout” → “ChannelOut”).Purely cosmetic; improves readability.
Also applies to: 124-131
Deeploy/DeeployTypes.py (1)
573-576: Avoid mutable dict defaults in NetworkContext.init.Initialize dicts inside to prevent shared state.
- globalObjects = {}, - localObjects = {}, + globalObjects: Optional[OrderedDict] = None, + localObjects: Optional[OrderedDict] = None, @@ - self.globalObjects = OrderedDict() - self.localObjects = OrderedDict() + self.globalObjects = OrderedDict() if globalObjects is None else globalObjects + self.localObjects = OrderedDict() if localObjects is None else localObjectsDeeployTest/testUtils/platformMapping.py (2)
92-104: Use int for n_cores (not Optional) since default is numeric.- n_cores: Optional[int] = 8, + n_cores: int = 8,
114-188: Propagate n_cores to all SignProp-based deployers for consistency.CMSIS, MemPool, SoftHier, Generic deployers inherit SignPropDeployer and accept n_cores. Pass it through so CLI behavior is uniform.
Example for each block:
- deployer = CMSISDeployer(..., - deeployStateDir = deeployStateDir, - inputOffsets = inputOffsets) + deployer = CMSISDeployer(..., + deeployStateDir = deeployStateDir, + inputOffsets = inputOffsets, + n_cores = n_cores)Apply similarly to MemPoolDeployer, SoftHierDeployer, GenericDeployer.
Also applies to: 140-188
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
50-148: Code duplication acceptable for performance.The has_bias and no-bias paths are duplicated (lines 50-148 vs 149-248), which increases maintainability burden but avoids branching inside tight loops. This trade-off is reasonable for performance-critical embedded kernels.
Consider adding a comment noting this is intentional for performance if future maintainers question the duplication.
TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)
41-101: Code duplication acceptable for performance.The has_bias and no-bias code paths are duplicated to avoid conditional branching inside the innermost loops (lines 47-64). This is a common optimization in performance-critical embedded code where avoiding branches can significantly improve execution speed.
Similar duplication exists in the im2col path (lines 140-218). Consider adding a brief comment explaining this trade-off for future maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
.github/workflows/ci-platform-siracusa.yml(3 hunks)CHANGELOG.md(4 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py(2 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(1 hunks)Deeploy/DeeployTypes.py(12 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)Deeploy/Targets/PULPOpen/Bindings.py(6 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(7 hunks)Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks)Deeploy/TilingExtension/TilerExtension.py(1 hunks)DeeployTest/testMVP.py(4 hunks)DeeployTest/testRunner_tiled_siracusa.py(1 hunks)DeeployTest/testUtils/platformMapping.py(2 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks)TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
- Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
- Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
- DeeployTest/testUtils/testRunner.py
- Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
- DeeployTest/testRunner_tiled_siracusa.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.cDeeploy/Targets/PULPOpen/Templates/FloatConvTemplate.pyTargetLibraries/PULPOpen/inc/kernel/Conv.hTargetLibraries/PULPOpen/src/Convolution_fp32.c
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (9)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
inputs(2569-2586)lookup(778-810)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2569-2586)lookup(778-810)parseNode(1091-1106)parseNodeCtxt(1109-1134)NetworkContext(564-1078)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1220-1286)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-243)parseNode(271-293)parseNode(333-351)parseNode(373-385)parseNode(422-434)parseNode(469-473)parseNode(496-507)parseNode(531-542)parseNode(578-582)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(544-570)parseNodeCtxt(584-590)parseNodeCtxt(603-620)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer(3244-3629)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler(29-33)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(564-1078)hoistTransientBuffers(165-185)hoistTransientBuffer(879-898)lookup(778-810)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t(74-78)int8_t(12-15)int32_t(24-27)uint8_t(36-39)Deeploy/DeeployTypes.py (3)
CodeTransformation(2352-2386)NodeBinding(1574-1719)NodeTemplate(87-229)Deeploy/Targets/Generic/TypeCheckers.py (2)
SliceChecker(37-51)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
DeploymentPlatform(2439-2482)TopologyOptimizer(2237-2266)NetworkDeployer(3244-3629)Deeploy/Targets/PULPOpen/Deployer.py (1)
PULPDeployer(29-127)
Deeploy/DeeployTypes.py (2)
DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)Deeploy/AbstractDataTypes.py (1)
Pointer(303-388)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
PULP_Conv2d_fp32_fp32_fp32_HWC(10-102)PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC(104-219)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC(10-251)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(519-524)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser(124-190)Deeploy/DeeployTypes.py (1)
NodeMapper(1722-1878)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
🪛 LanguageTool
CHANGELOG.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.1)
Deeploy/Targets/Generic/Parsers.py
1071-1071: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
1073-1073: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🔇 Additional comments (18)
.github/workflows/ci-platform-siracusa.yml (1)
57-64: CI test coverage additions align well with PR features.The new tests for FP 2D convolution (standard and depthwise variants with/without bias), reshape with skip connection, and TinyViT demo align cleanly with the PR's feature additions. Test naming is consistent with existing patterns, and per the earlier review resolution, these tests exist as directories in
DeeployTest/Tests/. The additions integrate smoothly into the existing kernel and model test lists without logic changes or structural issues.Also applies to: 75-75, 99-99
Deeploy/Targets/Generic/Parsers.py (1)
1070-1076: LGTM! Clear and Pythonic mapping implementation.The explicit mapping of inputs to
['data_in', 'shape']and outputs to['data_out']improves clarity and follows the pattern established by other parsers in the codebase (e.g.,iRMSNormParserlines 65-71). The use ofzip()is more Pythonic than index-based access.Deeploy/DeeployTypes.py (2)
879-899: Ensure hoistTransientBuffer size is always int.If any templates pass symbolic strings, they’ll break here. Confirm all computeTransientBuffersSize implementations return int/IntVar as per signature.
2696-2702: Good: preserve n_cores on context reinit.Reconstructing NetworkContext with n_cores=self.ctxt.n_cores avoids losing state on parse.
Deeploy/Targets/PULPOpen/Parsers.py (3)
80-81: LGTM! Bias support correctly integrated.The relaxed input count check now allows optional bias (2 inputs without bias, 3 inputs with bias), which aligns with the bias handling added in parseNodeCtxt.
106-118: LGTM! Clean bias handling implementation.The conditional bias handling correctly maps inputs based on the node's input count, setting appropriate flags for downstream code generation.
124-191: LGTM! Well-structured depthwise convolution parser.The parser correctly validates depthwise convolution constraints:
- Square padding requirement
- Optional bias support (2 or 3 inputs)
- Depthwise verification via
group == ch_im_incheck (line 187)The structure mirrors the regular FP Conv2D parser while adding depthwise-specific validation.
DeeployTest/testMVP.py (1)
57-59: LGTM! Clean n_cores parameter propagation.The n_cores parameter is consistently threaded through the deployment setup:
- Added to setupDeployer signature
- Passed to mapDeployer
- Exposed via CLI argument (default: 8)
- Propagated from parsed arguments
This enables proper sizing of per-core buffers (e.g., im2col buffers) in multi-core PULP deployments.
Also applies to: 84-92, 201-205, 261-262
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)
10-36: LGTM! Proper per-core work distribution.The function correctly partitions output channels across cores:
- Computes per-core channel chunks with remainder handling
- Early-exits when no channels assigned to a core
- Adjusts weight and im2col buffer pointers per core
This enables efficient parallel execution on multi-core PULP clusters.
103-145: LGTM! Correct depthwise convolution logic.The input channel iteration (lines 103-104) correctly computes the range based on output channels assigned to the core, accounting for the depthwise constraint where each input channel produces F_total/C output channels (the "group" parameter in ONNX Conv).
The output channel range computation (lines 118-130) properly clamps to the core's assigned range, ensuring correct work distribution.
Deeploy/Targets/PULPOpen/Bindings.py (3)
151-151: LGTM! Clear separation of DMA and non-DMA slice operations.The update clarifies the distinction:
- PULPDMASliceBindings (line 151): Uses DMASliceTemplate for asynchronous DMA-based slicing returning a Future
- PULPSliceBindings (lines 163-172): Uses SliceTemplate for synchronous float slicing with ForkTransformer
This separation enables appropriate code transformation for each use case.
Also applies to: 163-172
239-245: LGTM! Depthwise convolution bindings correctly configured.The new PULPFloatDWConv2DBindings:
- Support all float types via FloatDataTypes iteration
- Use FloatConvTemplate.referenceDW2DIm2ColTemplate (the new DW im2col template)
- Apply ForkTransformer for multi-core execution
This aligns with the new PULPFPDWConv2DParser and enables FP depthwise convolution support.
298-302: LGTM! Extended ReduceMean support for mixed float/integer inputs.The addition supports ReduceMean operations with float data and integer axes/keepdims parameters, using FloatReduceMeanTemplate. This is a natural extension of the existing integer-only ReduceMean bindings.
Deeploy/Targets/PULPOpen/Platform.py (2)
8-9: LGTM! Proper wiring of new depthwise and slice capabilities.The platform changes correctly integrate the new functionality:
- FPDWConv2DMapper (line 77) combines the new parser and bindings for FP depthwise convolution
- Conv layer (line 108) now supports both FPConv2DMapper and FPDWConv2DMapper
- Slice operation (line 134) supports both regular and DMA-based mappers
- All necessary imports added for new parsers, bindings, and optimization passes
This completes the integration stack for FP depthwise convolution on PULP.
Also applies to: 32-32, 35-36, 77-77, 92-94, 108-108, 134-134
234-235: LGTM! RemoveEmptyConvBiasPass addition is appropriate.Adding RemoveEmptyConvBiasPass to the PULPOptimizer makes sense when introducing bias support. This pass likely removes unnecessary bias parameters when the bias tensor is empty or zero, optimizing the generated code.
The placement after PULPAddRequantMergePass is reasonable in the optimization pass sequence.
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)
12-27: LGTM! Header declarations consistent with implementations.The function signatures correctly extend the existing FP32 convolution API:
- PULP_Conv2d_fp32_fp32_fp32_HWC (lines 12-18): Added pSrcBias and has_bias parameters
- PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 20-27): Added bias parameters and pContextBuffer
- PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 29-36): New entry point for depthwise im2col convolution
All declarations match the implementations in Convolution_fp32.c and DWConvolution_fp32.c. Based on learnings.
Also applies to: 29-36
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
10-34: LGTM! Proper multi-core parallelization added.The updated function now includes:
- Bias parameter support (pSrcBias, has_bias)
- Per-core output channel distribution (lines 18-31)
- Core-specific weight pointer adjustment (line 34)
- Early exit for cores with no assigned channels (lines 29-31)
This enables efficient parallel execution while maintaining correctness. Based on learnings.
66-67: LGTM! Correct per-core output indexing and bias application.The output indexing (lines 66-67, 96-97, 173, 212-214) correctly accounts for:
- Global output index calculation:
(h_out * W_out + w_out) * F_total + ...- Core-specific channel offset:
ch_out_start + for justf(depending on loop variable)- Bias application using the global channel index (line 67:
pSrcBias[f + ch_out_start], line 175:pSrcBias[f])This ensures correct results when work is distributed across multiple cores.
Also applies to: 175-175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)
241-241: Complete the mutable default fix forshapeparameter.The
aliasesparameter correctly usesNone, butshape = [1]is still a mutable default. This was flagged in past reviews and by static analysis.Apply this diff:
- def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None): + def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, aliases: Optional[List[str]] = None): self.name: str = name - self.shape: Sequence[int] = shape + self.shape: Sequence[int] = shape if shape is not None else (1,)As per past review feedback and static analysis (Ruff B006).
437-439: Fix mutable default arguments inConstantBuffer.__init__.Both
shape = [1]andvalues = [0]are mutable defaults, flagged by past reviews and static analysis (Ruff B006).Apply this diff:
- def __init__(self, name: str = '', shape = [1], values = [0]): - super().__init__(name, shape) + def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, values: Union[Sequence, np.ndarray, None] = None): + super().__init__(name, shape if shape is not None else (1,)) - values = np.asarray(values) + values = np.asarray(values if values is not None else [0])As per past review feedback and static analysis.
♻️ Duplicate comments (2)
CHANGELOG.md (1)
23-23: Remove duplicate PR reference.Line 23 duplicates the PR #105 entry already present at line 18. Remove the redundant entry to maintain changelog clarity.
- Use Docker digests instead of arch-specific tags [#106](https://github.com/pulp-platform/Deeploy/pull/106) - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105) - Fix `Unsqueeze` Op. when using ONNX opset 13 or higher (from attribute to input) [#119](https://github.com/pulp-platform/Deeploy/pull/119)Deeploy/DeeployTypes.py (1)
331-350: Critical bug: cannot append to a set.Line 260 initializes
self.aliasesas aset, but line 348 calls.append()which doesn't exist on sets. This will raise anAttributeErrorat runtime.Apply this diff:
def add_aliases(self, aliases_to_add: List[str]): """ Adds list of aliases to the aliases attribute. Parameters ---------- alias_to_add : List[str] List of names of aliases to add to the aliases attribute. Returns ------- None """ if not hasattr(self, "aliases"): return None for alias in aliases_to_add: - if alias not in self.aliases: - self.aliases.append(alias) + self.aliases.add(alias) return None
🧹 Nitpick comments (1)
CHANGELOG.md (1)
30-32: Reduce successive FP prefix repetition.Three consecutive entries (lines 30–32) begin with "FP", which reduces readability. Consider rewording one or more entries to vary the opening phrase.
Example refactors:
- Line 30: "Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests..."
- Line 31: "Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations."
- Line 32: "DW conv lowering optimization pass (FP variant) similar to the existent one for integer version."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(4 hunks)Deeploy/DeeployTypes.py(9 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (114)
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)
558-567: LGTM: n_cores parameter added correctly.The
n_coresparameter is properly added toNetworkContext.__init__with a default value of 8 and stored as an instance variable. No duplicate assignment.
2479-2519: LGTM: n_cores propagated correctly to NetworkContext.The
n_coresparameter is properly added toNetworkContainer.__init__and correctly passed through to theNetworkContextconstructor at lines 2518-2519.
2679-2685: LGTM: NetworkContext recreation preserves n_cores.The new
NetworkContextis correctly created withn_cores = self.ctxt.n_cores, preserving the core count from the existing context.
3231-3285: LGTM: n_cores propagated through NetworkDeployer.The
n_coresparameter is properly added toNetworkDeployer.__init__, passed to the parentNetworkContainerviasuper().__init__(), and stored as an instance variable at line 3285.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I reviewed the whole PR. There are still some changes I would like to see specifically around the n_cores and some remnant changes around reshape aliasing. I left some comments also in the older threads so check those too. Thanks!
| deeployStateDir: str = "DeeployState", | ||
| inputOffsets: Dict[str, int] = {}): | ||
| inputOffsets: Dict[str, int] = {}, | ||
| n_cores: int = 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a universal feature, meaning some targets don't have multiple cores, for example CortexM or Generic. You should add this just to the specific target's deployer, i.e., PULPDeployer
| localObjects = {}, | ||
| name: str = 'DeeployNetwork'): | ||
| name: str = 'DeeployNetwork', | ||
| n_cores: int = 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, add this to the target that uses it, not everyone
| name: str = 'DeeployNetwork', | ||
| deeployStateDir: str = "DeeployState"): | ||
| deeployStateDir: str = "DeeployState", | ||
| n_cores: int = 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that you will have to revert 99% of your changes to DeeployTypes.py
| operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]: | ||
| im2col_dim = 4 * 8 * (operatorRepresentation['ch_im_in'] * operatorRepresentation['dim_kernel_x'] * | ||
| operatorRepresentation['dim_kernel_y']) | ||
| im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way of getting the n_cores information might be through the operatorRepresentation which you add in the parse step of the PULP deployer. This might be interesting for the future if we ever want to have the number of cores variable per layer. I'm kinda leaning more towards that approach. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xeratec any opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this information should be added at the operator level. In a future multi-cluster application, the number of cores will only be relevant for executing a kernel, not for the entire network. Therefore, this information should be associated with the engine and reflected in the operator representation.
I’m fine with merging a temporary solution for now. However, the change should only affect the platform currently under consideration and be linked to the DeploymentEngine, where we ultimately want it to reside anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to ReshapeTemplate.py should be unnecessary after #125, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's actually porting the changes you made in the generic template, since this one is used in the Siracusa tests. Without this, the ReshapeWithSkipConnection test (and also others containing reshape if I remember correctly) would fail on Siracusa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I change my question to: Why are we not using the generic reshape template in the first place? 😅 could you please try with the generic one and confirm that it's still doing the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra _alias leftover parameter that is still used by the tiler, which breaks if removed. I will remove the unnecessary extra code by inheriting the template from generic and only adding this extra parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)
42-48: Consider removing redundant buffer retrieval.The parent class
_GenericReshapeTemplate.alignToContext()already retrieves and validates these buffers (lines 29-33 in Generic template). Since you callsuper().alignToContext()on line 40, you can retrieve the buffers directly from the returnedctxtwithout re-fetching them.Apply this diff to streamline the code:
- # Get buffers - bufferIn = ctxt.lookup(operatorRepresentation['data_in']) - assert isinstance(bufferIn, VariableBuffer) - - bufferOut = ctxt.lookup(operatorRepresentation['data_out']) - assert isinstance(bufferOut, VariableBuffer) - + # Get buffers (already retrieved and validated by parent) + bufferIn = ctxt.lookup(operatorRepresentation['data_in']) + bufferOut = ctxt.lookup(operatorRepresentation['data_out']) +Or even more concisely, inline them:
- # Get buffers - bufferIn = ctxt.lookup(operatorRepresentation['data_in']) - assert isinstance(bufferIn, VariableBuffer) - - bufferOut = ctxt.lookup(operatorRepresentation['data_out']) - assert isinstance(bufferOut, VariableBuffer) - # Linking required for tiling - bufferOut._alias = bufferIn.name + ctxt.lookup(operatorRepresentation['data_out'])._alias = ctxt.lookup(operatorRepresentation['data_in']).name
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py(3 hunks)Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
Deeploy/DeeployTypes.py (3)
NodeTemplate(87-229)lookup(722-754)VariableBuffer(232-360)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
Deeploy/DeeployTypes.py (4)
NetworkContext(508-1022)VariableBuffer(232-360)alignToContext(119-139)lookup(722-754)Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (2)
_GenericReshapeTemplate(10-39)alignToContext(15-39)
🔇 Additional comments (3)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
10-10: LGTM! Clean refactoring to clarify template hierarchy.The rename from
_ReshapeTemplateto_GenericReshapeTemplateimproves clarity and aligns well with the inheritance pattern where the PULP-specific template now extends this base class.Also applies to: 42-42
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
28-32: LGTM! Inheritance approach correctly implemented.The addition of the
VariableBufferimport and inheritance from_GenericReshapeTemplateproperly implements the agreed-upon design from the past review discussion.
49-50: LGTM! Tiler-specific aliasing correctly preserved.The
_aliasattribute assignment is correctly preserved for tiler compatibility, as discussed in the past review comments. This works alongside the parent's bidirectionalaliasesset mechanism.
This PR brings the changes required for a working minimal TinyViT to work on the Siracusa platform, without tiling.
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.