fix(aem-workflow-skills) - updated gaps in the workflow skills.#108
fix(aem-workflow-skills) - updated gaps in the workflow skills.#108akankshajain18 wants to merge 25 commits intoadobe:mainfrom
Conversation
…rectness Bundle 12 runbooks + 3 docs under references/ in both 6.5-lts and cloud-service variants, plus OSGi config examples and a working StaleWorkflowServlet for cloud-service. Also apply targeted Cloud Service correctness fixes that the previous JMX-copied content masked: - StaleWorkflowServlet: add 403 guard on wfSession.isSuperuser() — the prior code silently returned only workflows the caller initiated for non-superusers, so ops would read "staleCount: 0" while the system had a large stale backlog. Push RUNNING state filter into the JCR query instead of loading every workflow into memory. - Cloud Service SKILL.md: replace /libs/granite/operations/config/maintenance (/libs is read-only on AEMaaCS) with /conf/global/settings/granite/operations/maintenance. Add post-deploy verification step for queue.maxparallel override with service.ranking tiebreak guidance and an equal-ranking duplicate-registration warning. - Correct the cq.workflow.job.max.procs myth in both variants — real parallelism knob is queue.maxparallel on the Granite Workflow Queue (verified against WorkflowSessionFactory source). - Rewrite 4 Cloud Service runbooks (stale-workflows, failed-work-items, purge-and-cleanup, job-throughput-and-concurrency) — they were previously byte-identical copies of 6.5-lts, telling customers to invoke JMX operations that are not reachable on Cloud Service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 3 docs for AEMaaCS correctness The Cloud Service variant of workflow-debugging shipped with runbooks and docs that were byte-identical copies of the 6.5-lts variant, instructing customers to invoke JMX operations (countStaleWorkflows, retryFailedWorkItems, purgeCompleted, countRunningWorkflows, returnSystemJobInfo, etc.) that are not reachable on AEMaaCS. This commit diverges the Cloud Service copies and replaces every JMX remediation with its AEMaaCS-correct equivalent. Runbooks rewritten (Cloud Service variant): - runbook-decision-guide.md — first-action column now routes to servlet / OSGi-config / Developer Console paths; adds a JMX→CS translation table. - runbook-workflow-stuck.md — full CS rewrite; adds thread-pool saturation check for system-wide auto-advance failure. - runbook-workflow-fails-or-shows-error.md — CS-correct retry/terminate flow; propagates the audit-trail warning for bulk replay (pharma / finance / legal must not use terminate+restart). - runbook-task-not-in-inbox.md — IMS-federation-aware; adds principal- rotation gotcha (assigning to individuals is a time bomb). - runbook-inbox-and-permissions.md — group-vs-individual superuser guidance with repoinit patterns; warns against toggling enforce flags off in prod. - runbook-launcher-not-starting.md — ui.content deploy flow, run-mode scoping, /libs read-only guard, CRX/DE non-durability note. - runbook-model-delete-and-update.md — replaces JMX countRunningWorkflows with Workflow Console + custom read-only servlet; Sync silent-failure gotcha (empty OR/AND branch). - runbook-validate-workflow-setup.md — checklist-first restructure with a copy-paste pre-release block. Docs rewritten: - mbeans.md — full rewrite as a JMX→Cloud Service operation translation table. The previous file described unreachable JMX infrastructure as if it were callable, which silently misled customers. - error-patterns.md — adds Cloud Service log-access context (Cloud Manager Logs, Splunk, Developer Console), logger-class reference, and the LogManager factory config approach for raising log levels without Felix. - debugging-index.md — variant note clarifying that symptom_ids are portable but runbook_ref targets are CS-specific. Verification: - 12/12 CS runbooks now diverge from 6.5-lts (was 0/12 before this series). - 3/3 CS docs files diverge from 6.5-lts. - 0 broken intra-skill links across all 15 CS runbook+doc files. - 0 prescriptive JMX call-sites in CS runbooks (operation names appear only in "not reachable on AEMaaCS — use X instead" translation context). Scope: all changes live under cloud-service/.../workflow-debugging/references/. No changes to workflow-orchestrator, workflow-development, workflow-triaging, workflow-triggering, workflow-launchers, or workflow-model-design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unbooks 11 runbooks referenced 7 sibling docs that were never authored: configurations.md, custom-process-development.md, jcr-paths.md, workflow-editor-and-steps.md, authoring-and-inbox.md, references-and-sources.md, and examples/example-jmx-purge-and-restart.md. The dead links made 9 cross-refs non-functional in runbook-validate-workflow-setup.md alone and another 21 across the rest of the 6.5-lts runbooks. Fix: remove the broken links and replace load-bearing ones with inline pointers to SKILL.md Step 5 (which already holds the OSGi property matrix they were trying to point at). Dropped links where the target was duplicative of content elsewhere in the skill family (custom-process-development, workflow-editor-and-steps) rather than authoring new docs. Per-file: - runbook-decision-guide.md: configurations.md → inline SKILL.md Step 5 ref - runbook-failed-work-items.md: References section cleaned; SKILL.md pointer - runbook-inbox-and-permissions.md: References cleaned; SKILL.md pointer - runbook-job-throughput-and-concurrency.md: 2 configurations.md refs removed - runbook-launcher-not-starting.md: References cleaned; inline path note - runbook-model-delete-and-update.md: jcr-paths.md removed (paths already inline) - runbook-purge-and-cleanup.md: References cleaned; SKILL.md pointer - runbook-task-not-in-inbox.md: References cleaned; SKILL.md pointer - runbook-workflow-fails-or-shows-error.md: References cleaned - runbook-workflow-stuck.md: inline custom-process ref replaced with Felix Console hint; References cleaned - runbook-validate-workflow-setup.md: full rewrite — 9 broken links scattered across body and routing table; now a checklist-first runbook with a copy-paste pre-release block Verified: 0 broken intra-skill links across both 6.5-lts and cloud-service variants after this change. Scope: only plugins/aem/6.5-lts/skills/aem-workflow/workflow-debugging/. No changes to cloud-service, workflow-orchestrator, workflow-development, workflow-triaging, workflow-triggering, workflow-launchers, or workflow-model-design. No new docs authored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… symptom_id, /libs leak
Four in-scope residuals from the post-rewrite audit:
- cloud-service runbooks/README.md — was written pre-rewrite and warned that
"the runbooks in this folder still reference JMX MBeans — use the
translation table below". After the 12-runbook rewrite that framing is
obsolete and contradicts the runbooks. Replaced with a light table of
contents + JMX-operation → runbook pointers + bundled-artifact index.
- debugging-index.md (both variants) — workflow_auto_advance_failure appears
in each variant's SKILL.md Step 1 table but was missing from the
machine-readable YAML index and lookup table. Added a YAML block with
root_cause_categories and a lookup-table row so scripts / agents keying
on debugging-index.md can classify auto-advance failures.
- 6.5-lts runbook-decision-guide.md — cloud-service version got the
workflow_auto_advance_failure row in the prior rewrite pass; 6.5-lts
still had the original 11-row table. Added the matching row.
- cloud-service Purge Scheduler example JSON — the "//" comment block cited
/libs/granite/operations/config/maintenance as the scheduling window
location. That's the 6.5-lts path and is read-only on AEMaaCS. Updated
the comment to point at /conf/global/settings/granite/operations/maintenance
(matching the P2-a fix applied to SKILL.md and runbook-purge-and-cleanup.md
earlier). JSON still parses.
Verified post-fix:
- 0 broken intra-skill links across both variants.
- 0 obsolete JMX-framing mentions in README.md.
- workflow_auto_advance_failure present in SKILL.md + debugging-index.md +
runbook-decision-guide.md in both variants.
- Purge Scheduler JSON still valid; /libs only mentioned in the corrective
"Do NOT reference /libs on AEMaaCS" comment.
Scope: only plugins/aem/{6.5-lts,cloud-service}/skills/aem-workflow/workflow-debugging/.
No changes to workflow-orchestrator, workflow-development, workflow-triaging,
workflow-triggering, workflow-launchers, or workflow-model-design.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… set with workflow-debugging The References section in workflow-triaging/SKILL.md (both variants) pointed at 4 paths relative to an "aem-agent-marketplace-workflow-knowledge-base" root that does not exist in this repo: - aem-agent-marketplace-workflow-knowledge-base/docs/debugging-index.md - runbooks/runbook-decision-guide.md (triaging has no runbooks/ folder) - Workflow-docs/splunk-workflow-triaging.md (parent dir does not exist) - docs/error-patterns.md (triaging has no docs/ folder) Replaced with resolvable paths into the sibling workflow-debugging skill (../workflow-debugging/references/docs/ and .../runbooks/). Dropped the Splunk-file reference in favor of a pointer to Step 3 where the Splunk queries are already inlined. Also added the workflow_auto_advance_failure symptom row to Step 1 in both variants — it was present in workflow-debugging/SKILL.md but missing from triaging, so "workflow auto-advance stopped firing" had no classifier. Triaging and debugging now share an identical 12-symptom taxonomy. Verified: - 0 broken links in workflow-triaging/SKILL.md (both variants). - Symptom-id set matches 1:1 between triaging and debugging. - No changes to workflow-debugging, workflow-orchestrator, workflow-development, workflow-triggering, workflow-launchers, or workflow-model-design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llback, and polish Technical accuracy - ProjectEditorsChooser now returns a rep:principalName (reads editors multi-value property on the project roles node) instead of a JCR path. - Workflow rules noted as ECMAScript (Rhino); removed incorrect Groovy mention. - Workflow-package detection uses adaptTo(ResourceCollection.class) — the primary-type check for cq:WorkflowContentPackage was inaccurate. - getRoutes comment clarified; points to getBackRoutes for back routes. - DS R6 template now guards on payloadType == JCR_PATH. - Dropped unused @reference fields from DS R6 and Felix SCR templates. - SimpleMetaDataMap availability note added for test scope. Scope and consistency - Variant Scope now includes AMS deployments of 6.5 LTS (aligns with parent). - Felix SCR lifecycle clarifier: supported only for 6.5 LTS lifetime. - quick-start-guide.md intent tree adds debugging + triaging rows. Customer-facing safety - Rollback section added: bundle uninstall (with change-control caveat), launcher disable, pre-termination confirmation, 15-min verification window. - Escalation section added: four trigger criteria + artifact collection pack. - PII / payload-content logging guardrail. - Local-only caveats on curl -u admin:admin examples. Navigation - SKILL.md References expanded to link all five foundation files. - New Audience line and inline Prerequisites summary in SKILL.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mo Banner Approval validation Gaps exposed when validating an agent-generated workflow against the spec: - Add canonical design-time model page wrapper (cq:Page + cq:WorkflowModel) to jcr-paths-reference.md, with the correct cq:template (/libs/settings/workflow/templates/model) and sling:resourceType, plus an explicit "do not use" list covering the agent-hallucinated path (.../workflow-model) and the legacy /etc-era path (/libs/cq/workflow/...). - Add Guardrail in SKILL.md: model XML and Java are co-authored. The PROCESS= value on a cq:WorkflowNode must resolve to a deployed FQ class or process.label, otherwise the engine fails with "Process not found". - Add Pattern 4 to participant-step-patterns.md: route a participant step to the workflow initiator via a chooser that reads the engine-set 'initiator' metadata key. Includes a no-go on PARTICIPANT="\$initiator\$" substitution which is not consistently supported on 6.5 LTS. - Add Notification-Only Participant Steps section: one node per distinct outcome; do not multiplex outcomes through a shared step that branches in Java. Email path explicitly out of scope. - Add OR_SPLIT-after-Participant section to variables-and-metadata.md with worked Approve/Reject ECMAScript transitions, document-order evaluation rule, and explicit anti-pattern callout. - Remove the Testing section from process-step-patterns.md. Workflow-step testing is generic AEM testing (better served by AEM Mocks / aem-mock-junit5) and out of scope for this skill. Also removes the SimpleMetaDataMap visibility caveat that came with it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… api-reference copies WorkflowTransition.getRule() is evaluated by the Rhino ECMAScript engine on AEM 6.5 LTS. Groovy is not part of the Granite workflow rule pipeline — the prior "ECMA/Groovy" wording was misleading and could push customers to write Groovy rules that silently never compile. Aligns six sibling api-reference.md copies and drops the now-redundant "Groovy is not supported" line in variables-and-metadata.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tes and patterns Bring the skill's copy-ready samples and decision rules up to the level a Cursor / Claude Code consumer needs to generate correct AEM 6.5 LTS workflow code on the first pass. SKILL.md: - Variant Scope: explicit DS R6 vs Felix SCR decision rule for new code, matching the project's existing annotation style rather than mixing both. - Variant Scope: AEMaaCS stop-rule — refuses to generate 6.5-only patterns for cloud-service projects. - Felix SCR template: replaced "// same body as DS R6 example" with the full body so an LLM does not mis-substitute. - Workflow checklist: payload-type guard moved before payload extraction, matching the corrected Pattern 1 ordering. process-step-patterns.md: - Pattern 1: type-check before getPayload().toString() (BLOB/null safety). - Pattern 5: getServiceResourceResolver / commit() exception handling added — the prior snippet did not compile against WorkflowProcess.execute(...) signature. - Pattern 6: null guard on resolver.getResource(payloadPath) before adaptTo(Node.class) — prevents NPE → stuck workflow on missing payload. - Pattern 2: PROCESS_ARGS marked legacy with explicit "do not generate for new steps" directive so an LLM defaults to named args. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…architecture considerations
Bring the model-design skill up to the level a Cursor / Claude Code consumer
needs to produce deployable, well-shaped AEM 6.5 LTS workflow models on the
first pass. Several samples taught patterns that would silently fail to
deploy or stall at runtime.
SKILL.md:
- Audience, Prerequisites, Required Permissions sections — match the
sibling workflow-development skill's structure.
- Dependencies section — explicit Java-first / model-second order of
operations; prevents the most common cross-skill failure mode (Process
not found).
- Variant Scope: AEMaaCS stop-rule.
- Workflow checklist: payload-type wording aligned with the corrected
guardrail (cq:Page collection via adaptTo(ResourceCollection.class)).
- Workflow checklist step 8: verify the model loads in the Workflow Model
Editor — catches page-wrapper failures at the developer's desk.
- Architecture Considerations: transient vs persistent, participant
timeouts, Goto retry caps, design-time purge configuration, model
versioning. Closes the gap that the skill taught XML structure but not
workflow design judgment.
model-xml-reference.md:
- Full canonical /conf model XML structure: cq:Page → cq:PageContent (with
required cq:template and sling:resourceType) → cq:WorkflowModel "model"
child. Prior incomplete root-only XML deployed but the Workflow Model
Editor would not load it.
- Common-pitfalls block listing the wrong template/structure variants the
community frequently reaches for.
- SetVariableProcess argument modes (LITERAL, RELATIVE_TO_PAYLOAD,
ABSOLUTE_PATH, EXPRESSION, VARIABLE, JSON_DOT_NOTATION, XPATH).
65-lts-guardrails.md:
- Workflow-package detection: replaced cq:WorkflowContentPackage primary-
type check with adaptTo(ResourceCollection.class) — the prior pattern
silently missed every multi-page workflow payload.
- Manual purge curl: local-development-only warning + admin-credential
caveat.
model-design-patterns.md:
- Pattern 2: declared `routes` via session.getRoutes(item, false) — prior
snippet referenced an undefined variable.
- Pattern 4: Long retryCount with 0L default — prior int default risked
ClassCastException across step boundaries.
- OR_SPLIT and Goto rules: strict equality (===) with String(...) wrap to
match sibling skill's convention; eliminates Rhino coercion fragility.
- Pattern 4 Goto rule: type-safe raw read + longValue() to survive Java
Long values being read from Rhino.
- Pattern 5 (Task Manager): canonical PROCESS XML with PROCESS=Task Manager
Step, PROCESS_AUTO_ADVANCE={Boolean}false, taskTitle/Description/
Instructions/Owner/Priority. Prior version showed only the conceptual
flow.
step-types-catalog.md:
- ECMAScript (Rhino) terminology aligned with api-reference.md.
- Goto Step section (XML + arg explanation + hard-cap rule), so retry-loop
generation is fully self-contained.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chitecture considerations Bring the triggering skill up to parity with the now-richer workflow-development and workflow-model-design skills. Fixes defect classes that an IDE-LLM consumer would faithfully reproduce — wrong workflow-package detection, NPE-prone model lookup, deprecation-confusion, and a Sling Scheduler example that taught workflow-flood patterns. SKILL.md: - Audience, Dependencies, Prerequisites, Required Permissions sections — parity with sibling skills; Dependencies explicitly states the upstream workflow-model-design and workflow-development requirements. - Variant Scope: AEMaaCS stop-rule (no traditional replication agents, different HTTP auth, /etc paths deprecated). - Manage Publication payload: replaced cq:WorkflowContentPackage description with the corrected workflow-package guidance — cq:Page collection under /var/workflow/packages/ (newer) or /etc/workflow/packages/ (legacy), detected at runtime via adaptTo(ResourceCollection.class). - Programmatic example: getModel() null-guard with descriptive WorkflowException; explicit non-blocking note on startWorkflow(). - HTTP Workflow API: local-development-only warning on curl examples plus service-account guidance for non-local environments. - Architecture Considerations: async-by-default, bulk-trigger caps, transient workflows for high-volume triggers, recursive-trigger prevention, mechanism-stacking warning, initiator-as-service-user semantics. - Triggering Mechanisms Summary: removed unclear "Classic UI Activate" row; replaced with explicit "Replication Trigger (6.5 LTS only)" row that points at Section 5. programmatic-api.md: - Service Class Pattern: getModel() null-guard with descriptive exception; inline async-execution note. - Sling Scheduler example: hard cap (MAX_PER_RUN=500), per-iteration try/catch around startWorkflow (one bad payload no longer aborts the batch), getModel null check, named LOG instead of inline LoggerFactory.getLogger(getClass()), specific exception types instead of generic Exception, transient-workflow cross-reference for high-volume triggers. triggering-mechanisms.md: - Manage Publication payload: same cq:WorkflowContentPackage correction as SKILL.md. - Service user requirement: corrected the false "deprecated SlingRepository. loginService()" — loginService() is the supported method; only loginAdministrative() is deprecated and must not be used. - HTTP Workflow REST API: local-development-only warning on curl examples. 65-lts-guardrails.md (this skill's copy): - Workflow Packages section: replaced primary-type check with the canonical adaptTo(ResourceCollection.class) pattern, matching the fix already applied in workflow-model-design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r AI-IDE consumers
UX review pass on the post-architecture-fix skill. The skill produced excellent
codegen but went quiet on the post-trigger conversation — "did it work?", "how
do I parse the response?", "can I cancel?", "why isn't it running?" all had
no canonical answer. These six additions close that loop.
SKILL.md:
- Triggering Mechanisms Summary: added a "Common Scenarios" sub-table that
routes developer intent ("nightly batch job processes pending assets",
"CI pipeline triggers review after deploy", etc.) directly to the right
mechanism. Lifts the Decision Matrix from the reference into the
first-read surface.
- Manage Publication: added the payload-shape pairing requirement — the
selected workflow model must be designed for multi-page payloads, with
PROCESS steps that adapt the payload to ResourceCollection and iterate.
Prevents the "model + Manage Publication" combination that fails on the
first step.
- HTTP Workflow API: added an explicit response-shape note. POST returns
201 with the new instance path in the Location response header (capture
this in CI scripts); 4xx/5xx with Sling JSON error body on failure;
GET returns a JSON array; DELETE returns 200.
- HTTP Workflow API: added cancellation semantics — termination is
irreversible, completed steps are not rolled back, in-flight execute()
is abandoned, the instance becomes ABORTED and cannot be resumed.
Prefer suspend/resume for business-critical workflows.
- Verifying the Trigger: new section with three concrete confirmation
paths (UI / HTTP / Java). Closes the most common post-trigger question:
"did it actually start?"
- When the trigger succeeds but the workflow doesn't progress: explicit
failure-mode handoff. Most common cause is a missing WorkflowProcess
registration (cross-ref to workflow-development Dependencies); for full
diagnosis points at workflow-debugging. Closes the "I triggered something
and it's broken" conversational loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hitecture considerations Add Audience, Variant Scope (with AEMaaCS stop-rule), Dependencies, Prerequisites, Required Permissions, Common Scenarios (intent→pattern + when-not-to-use), Architecture Considerations (glob narrowing, multi-event amplification, loop prevention, transient workflows, lower-env discipline, mechanism stacking), and Verifying the Launcher sections to SKILL.md. Document `transient` and `noProcess` properties in launcher-config-reference. Flag `runModes` honoring as unreliable on 6.5 LTS in both files and steer toward `config.author/` packaging. Add a local-dev-only guardrail on the `curl -u admin:admin` debug example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ten routing
Add Audience, Variant Scope (with AEMaaCS stop-rule), Dependencies, and
Cross-Cutting Invariants (loop prevention, JMX safety, AEMaaCS stop-rule) to
SKILL.md. Flip routing default to debugging-first; load workflow-triaging only
when the user explicitly invokes a multi-instance / log-mining context. Drop
production-support routing rows that force IDE-LLM hallucination of
Splunk / multi-host / ticket context, and remove Patterns E and G; renumber
Pattern F to E. Fix Pattern A step 6 ("Sync via Package Manager" was
incorrect — split into the real Tools → Workflow → Models → Sync action and
the Maven autoInstallPackage iteration path). Flip Pattern C service-user
mapping to a dedicated-user-first recommendation with narrow ACLs. Extend
workflow-debugging reference loading to include runbooks and docs subdirs.
Add local-dev-only guardrail to quick-start-guide.md curl examples.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s all 7 skills
Strengthen the AEMaaCS stop-rule in workflow-debugging and workflow-triaging so
their Variant Scope blocks match the imperative "Not for AEM as a Cloud Service"
form already used by the dev cluster — IDE LLM consumers were getting weaker
guidance from the operational skills. Propagate the orchestrator's full
JMX-safety invariant ("Never recommend JMX remediation without confirming target
instance with the user") into the two skills that actually emit JMX commands.
Add a launcher-re-trigger loop-prevention guardrail (setUserData
"workflowmanager") to workflow-development Guardrails — the skill that emits
process-step code now carries the constraint at the point of code generation,
not just from the orchestrator. Add Audience and Dependencies blocks to
workflow-debugging, workflow-triaging, and workflow-development so all 7 skills
share the same opening structure. Fix workflow-orchestrator frontmatter
description (still said "spanning development and production support" after the
body was cleaned), and add a runModes-on-launchers reliability row to its
Guardrails Summary table. Add a "Routing back to dev skills" subsection to
workflow-debugging so diagnoses that conclude in code/model defects can route
forward to the right dev skill.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p-prevention guardrail
The loop-prevention guardrail in workflow-orchestrator, workflow-launchers, and
workflow-development told the LLM to call
`session.getWorkspace().getObservationManager().setUserData("workflowmanager")`
on the `session` parameter — but in `WorkflowProcess.execute(WorkItem,
WorkflowSession, MetaDataMap)` that parameter is a `WorkflowSession`, which
does not expose `getWorkspace()`. An IDE LLM faithfully reproducing the
guardrail would emit code that does not compile. Update all three sites to
make the JCR-Session-vs-WorkflowSession distinction explicit and show the
`adaptTo(javax.jcr.Session.class)` step inline, plus a note that a service-user
`ResourceResolver` write path uses a different `Session` instance and must be
tagged separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ice variant — Phase 1 Replace the launchers debug curl that emitted admin:admin against an /etc endpoint — neither is appropriate for Cloud Service (production auth is IMS-based; /etc/workflow/launcher.json is not the canonical surface on AEMaaCS) — with Tools → Workflow → Launchers UI guidance plus a local-AEMaaCS- SDK-only fallback. Add an imperative safety guardrail above the OOTB launcher overlay section warning never to disable dam_update_asset_* / dam_xmp_writeback without confirming intent — disabling these silently breaks asset processing. Flip the orchestrator service-user mapping (Guardrails Summary table and Pattern C step 3) from "Always use workflow-process-service" to "Use a dedicated service user with narrow ACLs" — reusing the OOTB privileged user as an application sub-service violates least-privilege. Fix the workflow- development Variant Scope claim that the bundle "goes into the ui.apps content package" — the Java source lives in the core (or equivalent) Maven module and the built bundle is wrapped by the all content package, not ui.apps. Add a Tools → Workflow → Models → Sync step to orchestrator Pattern A so the runtime path at /var/workflow/models/<id> matches design-time — without it the engine cannot resolve the deployed model. Fix the workflow-debugging Reference Loading Order entry that pointed at a non-existent reference.md and now lists the actual debugging-index.md and runbooks paths that exist on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… cloud-service variant — Phase 2
Mirror the 6.5-lts suite-level review work for the AEMaaCS variant, with
AEMaaCS-specific framing throughout — the underlying tools and surfaces differ
(no Felix Console JMX in production, Cloud Manager pipeline-only deploy,
Developer Console for diagnostics, IMS auth, all-package wrapping) so the
guardrail wording is not a literal port of the 6.5-lts text.
Add Audience + Variant Scope (with reverse "Not for AEM 6.5 LTS" stop-rule) +
Dependencies blocks to all 7 cloud-service skills so the IDE LLM has the same
opening structure across both variants. Add Cross-Cutting Invariants to the
orchestrator (loop prevention with WorkflowSession-vs-JCR-Session
disambiguation, AEMaaCS-flavored JMX-safety: "no JMX in production — use Inbox
Retry / Purge Scheduler / Cloud Manager-driven config changes", and the 6.5-LTS
reverse stop-rule). Propagate the JMX-safety framing into workflow-debugging
and workflow-triaging where JMX commands are emitted, replacing the weaker
existing wording with imperative "Never recommend JMX-based remediation"
guardrails. Remove the Production Support routing rows that forced LLM
hallucination of Splunk / multi-host / ticket context and remove orchestrator
Pattern D ("Workflow errors on host X for past 4 hours"); rename Pattern E to
Pattern D. Flip the routing default from triaging-first to debugging-first;
load workflow-triaging only on explicit invocation. Bring workflow-development
Guardrails to parity with the 6.5-lts variant (PII/payload-content logging
guard, model-vs-Java co-authorship constraint, loop-prevention with
disambiguated JCR Session adapt step). Add an Architecture Considerations
section to workflow-launchers (glob narrowing, multi-event amplification, loop
prevention, transient workflows, lower-env discipline via run-mode-aware
folders, mechanism stacking) — adjusted for AEMaaCS auto-scaling cost
implications. Strengthen the runModes guidance: the property has known
honoring issues, and the canonical AEMaaCS pattern is config.author/-style
run-mode-aware folder packaging. Fix the workflow-debugging file's broken
[reference.md](reference.md) link by listing the actual references that exist
on disk (debugging-index.md, runbooks/, error-patterns.md, mbeans.md). Add a
"Routing back to dev skills" closing subsection to workflow-debugging matching
the 6.5-lts variant. Add a runModes reliability row to the orchestrator
Guardrails Summary. Add a local-AEMaaCS-SDK-only guardrail above the HTTP
Workflow API curl examples in workflow-triggering, and flip its service-user
mapping recommendation from "workflow-process-service" to dedicated user.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase 3 Update workflow-orchestrator frontmatter description from "spanning development and production support" (which contradicted the cleaned body) to a routing- focused line with the AEMaaCS-only / 6.5-LTS-stop-rule gate so the LLM's skill-selection step sees the variant gate without loading the body. Add an "author-tier only by default" note above the orchestrator's Quick Architecture Recap so the LLM does not assume publish-tier workflow infrastructure on AEMaaCS (publish is read-mostly and replication-driven; workflow execution runs on author). Add a "cloud environments only" note above the Cloud Service Production Support Constraints table that distinguishes cloud environments from the local AEMaaCS SDK — the SDK has Felix Console with JMX, accepts Package Manager uploads, supports admin:admin auth, and gives jstack access, none of which apply to cloud, and conflating the two leads the LLM to suggest local affordances against cloud. Sweep ECMA / Groovy / "ECMA (JavaScript)" rule-language references to "ECMAScript (Rhino)" across workflow-model-design SKILL.md, model-xml-reference, step-types-catalog, model-design-patterns, workflow-development variables-and-metadata, and the five workflow-foundation api-reference duplicates — Groovy was never the correct term, workflow rules on AEMaaCS are evaluated by Rhino. Mirrors the 6.5-lts ac5e5f6 sweep so the LLM emits the canonical wording. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rrectness defects The same logical workflow-foundation/ directory was duplicated across 5 sub-skills with 4 different SHAs of 65-lts-guardrails.md, 3 of quick-start-guide.md, and 2 of jcr-paths-reference.md. The orchestrator copy (loaded first per its own SKILL.md) carried a stale workflow-package detection pattern (cq:WorkflowContentPackage + /etc/workflow/packages/) that an IDE LLM would faithfully emit, silently making multi-page Manage-Publication payloads fall through the single-payload branch. Consolidate to one canonical workflow-foundation/ under workflow-orchestrator/ referenced by relative path from each sub-skill. SHA drift is now structurally impossible. Promote the corrected ResourceCollection adapter pattern, the local-dev curl disclaimer, and the design-time model page wrapper template into the canonical copies. Delete the four duplicate trees. Also close three smaller IDE-LLM defects: - condition-patterns.md multi-condition .content.xml example was missing the backslash-escaped commas FileVault requires; an LLM copying it produced a launcher that silently never matched. Fix and call out the failure mode. - condition-patterns.md "Run Mode Patterns" section taught runModes="[author]" as primary, contradicting every SKILL.md's "unreliable on 6.5 LTS — package under config.author/" caveat. Add the same caveat. - workflow-orchestrator Pattern C said "do not reuse workflow-process-service" while workflow-triggering correctly says it is the standard target. Align orchestrator with triggering — reuse OOTB unless the starter writes outside scope or compliance requires a narrower user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orkflow skills The "use OOTB workflow-process-service vs. dedicated service user" guidance had no Adobe-doc backing and kept flipping across commits. Drop the prescriptive lines from workflow-orchestrator constraint table, Pattern C step 3, and workflow-triggering Variant Scope / Required Permissions in both cloud-service and 6.5-lts variants. Mechanical content (ACL list, ServiceUserMapper requirement, "never admin credentials") is preserved where it appears in the non-prescriptive Guardrails sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
@akankshajain18 - please ensure that the Adobe CLA bot identifies you as an employee. Then I can grant you write access and you can submit the PR from a branch of this repo so the extended checks will run.
Replace the incorrect cq:WorkflowModel/cq:WorkflowNode/cq:WorkflowTransition
structure (runtime /var format) with the correct flow/parsys design-time format
that the AEM Workflow Model Editor actually reads and writes at /conf. Fixes four
reference files:
- model-xml-reference.md: fix cq:template to /libs/cq/workflow/templates/model,
add cq:designPath, replace model/nodes/transitions with flow/parsys structure,
rewrite pitfalls section, clarify variables as runtime-only, add Sync instruction
- step-types-catalog.md: rewrite all step snippets to nt:unstructured +
sling:resourceType, add sling:resourceType table, document initiatorparticipant,
remove {Boolean} type hints, remove transition XML (not in flow layer)
- model-design-patterns.md: fix Pattern 5 Task Manager XML node type, annotate
Pattern 6 variables block as runtime-only
- SKILL.md: add Sync + test-instance verification to checklist, add /conf vs /var
distinction note under storage table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply the same flow/parsys design-time format corrections as the 6.5 LTS fix,
preserving AEMaaCS-specific differences (Cloud Manager deploy, no /etc path,
EXTERNAL_PROCESS step type, cloud-service-guardrails references):
- model-xml-reference.md: was using cq:WorkflowModel as root element (runtime
format) and wrong file path jcr:content/model/.content.xml; rewritten to
correct cq:Page/flow/parsys design-time structure, fix cq:template to
/libs/cq/workflow/templates/model, add cq:designPath, rewrite pitfalls,
clarify variables as runtime-only, add /etc deprecation note
- step-types-catalog.md: rewrite all step snippets to nt:unstructured +
sling:resourceType, add sling:resourceType table including EXTERNAL_PROCESS,
document initiatorparticipant, remove {Boolean} type hints, remove transition
XML, add AEMaaCS note on Activate/Deactivate Page
- SKILL.md: fix checklist step 5, add Sync + test-instance verification steps,
fix Cloud Service Deployment path, add /conf vs /var distinction note
- model-design-patterns.md: annotate Pattern 6 variables block as runtime-only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
trieloff
left a comment
There was a problem hiding this comment.
Dumping a 99-file PR without substantive description, motivation, or explanation makes the work for maintainers incredibly difficult. Improve the quality or close the PR.
@trieloff , the changes are around the gaps in workflow skills. This PR is kind of initial work for workflow skills, no other skill has been modified all files and improvements were around the workflow skills. Please check commit history for details |
updates: able to identified as an employee by Adobe CLA bot. |
…to cloud-service variant Add Output Contract, Default Path Rule, Runtime Model Structure, and Forbidden Patterns sections to the cloud-service workflow-model-design skill — aligning it with the equivalent fixes applied to 6.5-lts in the preceding commit.
|
@akankshajain18 - now that you're part of this github org I have added you as a collaborator so you can push to branch of this repo and have the full CI checks run. Not sure if you can reuse the same PR but please explicitly address Lars' comment from #108 (review) when you update. Thanks! |
Description
Comprehensive quality pass across all seven AEM Workflow skills for both AEM 6.5 LTS and AEM as a Cloud Service variants. Changes span 99 files covering correctness, IDE-LLM consumability, and structural integrity.
Skills updated: workflow-debugging, workflow-development, workflow-launchers, workflow-model-design, workflow-orchestrator, workflow-triaging, workflow-triggering
Related Issue
Motivation and Context
The aem-workflow skill suite is consumed directly by AI-IDE tooling to guide developers through workflow tasks. Incorrect API names, broken cross-references, outdated XML formats, and missing runbook entries cause the LLM to generate invalid code or misleading guidance. This PR closes all known correctness
and consumability defects identified during a structured QA sweep of both the 6.5 LTS and Cloud Service variants.
How Has This Been Tested?
CRX/DE-observed model structure).
Screenshots (if appropriate):
Types of changes
Checklist: