Skip to content

Conversation

@sroussey
Copy link
Collaborator

@sroussey sroussey commented Jan 18, 2026

Note

Introduces a chainable RAG pipeline and input resolution system, with storage/type refactors and CI/dependency updates.

  • New RAG tasks: DocumentEnricherTask, ChunkToVectorTask, ChunkVectorUpsertTask, ChunkVectorSearchTask, ChunkVectorHybridSearchTask, ContextBuilderTask, DocumentNodeRetrievalTask (+ workflow helpers)
  • Input resolvers via schema format annotations; model resolver registered in ModelRegistry; docs added on schema annotations and RAG usage
  • Storage refactor: rename TabularRepositoryTabularStorage; InMemoryModelRepository and ModelRepository updated
  • Remove legacy document source/converter classes; docs reworked for graph execution and instrumentation (provenance mentions trimmed)
  • AI provider run fns migrated to new task input/output types; text/image embedding now supports arrays; minor fixes (e.g., translation result handling)
  • New @workglow/dataset package; added as peer/dev dep where needed; example apps import ArrayTask from @workglow/tasks
  • CI: GitHub Actions uses bun@latest and bun run rebuild; testing docs clarified
  • Dependency bumps (eslint plugins, vitest, turbo, globals, caniuse-lite, JSON schema lib) and lockfile updates

Written by Cursor Bugbot for commit 72da8b1. This will update automatically on new commits. Configure here.

- Bumped versions of several dependencies including `caniuse-lite`, `@typescript-eslint/eslint-plugin`, `@typescript-eslint/parser`, `globals`, and `turbo` for enhanced functionality and compatibility.
- Added `@sroussey/json-schema-library` as a new dependency in the project.
- Updated testing documentation to clarify the command for running specific tests.
…ify to remove ArrayTask from JobQueueTask parentage

- Eliminated provenance tracking from the TaskGraphRunner, Task, and Dataflow classes to simplify the architecture.
- Updated related documentation to reflect the removal of provenance references.
- Adjusted task execution methods to no longer require provenance input, enhancing clarity and reducing complexity in task management.
- Refactored tests to align with the updated task structure and removed any assertions related to provenance.
- Introduced an input resolver registry to automatically resolve string identifiers to object instances based on JSON Schema format annotations.
- Enhanced the TaskRunner to utilize the input resolver for resolving model names and repository IDs before task execution.
- Registered custom resolvers for various formats, improving flexibility in task configuration.
- Updated documentation to reflect the new input resolution capabilities and usage examples.
…ument-node-vector repositories

- Reorganized storage exports to include new queue-limiter implementations for Postgres, Sqlite, and IndexedDb.
- Added document-node-vector repositories for Postgres and Sqlite, enhancing document storage capabilities.
- Updated existing references in common-server and common files to reflect the new structure.
- Introduced multiple new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, DocumentNodeRetrievalTask, DocumentNodeVectorHybridSearchTask, DocumentNodeVectorSearchTask, DocumentNodeVectorUpsertTask, HierarchicalChunkerTask, HierarchyJoinTask, QueryExpanderTask, RerankerTask, StructuralParserTask, TextChunkerTask, TopicSegmenterTask, and VectorQuantizeTask.
- Enhanced the task registry to support these new tasks, allowing for improved document processing workflows and vector management capabilities.
- Updated the index file to export the new tasks for easier access and integration.
- Added comprehensive tests for each new task to ensure functionality and reliability in various scenarios.
… add it in index to prevent tree shaking

- Eliminated TaskRegistry registration from multiple AI task files to streamline task management.
- Centralized task registration in the index file to ensure all tasks are registered in one place, improving maintainability and reducing redundancy.
- Updated documentation to reflect the changes in task registration structure.
- Refactored imports in TaskUI and TaskNode components to streamline dependencies.
- Expanded TODO list with new items related to chunk and node handling, model improvements, and documentation updates.
- Removed unnecessary whitespace in several task files for cleaner code.
- Centralized task registration in the index file to improve maintainability and reduce redundancy.
- Added new tasks for chunk vector management: ChunkRetrievalTask, ChunkVectorHybridSearchTask, ChunkVectorSearchTask, and ChunkVectorUpsertTask.
- Updated existing documentation to reflect the new chunk vector tasks and their functionalities.
- Refactored related components to utilize the new chunk vector repository structure, enhancing the overall architecture for vector storage and retrieval.
- Improved task registration in the index file for better maintainability and accessibility.
- Refactored storage components to rename repository classes from `Repository` to `Storage`, enhancing clarity in naming conventions.
- Updated various files to reflect the new `Storage` naming, including `InMemory`, `IndexedDb`, `Postgres`, and `Sqlite` implementations.
- Adjusted related documentation and tests to ensure consistency with the new structure.
- Improved overall organization of storage-related code for better maintainability and readability.
- Added a new `@workglow/dataset` package to manage dataset-related functionalities, including chunk vector storage and document management.
- Updated various components to utilize the new dataset package, replacing references to the storage package where applicable.
- Enhanced the `bun.lock` and `package.json` files to include the new dataset package as a dependency across relevant modules.
- Refactored existing tasks and tests to integrate with the new dataset structure, ensuring compatibility and improved functionality.
- Updated documentation to reflect the introduction of the dataset package and its features.
…nvention

- Updated the codebase to replace references from `repository` to `dataset`, enhancing clarity in data management terminology.
- Refactored various tasks and schemas to align with the new dataset structure, including `DocumentChunkDataset` and `DocumentDataset`.
- Removed deprecated chunk vector storage components and introduced new vector storage implementations for PostgreSQL and SQLite.
- Enhanced the TODO list with new items related to dataset management and improved documentation to reflect these changes.
- Updated tests to ensure compatibility with the new dataset naming and structure.
- Introduced support for auto-generated primary keys across all TabularStorage implementations, enhancing security and simplifying client interactions.
- Updated schemas to include `x-auto-generated: true` for primary key fields, allowing automatic ID generation during entity insertion.
- Refactored `put` and `putBulk` methods to handle entities with optional auto-generated keys, ensuring compatibility with existing data structures.
- Enhanced documentation and README files to reflect the new auto-generated key features and usage examples.
- Added comprehensive tests to validate the functionality of auto-generated keys across various storage backends, including InMemory, SQLite, Postgres, and IndexedDB.
- Updated existing tests to ensure they align with the new auto-generated key logic and structure.
@sroussey sroussey self-assigned this Jan 18, 2026
@sroussey sroussey requested a review from Copilot January 18, 2026 07:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the codebase to rename "Repository" classes to "Storage" classes and removes the provenance tracking system. It also introduces auto-generated primary keys for tabular storage, vector storage capabilities, and a new registration system for tasks and repositories.

Changes:

  • Renamed all *Repository classes to *Storage classes (e.g., SqliteTabularRepositorySqliteTabularStorage)
  • Removed provenance tracking throughout the task-graph system
  • Added auto-generated primary key support with configurable key generation strategies
  • Introduced vector storage implementations with similarity search
  • Added input resolver system for automatic dependency injection of repositories/models

Reviewed changes

Copilot reviewed 188 out of 251 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
packages/test/src/binding/*.ts Updated imports from *Repository to *Storage classes
packages/test/src/binding/RegisterTasks.ts New file providing centralized task registration
packages/tasks/src/task/*.ts Removed TaskRegistry.registerTask() calls and updated task instantiation patterns
packages/tasks/src/common.ts Added registerCommonTasks() function for centralized registration
packages/task-graph/src/task/*.ts Removed provenance tracking, added input resolver system
packages/storage/src/tabular/*.ts Renamed classes, added auto-generated key support
packages/storage/src/vector/*.ts New vector storage implementations with similarity search
packages/dataset/src/*.ts New dataset package with document storage schemas

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 26 to 27
"bun": "./src/bun.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports are pointing to source TypeScript files instead of compiled JavaScript/declaration files. This breaks published packages since TypeScript source files won't be included. These should point to ./dist/bun.js and ./dist/types.d.ts respectively.

Suggested change
"bun": "./src/bun.ts",
"types": "./src/types.ts",
"bun": "./dist/bun.js",
"types": "./dist/types.d.ts",

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 27
"bun": "./src/bun.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports are pointing to source TypeScript files instead of compiled outputs. This will cause runtime errors in published packages. Should point to ./dist/bun.js and ./dist/types.d.ts.

Suggested change
"bun": "./src/bun.ts",
"types": "./src/types.ts",
"bun": "./dist/bun.js",
"types": "./dist/types.d.ts",

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 27
"bun": "./src/bun.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports pointing to source files instead of compiled outputs. This breaks the published package distribution. Should reference ./dist/bun.js and ./dist/types.d.ts.

Suggested change
"bun": "./src/bun.ts",
"types": "./src/types.ts",
"bun": "./dist/bun.js",
"types": "./dist/types.d.ts",

Copilot uses AI. Check for mistakes.
Comment on lines 48 to 49
"bun": "./src/bun.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports incorrectly pointing to TypeScript source files. Published packages need to reference compiled outputs: ./dist/bun.js and ./dist/types.d.ts.

Suggested change
"bun": "./src/bun.ts",
"types": "./src/types.ts",
"bun": "./dist/bun.js",
"types": "./dist/types.d.ts",

Copilot uses AI. Check for mistakes.
Comment on lines 41 to 42
"bun": "./src/bun.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports pointing to source TypeScript files instead of built artifacts. This breaks package distribution. Should point to ./dist/bun.js and ./dist/types.d.ts.

Copilot uses AI. Check for mistakes.
Comment on lines 46 to 47
"types": "./src/bun.ts",
"import": "./src/bun.ts"
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ./bun export is pointing types to a TypeScript source file instead of a declaration file. Should be ./dist/bun.d.ts for types.

Suggested change
"types": "./src/bun.ts",
"import": "./src/bun.ts"
"types": "./dist/bun.d.ts",
"import": "./dist/bun.js"

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 27
"bun": "./src/bun.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports pointing to source TypeScript files. This prevents the package from working when published. Should reference ./dist/bun.js and ./dist/types.d.ts.

Suggested change
"bun": "./src/bun.ts",
"types": "./src/types.ts",
"bun": "./dist/bun.js",
"types": "./dist/types.d.ts",

Copilot uses AI. Check for mistakes.
"bun": "./dist/browser.js",
"types": "./dist/types.d.ts",
"bun": "./src/browser.ts",
"types": "./src/types.ts",
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package exports referencing TypeScript source files instead of compiled JavaScript. Should point to ./dist/browser.js and ./dist/types.d.ts.

Suggested change
"types": "./src/types.ts",
"types": "./dist/types.d.ts",

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@sroussey sroussey merged commit e9d47ce into main Jan 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants