Rag v8 #176

sroussey · 2026-01-11T03:44:52Z

No description provided.

* Rag (#170) * [feat] New VectorQuantizeTask, updated VectorSimilarityTask * [WIP] rework document * [refactor] Update task input handling and smartClone method for improved input data handling for tests - Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference. - quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning * [refactor] Removed unnecessary checks for undefined values when copying additional input properties. * [refactor] Enhance tasks with service registry integration - Updated IExecuteContext and IRunConfig to include registry support. - Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval. - Ensured backward compatibility while enhancing the overall architecture for better service management. - Introduced a service registry to manage model repositories and execution contexts in AiTask. * [feat] Introduce input resolver system for enhanced schema handling - Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances. - Implemented repository and model resolution capabilities, improving task input handling and validation. - Created new schemas for tabular, vector, and document repositories to facilitate input resolution. - Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries. - Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks. * [feat] Introduce new AI tasks and enhance document processing capabilities - Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows. - Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution. - Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers. - Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system. * Update packages/ai/src/task/QueryExpanderTask.ts * Update packages/task-graph/src/task/Task.ts * Update packages/util/src/vector/Tensor.ts * Update packages/ai/src/task/VectorQuantizeTask.ts * Update packages/util/src/vector/VectorUtils.ts * Optimize quantizeToUint8 and quantizeToUint16 with single-pass min/max * Remove unused query variable from InputResolver test (#161) * Fix edge case: return non-zero range for empty arrays in findMinMax * Fix markdown auto-detection to use header pattern matching (#157) * Improve markdown auto-detection with robust pattern matching * Remove unused task class imports from ChunkToVector.test.ts (#160) * Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask * Fix inconsistent vector/tensor terminology in Tensor.ts (#167) * Update Tensor.ts to use consistent "tensor" terminology throughout * Optimize VectorQuantizeTask min/max calculation for large vectors (#168) * Optimize quantizeToUint8 and quantizeToUint16 to use single loop for min/max * Add empty array guard to quantization methods * Replace unsafe type assertions with type-safe field extraction in Document.addVariant (#158) * Use extractConfigFields for type-safe provenance handling * Add comprehensive tests for type-safe provenance handling * Add test coverage for VectorSimilarityUtils functions (#165) * Add comprehensive tests for VectorSimilarityUtils * Extract magic number to named constant in ProvenanceUtils (#159) * Extract magic number 512 to DEFAULT_MAX_TOKENS constant * Add support for Float16Array in normalize function of VectorUtils.ts * Fix naming inconsistency between Vector type and TensorSchema (#169) * Fix naming inconsistency: rename Vector to Tensor in Tensor.ts * Enhance jaccardSimilarity function to handle negative values by normalizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly. * [test] Add tests for VectorUtils, covering magnitude, inner product, normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion. * Add circular reference detection to smartClone method (#162) * Fix circular reference detection to handle shared references correctly * Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method * Enhance StructuralParser to include title in document nodes * Enhance DocumentSchema to include title field and update required properties * [refactor] Document and InputResolver modules - Removed re-export of schemas and types from Document.ts for cleaner module structure. - Enhanced AiTask's getDefaultQueueName method to handle single model inputs and throw an error for multiple models. - Cleaned up InputResolver by removing unnecessary re-exports, streamlining the module for better clarity. - Added comprehensive tests for Document functionality, ensuring robust handling of variants and provenance. * Refactor Document class to use optional chaining and nullish coalescing in getChunks method for improved safety. Update README to clarify vector metadata structure and enrich metadata fields for hierarchical documents. * Refactor DocumentEnricherTask to utilize ModelConfig for summary and NER model parameters, enhancing type safety and clarity in method signatures. * Refactor provenance handling to support array structure - Updated the Provenance type to be an array of ProvenanceItem, allowing for multiple provenance entries. - Modified extractConfigFields and related functions to handle provenance as an array, enhancing type safety and flexibility. - Adjusted Document and task classes to utilize the new provenance structure, ensuring consistent handling across the codebase. - Updated tests to reflect changes in provenance structure and validate functionality. * Refactor ModelRegistry and InputResolver for improved type handling of model arrays - Updated setGlobalModelRepository parameter name for clarity. - Enhanced resolveModelFromRegistry to support both single and array of model IDs. - Modified resolveSchemaInputs to handle string values and arrays of strings more effectively, ensuring proper resolution of inputs. * [refactor] Remove ArrayTask from between JobQueueTask and Task. Refactor AI task schemas to simplify model handling - Updated various AI task schemas to replace array-based model definitions with single model references, enhancing clarity and type safety. - Adjusted input schemas for tasks such as BackgroundRemovalTask, ImageClassificationTask, and others to reflect these changes. - Removed unnecessary type handling for model arrays in AiTask and AiVisionTask classes, streamlining the codebase. - Enhanced the GraphAsTask and JobQueueTask classes to support the new model structure, ensuring compatibility across the task framework. * [refator] Remove Provenance from task and task graph - Removed the Provenance type and related handling from various classes, including Task, TaskRunner, and Dataflow, to streamline the codebase. - Updated Document and HierarchicalChunkerTask to directly use VariantProvenance, enhancing clarity and type safety. - Adjusted method signatures and removed unused provenance-related methods across the task graph framework. - Updated tests to reflect changes in provenance structure and validate functionality. * [refactor] Simplify Document handling by removing Provenance and variants - Removed Provenance-related functionality from the Document class, including the handling of variants and associated methods. - Updated Document methods to manage chunks directly, enhancing clarity and reducing complexity. - Adjusted related schemas and tests to reflect the removal of Provenance and the shift to a chunk-based structure. - Ensured compatibility across the codebase by updating references and method signatures accordingly.

- Added DocumentRepository and DocumentRepositoryRegistry for improved document storage and retrieval. - Updated Document class to support chunk handling and enhanced constructor for better initialization. - Removed IDocumentRepository interface and InMemoryDocumentRepository implementation to streamline the codebase. - Adjusted related tests to utilize the new DocumentRepository structure, ensuring comprehensive coverage of document operations.

- Updated all instances of the `search` method to `similaritySearch` across various repositories and tasks for consistency in naming. - Adjusted related documentation to reflect the new method name, ensuring clarity in usage. - Enhanced the InMemoryVectorRepository to utilize an internal tabular repository for improved data handling and storage.

…o use Partial<Input> like their parents. - Changed constructor parameter type for input in AiTask and JobQueueTask from Input to Partial<Input> for improved flexibility in input handling. - Ensured compatibility with existing functionality while enhancing type safety.

- Refactored import statements in multiple AI task files to ensure consistency and clarity. - Updated task schemas to utilize the new TypeReplicateArray and DeReplicateFromSchema functions for improved type handling. - Enhanced constructor calls in task functions to align with recent changes in input handling, ensuring better compatibility and flexibility.

- Bumped versions of several dependencies including caniuse-lite, @typescript-eslint/eslint-plugin, @typescript-eslint/parser, globals, and turbo for improved functionality and compatibility. - Updated devDependencies to their latest versions to ensure better performance and security across the project.

- Introduced a new model sample for NeuroBERT NER, including its configuration for text named entity recognition tasks. - Removed the LaMini-Flan-T5-783M model sample to streamline the list of available models.

…t text handling - Updated TextEmbeddingInputSchema and TextEmbeddingOutputSchema to support single or array inputs for text and vector properties, improving flexibility in handling embeddings. - Refactored HFT_TextEmbedding function to validate and process both single and array text inputs, ensuring correct dimension checks and tensor extraction for multiple embeddings. - Added error handling for dimension mismatches to enhance robustness in embedding operations.

…dation and type compatibility - Updated error message in Task class to display input keys instead of the entire input object for clearer debugging. - Refactored Workflow class to introduce new methods for extracting type identifiers from schemas and checking type compatibility, enhancing the matching logic for task inputs and outputs. - Improved handling of required inputs by implementing strategies to connect unmatched required inputs from earlier tasks, ensuring better integration within the task graph.

- Updated TFMP_TextEmbedding function to process both single and array inputs for text, allowing for batch embedding generation. - Added error handling for empty embedding results to ensure robustness in embedding operations. - Improved return structure to accommodate multiple embeddings in a single response.

…ialized

…ArraySchemaOptions, simplify overall - Refactored multiple tabular repository classes to incorporate TypedArraySchemaOptions in their entity definitions, enhancing type handling for typed arrays. - Updated service tokens to utilize AnyTabularRepository for improved type flexibility across the repository implementations. - Ensured consistency in constructor signatures and type definitions across various repository files.

…ma flexibility - Refactored InMemoryVectorRepository, PostgresVectorRepository, and SqliteVectorRepository to support custom schemas, allowing for greater flexibility in vector storage. - Updated constructor signatures to require schema definitions, primary key names, and indexes, ensuring consistency across implementations. - Enhanced metadata handling by introducing dedicated methods for finding vector and metadata columns in schemas. - Improved type definitions and event handling to align with the new schema structure, facilitating better integration and type safety across vector repositories.

…TabularRepository tests - Removed unused imports from SupabaseTabularRepository test files for better clarity. - Simplified type definitions in test classes by eliminating unnecessary generic parameters, enhancing readability and maintainability. - Updated setupDatabase method documentation to clarify default behavior regarding table existence.

- Renamed limiter-related exports to queue-limiter for better organization - Updated tabular repository classes to extend from BaseTabularRepository, instead of TabularRepository.

- Replaced local imports with imports from @workglow/storage for consistency. - Updated type definitions to use AnyVectorRepository and BaseTabularRepository for improved type flexibility. - Modified input schema properties in VectorSimilarityTask to enhance clarity and maintainability.

… vector handling - Introduced new tasks: DocumentChunkRetrievalTask, DocumentChunkVectorHybridSearchTask, DocumentChunkVectorSearchTask, and DocumentChunkVectorUpsertTask to enhance vector processing capabilities. - Updated existing tasks and schemas to standardize the use of `doc_id` instead of `docId` for consistency across the codebase. - Refactored vector repository implementations to support document chunk vectors, improving schema flexibility and integration. - Cleaned up imports and type definitions across various files to enhance maintainability and readability.

…nt nodes - Introduced new tasks: DocumentNodeRetrievalTask, DocumentNodeVectorHybridSearchTask, DocumentNodeVectorSearchTask, and DocumentNodeVectorUpsertTask to enhance vector processing capabilities for document nodes. - Updated existing documentation and schemas to reflect the transition from document chunk vectors to document node vectors, ensuring consistency in naming conventions. - Refactored vector repository implementations to support document node vectors, improving schema flexibility and integration. - Cleaned up imports and type definitions across various files to enhance maintainability and readability.

sroussey added 22 commits January 11, 2026 03:44

[feat] Add NeuroBERT NER model sample to ONNXModelSamples

19c79d9

- Introduced a new model sample for NeuroBERT NER, including its configuration for text named entity recognition tasks. - Removed the LaMini-Flan-T5-783M model sample to streamline the list of available models.

[refactor] Update JsonSchema and TypedArray so typed arrays are deser…

f394f08

…ialized

[refactor] Remove EdgeVecRepository for now

428d7a9

[feat] Introduce new vector-related tasks for enhanced workflow tests

e7ac63c

[fix] for dimentions

b361c57

[refactor] Update repository structure

8cc8617

- Renamed limiter-related exports to queue-limiter for better organization - Updated tabular repository classes to extend from BaseTabularRepository, instead of TabularRepository.

[refactor] move tests

f8ce8d5

github-actions bot assigned sroussey Jan 11, 2026

sroussey merged commit 62be244 into rag-v9 Jan 11, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rag v8 #176

Rag v8 #176

Uh oh!

sroussey commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rag v8 #176

Rag v8 #176

Uh oh!

Conversation

sroussey commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants