Skip to content

Conversation

@sroussey
Copy link
Collaborator

@sroussey sroussey commented Jan 3, 2026

No description provided.

…ved input data handling for tests

- Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference.

- quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new VectorQuantizeTask for efficient vector quantization and refactors vector utilities into reusable modules. The changes improve code organization by extracting common vector operations from VectorSimilarityTask into dedicated utility files.

  • New VectorQuantizeTask supporting multiple quantization types (INT8, UINT8, INT16, UINT16, FLOAT16, FLOAT32, FLOAT64)
  • Refactored vector utilities into VectorUtils and VectorSimilarityUtils modules for reusability
  • Updated VectorSimilarityTask to use the new utility functions and renamed similarity parameter to method

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
packages/util/src/vector/VectorUtils.ts New utility module providing magnitude, inner product, and normalize functions for vector operations
packages/util/src/vector/VectorSimilarityUtils.ts New utility module with cosine, Jaccard, and Hamming similarity/distance calculations
packages/util/src/vector/TypedArray.ts Type definitions and JSON schemas for supported typed array types (Float16/32/64, Int8/16, Uint8/16)
packages/util/src/vector/Tensor.ts Schema definitions for tensor/vector data structures with type, data, shape, and normalization properties
packages/util/src/json-schema/SchemaValidation.ts Updated import to use @sroussey/json-schema-library package
packages/util/src/common.ts Added exports for new vector utility modules
packages/util/package.json Updated dependency from json-schema-library to @sroussey/json-schema-library
packages/test/src/test/task/VectorQuantizeTask.test.ts Comprehensive test suite for VectorQuantizeTask covering all quantization types and edge cases
packages/task-graph/src/task/Task.ts Updated stripSymbols to preserve TypedArrays by detecting ArrayBuffer views
packages/ai/src/task/index.ts Added export for VectorQuantizeTask
packages/ai/src/task/base/AiTaskSchemas.ts Refactored to import TypedArray and related types from @workglow/util, removed duplicate definitions
packages/ai/src/task/VectorSimilarityTask.ts Refactored to use imported similarity functions from @workglow/util, removed local implementations, renamed similarity parameter to method
packages/ai/src/task/VectorQuantizeTask.ts New task implementing vector quantization with normalization and multiple target type support
packages/ai/src/task/TextEmbeddingTask.ts Updated imports to use TypedArraySchema from @workglow/util
packages/ai/src/task/ImageEmbeddingTask.ts Updated imports to use TypedArraySchema from @workglow/util
packages/ai-provider/src/hf-transformers/common/HFT_JobRunFns.ts Updated to import TypedArray from @workglow/util instead of @workglow/ai
packages/ai-provider/README.md Updated comment to use "Vector" instead of "TypedArray" in code example
bun.lock Updated lockfile with new @sroussey/json-schema-library dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 196 to 204
private quantizeToUint8(values: number[]): Uint8Array {
// Find min/max for scaling
const min = Math.min(...values);
const max = Math.max(...values);
const range = max - min || 1;

// Scale to [0, 255]
return new Uint8Array(values.map((v) => Math.round(((v - min) / range) * 255)));
}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantizeToUint8 and quantizeToUint16 methods use spread operator with Math.min/Math.max on the entire values array. For large vectors, this is inefficient as it creates multiple intermediate arrays. Consider using a single loop to find both min and max values simultaneously, which would be more performant and memory-efficient.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 12 to 85
export function magnitude(arr: TypedArray | number[]): number {
// @ts-ignore - Vector reduce works but TS doesn't recognize it
return Math.sqrt(arr.reduce((acc, val) => acc + val * val, 0));
}

/**
* Calculates the inner (dot) product of two vectors
*/
export function inner(arr1: TypedArray, arr2: TypedArray): number {
// @ts-ignore - Vector reduce works but TS doesn't recognize it
return arr1.reduce((acc, val, i) => acc + val * arr2[i], 0);
}

/**
* Normalizes a vector to unit length (L2 normalization)
*
* @param vector - The vector to normalize
* @param throwOnZero - If true, throws an error for zero vectors. If false, returns the original vector.
* @returns Normalized vector with the same type as input
*/
export function normalize(vector: TypedArray, throwOnZero = true): TypedArray {
const mag = magnitude(vector);

if (mag === 0) {
if (throwOnZero) {
throw new Error("Cannot normalize a zero vector.");
}
return vector;
}

const normalized = Array.from(vector).map((val) => Number(val) / mag);

// Preserve the original Vector type
if (vector instanceof Float64Array) {
return new Float64Array(normalized);
}
if (vector instanceof Float32Array) {
return new Float32Array(normalized);
}
if (vector instanceof Int8Array) {
return new Int8Array(normalized);
}
if (vector instanceof Uint8Array) {
return new Uint8Array(normalized);
}
if (vector instanceof Int16Array) {
return new Int16Array(normalized);
}
if (vector instanceof Uint16Array) {
return new Uint16Array(normalized);
}
// For other integer arrays, use Float32Array since normalization produces floats
return new Float32Array(normalized);
}

/**
* Normalizes an array of numbers to unit length (L2 normalization)
*
* @param values - The array of numbers to normalize
* @param throwOnZero - If true, throws an error for zero vectors. If false, returns the original array.
* @returns Normalized array of numbers
*/
export function normalizeNumberArray(values: number[], throwOnZero = false): number[] {
const norm = magnitude(values);

if (norm === 0) {
if (throwOnZero) {
throw new Error("Cannot normalize a zero vector.");
}
return values;
}

return values.map((v) => v / norm);
}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly introduced VectorUtils module (magnitude, inner, normalize, normalizeNumberArray functions) lacks test coverage. Given that the repository has comprehensive testing for other utility functions, these vector utility functions should also have tests to ensure correctness, especially for edge cases like zero vectors, different typed array types, and Float16Array handling.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 13 to 80
export function cosineSimilarity(a: TypedArray, b: TypedArray): number {
if (a.length !== b.length) {
throw new Error("Vectors must have the same length");
}
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
const denominator = Math.sqrt(normA) * Math.sqrt(normB);
if (denominator === 0) {
return 0;
}
return dotProduct / denominator;
}

/**
* Calculates Jaccard similarity between two vectors
* Uses the formula: sum(min(a[i], b[i])) / sum(max(a[i], b[i]))
* Returns a value between 0 and 1
*/
export function jaccardSimilarity(a: TypedArray, b: TypedArray): number {
if (a.length !== b.length) {
throw new Error("Vectors must have the same length");
}

let minSum = 0;
let maxSum = 0;

for (let i = 0; i < a.length; i++) {
minSum += Math.min(a[i], b[i]);
maxSum += Math.max(a[i], b[i]);
}

return maxSum === 0 ? 0 : minSum / maxSum;
}

/**
* Calculates Hamming distance between two vectors (normalized)
* Counts the number of positions where vectors differ
* Returns a value between 0 and 1 (0 = identical, 1 = completely different)
*/
export function hammingDistance(a: TypedArray, b: TypedArray): number {
if (a.length !== b.length) {
throw new Error("Vectors must have the same length");
}

let differences = 0;

for (let i = 0; i < a.length; i++) {
if (a[i] !== b[i]) {
differences++;
}
}

return differences / a.length;
}

/**
* Calculates Hamming similarity (inverse of distance)
* Returns a value between 0 and 1 (1 = identical, 0 = completely different)
*/
export function hammingSimilarity(a: TypedArray, b: TypedArray): number {
return 1 - hammingDistance(a, b);
}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly introduced VectorSimilarityUtils module (cosineSimilarity, jaccardSimilarity, hammingDistance, hammingSimilarity functions) lacks test coverage. Given that the repository has comprehensive testing for other utility functions and these functions are now factored out from VectorSimilarityTask, they should have dedicated tests to ensure correctness across different typed array types and edge cases like zero vectors and mismatched lengths.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 44 to 65
// Preserve the original Vector type
if (vector instanceof Float64Array) {
return new Float64Array(normalized);
}
if (vector instanceof Float32Array) {
return new Float32Array(normalized);
}
if (vector instanceof Int8Array) {
return new Int8Array(normalized);
}
if (vector instanceof Uint8Array) {
return new Uint8Array(normalized);
}
if (vector instanceof Int16Array) {
return new Int16Array(normalized);
}
if (vector instanceof Uint16Array) {
return new Uint16Array(normalized);
}
// For other integer arrays, use Float32Array since normalization produces floats
return new Float32Array(normalized);
}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalize function doesn't handle Float16Array type preservation. When a Float16Array is passed to normalize, it will fall through all the instanceof checks and default to returning a Float32Array, losing type information. This is inconsistent with the TypedArray type definition which includes Float16Array as a supported type.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 23 to 27
/**
* Vector schema for representing vectors as arrays of numbers
* @param annotations - Additional annotations for the schema
* @returns The vector schema
*/
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc comment describes this as "Vector schema" but the function and type are named "TensorSchema" and "Tensor". The documentation should be updated to use "Tensor" consistently or the naming should be clarified to explain the relationship between vectors and tensors in this context.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 212 to 220
private quantizeToUint16(values: number[]): Uint16Array {
// Find min/max for scaling
const min = Math.min(...values);
const max = Math.max(...values);
const range = max - min || 1;

// Scale to [0, 65535]
return new Uint16Array(values.map((v) => Math.round(((v - min) / range) * 65535)));
}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantizeToUint16 method uses spread operator with Math.min/Math.max on the entire values array. For large vectors, this is inefficient as it creates multiple intermediate arrays. Consider using a single loop to find both min and max values simultaneously, which would be more performant and memory-efficient.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

...annotations,
}) as const satisfies JsonSchema;

export type Vector = FromSchema<ReturnType<typeof TensorSchema>, TypedArraySchemaOptions>;
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exported type is named "Vector" but the schema function is named "TensorSchema". This naming inconsistency is confusing. Either the type should be named "Tensor" to match the schema, or the schema should be named "VectorSchema" to match the type. The comments in the file also refer to "vector" rather than "tensor".

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

sroussey and others added 11 commits January 3, 2026 19:35
- Updated IExecuteContext and IRunConfig to include registry support.
- Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval.
- Ensured backward compatibility while enhancing the overall architecture for better service management.
- Introduced a service registry to manage model repositories and execution contexts in AiTask.
- Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances.
- Implemented repository and model resolution capabilities, improving task input handling and validation.
- Created new schemas for tabular, vector, and document repositories to facilitate input resolution.
- Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries.
- Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks.
…ities

- Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows.
- Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution.
- Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers.
- Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.
Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #163, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #164, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #165, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #166, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #167, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #169, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 22 commits January 4, 2026 00:59
* Initial plan

* Remove unused query variable from InputResolver test

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Improve markdown auto-detection with robust pattern matching

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Update Tensor.ts to use consistent "tensor" terminology throughout

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Optimize quantizeToUint8 and quantizeToUint16 to use single loop for min/max

Co-authored-by: sroussey <[email protected]>

* Add empty array guard to quantization methods

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
…ument.addVariant (#158)

* Initial plan

* Use extractConfigFields for type-safe provenance handling

Co-authored-by: sroussey <[email protected]>

* Add comprehensive tests for type-safe provenance handling

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Add comprehensive tests for VectorSimilarityUtils

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Extract magic number 512 to DEFAULT_MAX_TOKENS constant

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
* Initial plan

* Fix naming inconsistency: rename Vector to Tensor in Tensor.ts

Co-authored-by: sroussey <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
…lizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly.
…normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion.
* Initial plan

* Add circular reference detection to smartClone method

Co-authored-by: sroussey <[email protected]>

* Fix circular reference detection to handle shared references correctly

Co-authored-by: sroussey <[email protected]>

* Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method

- Updated TaskEvents to import TaskStatus from the correct module.
- Added comprehensive unit tests for the smartClone method, including cases for circular reference detection and handling various data structures.

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: sroussey <[email protected]>
Co-authored-by: Steven Roussey <[email protected]>
@sroussey sroussey changed the title [feat] New VectorQuantizeTask, updated VectorSimilarityTask [feat] Repo registries and RAG workflows Jan 4, 2026
@sroussey sroussey requested a review from Copilot January 4, 2026 02:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 126 out of 127 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sroussey sroussey marked this pull request as draft January 4, 2026 02:32
…f model arrays

- Updated setGlobalModelRepository parameter name for clarity.
- Enhanced resolveModelFromRegistry to support both single and array of model IDs.
- Modified resolveSchemaInputs to handle string values and arrays of strings more effectively, ensuring proper resolution of inputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants