You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are LIST OF DOUBLE and ARRAY_OF_DOUBLES in the functions handled as float?
What about vector types ARRAY_OF_SHORTS, ARRAY_OF_INTEGERS, ARRAY_OF_LONGS?
Is ARRAY_OF_FLOATS a LIST OF FLOAT? If so what is [1.0, 2.0] by default? LIST or ARRAY_OF_FLOAT? I assume a list? I guess this hierarchy needs documenting.
Is SparseVector a SQL data type? Can a property have this type?
The dot product computation (sum of squares) is implemented in various functions, should there be one that is reused?
Specific Remarks:
vectorDimension
Errors for a NULL argument, IMHO should return 0 like .length() or .size() for a NULL argument.
Wouldn't vectorDim be sufficient as a name, given that it is also vectorHasInf and not vectorHasInfinity?
vectorHasNaN, vectorHasInf
Cannot be tested with LIST OF FLOAT because NAN values are automatically converted to NULL which is not allowed in typed collections.
SELECT vectorHasNaN([1.0,sqrt(-1.0),3.0]) errors with Cannot invoke "Object.getClass()" because "elem" is null
Is maybe a function vectorHasNull needed?
vectorIsNormalized
The test for normalization is numerically as complex as normalizing itself, so instead of testing for normality one could just normalize.
Why is the default threshold 0.001 is this supposed to be approximately sqrt(eps) for float? Then it should be 0.0000001 for doubles.
Wouldn't vectorIsNormal be sufficient as name?
vectorAdd, vectorSubtract
How to broadcast? Meaning how to add (or subtract) a scalar from or to a vector without creating a vector, ie [1.0,2.0,3.0] + 4.0 (which currently would add the element 4.0 to the vector instead of adding 4.0 to every element)
Wouldn't vectorSub be sufficient as a name?
vectorMagnitude
Why is this function not named vectorL2Norm, symmertrically with vectorL1Norm and vectorLInfNorm?
vectorLInfNorm
The loop does not need a conditional if something like maxAbs = Math.max(maxAbs, Math.abs(value)) is used.
vectorSparsity
Should there be a default threshold like sqrt(eps)?
Alternatively or additionally the L0 pseudonorm could be computed as sparsity measure (Geometric mean of absolute values)
vectorSum, vectorAvg, vectorMax, vectorMin
There is an error in the vectorSum function as repeated calls of SELECT vectorSum([1.0,2.0,3.0]) yield different (increasing) results, same when using a property as argument. vectorAvg does not have this problem.
These seems to work different than other aggregating functions, which for a single argument aggregate over the argument, which would mean here for example the sum of vector elements.
vectorAvg does not produce the arithmetic average (just one specific property of a record)
vectorStdDev, vectorVariance
Unlike the variance and stddev SQL functions, these are not aggregating.
Unlike the variance and stddev SQL functions, vectorStdDev is not reusing the vectorVariance code.
vectorClip
Since this is called clamp in Java terminology, should this be renamed?
vectorCosineSimilarity
The two loops in the computation can be merged into one.
vectorQuantizeBinary
Why is the median used to decide?
Why is there no vectorDequantizeBinary? At least for completeness.
vectorDequantizeInt8
This does not work SELECT vectorDequantizeInt8(vectorQuantizeInt8([1.0, 2.0, 3.0]), 1.0, 3.0) and gives the error Quantized vector must be an array or list, found: QuantizationResult
vectorApproxDistance
What means ranking is preserved for INT8? Vector spaces are not ordered. Is this meant element-wise?
Why can't the function deduce the quantization from its arguments?
The following errors SELECT vectorApproxDistance(vectorQuantizeInt8([1.0, 2.0, 3.0]),vectorQuantizeInt8(1.0, 3.0, 3.0),'INT8') with vectorQuantizeInt8(<vector>)
vectorNormalizeScores
Wouldn't it be faster to create a new array with the midpoint value for the edge case of range zero instead of looping?
vectorMultiScore
The associated Java class filename does nt fit the pattern (misses the Vector prefix).
Why is the weighted average an extra type, and not just 'AVG' with an extra argument, or AVG is always weighted but by default with the vector or ones.
vectorHybridScore
This is just a special case of vectorMultiScore for the case of two scores with a weighted average. Is this extra function needed?
This is not really a vector function as it does not handle vectors.
vectorRRFScore
This produces wrong results for more than two scores as it cannot be distinguished between optional last argument and score.
Why are the scores not grouped into a vector as for vectorMultiScore?
This is not really a vector function as it does not handle vectors.
vectorScoreTransformation
This is not really a vector function as it does not handle vectors.
LN would be more clear in terms of type of logarithm than LOG.
additionally TANH might be a useful variant to SIGMOID.
vectorDenseToSparse
Associated java class filename differs from pattern (vector prefix missing)
Associated java class filename differs from pattern (vector prefix missing)
vectorToString
Are these meant to copy paste into code / scripts or saved to a file which then loaded?
When using the numpyfromString method is used a comma-separated list is expected, the brackets are only for code AFAIK
In MATLAB the separator determines the type of vector space (or comma!) produce a row vector, while semi-colon results in column vector
Julia is similar to MATLAB in many regards, so the MATLAB variant should work also in Julia, but it would be more obvious if also a 'JULIA' would be available.
General Remarks:
LIST OF DOUBLEandARRAY_OF_DOUBLESin the functions handled as float?ARRAY_OF_SHORTS,ARRAY_OF_INTEGERS,ARRAY_OF_LONGS?ARRAY_OF_FLOATSaLIST OF FLOAT? If so what is[1.0, 2.0]by default?LISTorARRAY_OF_FLOAT? I assume a list? I guess this hierarchy needs documenting.SparseVectora SQL data type? Can a property have this type?Specific Remarks:
vectorDimensionNULLargument, IMHO should return0like.length()or.size()for aNULLargument.vectorDimbe sufficient as a name, given that it is alsovectorHasInfand notvectorHasInfinity?vectorHasNaN,vectorHasInfLIST OF FLOATbecauseNANvalues are automatically converted toNULLwhich is not allowed in typed collections.SELECT vectorHasNaN([1.0,sqrt(-1.0),3.0])errors withCannot invoke "Object.getClass()" because "elem" is nullvectorHasNullneeded?vectorIsNormalized0.001is this supposed to be approximatelysqrt(eps)for float? Then it should be0.0000001for doubles.vectorIsNormalbe sufficient as name?vectorAdd,vectorSubtract[1.0,2.0,3.0] + 4.0 (which currently would add the element 4.0 to the vector instead of adding 4.0 to every element)vectorSubbe sufficient as a name?vectorMagnitudevectorL2Norm, symmertrically withvectorL1NormandvectorLInfNorm?vectorLInfNormmaxAbs = Math.max(maxAbs, Math.abs(value))is used.vectorSparsitysqrt(eps)?vectorSum,vectorAvg,vectorMax,vectorMinvectorSumfunction as repeated calls ofSELECT vectorSum([1.0,2.0,3.0])yield different (increasing) results, same when using a property as argument.vectorAvgdoes not have this problem.vectorAvgdoes not produce the arithmetic average (just one specific property of a record)vectorStdDev,vectorVariancevarianceandstddevSQL functions, these are not aggregating.varianceandstddevSQL functions,vectorStdDevis not reusing thevectorVariancecode.vectorClipclampin Java terminology, should this be renamed?vectorCosineSimilarityvectorQuantizeBinaryvectorDequantizeBinary? At least for completeness.vectorDequantizeInt8SELECT vectorDequantizeInt8(vectorQuantizeInt8([1.0, 2.0, 3.0]), 1.0, 3.0)and gives the errorQuantized vector must be an array or list, found: QuantizationResultvectorApproxDistanceINT8? Vector spaces are not ordered. Is this meant element-wise?SELECT vectorApproxDistance(vectorQuantizeInt8([1.0, 2.0, 3.0]),vectorQuantizeInt8(1.0, 3.0, 3.0),'INT8')withvectorQuantizeInt8(<vector>)vectorNormalizeScoresvectorMultiScoreVectorprefix).'AVG'with an extra argument, orAVGis always weighted but by default with the vector or ones.vectorHybridScorevectorMultiScorefor the case of two scores with a weighted average. Is this extra function needed?vectorfunction as it does not handle vectors.vectorRRFScorevectorMultiScore?vectorfunction as it does not handle vectors.vectorScoreTransformationvectorfunction as it does not handle vectors.LNwould be more clear in terms of type of logarithm thanLOG.TANHmight be a useful variant toSIGMOID.vectorDenseToSparsevectorprefix missing)vectorAsSparse?vectorSparseCreate,vectorSparseDot,vectorSparseToDensevectorprefix missing)vectorToStringnumpyfromStringmethod is used a comma-separated list is expected, the brackets are only for code AFAIK'JULIA'would be available.vectorAsString?