Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: Soft dynamic-time-warping divergence for FDataGrid #412

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from

Conversation

Clej
Copy link
Contributor

@Clej Clej commented Jan 13, 2022

Content
I'm proposing the soft dynamic-time-warping (SDTW) divergence as a metric between FDataGrid objects. The domain dimension (time) must be one. The co-domain dimension can be greater than one. I use it for my work and noted in #363 that asked classical dtw implementation.

This metric is a smooth version of the DTW which is defined by a min operator and depends on a pre-defined alignment cost (ie a discrepancy measure between all pairs (x(t_i), x(t_j))), eg squared euclidean cost, or roughly speaking a Gram-matrix. In SDTW, the min operator is replaced by the soft-min (through a negative log-sum-exp like function). The smoothness is controlled by gamma. SDTW divergence is positive, symmetric, indiscernable (SDTW(X, Y) = 0 <=> X=Y) but does not satisfy the triangle inequality (depends on the alignment cost). See "Differentiable Divergences Between Time Series", Blondel et al., 2021 AISTATS. Thus it can be used like a distance for clustering, k-nn classification, etc.

The main advantages of the SDTW divergence are:

  • it offfers possibility to compare multivariate functional samples with varying number of grid points,
  • it is more robust to temporal shifts than Lp distances, and thus the comparison is both temporal and geometric.

Implementation
Since SDTW is defined by the soft-min operator, computing SDTW divergence between two FDataGrid objects, X and Y, requires an iterative procedure to solve the minimization problem, that we solve with a dynamic-programming recursion in O(mn) where m and n are the respective size of the grid points of X and Y. I coded the recursion with ctyhon in sdtw_fast.pyx, and so have added a setup.py in ~/misc/metrics to generate the .c extension.

Improvements

  • Test: what do you suggest as unit test case(s) ?
  • Examples within the documentation sdtwDivergence, with different cost alignment
  • PairwiseMetric, tried to implement it but I did not manage to understand what to return exactly in _pairwise_metric_optimization_sdtw.

Usage:

import numpy as np
from skfda.datasets import make_sinusoidal_process
from skfda.misc.metrics import sdtwDivergence
from sklearn.gaussian_process.kernels import DotProduct, RBF

n = 50
n_grid = 10**3
x_curves = make_sinusoidal_process(
    n_samples=n,
    n_features=n_grid,
    start=0.0,
    stop=50.0,
    period=30.0,
    phase_mean=3.0,
    amplitude_mean=5.0,
    amplitude_std=1.0,
    random_state=11
)

# default alignment cost is the half squared euclidean distance
sdtw_hsqe= sdtwDivergence(gamma=1.0, cost=None)
sdtw_hsqe(x_curves[0], x_curves[1])

# which is equivalent to
cost_hsqe = DotProduct(sigma_0=0)
def cost_lin(X, Y):
    mat = -1 * cost_hsqe(X, Y)
    mat += 0.5 * cost_hsqe.diag(X)[:, np.newaxis]
    mat += 0.5 * cost_hsqe.diag(Y)
    return mat

sdtw_hsqe= sdtwDivergence(gamma=1.0, cost=cost_lin)
sdtw_hsqe(x_curves[0], x_curves[1])

# the cost can be a non-linear kernel:
rbf = RBF(length_scale=1.0)
def cost_rbf(X, Y):
    mat = -1 * rbf(X, Y)
    mat += 0.5 * rbf.diag(X)[:, np.newaxis]
    mat += 0.5 * rbf.diag(Y)
    return mat

sdtw_rbf= sdtwDivergence(gamma=1.0, cost=cost_rbf)
sdtw_rbf(x_curves[0], x_curves[1])

Any other suggestion

skfda/misc/metrics/sdtw_fast.c Outdated Show resolved Hide resolved
skfda/misc/metrics/sdtw_fast.pyx Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 8, 2022

Codecov Report

Base: 85.53% // Head: 85.06% // Decreases project coverage by -0.47% ⚠️

Coverage data is based on head (f43c73b) compared to base (e69154b).
Patch coverage: 26.66% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #412      +/-   ##
===========================================
- Coverage    85.53%   85.06%   -0.47%     
===========================================
  Files          141      143       +2     
  Lines        11280    11370      +90     
===========================================
+ Hits          9648     9672      +24     
- Misses        1632     1698      +66     
Impacted Files Coverage Δ
skfda/misc/_math.py 84.37% <ø> (ø)
skfda/misc/metrics/_sdtw_distances.py 24.63% <24.63%> (ø)
skfda/misc/metrics/_sdtw_numba.py 33.33% <33.33%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Clej
Copy link
Contributor Author

Clej commented Feb 8, 2022

I wrote de code in python and used @jit decorators with fastmath=True and cache=True. Computation times are comparable to my cython version.

@Clej
Copy link
Contributor Author

Clej commented Feb 8, 2022

What is missing is the test part: any suggestion ? Checking expected properties of divergence like symmetry, positivity, indiscernibility would be enough ? @vnmabus

Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to check the algorithm.

For the tests, IMHO the best test is to compare the results with an existing implementation if there is any, for a particular dataset.

skfda/misc/_math.py Outdated Show resolved Hide resolved
skfda/misc/_math.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
from skfda.misc._math import half_sq_euclidean


def _check_shape_postive_cost_mat(cost, expected_shape, shape_only=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use type hints even in private functions, to catch possible errors.

skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
" ({}, {})".format(n1, n2)
)

_check_shape_postive_cost_mat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should always check these things. Maybe we can check in the tests for particular data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks inappropriate here but otherwise, when the cost matrix does not satisfay these properties the output value of the algorithm does not make any sense. That's why I put a boolean check_cost (default: False) as input.

Should I let it here or is the user supposed to check it before ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that if it is only done when check_cost is True, and it is false by default, it is not a problem.
By the way, the code can be made more compact (and better tested) by generating the costs if cost is a callable and then reusing the same code path as if cost where a sequence of ndarray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so only a callable as for the input cost.
Actually, I could raise a warning instead of a ValueError, that lets more flexibility to the user (but increases the risk of making wrong conclusion) but at least make the user aware of the that risk WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to allow only a callable. I wanted to say that after the callable is evaluated, the code path for a callable input is the same as for a ndarray, so both can be merged together.
I am not sure about the warning/exception thing. It is easier to handle an exception programatically, and you can always call the function again without checking if that is what you want. Also, if the algorithm require these preconditions maybe it shouldn't continue. Is there any case in which you would want to call the algorithm breaking the preconditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I've been busy too.
So I followed your suggestion, the main code path is the following:
(1) pre-computations of cost matrices,
(2) Check the properties of each cost matrix if check_cost is True,
(3) Return the sdtw value

Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking too long to review this more thoroughly, I have been busy.

docs/refs.bib Outdated Show resolved Hide resolved
docs/refs.bib Outdated Show resolved Hide resolved
docs/refs.bib Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
# and symmetry
if cost is not None:
if isinstance(cost, tuple):
if not len(cost) == 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better (easier to understand) to have the positive condition in the if instead of the else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_numba.py Show resolved Hide resolved
Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is taking a great shape!

Remember to also add some tests for this feature, preferably comparing with an existing implementation and/or known results.

skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_numba.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Show resolved Hide resolved

# np.sum(X ** 2, axis=1)=inner_product(X, X)
cost_11 = 0.5 * inner_product(
arg1.data_matrix[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for not passing just the FDataGrid (thus using the information of the grid points to compute the functional inner product, instead of the multivariate one)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account the grid points could be an extension yes, but the original implementation (not functional based) does not consider the grid points (this is reminded in the function header), so the functions inputs only needs the data_matrixs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, checking again the original paper I now see what this function is supposed to do.
So, it receives as input two functions $x: \mathbb{R} \to \mathbb{R}^p$, discretized at points $t_1, \ldots, t_n$, and $y: \mathbb{R} \to \mathbb{R}^p$, discretized at points $s_1, \ldots, s_m$ and computes the matrix

$$ \begin{pmatrix} d(x(t_1), y(s_1)) & \ldots & d(x(t_1), y(s_m)) \\ \vdots & & \vdots \\ d(x(t_n), y(s_1)) & \ldots & d(x(t_n), y(s_m)) \\ \end{pmatrix} $$

with $d$ a distance function $d: \mathbb{R}^p \times \mathbb{R}^p \to \mathbb{R}$, right?

If that is the case, I would put that explicitly in the documentation, and maybe check that there is only one observation per FDataGrid just in case. Of course then it makes no sense to consider the actual values of the grid points, unless you want to consider the output as a function of two variables.

Copy link
Contributor Author

@Clej Clej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay now, except for the tests (another PR?).

Comment on lines +361 to +368
# @pairwise_metric_optimization.register
# def _pairwise_metric_optimization_sdtw(
# metric: sdtwDivergence,
# elem1: FData,
# elem2: FData,
# ) -> NDArrayFloat:

# return metric(elem1, elem2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part, I let the code commented because I don't know how to the decorator @pairwise_metrice_optimization.register must be used. Is its goal to compute a similarity or Gram matrix in a distributed way ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its goal is to be able to provide a more efficient implementation of the pairwise distance matrix between two data sets. Given the datasets $X = (x_1, \ldots, x_n)$ and $Y = (y_1, \ldots y_m)$, it computes

$$ \begin{pmatrix} d(x_1, y_1) & \ldots & d(x_1, y_m) \\ \vdots & & \vdots \\ d(x_n, y_1) & \ldots & d(x_n, y_m) \\ \end{pmatrix} $$

Note that by default $d$ can compute paired distances, that is, given two FData $X' = (x'_1, \ldots, x'_n)$ and $Y' = (y'_1, \ldots, y'_n)$, both of size $n$, it computes

$$ d(X', Y') = \begin{pmatrix} d(x'_1, y'_1) \\ \vdots \\ d(x'_n, y'_n) \\ \end{pmatrix} $$

By default, what it is done (if you do not implement this optimization) is to build the datagrids $X' = (x_1, \ldots, x_1, x_2, \ldots, x_2, \ldots, x_n, \ldots, x_n)$ and $Y' = (y_1, y_2, \ldots, y_m, y_1, y_2, \ldots, y_m, \ldots, y_1, y_2, \ldots, y_m)$ and compute their paired distance. This works, but performs badly and consume lots of memory. Note: I want to try alternatively to iterate over the $x_i$ and compute $d(x_i, Y)$. This will certainly reduce memory consumption and will probably be faster due to cache effects, even if it does an iteration in pure Python, but I did not have time to try.

skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_numba.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Outdated Show resolved Hide resolved
skfda/misc/metrics/_sdtw_distances.py Show resolved Hide resolved

# np.sum(X ** 2, axis=1)=inner_product(X, X)
cost_11 = 0.5 * inner_product(
arg1.data_matrix[0],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account the grid points could be an extension yes, but the original implementation (not functional based) does not consider the grid points (this is reminded in the function header), so the functions inputs only needs the data_matrixs.

Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have some tests (comparing with known results) as part of this PR, to guarantee that it works.

If you also want to do an example for the documentation, that could be in a separate PR.


# np.sum(X ** 2, axis=1)=inner_product(X, X)
cost_11 = 0.5 * inner_product(
arg1.data_matrix[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, checking again the original paper I now see what this function is supposed to do.
So, it receives as input two functions $x: \mathbb{R} \to \mathbb{R}^p$, discretized at points $t_1, \ldots, t_n$, and $y: \mathbb{R} \to \mathbb{R}^p$, discretized at points $s_1, \ldots, s_m$ and computes the matrix

$$ \begin{pmatrix} d(x(t_1), y(s_1)) & \ldots & d(x(t_1), y(s_m)) \\ \vdots & & \vdots \\ d(x(t_n), y(s_1)) & \ldots & d(x(t_n), y(s_m)) \\ \end{pmatrix} $$

with $d$ a distance function $d: \mathbb{R}^p \times \mathbb{R}^p \to \mathbb{R}$, right?

If that is the case, I would put that explicitly in the documentation, and maybe check that there is only one observation per FDataGrid just in case. Of course then it makes no sense to consider the actual values of the grid points, unless you want to consider the output as a function of two variables.

Comment on lines +361 to +368
# @pairwise_metric_optimization.register
# def _pairwise_metric_optimization_sdtw(
# metric: sdtwDivergence,
# elem1: FData,
# elem2: FData,
# ) -> NDArrayFloat:

# return metric(elem1, elem2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its goal is to be able to provide a more efficient implementation of the pairwise distance matrix between two data sets. Given the datasets $X = (x_1, \ldots, x_n)$ and $Y = (y_1, \ldots y_m)$, it computes

$$ \begin{pmatrix} d(x_1, y_1) & \ldots & d(x_1, y_m) \\ \vdots & & \vdots \\ d(x_n, y_1) & \ldots & d(x_n, y_m) \\ \end{pmatrix} $$

Note that by default $d$ can compute paired distances, that is, given two FData $X' = (x'_1, \ldots, x'_n)$ and $Y' = (y'_1, \ldots, y'_n)$, both of size $n$, it computes

$$ d(X', Y') = \begin{pmatrix} d(x'_1, y'_1) \\ \vdots \\ d(x'_n, y'_n) \\ \end{pmatrix} $$

By default, what it is done (if you do not implement this optimization) is to build the datagrids $X' = (x_1, \ldots, x_1, x_2, \ldots, x_2, \ldots, x_n, \ldots, x_n)$ and $Y' = (y_1, y_2, \ldots, y_m, y_1, y_2, \ldots, y_m, \ldots, y_1, y_2, \ldots, y_m)$ and compute their paired distance. This works, but performs badly and consume lots of memory. Note: I want to try alternatively to iterate over the $x_i$ and compute $d(x_i, Y)$. This will certainly reduce memory consumption and will probably be faster due to cache effects, even if it does an iteration in pure Python, but I did not have time to try.

) -> None:
"""check compatible, one sample and one-dimensional"""

if (fdata1.dim_domain != fdata2.dim_domain):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can reuse here the new functions check_fdata_same_dimensions and check_fdata_dimensions in skfda.misc.validation.

)


class sdtwDivergence():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class names start with uppercase. Thus it would be SDTWDivergence.

skfda/misc/metrics/_sdtw_numba.py Show resolved Hide resolved


@njit(cache=True)
def _sdtw(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is this the same (or similar) dynamic programming algorithm used for Fisher-Rao registration? If it is, we could also use it in that case and remove the dependency on fdasrsf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants