Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scores for FDataIrregular objects #609

Open
pcuestas opened this issue Apr 1, 2024 · 3 comments
Open

Scores for FDataIrregular objects #609

pcuestas opened this issue Apr 1, 2024 · 3 comments
Assignees

Comments

@pcuestas
Copy link
Contributor

pcuestas commented Apr 1, 2024

Motivation

Computing scores between FDataIrregular objects is a missing functionality of the package, and it can be useful when measuring the quality of conversions from irregular objects to basis representation.

Desired functionality

Compute scores when both y_true and y_pred are FDataIrregular objects.

How to implement each score?

There is a big problem when implementing scores for FDataIrregular: the mean of an FDataIrreuglar objects is not well defined. Most of the scores (for FData objects) involve computing the mean of an FData object.

We can surpass this issue in some of the cases when we want the "uniform_average" of the score and not the "raw_values".
An example where we can avoid computing the mean is mean_absolute_error. The mean absolute error is defined this way:
image
To avoid having to calculate the mean of the FDataIrregular when multioutput="uniform_average", we can change the order of the mean and the integral. That is, instead of:
image
We can use:
image
Where $D_i$ and $V_i$ correspond to the domain of the $i$-th irregular curve and its lebesgue measure, respectively. I am not sure if this choice of not using the whole domain $D$ and its volume $V$ is the best, perhaps it would be less confusing to not bother computing the $V_i$'s, but I believe that the result would be less accurate, implicitly giving more weight to curves that have more spread-out points.

This idea can be applied to mean_absolute_error, mean_absolute_percentage_error, mean_squared_error and mean_squared_log_error. I am going to implement these in feature/scoring-fdatairregular.

r2_score

I believe that the r2_score can not be implemented for the FDataIrregular case, as its definition is to compare how well y_pred predicts the values of y_true in relation to how well the mean does, and the mean is not defined.

A possible implementation of r2_score for FDataIrregular objects would be to just compute the r2_score of (y_true.values, y_pred.values). However, I do not think this is a good option, as it disregards the functional structure of the curves, ignoring the points where they are measured and the mean of the values does not have the same meaning as in the other cases (FDataGrid and FDataBasis). Moreover, a user can manually call r2_score(y_true.values, y_pred.values) explicitly, so I do not think we should implement this score for irregular data, as it is not properly defined.

The case of explained_variance_score is very similar to that of r2_score.

@pcuestas pcuestas self-assigned this Apr 1, 2024
pcuestas added a commit that referenced this issue Apr 1, 2024
(testing included to assert equality with the `FDataGrid` case)
@ooodragon94
Copy link

ooodragon94 commented Apr 14, 2024

hi, thank you for opening up the issue.
I think this is another method where FDataIrregular is not well defined on.

I'm trying to apply FPCA using this code.
https://fda.readthedocs.io/en/stable/auto_examples/plot_fpca_inverse_transform_outl_detection.html#sphx-glr-auto-examples-plot-fpca-inverse-transform-outl-detection-py

I have functions with R^3 -> R.

can FPCA be implemented on FDataIrregular too?

(or should I open up another issue?)

@pcuestas
Copy link
Contributor Author

Hello, @ooodragon94.

As I understand, your case is very different from the one I outlined in this issue. There are ways to implement FPCA for irregular data, but we haven't implemented that yet, as FDataIrregular is a very recent addition to the package. You should definitely open another issue explaining the type of data that you have and what you want to do in detail.

The development efforts tend to be steered towards what users request, so it will be very useful to know what you would like to have in the package.

@pcuestas
Copy link
Contributor Author

pcuestas commented Jun 30, 2024

After discussing this issue with @vnmabus and Alberto Suárez, we concluded that the integral of a functional data object should always be the integral over its domain $D$, and not over the interval bounded by the endpoints of the discretization grid (called $D_i$ in the original issue description). This is discussed in depth in #619.

In #610 , I have implemented the changes explained above; that is, dividing each integral by the measure $V_i$ of the smallest interval $D_i$ that contains the $i$-th curve's discretization points:

image

However, once the integral of discretized datasets is properly defined #619 (over the domain of the functional data object), these scores must be redefined so that the integrals are divided by the domain's measure: $V$, instead of $V_i$. For example, the MAE formula will be:

$$MAE = \frac{1}{\sum w_i}\sum_{i=1}^N w_i \frac{1}{V}\int_D |X_i(t) - \hat X_i(t)|\ dt.$$

vnmabus added a commit that referenced this issue Jul 5, 2024
Implement scores for `FDatairregular` objects as described in #609
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants