Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss function #70

Open
saeedalahmari3 opened this issue Mar 27, 2018 · 7 comments
Open

loss function #70

saeedalahmari3 opened this issue Mar 27, 2018 · 7 comments

Comments

@saeedalahmari3
Copy link

I didn't understand this loss functoin return -dice_coef(y_true, y_pred). For backpropagation I think we need a differentiable loss function for instance return 0.5*math.pow(1-dice_coef(y_true, y_pred),2)
Is it true?

@jocicmarko
Copy link
Owner

dice_coef is a function that we want to minimize. You can minimize it in number of ways:

  • -dice_coef
  • 1/dice_coef
  • your proposition

@jmargeta
Copy link

jmargeta commented Mar 28, 2018

Hi @saeedalahmari3,
This project defines Dice coefficient as:

def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

Compared to its traditional definition it is already smooth and differentiable.
And negative (-dice_coef) of a differentiable function is also a differentiable function, even without pow.

Since y_pred and y_true are bounded between 0 and 1 (if y_pred is the result of a softmax), Dice coefficient should also always be positive and bounded.

The pow(..., 2) or K.square is used, for example, in the mean_squared_error loss:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

pow is differentiable, but that is not its purpose there.
Most importantly pow makes the (y_pred - y_true) difference always positive (which Dice coefficient is by definition). The 0.5 there is only a cosmetic detail for prettier math on paper.
Please also note that instead of math.pow(..., 2) it probably should be K.pow(...., 2).

Of course, you could even try to use the mean_squared_error loss for segmentation, but this could largely ignore the less dominant class (that is why there is Dice coefficient in the first place instead of more frequently used binary_crossentropy).

For more loss functions see here.

But maybe I misunderstand. Why would you need any pow(..., 2) here?

As @jocicmarko says it could work, your function stays differentiable.
Of course, give it a try and use the best outcome, after all, that is what we all care about :)

@nabsabraham
Copy link

@jmargeta, I'm sorry if this is a trivial question, but how does the addition of the smooth term make this function differentiable?

@jmargeta
Copy link

jmargeta commented Aug 23, 2018

@BrownPanther, adding a small constant to the denominator prevents the possibility of division by zero when K.sum(y_true_f) + K.sum(y_pred_f) equals to 0 . It would otherwise be a point with no defined derivative.

Even if the sum is a very small positive number, adding eps dampens the dramatic changes in gradient that could be even caused by single pixel changes in the prediction/groundtruth.

@nabsabraham
Copy link

@jmargeta thanks for the explanation! but, even before the addition of the smooth term, the dice function is differentiable because the last layer's output is a sigmoid/softmax ie probabilities rather than 0 or 1, correct? The smooth term just helps with the gradient flow - is this understanding correct?

@jmargeta
Copy link

jmargeta commented Sep 3, 2018

@BrownPanther Yes, even without the smooth term the function itself would be differentiable almost everywhere (except for K.sum(y_true_f) + K.sum(y_pred_f) == 0). The term, however, makes the function differentiable everywhere, with no exceptions.

Using sigmoid/softmax in the last layer does not influence the differentiability of the dice function itself. Passing a continuous input into a differentiable function results in a continuous change of its output and can indeed help with the gradient flow.

In the end, perfect differentiability is often not such a big deal.
Even one of the most commonly used activation functions of today - ReLU (def relu(x): return max(0, x)) is not differentiable at x=0 and we can compute its sub-derivatives (1 if x > 0 else 0) and train the nets rather successfully.

@jizhang02
Copy link

hello,
I use dice loss function in U-net, but the predicted images were all white or grey.
But when I use the default binary_cross_entropy loss function, the predicted images are good.
Is there something wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants