loss function #70

saeedalahmari3 · 2018-03-27T12:38:05Z

I didn't understand this loss functoin return -dice_coef(y_true, y_pred). For backpropagation I think we need a differentiable loss function for instance return 0.5*math.pow(1-dice_coef(y_true, y_pred),2)
Is it true?

The text was updated successfully, but these errors were encountered:

jocicmarko · 2018-03-28T08:12:04Z

dice_coef is a function that we want to minimize. You can minimize it in number of ways:

-dice_coef
1/dice_coef
your proposition

jmargeta · 2018-03-28T08:12:48Z

Hi @saeedalahmari3,
This project defines Dice coefficient as:

def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

Compared to its traditional definition it is already smooth and differentiable.
And negative (-dice_coef) of a differentiable function is also a differentiable function, even without pow.

Since y_pred and y_true are bounded between 0 and 1 (if y_pred is the result of a softmax), Dice coefficient should also always be positive and bounded.

The pow(..., 2) or K.square is used, for example, in the mean_squared_error loss:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

pow is differentiable, but that is not its purpose there.
Most importantly pow makes the (y_pred - y_true) difference always positive (which Dice coefficient is by definition). The 0.5 there is only a cosmetic detail for prettier math on paper.
Please also note that instead of math.pow(..., 2) it probably should be K.pow(...., 2).

Of course, you could even try to use the mean_squared_error loss for segmentation, but this could largely ignore the less dominant class (that is why there is Dice coefficient in the first place instead of more frequently used binary_crossentropy).

For more loss functions see here.

But maybe I misunderstand. Why would you need any pow(..., 2) here?

As @jocicmarko says it could work, your function stays differentiable.
Of course, give it a try and use the best outcome, after all, that is what we all care about :)

nabsabraham · 2018-08-22T20:04:00Z

@jmargeta, I'm sorry if this is a trivial question, but how does the addition of the smooth term make this function differentiable?

jmargeta · 2018-08-23T13:37:08Z

@BrownPanther, adding a small constant to the denominator prevents the possibility of division by zero when K.sum(y_true_f) + K.sum(y_pred_f) equals to 0 . It would otherwise be a point with no defined derivative.

Even if the sum is a very small positive number, adding eps dampens the dramatic changes in gradient that could be even caused by single pixel changes in the prediction/groundtruth.

nabsabraham · 2018-08-24T12:22:25Z

@jmargeta thanks for the explanation! but, even before the addition of the smooth term, the dice function is differentiable because the last layer's output is a sigmoid/softmax ie probabilities rather than 0 or 1, correct? The smooth term just helps with the gradient flow - is this understanding correct?

jmargeta · 2018-09-03T14:45:16Z

@BrownPanther Yes, even without the smooth term the function itself would be differentiable almost everywhere (except for K.sum(y_true_f) + K.sum(y_pred_f) == 0). The term, however, makes the function differentiable everywhere, with no exceptions.

Using sigmoid/softmax in the last layer does not influence the differentiability of the dice function itself. Passing a continuous input into a differentiable function results in a continuous change of its output and can indeed help with the gradient flow.

In the end, perfect differentiability is often not such a big deal.
Even one of the most commonly used activation functions of today - ReLU (def relu(x): return max(0, x)) is not differentiable at x=0 and we can compute its sub-derivatives (1 if x > 0 else 0) and train the nets rather successfully.

jizhang02 · 2019-03-06T15:33:17Z

hello,
I use dice loss function in U-net, but the predicted images were all white or grey.
But when I use the default binary_cross_entropy loss function, the predicted images are good.
Is there something wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss function #70

loss function #70

saeedalahmari3 commented Mar 27, 2018

jocicmarko commented Mar 28, 2018

jmargeta commented Mar 28, 2018 •

edited

Loading

nabsabraham commented Aug 22, 2018

jmargeta commented Aug 23, 2018 •

edited

Loading

nabsabraham commented Aug 24, 2018

jmargeta commented Sep 3, 2018

jizhang02 commented Mar 6, 2019

loss function #70

loss function #70

Comments

saeedalahmari3 commented Mar 27, 2018

jocicmarko commented Mar 28, 2018

jmargeta commented Mar 28, 2018 • edited Loading

nabsabraham commented Aug 22, 2018

jmargeta commented Aug 23, 2018 • edited Loading

nabsabraham commented Aug 24, 2018

jmargeta commented Sep 3, 2018

jizhang02 commented Mar 6, 2019

jmargeta commented Mar 28, 2018 •

edited

Loading

jmargeta commented Aug 23, 2018 •

edited

Loading