Question about nesterov method #42

chrismaliszewski · 2021-06-19T05:46:17Z

Either I don't understand something or there is something wrong.
In the second for look the code says:

for idx, gradient in enumerate(model.derivate(x,y)):
            # Here we need to do a bit of gymnastic because of how the code is setup
            # We need to save the parameters state, modify it, do the simulation and then reset the parameter state
            # The update happen in the next section
            prev_weight = model.weights[idx]
            model.weights[idx] = decay_factor*gradient
            g[idx] = decay_factor*g[idx] + learning_rate*gradient
            model.weights[idx] = prev_weight         
            # Update the model parameter
            model.weights[idx] = model.weights[idx] - g[idx]

If you check the code you set prev_weight's value, then you set model.weights, then you never use model.weights anywhere with the new value and soon after you assign the same value of prev_weight to weights.

My questions are:
a. Why the derivative is calculated in x,y (theta) instead of theta - gamma * v_t-1?
b. What's the purpose of changing model.weights value here if it's never used?
c. What's the purpose of prev_weights?

What don't I understand? What am I missing?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about nesterov method #42

Question about nesterov method #42

chrismaliszewski commented Jun 19, 2021 •

edited

Loading

Question about nesterov method #42

Question about nesterov method #42

Comments

chrismaliszewski commented Jun 19, 2021 • edited Loading

chrismaliszewski commented Jun 19, 2021 •

edited

Loading