-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GrayBox predictor #96
Conversation
I actually think this is super cool! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool. In addition to my inline comments, it would be nice to avoid adding unnecessary additional nonlinear operators if I want to use the same NN for multiple sets of inputs. I am also not sure the full space formulation seems necessary, but I suppose it is consistent with the rest of the predictors.
return only(Flux.outputsize(predictor, (length(x),))) | ||
end | ||
function with_jacobian(x) | ||
ret = Flux.withjacobian(x -> predictor(Float32.(x)), collect(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the Flux model always use Float32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the default. Different precision is a bit all over the place, since if you throw x::Vector{Float64}
in, it automatically changes your weights to the same type. I dislike this aspect of Flux. At the very least, let's wait until someone complains before addressing this. It works for the tests.
return map(1:predictor.predictor.output_size(x)) do i | ||
op_i = JuMP.add_nonlinear_operator( | ||
model, | ||
length(x), | ||
(x...) -> f(i, x...), | ||
(g, x...) -> ∇f(g, i, x...); | ||
name = Symbol("op_$(gensym())"), | ||
) | ||
return op_i(x...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a Hessian function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do people compute Hessian's of NNs? Or you just want arbitrary possibility? Do you have an example where this is useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sense is to leave as-is for the first pass. We can always add it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a paper I am about to submit, we used hessians with a PyTorch NN in an optimal control problem and saw a significant speed up. This was done with PyNumero's graybox interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link me the code of getting Hessians etc out of torch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll dig it up from my former student that just graduated. In the meantime, I know that we used torch.func
which provides functions to evaluate the Jacobian and the Hessian directly: https://pytorch.org/docs/stable/func.api.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found jacobian = torch.autograd.functional.jacobian(model, x)
. But func
seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried torch.autograd.functional
, but it was quite a bit slower. Notably, we did leverage the batch abilities of torch.func
to evaluate all the gradients of a NN over a different sets of inputs which probably gave torch.func
an extra advantage.
Another cool thing is that this has little to no restrictions on what layers the |
Lux support is complicated, because you need to bring your own AD system. I'll leave out for now. |
As a first pass, I think this can be merged. We can come back and improve the performance, and I'll open an issue to add Hessian support. |
Part of #90
@pulsipher is this what you had in mind?
I'll do something similar for Lux and PyTorch.
And I'll also sort vector-valued outputs.