Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GrayBox predictor #96

Merged
merged 7 commits into from
Aug 29, 2024
Merged

Add GrayBox predictor #96

merged 7 commits into from
Aug 29, 2024

Conversation

odow
Copy link
Collaborator

@odow odow commented Aug 28, 2024

Part of #90

@pulsipher is this what you had in mind?

I'll do something similar for Lux and PyTorch.

And I'll also sort vector-valued outputs.

@odow odow changed the title Add gray_box kwarg to Flux Add GrayBox predictor Aug 29, 2024
@odow
Copy link
Collaborator Author

odow commented Aug 29, 2024

I actually think this is super cool!

test/test_Flux.jl Outdated Show resolved Hide resolved
Copy link

@pulsipher pulsipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. In addition to my inline comments, it would be nice to avoid adding unnecessary additional nonlinear operators if I want to use the same NN for multiple sets of inputs. I am also not sure the full space formulation seems necessary, but I suppose it is consistent with the rest of the predictors.

ext/MathOptAIFluxExt.jl Show resolved Hide resolved
return only(Flux.outputsize(predictor, (length(x),)))
end
function with_jacobian(x)
ret = Flux.withjacobian(x -> predictor(Float32.(x)), collect(x))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the Flux model always use Float32?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the default. Different precision is a bit all over the place, since if you throw x::Vector{Float64} in, it automatically changes your weights to the same type. I dislike this aspect of Flux. At the very least, let's wait until someone complains before addressing this. It works for the tests.

Comment on lines +90 to +98
return map(1:predictor.predictor.output_size(x)) do i
op_i = JuMP.add_nonlinear_operator(
model,
length(x),
(x...) -> f(i, x...),
(g, x...) -> ∇f(g, i, x...);
name = Symbol("op_$(gensym())"),
)
return op_i(x...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a Hessian function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people compute Hessian's of NNs? Or you just want arbitrary possibility? Do you have an example where this is useful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sense is to leave as-is for the first pass. We can always add it later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a paper I am about to submit, we used hessians with a PyTorch NN in an optimal control problem and saw a significant speed up. This was done with PyNumero's graybox interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link me the code of getting Hessians etc out of torch?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll dig it up from my former student that just graduated. In the meantime, I know that we used torch.func which provides functions to evaluate the Jacobian and the Hessian directly: https://pytorch.org/docs/stable/func.api.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found jacobian = torch.autograd.functional.jacobian(model, x). But func seems better.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried torch.autograd.functional, but it was quite a bit slower. Notably, we did leverage the batch abilities of torch.func to evaluate all the gradients of a NN over a different sets of inputs which probably gave torch.func an extra advantage.

@pulsipher
Copy link

Another cool thing is that this has little to no restrictions on what layers the Chain can use.

@odow
Copy link
Collaborator Author

odow commented Aug 29, 2024

Lux support is complicated, because you need to bring your own AD system. I'll leave out for now.

@odow
Copy link
Collaborator Author

odow commented Aug 29, 2024

As a first pass, I think this can be merged. We can come back and improve the performance, and I'll open an issue to add Hessian support.

@odow odow merged commit 1d5fb1d into main Aug 29, 2024
@odow odow deleted the od/gray-box branch August 29, 2024 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants