Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused entry-wise layers #194

Open
timmoon10 opened this issue Feb 6, 2018 · 2 comments
Open

Fused entry-wise layers #194

timmoon10 opened this issue Feb 6, 2018 · 2 comments

Comments

@timmoon10
Copy link
Collaborator

Our forward/backward prop implementation requires that we store every layer's activations and error signals. However, fusing entry-wise operations together would help us avoid having to store intermediate values, increasing our capacity. Since these operations are often memory-bound, it would also boost performance since we would access data once instead of at each forward/backward prop step. Steps to implement this functionality:

  • The fused entry-wise layer can execute a series of operations, possibly implemented as lambda functions.
  • The fused entry-wise layer can perform entry-wise automatic differentiation of its sequence of operations.
  • The model can parse its layer graph and construct an appropriate fused entry-wise layer.

This functionality will become especially important if #193 is implemented since custom objective functions will often require a sequence of entry-wise operations prior to a reduction.

@timmoon10
Copy link
Collaborator Author

I'm not sure if this is currently possible since CUDA kernels don't support polymorphism. Attempts to mimick polymorphism with device function pointers haven't had any success.

@ndryden
Copy link
Collaborator

ndryden commented Oct 29, 2018

I think it may be worth looking at what other frameworks do, since fusing operations is a common optimization. TensorFlow has a combination of manually fused operations and their XLA compiler can do it both ahead of time and JIT. PyTorch has tensor comprehensions (and here). Caffe2 (which is merging into PyTorch) also does kernel fusion for deployment. MXNet also does fusion. Chainer doesn't do it yet, but appears to be moving in that direction.

@ndryden ndryden reopened this Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants