Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Bazel cache for TensorFlow builds #5

Open
angerson opened this issue May 14, 2020 · 21 comments
Open

Provide Bazel cache for TensorFlow builds #5

angerson opened this issue May 14, 2020 · 21 comments

Comments

@angerson
Copy link
Contributor

Providing a TensorFlow build cache could be very helpful to external developers, and lower the barrier to entry of contributing to TF.

Some ideas for this we've discussed before are:

  • Offer Bazel RBE resources on behalf of SIG Build. This service is in alpha on GCP.
  • Provide a read-only build cache in a GCP bucket.
  • Provide devel_cache Docker images containing a build cache (these could be very large)
  • Provide code-and-cache volumes for the docker devel images.

See also:

@angerson
Copy link
Contributor Author

I'm looking into the feasibility of providing GCP resources (likely a long-term discussion) and devel_cache images as an evaluation (short-term, but no ETA).

@bhack
Copy link
Contributor

bhack commented May 15, 2020

I want to just add a reference that we could need to solve this also for let the user to adopt the new Github Codespace/Vscode Remote (tensorflow/addons#1309) or for Gitpod (tensorflow/tensorflow#38755).

@bhack
Copy link
Contributor

bhack commented May 15, 2020

It would be also nice as many SIGs builds using github Actions CI infra, specially the ones with c++/cuda custom ops, if we could find a way to recycle the bazel cache to speed-up CI builds.
We have tried to use the bazel cache in Action cache for the CI (tensorflow/addons#1655) but it is not working. If you see in this ticket we have external request on Github Action repo.

@lgeiger
Copy link

lgeiger commented May 15, 2020

It would be also nice as many SIGs builds using github Actions CI infra

This would be excellent! For reference, some time ago there were some discussions about improving bazel cache support in GitHub actions at actions/cache#109

@bhack
Copy link
Contributor

bhack commented May 15, 2020

@lgeiger Our ticket was actions/cache#260. I don't know if they could be fused or not.

@bhack
Copy link
Contributor

bhack commented May 15, 2020

This will be orthogonal to the approved TF modularizzation RFC

@gunan
Copy link

gunan commented May 29, 2020

We have started to explore internally to see if we can share our RBE cache. We will also look into if we can share a GCS cache.

@bhack
Copy link
Contributor

bhack commented May 29, 2020

@gunan Thanks I've intercepted this candidate dup tensorflow/tensorflow#34719. Probably you can find some other ones on the TF repo.

@gunan
Copy link

gunan commented May 29, 2020

Yes, this has been a long running problem for TF. And as TF gets bigger it will only get worse.

@bhack
Copy link
Contributor

bhack commented Jun 15, 2020

If this is going to take too much time can we find an intermediate goal like having support for python only PR?
I think that it could be easier as an intermediate step. What do you think?

@bhack
Copy link
Contributor

bhack commented Jul 26, 2020

See what kind of bad hack I need to suggest tensorflow/tensorflow#41701 (comment)

@bhack
Copy link
Contributor

bhack commented Aug 2, 2020

@bhack
Copy link
Contributor

bhack commented Aug 29, 2020

I've tested your initial cache inside official TF Docker devel image but it has not the cross tools (d7/d8) like RBE and custom-ops Dockerfiles/images.

We have a threads in SIG-build Gitter channel

@adriangb
Copy link

adriangb commented Nov 23, 2020

This would be great. It is very frustrating that I have to spin up docker images and compile C++ code overnight just to test a single line of code change to a Python function. The barrier to entry to contributing is extremely high. What I often end up doing is copying test_xyz.py as test.py, editing the tensorflow install in my virtual env and running test.py then crossing my fingers that CI passes.

@bhack
Copy link
Contributor

bhack commented Nov 24, 2020

Also when we are mounting the bazel cache inside the official Tensorflow Docker devel container we need to improve the stale cache handling.
Too often I see Deleting stale sandbox base is it related to bazelbuild/bazel#8525? Seems that one was closed in Bazel 3.4.0.

@bhack
Copy link
Contributor

bhack commented Jan 20, 2021

In the meantime can we reply to https://groups.google.com/a/tensorflow.org/g/developers/c/1OJLv2ew7pA?

Is there a quick solution to iterate and modify the source code and run an example in the source dir without building and installing the wheel?

@bhack
Copy link
Contributor

bhack commented Apr 1, 2021

It seems that now we have a read only cache for TF IO but still not for Tensorflow contributors:

tensorflow/io#1294

@AdityaKane2001
Copy link

@bhack
Given this situation, what is the best way to build TensorFlow while making small changes to the codebase? Can you please outline the procedure? TIA

@bhack
Copy link
Contributor

bhack commented Apr 19, 2021

With @angerson and @perfinion we are prototyping with tensorflow/tensorflow#48421 (and #24) to continuously execute and monitor the external developer contribution experience/overhead (compile, lint and test).

/cc @theadactyl @nikitamaia

@bhack
Copy link
Contributor

bhack commented Nov 30, 2021

I think we could close this and monitor the build reproducibility and cache efficiency in #48

@bhack
Copy link
Contributor

bhack commented Sep 6, 2022

We have now a PR at tensorflow/tensorflow#57630 if you want to support/review/imporve this baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants