Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensor: allow compute to bypass caching mechanism #408

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rohany
Copy link
Contributor

@rohany rohany commented Feb 13, 2021

This commit adds a boolean to the TensorBase::compute method allowing
for bypass of the compute caching mechanism. This is useful when
benchmarking the generated kernels.

This commit adds a boolean to the `TensorBase::compute` method allowing
for bypass of the compute caching mechanism. This is useful when
benchmarking the generated kernels.
@rohany
Copy link
Contributor Author

rohany commented Feb 25, 2021

cc @stephenchouca this PR makes it easier to write benchmarks, so I'd like to merge it (and also into the array_algebra branch) for the ooplsa benchmarks

@stephenchouca
Copy link
Contributor

Is this change really necessary? As an example, if one wants to benchmark SpMV multiple times, they can already do something like this with the current API:

for (int t = 0; t < trials; t++) {
  Tensor<double> y({A.getDimension(0)}, dv);
  y(i) = A(i,j) * x(j);
  y.compile();
  auto begin = std::chrono::high_resolution_clock::now();
  y.assemble();
  y.compute();                                             
  auto end = std::chrono::high_resolution_clock::now();
  auto diff = std::chrono::duration<double,std::milli>(end - begin).count();
}

The compute kernel would be run in every iteration since y is redefined, but compilation would actually still only happen once due to the code caching mechanism.

More fundamentally though, I wonder if perhaps we should be discouraging people from benchmarking TACO using the lazy evaluation API. As things are now, it can be difficult for users to figure out what's actually being timed, since there's so much happening in the background. For instance, if A and x were not already packed in the previous example, it wouldn't be at all obvious that the previous example actually also measures the amount of time it takes to pack the operands. Perhaps there needs to be a way for users to simply disable the lazy evaluation API so that it's more intuitive what exactly each method call is doing at runtime.

@rohany
Copy link
Contributor Author

rohany commented Feb 25, 2021

Is this change really necessary

probably not. I understand that you can benchmark it this way, but it just becomes sort of cumbersome and slower to run than being able to just do the computation in the benchmark loop.

More fundamentally though, I wonder if perhaps we should be discouraging people from benchmarking TACO using the lazy evaluation API. As things are now, it can be difficult for users to figure out what's actually being timed, since there's so much happening in the background. For instance, if A and x were not already packed in the previous example, it wouldn't be at all obvious that the previous example actually also measures the amount of time it takes to pack the operands. Perhaps there needs to be a way for users to simply disable the lazy evaluation API so that it's more intuitive what exactly each method call is doing at runtime.

I agree. I don't have suggestions as of now, but probably not how the CLI does it (manually lowering and then calling the packed functions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants