Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor value insertion APIs #85

Open
fredrikbk opened this issue May 22, 2017 · 9 comments
Open

Tensor value insertion APIs #85

fredrikbk opened this issue May 22, 2017 · 9 comments

Comments

@fredrikbk
Copy link
Contributor

fredrikbk commented May 22, 2017

To ease matrix assembly in finite element codes we should support duplicate entry insertions into tensors. These entries should be summed before packing. This feature was asked for by @sueda.

@shoaibkamil
Copy link
Contributor

shoaibkamil commented May 25, 2017

I really think we should rethink these semantics. Matlab, Eigen don't have these semantics and I think more generally, this violates the principle of least surprise. Instead, can we use some form of +=, whether it's a different function (e.g. insertAndSum()) or whether we take the Eigen approach (insert() assumes the element does not exist, otherwise use coeffRef(i,j)) or some other thing we come up with where semantics are more clear.

@shoaibkamil shoaibkamil reopened this May 25, 2017
@sueda
Copy link

sueda commented May 25, 2017

From the user's perspective, I think it's important to have a way to add elements into the tensor without having to care if that element was 0 previously. So, += would work as long as the user can call it on an empty tensor.

  • Matlab doesn't have +=, but its sparse matrices allow readback, so I can say something like A = sparse(3,3); A(1,3) = A(1,3) + 1.0;.

  • In Eigen, I don't user insert() or coeffRef(i,j) but use triplet insertion.

@shoaibkamil
Copy link
Contributor

@sueda I see that setFromTriplets() does do summation for duplicate entries. I think we need to consider whether our default semantics should be equivalent to insert() or setFromTriplets() and we should also provide the other semantics.

@fredrikbk
Copy link
Contributor Author

fredrikbk commented May 25, 2017

Thanks for reopening this and I agree with all the above concerns. The current thinking is that we have a concept of a stash. When you insert values they go into the stash and the values in the stash are not a part of the tensor until you pack them. This design was intended to match the Eigen triplet insertion that @sueda mentions, but I'm not very comfortable that it's hidden inside TensorBase (see design proposal below). The reason for the stash concept is that for many formats it is expensive to insert into it and we should try to design our semantics so that the performance cost of operations are quite clear.

I propose the following design to make the stash concept explicit (and to get closer to the Eigen triplet insertion). We add a new class called ValueList similar to Eigen's triplet vectors. You can insert any number of values to it and they can be duplicates. Then we provide a method TensorBase::pack that packs the ValueList into a tensor. The pack method takes a ValueList and an enum that describes what to do with duplicates (sum, last, etc.). With the ValueList the current TensorBase::insert method and the stash concept disappears.

// What to do with duplicates
enum DuplicatePolicy {Sum, Last};

// Method to pack a ValueList into a tensor
void TensorBase::pack(const ValueList&, DuplicatePolicy);

// Factory function to construct tensors from ValueLists?
TensorBase pack(const ValueList&, DuplicatePolicy);

@fredrikbk
Copy link
Contributor Author

fredrikbk commented May 25, 2017

I agree with @shoaibkamil that we should provide semantics to actually read and insert into a tensor. For this maybe we should try to make it work with integers to index into TensorBase::operator() (see #24)? So to insert a values, maybe:

A(0,1) = 1;
A(0,1) += 1;

We have to be pretty clear about the unpredictable performance semantics of these operations in the documentation if we provide them. For dense storage it's not a problem (O(1)), but for sparse storage this can be very expensive (O(nnz)). We can play tricks to reduce the cost, which is great but which will also cause the performance semantics to be even more unpredictable.

Maybe we can provide both this and the insert @shoaibkamil mentions that assumes non-existence?

@fredrikbk
Copy link
Contributor Author

@sueda Do you generally avoid direct indexing into sparse matrices in a Matlab codes in favor of constructing them from some triplet list-like form?

@fredrikbk fredrikbk changed the title Duplicate entries Tensor value insertion APIs May 25, 2017
@fredrikbk fredrikbk added the api label May 25, 2017
@sueda
Copy link

sueda commented May 25, 2017

@fredrikbk I use direct indexing in Matlab because I usually start with dense matrices and switch to sparse later. With direct indexing, I don't have to change the code. I am sure it is slower than creating a triplet list and then creating a matrix from it.

@fredrikbk
Copy link
Contributor Author

@sueda that makes sense. Perhaps you'd be able to use ValueList in much the same way as you use dense matrices in Matlab if we provide an interface to ValueList that looks a lot like tensors? Then the pack would be the same as converting to sparse.

@sueda
Copy link

sueda commented May 27, 2017

@fredrikbk I use direct indexing in Matlab for portability between dense/sparse, but I am equally happy with triplet insertion. I am wondering if adding such an interface to ValueList might end up being more confusing to the user, since the matrix does not yet exist at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants