-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zigzag persistence part1 #917
base: master
Are you sure you want to change the base?
Conversation
std::set<internal_key> translatedBoundary; // set maintains the natural order on indices | ||
for (auto b : boundary) { | ||
translatedBoundary.insert(handleToKey_.at(b)); // TODO: add possibilities of coefficients | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the keys to a vector and then sorting it would be more natural, but that can wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took over how Clément did it in the original code. I have no preference about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::set is bound to be slow, but at this point I have no idea what parts of the code matter for performance.
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
* - For the other type, there are two classes: @ref Filtered_zigzag_persistence and | ||
* @ref Filtered_zigzag_persistence_with_storage. | ||
* They are both based on @ref Zigzag_persistence and manage additionally the filtration values which are ignored by | ||
* @ref Zigzag_persistence. They automatically translate the operation numbers into their corresponding filtration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(comment attached to a random line)
I think it makes a lot of sense to associate indexes and filtrations to arrows as you do. However, I think many people might think of filtration values as something associated to a space, not an arrow, and get a bit confused. I don't know if it is worth saying something about it, it may become clearer once we have more geometric examples with meaningful filtration values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good question. To keep in mind.
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
* diagram will have arrows alternating between forward and backward. | ||
* | ||
* The module is partitioned in two types of classes: filtered by filtration values and filtered by the atomic | ||
* operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there isn't really much to change. I would avoid saying that Zigzag_persistence ignores filtration values, which sounds like it has access to them and chooses not to use them; they just don't exist there.
My take: the main class is Zigzag_persistence, which indexes operations by their order and represents cells by the index of the operation that created them (we could have used the order "among insertions" to avoid holes in the numbering, but we waste at most a factor 2 this way, and since all the structures are sparse the holes don't take up any space, so it isn't worth keeping a second counter). The 2 other classes are convenience wrappers, which allow labeling the arrows (indexes are replaced by those labels when reporting interval bounds), arbitrary identifiers for cells, and maybe another detail or two.
As much as possible, I'd like to avoid giving the impression that the filtration values define the zigzag in the same way that the indexes do, because that leads to misunderstandings like the bar-splitting case above.
the barcode is still very different from the barcode in the other version
They don't look that different to me, but it is probably a matter of perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups, some of the comments were never submitted.
* @param arrowNumber Forward arrow number. | ||
*/ | ||
void add_birth_forward(index arrowNumber) { // amortized constant time | ||
birthToPos_.emplace_hint(birthToPos_.end(), arrowNumber, maxBirthPos_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably birthToPos_
was an ordered map
at some point and emplace_hint(end,*)
made sense, but now that it is an unordered_map, it looks a bit strange. Changing it to plain emplace
is not urgent though if you still want to experiment with other map types.
auto& col = matrix_.get_column(p.first); | ||
stream_infinite_interval(col.get_dimension(), p.second); | ||
} else { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get the information without having to throw and catch an exception? Sadly, C++ implementations have very costly exceptions and don't optimize them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add to the matrix a method like has_column(index)
which returns true if and only if the given index is not out of range. Otherwise, we can also hope that the benchmark with the oscillating rips branch shows that we can just keep erase_birth_history
at true and then we would just remove this piece of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_column
makes sense. It encourages making twice the same search in the hashtable though (has_column+get_column), so we could also consider a version of get_column that returns
- a
boost::optional<column&>
- a
column*
, with nullptr to indicate out_of_range
Indeed, if erase_birth_history
is just true
in the future, that makes this point less relevant.
Co-authored-by: Marc Glisse <[email protected]>
Co-authored-by: Marc Glisse <[email protected]>
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
* diagram will have arrows alternating between forward and backward. | ||
* | ||
* The module is partitioned in two types of classes: filtered by filtration values and filtered by the atomic | ||
* operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok (I am not very fond of using the word "index" for the filtration values, but there is enough context that it should be fine)
I notice that you don't mention anymore that cells can be represented by arbitrary faceID in the wrappers. That's ok, but then I think it would make sense to have a paragraph in the doc of the wrappers talking about faceID, or users might be surprised by this difference between the base class and the wrappers.
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
auto& col = matrix_.get_column(p.first); | ||
stream_infinite_interval(col.get_dimension(), p.second); | ||
} else { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_column
makes sense. It encourages making twice the same search in the hashtable though (has_column+get_column), so we could also consider a version of get_column that returns
- a
boost::optional<column&>
- a
column*
, with nullptr to indicate out_of_range
Indeed, if erase_birth_history
is just true
in the future, that makes this point less relevant.
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
* non-zero coefficients generating it. A face should be represented by the arrow number when the face appeared for | ||
* the first time in the filtration (if a face was inserted and then removed and reinserted etc., only the last | ||
* insertion counts). The face range should be ordered by increasing arrow numbers. | ||
* @param dimension Dimension of the inserted face. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, the only thing the dimension is used for is the output, and we wouldn't lose anything if we removed the dimension completely from this class, wrappers could handle it on their own.
I am not asking to do that (I haven't thought about it much), I just wanted to write this down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the same than for the matrix module: the dimension is only useful for the persistence diagram. I could make it optional for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine what removing dim
from Zigzag_persistence would look like. I am not so sure what making it optional would mean. It sounds like a lot of complications, extra overloads and if constexpr
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉
Ah, wait, this is not normal! Because Zigzag was based on the simplex tree before, it only worked with simplices and the matrix would automatically compute the dimension, so I didn't had to add it. But now that the code works for general cells, we have to give the dimension also to the matrices, as col.get_dimension
is used to get the infinite bars...
In the matrix module, the dimension is always stored for chain/RU/boundary matrices and never for base matrices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests didn't notice the problem because... there were no infinite bars? Or they were all of the same dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit error-prone indeed... But there is worse and at least it comes in handy, as simplices are still the bigger use case.
If the size is 1, the dimension will just be put to 0. Everything else will work completely alright. It is just that when returning the barcode, the dimension will be completely wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I am not completely right: there is one point where a wrong dimension will make problems and that is in the boundary matrix (with only R and not RU). The reduction algorithm uses the clearing optimization which uses the dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the size is 1, the dimension will just be put to 0.
Doesn't that deserve an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make it optional for both.
How horrible would it look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How horrible would it look?
It won't be more horrible than any other thing in the matrix module, but you will not like it for sure. I would need to make two interval types (are make the interval inherit its dimension) and then add some std::conditional
and if constexpr
here and there.
/** | ||
* @brief Type for filtration values. | ||
*/ | ||
using filtration_value = unspecified; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VincentRouvreau would have to weigh in, but I think the type Filtration_value
is capitalized in all the rest of Gudhi, so it looks a bit strange here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use lower case for all types which are or have high probability to be native types and upper case for classes/structs. But true, it makes the Filtration_value
case weird. I don't have a very strong opinion about it, so I will let you both decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you get inspiration from somewhere for this style? Since we haven't used that in the rest of Gudhi, I am a bit reluctant to introduce this inconsistency whose benefits are not really obvious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember where I got it from, but I think it was from a comment of someone, as I didn't paid much attention to this detail before. So either it was from someone working on Gudhi, or from one of my coauthors.
Like I said, I can change that for Gudhi if you prefer. Is it only for Filtration_value
or is there some unwritten rule about type nomenclature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know specifically about Gudhi (although we seem to have mostly done it this way), but it is fairly common to capitalize all types, so you can tell immediately if something is a type or a variable (in a wide sense that also includes functions). You can wait for Vincent to come back, but I think I would prefer to stick to this convention. It also makes it unnecessary (although not forbidden) to end a type name in _t
or _type
.
* using dimension_type = Zigzag_persistence::dimension_type; | ||
* using index_type = Zigzag_persistence::index; | ||
* using filtration_value_type = Filtered_zigzag_persistence::filtration_value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks strange that dimension_type
already ends in _type
while index
becomes index_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but when I just write using index = Zigzag_persistence::index;
the compiler complains about some "Redefinition of 'index' as different kind of symbol" with a first definition in "strings.h". I could just change index
everywhere to index_type
, but as the type is used a lot, I prefer the shorter version. Or I change dimension_type
to dimension
. Or perhaps capitalizing the first letter is just enough (as you want me to do anyway).
Like I said, I have no preferences on this. But then I would change all type names once and for all to homogenize them. So, if I understood well, your preference would be: all types start with a capitalized letter and only the name of object without any "_type" or "_t" at the end? Eg. dimension_type
---> Dimension
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index
should work (unless it conflicts with something else), or having the typedef not in the global namespace (in a subnamespace, in a function/class, etc).
Yes on the description of my preference, I think that's mostly what we've been doing so far in Gudhi (with obvious exceptions like value_type
for compatibility with the standard library, or Thing_type where "type" is not in the C++ sense but means "kind", "category").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I would do a PR for the matrix module too, to keep things coherent.
/** | ||
* @brief Type for filtration values. | ||
*/ | ||
using filtration_value = unspecified; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you get inspiration from somewhere for this style? Since we haven't used that in the rest of Gudhi, I am a bit reluctant to introduce this inconsistency whose benefits are not really obvious to me.
* non-zero coefficients generating it. A face should be represented by the arrow number when the face appeared for | ||
* the first time in the filtration (if a face was inserted and then removed and reinserted etc., only the last | ||
* insertion counts). The face range should be ordered by increasing arrow numbers. | ||
* @param dimension Dimension of the inserted face. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine what removing dim
from Zigzag_persistence would look like. I am not so sure what making it optional would mean. It sounds like a lot of complications, extra overloads and if constexpr
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉
src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h
Outdated
Show resolved
Hide resolved
|
||
// Only the closed bars where output so far, so the open/infinite bars still need to be retrieved. | ||
|
||
// in this example, outputs (0, 0) and (0, 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not "[0] 0 - inf"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The callback method will write "[0] 0 - inf", but it will be called with the arguments 0 and 0 (callback(0,0)
).
Perhaps I should write "computes" instead of "outputs" to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decimal values help more than the verb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But those are integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, my comment seems completely off, sorry. I wonder if I intended to post it elsewhere instead...
src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp
Show resolved
Hide resolved
src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp
Outdated
Show resolved
Hide resolved
src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp
Show resolved
Hide resolved
src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp
Show resolved
Hide resolved
* | ||
* \li \gudhi_example_link{Zigzag_persistence,example_usage_zigzag_persistence.cpp} - A simple example to showcase how | ||
* to use the @ref Zigzag_persistence class to compute a barcode. | ||
* <details> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this doesn't work with the version of doxygen we use in the CI, we really need to upgrade before the next release (not this PR).
src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp
Show resolved
Hide resolved
src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp
Outdated
Show resolved
Hide resolved
// The bars are stored within the class and where not output at all for now. | ||
|
||
// get all bars in a vector | ||
auto barcode = zp.get_persistence_diagram(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use zp.get_persistence_diagram(0, true)
so it matches the other examples? It can also be a hint to users that the documentation of this function may be useful. On the other hand, it complicates (slightly) the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the example as simple as possible. It shows also what the default parameters do. But I have no strong opinion about it.
|
||
// Only the closed bars where output so far, so the open/infinite bars still need to be retrieved. | ||
|
||
// in this example, outputs (0, 0) and (0, 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decimal values help more than the verb.
New module to compute zigzag persistence. To simplify the review process, this version contains only the main functionality, that is to compute the barcode.
Following functionalities will be added in separated PRs:
The code is based on PR #401 and requires PR #886 and draft #669.