Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zigzag persistence part1 #917

Open
wants to merge 63 commits into
base: master
Choose a base branch
from

Conversation

hschreiber
Copy link
Collaborator

@hschreiber hschreiber commented Jul 21, 2023

New module to compute zigzag persistence. To simplify the review process, this version contains only the main functionality, that is to compute the barcode.
Following functionalities will be added in separated PRs:

  • python interface
  • construction of oscillating rips filtrations from given edges
  • use of discrete Morse theory to simplify filtration
  • implementation of Dey and Hou's FastZigzag algorithm (?)(further tests are necessary)

The code is based on PR #401 and requires PR #886 and draft #669.

src/Zigzag_persistence/concept/ZigzagPersistenceOptions.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h Outdated Show resolved Hide resolved
Comment on lines 147 to 150
std::set<internal_key> translatedBoundary; // set maintains the natural order on indices
for (auto b : boundary) {
translatedBoundary.insert(handleToKey_.at(b)); // TODO: add possibilities of coefficients
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the keys to a vector and then sorting it would be more natural, but that can wait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took over how Clément did it in the original code. I have no preference about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using std::set is bound to be slow, but at this point I have no idea what parts of the code matter for performance.

* - For the other type, there are two classes: @ref Filtered_zigzag_persistence and
* @ref Filtered_zigzag_persistence_with_storage.
* They are both based on @ref Zigzag_persistence and manage additionally the filtration values which are ignored by
* @ref Zigzag_persistence. They automatically translate the operation numbers into their corresponding filtration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(comment attached to a random line)
I think it makes a lot of sense to associate indexes and filtrations to arrows as you do. However, I think many people might think of filtration values as something associated to a space, not an arrow, and get a bit confused. I don't know if it is worth saying something about it, it may become clearer once we have more geometric examples with meaningful filtration values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good question. To keep in mind.

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Outdated Show resolved Hide resolved
src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Outdated Show resolved Hide resolved
* diagram will have arrows alternating between forward and backward.
*
* The module is partitioned in two types of classes: filtered by filtration values and filtered by the atomic
* operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there isn't really much to change. I would avoid saying that Zigzag_persistence ignores filtration values, which sounds like it has access to them and chooses not to use them; they just don't exist there.
My take: the main class is Zigzag_persistence, which indexes operations by their order and represents cells by the index of the operation that created them (we could have used the order "among insertions" to avoid holes in the numbering, but we waste at most a factor 2 this way, and since all the structures are sparse the holes don't take up any space, so it isn't worth keeping a second counter). The 2 other classes are convenience wrappers, which allow labeling the arrows (indexes are replaced by those labels when reporting interval bounds), arbitrary identifiers for cells, and maybe another detail or two.
As much as possible, I'd like to avoid giving the impression that the filtration values define the zigzag in the same way that the indexes do, because that leads to misunderstandings like the bar-splitting case above.

the barcode is still very different from the barcode in the other version

They don't look that different to me, but it is probably a matter of perspective.

Copy link
Member

@mglisse mglisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oups, some of the comments were never submitted.

* @param arrowNumber Forward arrow number.
*/
void add_birth_forward(index arrowNumber) { // amortized constant time
birthToPos_.emplace_hint(birthToPos_.end(), arrowNumber, maxBirthPos_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably birthToPos_ was an ordered map at some point and emplace_hint(end,*) made sense, but now that it is an unordered_map, it looks a bit strange. Changing it to plain emplace is not urgent though if you still want to experiment with other map types.

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Outdated Show resolved Hide resolved
auto& col = matrix_.get_column(p.first);
stream_infinite_interval(col.get_dimension(), p.second);
} else {
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get the information without having to throw and catch an exception? Sadly, C++ implementations have very costly exceptions and don't optimize them at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add to the matrix a method like has_column(index) which returns true if and only if the given index is not out of range. Otherwise, we can also hope that the benchmark with the oscillating rips branch shows that we can just keep erase_birth_history at true and then we would just remove this piece of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_column makes sense. It encourages making twice the same search in the hashtable though (has_column+get_column), so we could also consider a version of get_column that returns

  • a boost::optional<column&>
  • a column*, with nullptr to indicate out_of_range

Indeed, if erase_birth_history is just true in the future, that makes this point less relevant.

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Outdated Show resolved Hide resolved
* diagram will have arrows alternating between forward and backward.
*
* The module is partitioned in two types of classes: filtered by filtration values and filtered by the atomic
* operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok (I am not very fond of using the word "index" for the filtration values, but there is enough context that it should be fine)

I notice that you don't mention anymore that cells can be represented by arbitrary faceID in the wrappers. That's ok, but then I think it would make sense to have a paragraph in the doc of the wrappers talking about faceID, or users might be surprised by this difference between the base class and the wrappers.

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Outdated Show resolved Hide resolved
auto& col = matrix_.get_column(p.first);
stream_infinite_interval(col.get_dimension(), p.second);
} else {
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_column makes sense. It encourages making twice the same search in the hashtable though (has_column+get_column), so we could also consider a version of get_column that returns

  • a boost::optional<column&>
  • a column*, with nullptr to indicate out_of_range

Indeed, if erase_birth_history is just true in the future, that makes this point less relevant.

* non-zero coefficients generating it. A face should be represented by the arrow number when the face appeared for
* the first time in the filtration (if a face was inserted and then removed and reinserted etc., only the last
* insertion counts). The face range should be ordered by increasing arrow numbers.
* @param dimension Dimension of the inserted face.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, the only thing the dimension is used for is the output, and we wouldn't lose anything if we removed the dimension completely from this class, wrappers could handle it on their own.
I am not asking to do that (I haven't thought about it much), I just wanted to write this down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the same than for the matrix module: the dimension is only useful for the persistence diagram. I could make it optional for both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine what removing dim from Zigzag_persistence would look like. I am not so sure what making it optional would mean. It sounds like a lot of complications, extra overloads and if constexpr ☹️ Since the dimension doesn't hurt that much in Zigzag_persistence (if we don't want it, we can pass 0 all the time and ignore it in the callback), and if there was no support for dimension it wouldn't be hard to work around (store the dimension in a specific location, known to the callback, before each insert/remove), I don't think it deserves complications by making it optional, we should just pick yes or no.
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉

Ah, wait, this is not normal! Because Zigzag was based on the simplex tree before, it only worked with simplices and the matrix would automatically compute the dimension, so I didn't had to add it. But now that the code works for general cells, we have to give the dimension also to the matrices, as col.get_dimension is used to get the infinite bars...
In the matrix module, the dimension is always stored for chain/RU/boundary matrices and never for base matrices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests didn't notice the problem because... there were no infinite bars? Or they were all of the same dimension?

Copy link
Collaborator Author

@hschreiber hschreiber Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit error-prone indeed... But there is worse and at least it comes in handy, as simplices are still the bigger use case.
If the size is 1, the dimension will just be put to 0. Everything else will work completely alright. It is just that when returning the barcode, the dimension will be completely wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I am not completely right: there is one point where a wrong dimension will make problems and that is in the boundary matrix (with only R and not RU). The reduction algorithm uses the clearing optimization which uses the dimensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the size is 1, the dimension will just be put to 0.

Doesn't that deserve an assert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it optional for both.

How horrible would it look?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How horrible would it look?

It won't be more horrible than any other thing in the matrix module, but you will not like it for sure. I would need to make two interval types (are make the interval inherit its dimension) and then add some std::conditional and if constexpr here and there.

/**
* @brief Type for filtration values.
*/
using filtration_value = unspecified;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VincentRouvreau would have to weigh in, but I think the type Filtration_value is capitalized in all the rest of Gudhi, so it looks a bit strange here.

Copy link
Collaborator Author

@hschreiber hschreiber Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use lower case for all types which are or have high probability to be native types and upper case for classes/structs. But true, it makes the Filtration_value case weird. I don't have a very strong opinion about it, so I will let you both decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get inspiration from somewhere for this style? Since we haven't used that in the rest of Gudhi, I am a bit reluctant to introduce this inconsistency whose benefits are not really obvious to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember where I got it from, but I think it was from a comment of someone, as I didn't paid much attention to this detail before. So either it was from someone working on Gudhi, or from one of my coauthors.

Like I said, I can change that for Gudhi if you prefer. Is it only for Filtration_value or is there some unwritten rule about type nomenclature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know specifically about Gudhi (although we seem to have mostly done it this way), but it is fairly common to capitalize all types, so you can tell immediately if something is a type or a variable (in a wide sense that also includes functions). You can wait for Vincent to come back, but I think I would prefer to stick to this convention. It also makes it unnecessary (although not forbidden) to end a type name in _t or _type.

Comment on lines 78 to 80
* using dimension_type = Zigzag_persistence::dimension_type;
* using index_type = Zigzag_persistence::index;
* using filtration_value_type = Filtered_zigzag_persistence::filtration_value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks strange that dimension_type already ends in _type while index becomes index_type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but when I just write using index = Zigzag_persistence::index; the compiler complains about some "Redefinition of 'index' as different kind of symbol" with a first definition in "strings.h". I could just change index everywhere to index_type, but as the type is used a lot, I prefer the shorter version. Or I change dimension_type to dimension. Or perhaps capitalizing the first letter is just enough (as you want me to do anyway).

Like I said, I have no preferences on this. But then I would change all type names once and for all to homogenize them. So, if I understood well, your preference would be: all types start with a capitalized letter and only the name of object without any "_type" or "_t" at the end? Eg. dimension_type ---> Dimension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index should work (unless it conflicts with something else), or having the typedef not in the global namespace (in a subnamespace, in a function/class, etc).
Yes on the description of my preference, I think that's mostly what we've been doing so far in Gudhi (with obvious exceptions like value_type for compatibility with the standard library, or Thing_type where "type" is not in the C++ sense but means "kind", "category").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I would do a PR for the matrix module too, to keep things coherent.

/**
* @brief Type for filtration values.
*/
using filtration_value = unspecified;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get inspiration from somewhere for this style? Since we haven't used that in the rest of Gudhi, I am a bit reluctant to introduce this inconsistency whose benefits are not really obvious to me.

* non-zero coefficients generating it. A face should be represented by the arrow number when the face appeared for
* the first time in the filtration (if a face was inserted and then removed and reinserted etc., only the last
* insertion counts). The face range should be ordered by increasing arrow numbers.
* @param dimension Dimension of the inserted face.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine what removing dim from Zigzag_persistence would look like. I am not so sure what making it optional would mean. It sounds like a lot of complications, extra overloads and if constexpr ☹️ Since the dimension doesn't hurt that much in Zigzag_persistence (if we don't want it, we can pass 0 all the time and ignore it in the callback), and if there was no support for dimension it wouldn't be hard to work around (store the dimension in a specific location, known to the callback, before each insert/remove), I don't think it deserves complications by making it optional, we should just pick yes or no.
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉


// Only the closed bars where output so far, so the open/infinite bars still need to be retrieved.

// in this example, outputs (0, 0) and (0, 7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not "[0] 0 - inf"?

Copy link
Collaborator Author

@hschreiber hschreiber Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback method will write "[0] 0 - inf", but it will be called with the arguments 0 and 0 (callback(0,0)).

Perhaps I should write "computes" instead of "outputs" to be more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decimal values help more than the verb.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those are integers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, my comment seems completely off, sorry. I wonder if I intended to post it elsewhere instead...

src/Zigzag_persistence/doc/zigzag_ex.png Outdated Show resolved Hide resolved
*
* \li \gudhi_example_link{Zigzag_persistence,example_usage_zigzag_persistence.cpp} - A simple example to showcase how
* to use the @ref Zigzag_persistence class to compute a barcode.
* <details>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this doesn't work with the version of doxygen we use in the CI, we really need to upgrade before the next release (not this PR).

// The bars are stored within the class and where not output at all for now.

// get all bars in a vector
auto barcode = zp.get_persistence_diagram();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use zp.get_persistence_diagram(0, true) so it matches the other examples? It can also be a hint to users that the documentation of this function may be useful. On the other hand, it complicates (slightly) the example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the example as simple as possible. It shows also what the default parameters do. But I have no strong opinion about it.


// Only the closed bars where output so far, so the open/infinite bars still need to be retrieved.

// in this example, outputs (0, 0) and (0, 7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decimal values help more than the verb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants