Zigzag persistence part1 #917

hschreiber · 2023-07-21T17:58:06Z

New module to compute zigzag persistence. To simplify the review process, this version contains only the main functionality, that is to compute the barcode.
Following functionalities will be added in separated PRs:

python interface
construction of oscillating rips filtrations from given edges
use of discrete Morse theory to simplify filtration
implementation of Dey and Hou's FastZigzag algorithm (?)(further tests are necessary)

The code is based on PR #401 and requires PR #886 and draft #669.

…persistence

src/Zigzag_persistence/concept/ZigzagPersistenceOptions.h

src/Zigzag_persistence/include/gudhi/Zigzag_persistence.h

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

mglisse · 2024-07-17T15:09:05Z

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

+    std::set<internal_key> translatedBoundary;  // set maintains the natural order on indices
+    for (auto b : boundary) {
+      translatedBoundary.insert(handleToKey_.at(b));  // TODO: add possibilities of coefficients
+    }


Adding the keys to a vector and then sorting it would be more natural, but that can wait.

I just took over how Clément did it in the original code. I have no preference about it.

Using std::set is bound to be slow, but at this point I have no idea what parts of the code matter for performance.

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

mglisse · 2024-07-17T16:19:38Z

src/Zigzag_persistence/doc/Intro_zigzag_persistence.h

+ * - For the other type, there are two classes: @ref Filtered_zigzag_persistence and
+ * @ref Filtered_zigzag_persistence_with_storage.
+ * They are both based on @ref Zigzag_persistence and manage additionally the filtration values which are ignored by 
+ * @ref Zigzag_persistence. They automatically translate the operation numbers into their corresponding filtration


(comment attached to a random line)
I think it makes a lot of sense to associate indexes and filtrations to arrows as you do. However, I think many people might think of filtration values as something associated to a space, not an arrow, and get a bit confused. I don't know if it is worth saying something about it, it may become clearer once we have more geometric examples with meaningful filtration values.

Hm, good question. To keep in mind.

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

mglisse · 2024-07-17T19:58:00Z

src/Zigzag_persistence/doc/Intro_zigzag_persistence.h

+ * diagram will have arrows alternating between forward and backward.
+ *
+ * The module is partitioned in two types of classes: filtered by filtration values and filtered by the atomic
+ * operations.


Maybe there isn't really much to change. I would avoid saying that Zigzag_persistence ignores filtration values, which sounds like it has access to them and chooses not to use them; they just don't exist there.
My take: the main class is Zigzag_persistence, which indexes operations by their order and represents cells by the index of the operation that created them (we could have used the order "among insertions" to avoid holes in the numbering, but we waste at most a factor 2 this way, and since all the structures are sparse the holes don't take up any space, so it isn't worth keeping a second counter). The 2 other classes are convenience wrappers, which allow labeling the arrows (indexes are replaced by those labels when reporting interval bounds), arbitrary identifiers for cells, and maybe another detail or two.
As much as possible, I'd like to avoid giving the impression that the filtration values define the zigzag in the same way that the indexes do, because that leads to misunderstandings like the bar-splitting case above.

the barcode is still very different from the barcode in the other version

They don't look that different to me, but it is probably a matter of perspective.

mglisse

oups, some of the comments were never submitted.

mglisse · 2024-07-17T21:03:02Z

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

+     * @param arrowNumber Forward arrow number.
+     */
+    void add_birth_forward(index arrowNumber) {  // amortized constant time
+      birthToPos_.emplace_hint(birthToPos_.end(), arrowNumber, maxBirthPos_);


Probably birthToPos_ was an ordered map at some point and emplace_hint(end,*) made sense, but now that it is an unordered_map, it looks a bit strange. Changing it to plain emplace is not urgent though if you still want to experiment with other map types.

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

mglisse · 2024-07-17T21:10:29Z

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

+        auto& col = matrix_.get_column(p.first);
+        stream_infinite_interval(col.get_dimension(), p.second);
+      } else {
+        try {


Is there a way to get the information without having to throw and catch an exception? Sadly, C++ implementations have very costly exceptions and don't optimize them at all.

I could add to the matrix a method like has_column(index) which returns true if and only if the given index is not out of range. Otherwise, we can also hope that the benchmark with the oscillating rips branch shows that we can just keep erase_birth_history at true and then we would just remove this piece of code.

has_column makes sense. It encourages making twice the same search in the hashtable though (has_column+get_column), so we could also consider a version of get_column that returns

a boost::optional<column&>

a column*, with nullptr to indicate out_of_range

Indeed, if erase_birth_history is just true in the future, that makes this point less relevant.

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

Co-authored-by: Marc Glisse <[email protected]>

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

mglisse · 2024-07-18T14:37:45Z

src/Zigzag_persistence/doc/Intro_zigzag_persistence.h

+ * diagram will have arrows alternating between forward and backward.
+ *
+ * The module is partitioned in two types of classes: filtered by filtration values and filtered by the atomic
+ * operations.


ok (I am not very fond of using the word "index" for the filtration values, but there is enough context that it should be fine)

I notice that you don't mention anymore that cells can be represented by arbitrary faceID in the wrappers. That's ok, but then I think it would make sense to have a paragraph in the doc of the wrappers talking about faceID, or users might be surprised by this difference between the base class and the wrappers.

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

mglisse · 2024-07-18T15:15:56Z

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

+        auto& col = matrix_.get_column(p.first);
+        stream_infinite_interval(col.get_dimension(), p.second);
+      } else {
+        try {


has_column makes sense. It encourages making twice the same search in the hashtable though (has_column+get_column), so we could also consider a version of get_column that returns

a boost::optional<column&>

a column*, with nullptr to indicate out_of_range

Indeed, if erase_birth_history is just true in the future, that makes this point less relevant.

src/Zigzag_persistence/concept/ZigzagOptions.h

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

src/Zigzag_persistence/example/example_zzfiltration_from_file.cpp

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

mglisse · 2024-07-18T22:15:24Z

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

+   * non-zero coefficients generating it. A face should be represented by the arrow number when the face appeared for
+   * the first time in the filtration (if a face was inserted and then removed and reinserted etc., only the last
+   * insertion counts). The face range should be ordered by increasing arrow numbers.
+   * @param dimension Dimension of the inserted face.


From what I can see, the only thing the dimension is used for is the output, and we wouldn't lose anything if we removed the dimension completely from this class, wrappers could handle it on their own.
I am not asking to do that (I haven't thought about it much), I just wanted to write this down.

Yes, it is the same than for the matrix module: the dimension is only useful for the persistence diagram. I could make it optional for both.

I can imagine what removing dim from Zigzag_persistence would look like. I am not so sure what making it optional would mean. It sounds like a lot of complications, extra overloads and if constexpr ☹️ Since the dimension doesn't hurt that much in Zigzag_persistence (if we don't want it, we can pass 0 all the time and ignore it in the callback), and if there was no support for dimension it wouldn't be hard to work around (store the dimension in a specific location, known to the callback, before each insert/remove), I don't think it deserves complications by making it optional, we should just pick yes or no.
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉

It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉

Ah, wait, this is not normal! Because Zigzag was based on the simplex tree before, it only worked with simplices and the matrix would automatically compute the dimension, so I didn't had to add it. But now that the code works for general cells, we have to give the dimension also to the matrices, as col.get_dimension is used to get the infinite bars...
In the matrix module, the dimension is always stored for chain/RU/boundary matrices and never for base matrices.

The tests didn't notice the problem because... there were no infinite bars? Or they were all of the same dimension?

A bit error-prone indeed... But there is worse and at least it comes in handy, as simplices are still the bigger use case.
If the size is 1, the dimension will just be put to 0. Everything else will work completely alright. It is just that when returning the barcode, the dimension will be completely wrong.

Ah, I am not completely right: there is one point where a wrong dimension will make problems and that is in the boundary matrix (with only R and not RU). The reduction algorithm uses the clearing optimization which uses the dimensions.

If the size is 1, the dimension will just be put to 0.

Doesn't that deserve an assert?

I could make it optional for both.

How horrible would it look?

How horrible would it look?

It won't be more horrible than any other thing in the matrix module, but you will not like it for sure. I would need to make two interval types (are make the interval inherit its dimension) and then add some std::conditional and if constexpr here and there.

mglisse · 2024-07-18T22:36:56Z

src/Zigzag_persistence/concept/ZigzagOptions.h

+  /**
+   * @brief Type for filtration values.
+   */
+  using filtration_value = unspecified;


@VincentRouvreau would have to weigh in, but I think the type Filtration_value is capitalized in all the rest of Gudhi, so it looks a bit strange here.

I wanted to use lower case for all types which are or have high probability to be native types and upper case for classes/structs. But true, it makes the Filtration_value case weird. I don't have a very strong opinion about it, so I will let you both decide.

Did you get inspiration from somewhere for this style? Since we haven't used that in the rest of Gudhi, I am a bit reluctant to introduce this inconsistency whose benefits are not really obvious to me.

I don't remember where I got it from, but I think it was from a comment of someone, as I didn't paid much attention to this detail before. So either it was from someone working on Gudhi, or from one of my coauthors.

Like I said, I can change that for Gudhi if you prefer. Is it only for Filtration_value or is there some unwritten rule about type nomenclature?

I don't know specifically about Gudhi (although we seem to have mostly done it this way), but it is fairly common to capitalize all types, so you can tell immediately if something is a type or a variable (in a wide sense that also includes functions). You can wait for Vincent to come back, but I think I would prefer to stick to this convention. It also makes it unnecessary (although not forbidden) to end a type name in _t or _type.

…ue_from_index->value

mglisse · 2024-07-19T17:26:30Z

src/Zigzag_persistence/doc/Intro_zigzag_persistence.h

+ * using dimension_type = Zigzag_persistence::dimension_type;
+ * using index_type = Zigzag_persistence::index;
+ * using filtration_value_type = Filtered_zigzag_persistence::filtration_value;


It looks strange that dimension_type already ends in _type while index becomes index_type.

I agree, but when I just write using index = Zigzag_persistence::index; the compiler complains about some "Redefinition of 'index' as different kind of symbol" with a first definition in "strings.h". I could just change index everywhere to index_type, but as the type is used a lot, I prefer the shorter version. Or I change dimension_type to dimension. Or perhaps capitalizing the first letter is just enough (as you want me to do anyway).

Like I said, I have no preferences on this. But then I would change all type names once and for all to homogenize them. So, if I understood well, your preference would be: all types start with a capitalized letter and only the name of object without any "_type" or "_t" at the end? Eg. dimension_type ---> Dimension.

Index should work (unless it conflicts with something else), or having the typedef not in the global namespace (in a subnamespace, in a function/class, etc).
Yes on the description of my preference, I think that's mostly what we've been doing so far in Gudhi (with obvious exceptions like value_type for compatibility with the standard library, or Thing_type where "type" is not in the C++ sense but means "kind", "category").

Ok, then I would do a PR for the matrix module too, to keep things coherent.

src/Zigzag_persistence/example/example_zzfiltration_from_file.cpp

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

mglisse · 2024-07-19T18:22:30Z

src/Zigzag_persistence/concept/ZigzagOptions.h

+  /**
+   * @brief Type for filtration values.
+   */
+  using filtration_value = unspecified;


Did you get inspiration from somewhere for this style? Since we haven't used that in the rest of Gudhi, I am a bit reluctant to introduce this inconsistency whose benefits are not really obvious to me.

mglisse · 2024-07-19T19:09:27Z

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

+   * non-zero coefficients generating it. A face should be represented by the arrow number when the face appeared for
+   * the first time in the filtration (if a face was inserted and then removed and reinserted etc., only the last
+   * insertion counts). The face range should be ordered by increasing arrow numbers.
+   * @param dimension Dimension of the inserted face.


I can imagine what removing dim from Zigzag_persistence would look like. I am not so sure what making it optional would mean. It sounds like a lot of complications, extra overloads and if constexpr ☹️ Since the dimension doesn't hurt that much in Zigzag_persistence (if we don't want it, we can pass 0 all the time and ignore it in the callback), and if there was no support for dimension it wouldn't be hard to work around (store the dimension in a specific location, known to the callback, before each insert/remove), I don't think it deserves complications by making it optional, we should just pick yes or no.
It looks like the dimension is already kind of optional in the matrix, since zigzags don't pass the dimension to the matrix as far as I can see 😉

src/Zigzag_persistence/include/gudhi/filtered_zigzag_persistence.h

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

mglisse · 2024-07-22T16:19:07Z

src/Zigzag_persistence/example/example_usage_zigzag_persistence.cpp

+
+  // Only the closed bars where output so far, so the open/infinite bars still need to be retrieved.
+
+  // in this example, outputs (0, 0) and (0, 7)


Not "[0] 0 - inf"?

The callback method will write "[0] 0 - inf", but it will be called with the arguments 0 and 0 (callback(0,0)).

Perhaps I should write "computes" instead of "outputs" to be more clear.

I think the decimal values help more than the verb.

But those are integers?

Uh, my comment seems completely off, sorry. I wonder if I intended to post it elsewhere instead...

src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp

src/Zigzag_persistence/doc/Intro_zigzag_persistence.h

src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp

src/Zigzag_persistence/doc/zigzag_ex.png

mglisse · 2024-07-23T12:49:04Z

src/Zigzag_persistence/doc/Intro_zigzag_persistence.h

+ *
+ * \li \gudhi_example_link{Zigzag_persistence,example_usage_zigzag_persistence.cpp} - A simple example to showcase how
+ * to use the @ref Zigzag_persistence class to compute a barcode.
+ * <details>


Note that this doesn't work with the version of doxygen we use in the CI, we really need to upgrade before the next release (not this PR).

src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp

mglisse · 2024-07-23T12:59:44Z

src/Zigzag_persistence/example/example_usage_filtered_zigzag_persistence_with_storage.cpp

+  // The bars are stored within the class and where not output at all for now.
+
+  // get all bars in a vector
+  auto barcode = zp.get_persistence_diagram();


I wonder if we should use zp.get_persistence_diagram(0, true) so it matches the other examples? It can also be a hint to users that the documentation of this function may be useful. On the other hand, it complicates (slightly) the example.

I wanted to keep the example as simple as possible. It shows also what the default parameters do. But I have no strong opinion about it.

mglisse · 2024-07-23T13:01:59Z

src/Zigzag_persistence/example/example_usage_zigzag_persistence.cpp

+
+  // Only the closed bars where output so far, so the open/infinite bars still need to be retrieved.
+
+  // in this example, outputs (0, 0) and (0, 7)


I think the decimal values help more than the verb.

hschreiber and others added 17 commits July 17, 2023 14:48

restore after merge

5407637

Merge remote-tracking branch 'origin/persistence_matrix' into zigzag_…

f03a8bf

…persistence

doc and concepts

3f8409f

deleting files not wanted for the PR

2e5a20b

doc main page

bfc5642

restore cmake files

688b381

restore cmake files

f7087fc

restore cmake files

15fbc53

Merge branch 'GUDHI:master' into zigzag_persistence

87ba652

simplex tree options correction

d8d07c5

sync with upstream Gudhi-devel

0644ca6

doc

394e33e

first update merge

7178d45

second update merge

5c4f9d7

merge error fix

2c79bc2

removal of useless simplex tree options

f85ffb9

Merge branch 'GUDHI:master' into zigzag_persistence

902db3b

hschreiber mentioned this pull request Sep 1, 2023

Zigzag persistence part2 #953

Draft

hschreiber and others added 4 commits April 18, 2024 15:16

update from upstream master

da925a0

Merge branch 'GUDHI:master' into zigzag_persistence

5208e5d

update persistence matrix

4436039

update persistence matrix

e384378

mglisse reviewed May 20, 2024

View reviewed changes

hschreiber added 7 commits May 31, 2024 15:18

generalized to faces instead of simplices

0c7d3e6

updates concepts

ece1294

updates concepts

20ae249

zigzag doc update

df39f9c

merge upstream update

c192cd6

merge Persistence_matrix update

cadfb5c

small fixes

03e7b7d

doc fix

e025cec

mglisse reviewed Jul 17, 2024

View reviewed changes

small changes

99f99ae

mglisse reviewed Jul 18, 2024

View reviewed changes

hschreiber and others added 3 commits July 18, 2024 15:42

Update src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

6dc4219

Co-authored-by: Marc Glisse <[email protected]>

Update src/Zigzag_persistence/include/gudhi/zigzag_persistence.h

9446bad

Co-authored-by: Marc Glisse <[email protected]>

Merge branch 'GUDHI:master' into zigzag_persistence

32e3053

mglisse reviewed Jul 18, 2024

View reviewed changes

hschreiber added 3 commits July 19, 2024 15:19

doc

bbd3d90

replacing map_index_to_filtration_value->pair with get_filtration_val…

c7edb91

…ue_from_index->value

doc

3472160

mglisse reviewed Jul 19, 2024

View reviewed changes

mglisse reviewed Jul 21, 2024

View reviewed changes

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Outdated Show resolved Hide resolved

src/Zigzag_persistence/include/gudhi/zigzag_persistence.h Show resolved Hide resolved

hschreiber added 3 commits July 22, 2024 14:38

dim bug fix

744ac47

doc

7d84dbf

doc

564f751

mglisse reviewed Jul 22, 2024

View reviewed changes

doc

bd6d9b5

mglisse reviewed Jul 23, 2024

View reviewed changes

hschreiber mentioned this pull request Jul 23, 2024

[Persistence_matrix] Type renaming to be more coherent with Gudhi conventions #1110

Merged

hschreiber and others added 5 commits September 17, 2024 17:21

Merge branch 'GUDHI:master' into zigzag_persistence

4de6a26

upstream merge

4f40d8b

renaming after Gudhi convention

6a0fad2

doc

2b2a569

doc

1e31a4b


		// Only the closed bars where output so far, so the open/infinite bars still need to be retrieved.

		// in this example, outputs (0, 0) and (0, 7)

Zigzag persistence part1 #917

Are you sure you want to change the base?

Zigzag persistence part1 #917

Conversation

hschreiber commented Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mglisse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hschreiber Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hschreiber Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hschreiber Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hschreiber commented Jul 21, 2023 •

edited

Loading

hschreiber Jul 22, 2024 •

edited

Loading

hschreiber Jul 19, 2024 •

edited

Loading

hschreiber Jul 22, 2024 •

edited

Loading