Finish writing the FoF tutorial.

technic960183 · technic960183 · commit dc2df48c230a · 2024-07-25T00:43:11.000+08:00
diff --git a/docs/source/tutorial/fof.rst b/docs/source/tutorial/fof.rst
@@ -1,25 +1,159 @@
 Example: Clustering using FoF algorithm
 =======================================
 
+A friend of friends (FoF) algorithm is useful when you want to find groups of objects that are close
+to each other forming clusters. Here is an example of how to perform clustering using the **pycorrelator** package.
+
+First, let's create a mock catalog:
+
 .. code-block:: python
 
     import pandas as pd
+
+    # Create a mock catalog as a pandas DataFrame
+    catalog = pd.DataFrame([[80.894, 41.269, 15.5], [120.689, -41.269, 12.3], 
+                            [10.689, -41.269, 18.7], [10.688, -41.270, 14.1], 
+                            [10.689, -41.270, 16.4], [10.690, -41.269, 13.2], 
+                            [120.690, -41.270, 17.8]], columns=['ra', 'dec', 'mag'])
+
+.. note::
+    If you want to use a format other than a pandas DataFrame,
+    see the :doc:`supported formats <input_validation>` for more information.
+
+group_by_quadtree()
+-------------------
+
+Then, we can perform clustering using the FoF algorithm with the tolerance of 0.01 degree using the
+:func:`pycorrelator.group_by_quadtree` function.
+
+.. code-block:: python
+
     from pycorrelator import group_by_quadtree
+    result_object = group_by_quadtree(catalog, tolerance=0.01)
+
+The result object contains the clustering results. Four methods are available to get the results in different formats:
+
+get_group_dataframe()
+---------------------
+
+To get the clustering results with the appendind data (``"mag"`` in this case), use the
+:func:`pycorrelator.FoFResult.get_group_dataframe` method:
+
+.. code-block:: python
 
-    # Create a mock catalog
-    catalog = pd.DataFrame([[80.894, 41.269], [120.689, -41.269], 
-                            [10.689, -41.269], [10.688, -41.270], 
-                            [10.689, -41.270], [10.690, -41.269], 
-                            [120.690, -41.270]], columns=['ra', 'dec'])
+    groups_df = result_object.get_group_dataframe()
+    print(groups_df)
 
-    # Perform the clustering
-    result = group_by_quadtree(catalog, tolerance=0.01)
+Expected output::
+
+                       Ra     Dec   mag
+    Group Object                       
+    0     0        80.894  41.269  15.5
+    1     1       120.689 -41.269  12.3
+          6       120.690 -41.270  17.8
+    2     2        10.689 -41.269  18.7
+          3        10.688 -41.270  14.1
+          4        10.689 -41.270  16.4
+          5        10.690 -41.269  13.2
 
-    # Get the result
-    print(result.get_coordinates())
+This method returns a pandas DataFrame with two layers of indices: the group index and the object index from the original catalog.
+
+You can iterate through each group by:
+
+.. code-block:: python
+
+    for group_index, group in groups_df.groupby('Group'):
+        print(f"Print group {group_index}:")
+        print(f"The type of group is {type(group)}.")
+        print(group, end="\n\n")
+
+Expected output::
+
+    Print group 0:
+    The type of group is <class 'pandas.core.frame.DataFrame'>.
+                      Ra     Dec   mag
+    Group Object                      
+    0     0       80.894  41.269  15.5
+
+    Print group 1:
+    The type of group is <class 'pandas.core.frame.DataFrame'>.
+                       Ra     Dec   mag
+    Group Object                       
+    1     1       120.689 -41.269  12.3
+          6       120.690 -41.270  17.8
+
+    Print group 2:
+    The type of group is <class 'pandas.core.frame.DataFrame'>.
+                      Ra     Dec   mag
+    Group Object                      
+    2     2       10.689 -41.269  18.7
+          3       10.688 -41.270  14.1
+          4       10.689 -41.270  16.4
+          5       10.690 -41.269  13.2
+
+Each group is also a pandas DataFrame.
+
+.. note::
+    The iterater from ``groupby()`` is extremely slow for large datasets. The current solution is to flatten the
+    DataFrame into a single layer of index and manupulate the index directly, or even turn the DataFrame into a numpy array.
+
+If you want DataFrame with a single layer of index and the size of each group as a column, you can use the following code:
+
+.. code-block:: python
+
+    groups_df['group_size'] = groups_df.groupby(level='Group')['Ra'].transform('size')
+    groups_df.reset_index(level='Group', inplace=True)
+    print(groups_df)
+
+Expected output::
+
+            Group       Ra     Dec   mag  group_size
+    Object                                          
+    0           0   80.894  41.269  15.5           1
+    1           1  120.689 -41.269  12.3           2
+    6           1  120.690 -41.270  17.8           2
+    2           2   10.689 -41.269  18.7           4
+    3           2   10.688 -41.270  14.1           4
+    4           2   10.689 -41.270  16.4           4
+    5           2   10.690 -41.269  13.2           4
+
+get_group_sizes()
+-----------------
+
+To get the size of each group in the order of the group index, use the :func:`pycorrelator.FoFResult.get_group_sizes` method:
+
+.. code-block:: python
+
+    print(result_object.get_group_sizes())
+
+Expected output::
+
+    [1, 2, 4]
+
+get_coordinates()
+-----------------
+
+To get the coordinates of the objects in each group, use the :func:`pycorrelator.FoFResult.get_coordinates` method:
+
+.. code-block:: python
+
+    print(result_object.get_coordinates())
 
 Expected output::
 
     [[(80.894, 41.269)],
      [(120.689, -41.269), (120.69, -41.27)],
-     [(10.689, -41.269), (10.688, -41.27), (10.689, -41.27), (10.69, -41.269)]]
+     [(10.689, -41.269), (10.688, -41.27), (10.689, -41.27), (10.69, -41.269)]]
+
+get_group_coordinates()
+-----------------------
+
+To get the center coordinates of each group, use the :func:`pycorrelator.FoFResult.get_group_coordinates` method:
+
+.. code-block:: python
+
+    print(result_object.get_group_coordinates())
+
+Expected output::
+
+    [(80.894, 41.269), (120.6895, -41.2695), (10.689 , -41.2695)]
diff --git a/docs/source/tutorial/index.rst b/docs/source/tutorial/index.rst
@@ -7,4 +7,4 @@ This section contains tutorials on how to use the **pycorrelator** package.
 
    input_validation
    xmatch
-   fof
+   fof
diff --git a/docs/source/tutorial/xmatch.rst b/docs/source/tutorial/xmatch.rst
@@ -10,7 +10,7 @@ First, let's create two mock catalogs A and B:
 
     import numpy as np
 
-    # Create two mock catalogs
+    # Create two mock catalogs as numpy arrays
     catalogA = np.array([[80.894, 41.269], [120.689, -41.269], [10.689, -41.269]])
     catalogB = np.array([[10.688, -41.270], [10.689, -41.270], [10.690, -41.269], [120.690, -41.270]])
 
diff --git a/pycorrelator/result_fof.py b/pycorrelator/result_fof.py
@@ -29,6 +29,7 @@ def get_group_coordinates(self) -> list[tuple]:
             A list of tuples of coordinates of the center of each group.
         """
         objects_coordinates = self.catalog.get_coordiantes()
+        # [FIXME] This return a list of NDArrays, not a list of tuples.
         return [np.average(objects_coordinates[g, :], axis=0) for g in self.result_list]
     
     def get_group_sizes(self) -> list[int]:

-Original file line number
+Diff line change
    input_validation
    xmatch
 -   fof
 +   fof