|
1 | 1 | Example: Clustering using FoF algorithm
|
2 | 2 | =======================================
|
3 | 3 |
|
| 4 | +A friend of friends (FoF) algorithm is useful when you want to find groups of objects that are close |
| 5 | +to each other forming clusters. Here is an example of how to perform clustering using the **pycorrelator** package. |
| 6 | + |
| 7 | +First, let's create a mock catalog: |
| 8 | + |
4 | 9 | .. code-block:: python
|
5 | 10 |
|
6 | 11 | import pandas as pd
|
| 12 | +
|
| 13 | + # Create a mock catalog as a pandas DataFrame |
| 14 | + catalog = pd.DataFrame([[80.894, 41.269, 15.5], [120.689, -41.269, 12.3], |
| 15 | + [10.689, -41.269, 18.7], [10.688, -41.270, 14.1], |
| 16 | + [10.689, -41.270, 16.4], [10.690, -41.269, 13.2], |
| 17 | + [120.690, -41.270, 17.8]], columns=['ra', 'dec', 'mag']) |
| 18 | +
|
| 19 | +.. note:: |
| 20 | + If you want to use a format other than a pandas DataFrame, |
| 21 | + see the :doc:`supported formats <input_validation>` for more information. |
| 22 | + |
| 23 | +group_by_quadtree() |
| 24 | +------------------- |
| 25 | + |
| 26 | +Then, we can perform clustering using the FoF algorithm with the tolerance of 0.01 degree using the |
| 27 | +:func:`pycorrelator.group_by_quadtree` function. |
| 28 | + |
| 29 | +.. code-block:: python |
| 30 | +
|
7 | 31 | from pycorrelator import group_by_quadtree
|
| 32 | + result_object = group_by_quadtree(catalog, tolerance=0.01) |
| 33 | +
|
| 34 | +The result object contains the clustering results. Four methods are available to get the results in different formats: |
| 35 | + |
| 36 | +get_group_dataframe() |
| 37 | +--------------------- |
| 38 | + |
| 39 | +To get the clustering results with the appendind data (``"mag"`` in this case), use the |
| 40 | +:func:`pycorrelator.FoFResult.get_group_dataframe` method: |
| 41 | + |
| 42 | +.. code-block:: python |
8 | 43 |
|
9 |
| - # Create a mock catalog |
10 |
| - catalog = pd.DataFrame([[80.894, 41.269], [120.689, -41.269], |
11 |
| - [10.689, -41.269], [10.688, -41.270], |
12 |
| - [10.689, -41.270], [10.690, -41.269], |
13 |
| - [120.690, -41.270]], columns=['ra', 'dec']) |
| 44 | + groups_df = result_object.get_group_dataframe() |
| 45 | + print(groups_df) |
14 | 46 |
|
15 |
| - # Perform the clustering |
16 |
| - result = group_by_quadtree(catalog, tolerance=0.01) |
| 47 | +Expected output:: |
| 48 | + |
| 49 | + Ra Dec mag |
| 50 | + Group Object |
| 51 | + 0 0 80.894 41.269 15.5 |
| 52 | + 1 1 120.689 -41.269 12.3 |
| 53 | + 6 120.690 -41.270 17.8 |
| 54 | + 2 2 10.689 -41.269 18.7 |
| 55 | + 3 10.688 -41.270 14.1 |
| 56 | + 4 10.689 -41.270 16.4 |
| 57 | + 5 10.690 -41.269 13.2 |
17 | 58 |
|
18 |
| - # Get the result |
19 |
| - print(result.get_coordinates()) |
| 59 | +This method returns a pandas DataFrame with two layers of indices: the group index and the object index from the original catalog. |
| 60 | + |
| 61 | +You can iterate through each group by: |
| 62 | + |
| 63 | +.. code-block:: python |
| 64 | +
|
| 65 | + for group_index, group in groups_df.groupby('Group'): |
| 66 | + print(f"Print group {group_index}:") |
| 67 | + print(f"The type of group is {type(group)}.") |
| 68 | + print(group, end="\n\n") |
| 69 | +
|
| 70 | +Expected output:: |
| 71 | + |
| 72 | + Print group 0: |
| 73 | + The type of group is <class 'pandas.core.frame.DataFrame'>. |
| 74 | + Ra Dec mag |
| 75 | + Group Object |
| 76 | + 0 0 80.894 41.269 15.5 |
| 77 | + |
| 78 | + Print group 1: |
| 79 | + The type of group is <class 'pandas.core.frame.DataFrame'>. |
| 80 | + Ra Dec mag |
| 81 | + Group Object |
| 82 | + 1 1 120.689 -41.269 12.3 |
| 83 | + 6 120.690 -41.270 17.8 |
| 84 | + |
| 85 | + Print group 2: |
| 86 | + The type of group is <class 'pandas.core.frame.DataFrame'>. |
| 87 | + Ra Dec mag |
| 88 | + Group Object |
| 89 | + 2 2 10.689 -41.269 18.7 |
| 90 | + 3 10.688 -41.270 14.1 |
| 91 | + 4 10.689 -41.270 16.4 |
| 92 | + 5 10.690 -41.269 13.2 |
| 93 | + |
| 94 | +Each group is also a pandas DataFrame. |
| 95 | + |
| 96 | +.. note:: |
| 97 | + The iterater from ``groupby()`` is extremely slow for large datasets. The current solution is to flatten the |
| 98 | + DataFrame into a single layer of index and manupulate the index directly, or even turn the DataFrame into a numpy array. |
| 99 | + |
| 100 | +If you want DataFrame with a single layer of index and the size of each group as a column, you can use the following code: |
| 101 | + |
| 102 | +.. code-block:: python |
| 103 | +
|
| 104 | + groups_df['group_size'] = groups_df.groupby(level='Group')['Ra'].transform('size') |
| 105 | + groups_df.reset_index(level='Group', inplace=True) |
| 106 | + print(groups_df) |
| 107 | +
|
| 108 | +Expected output:: |
| 109 | + |
| 110 | + Group Ra Dec mag group_size |
| 111 | + Object |
| 112 | + 0 0 80.894 41.269 15.5 1 |
| 113 | + 1 1 120.689 -41.269 12.3 2 |
| 114 | + 6 1 120.690 -41.270 17.8 2 |
| 115 | + 2 2 10.689 -41.269 18.7 4 |
| 116 | + 3 2 10.688 -41.270 14.1 4 |
| 117 | + 4 2 10.689 -41.270 16.4 4 |
| 118 | + 5 2 10.690 -41.269 13.2 4 |
| 119 | + |
| 120 | +get_group_sizes() |
| 121 | +----------------- |
| 122 | + |
| 123 | +To get the size of each group in the order of the group index, use the :func:`pycorrelator.FoFResult.get_group_sizes` method: |
| 124 | + |
| 125 | +.. code-block:: python |
| 126 | +
|
| 127 | + print(result_object.get_group_sizes()) |
| 128 | +
|
| 129 | +Expected output:: |
| 130 | + |
| 131 | + [1, 2, 4] |
| 132 | + |
| 133 | +get_coordinates() |
| 134 | +----------------- |
| 135 | + |
| 136 | +To get the coordinates of the objects in each group, use the :func:`pycorrelator.FoFResult.get_coordinates` method: |
| 137 | + |
| 138 | +.. code-block:: python |
| 139 | +
|
| 140 | + print(result_object.get_coordinates()) |
20 | 141 |
|
21 | 142 | Expected output::
|
22 | 143 |
|
23 | 144 | [[(80.894, 41.269)],
|
24 | 145 | [(120.689, -41.269), (120.69, -41.27)],
|
25 |
| - [(10.689, -41.269), (10.688, -41.27), (10.689, -41.27), (10.69, -41.269)]] |
| 146 | + [(10.689, -41.269), (10.688, -41.27), (10.689, -41.27), (10.69, -41.269)]] |
| 147 | + |
| 148 | +get_group_coordinates() |
| 149 | +----------------------- |
| 150 | + |
| 151 | +To get the center coordinates of each group, use the :func:`pycorrelator.FoFResult.get_group_coordinates` method: |
| 152 | + |
| 153 | +.. code-block:: python |
| 154 | +
|
| 155 | + print(result_object.get_group_coordinates()) |
| 156 | +
|
| 157 | +Expected output:: |
| 158 | + |
| 159 | + [(80.894, 41.269), (120.6895, -41.2695), (10.689 , -41.2695)] |
0 commit comments