Skip to content

Commit dc2df48

Browse files
committed
Finish writing the FoF tutorial.
1 parent 80ce5af commit dc2df48

File tree

4 files changed

+147
-12
lines changed

4 files changed

+147
-12
lines changed

docs/source/tutorial/fof.rst

Lines changed: 144 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,159 @@
11
Example: Clustering using FoF algorithm
22
=======================================
33

4+
A friend of friends (FoF) algorithm is useful when you want to find groups of objects that are close
5+
to each other forming clusters. Here is an example of how to perform clustering using the **pycorrelator** package.
6+
7+
First, let's create a mock catalog:
8+
49
.. code-block:: python
510
611
import pandas as pd
12+
13+
# Create a mock catalog as a pandas DataFrame
14+
catalog = pd.DataFrame([[80.894, 41.269, 15.5], [120.689, -41.269, 12.3],
15+
[10.689, -41.269, 18.7], [10.688, -41.270, 14.1],
16+
[10.689, -41.270, 16.4], [10.690, -41.269, 13.2],
17+
[120.690, -41.270, 17.8]], columns=['ra', 'dec', 'mag'])
18+
19+
.. note::
20+
If you want to use a format other than a pandas DataFrame,
21+
see the :doc:`supported formats <input_validation>` for more information.
22+
23+
group_by_quadtree()
24+
-------------------
25+
26+
Then, we can perform clustering using the FoF algorithm with the tolerance of 0.01 degree using the
27+
:func:`pycorrelator.group_by_quadtree` function.
28+
29+
.. code-block:: python
30+
731
from pycorrelator import group_by_quadtree
32+
result_object = group_by_quadtree(catalog, tolerance=0.01)
33+
34+
The result object contains the clustering results. Four methods are available to get the results in different formats:
35+
36+
get_group_dataframe()
37+
---------------------
38+
39+
To get the clustering results with the appendind data (``"mag"`` in this case), use the
40+
:func:`pycorrelator.FoFResult.get_group_dataframe` method:
41+
42+
.. code-block:: python
843
9-
# Create a mock catalog
10-
catalog = pd.DataFrame([[80.894, 41.269], [120.689, -41.269],
11-
[10.689, -41.269], [10.688, -41.270],
12-
[10.689, -41.270], [10.690, -41.269],
13-
[120.690, -41.270]], columns=['ra', 'dec'])
44+
groups_df = result_object.get_group_dataframe()
45+
print(groups_df)
1446
15-
# Perform the clustering
16-
result = group_by_quadtree(catalog, tolerance=0.01)
47+
Expected output::
48+
49+
Ra Dec mag
50+
Group Object
51+
0 0 80.894 41.269 15.5
52+
1 1 120.689 -41.269 12.3
53+
6 120.690 -41.270 17.8
54+
2 2 10.689 -41.269 18.7
55+
3 10.688 -41.270 14.1
56+
4 10.689 -41.270 16.4
57+
5 10.690 -41.269 13.2
1758

18-
# Get the result
19-
print(result.get_coordinates())
59+
This method returns a pandas DataFrame with two layers of indices: the group index and the object index from the original catalog.
60+
61+
You can iterate through each group by:
62+
63+
.. code-block:: python
64+
65+
for group_index, group in groups_df.groupby('Group'):
66+
print(f"Print group {group_index}:")
67+
print(f"The type of group is {type(group)}.")
68+
print(group, end="\n\n")
69+
70+
Expected output::
71+
72+
Print group 0:
73+
The type of group is <class 'pandas.core.frame.DataFrame'>.
74+
Ra Dec mag
75+
Group Object
76+
0 0 80.894 41.269 15.5
77+
78+
Print group 1:
79+
The type of group is <class 'pandas.core.frame.DataFrame'>.
80+
Ra Dec mag
81+
Group Object
82+
1 1 120.689 -41.269 12.3
83+
6 120.690 -41.270 17.8
84+
85+
Print group 2:
86+
The type of group is <class 'pandas.core.frame.DataFrame'>.
87+
Ra Dec mag
88+
Group Object
89+
2 2 10.689 -41.269 18.7
90+
3 10.688 -41.270 14.1
91+
4 10.689 -41.270 16.4
92+
5 10.690 -41.269 13.2
93+
94+
Each group is also a pandas DataFrame.
95+
96+
.. note::
97+
The iterater from ``groupby()`` is extremely slow for large datasets. The current solution is to flatten the
98+
DataFrame into a single layer of index and manupulate the index directly, or even turn the DataFrame into a numpy array.
99+
100+
If you want DataFrame with a single layer of index and the size of each group as a column, you can use the following code:
101+
102+
.. code-block:: python
103+
104+
groups_df['group_size'] = groups_df.groupby(level='Group')['Ra'].transform('size')
105+
groups_df.reset_index(level='Group', inplace=True)
106+
print(groups_df)
107+
108+
Expected output::
109+
110+
Group Ra Dec mag group_size
111+
Object
112+
0 0 80.894 41.269 15.5 1
113+
1 1 120.689 -41.269 12.3 2
114+
6 1 120.690 -41.270 17.8 2
115+
2 2 10.689 -41.269 18.7 4
116+
3 2 10.688 -41.270 14.1 4
117+
4 2 10.689 -41.270 16.4 4
118+
5 2 10.690 -41.269 13.2 4
119+
120+
get_group_sizes()
121+
-----------------
122+
123+
To get the size of each group in the order of the group index, use the :func:`pycorrelator.FoFResult.get_group_sizes` method:
124+
125+
.. code-block:: python
126+
127+
print(result_object.get_group_sizes())
128+
129+
Expected output::
130+
131+
[1, 2, 4]
132+
133+
get_coordinates()
134+
-----------------
135+
136+
To get the coordinates of the objects in each group, use the :func:`pycorrelator.FoFResult.get_coordinates` method:
137+
138+
.. code-block:: python
139+
140+
print(result_object.get_coordinates())
20141
21142
Expected output::
22143

23144
[[(80.894, 41.269)],
24145
[(120.689, -41.269), (120.69, -41.27)],
25-
[(10.689, -41.269), (10.688, -41.27), (10.689, -41.27), (10.69, -41.269)]]
146+
[(10.689, -41.269), (10.688, -41.27), (10.689, -41.27), (10.69, -41.269)]]
147+
148+
get_group_coordinates()
149+
-----------------------
150+
151+
To get the center coordinates of each group, use the :func:`pycorrelator.FoFResult.get_group_coordinates` method:
152+
153+
.. code-block:: python
154+
155+
print(result_object.get_group_coordinates())
156+
157+
Expected output::
158+
159+
[(80.894, 41.269), (120.6895, -41.2695), (10.689 , -41.2695)]

docs/source/tutorial/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ This section contains tutorials on how to use the **pycorrelator** package.
77

88
input_validation
99
xmatch
10-
fof
10+
fof

docs/source/tutorial/xmatch.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ First, let's create two mock catalogs A and B:
1010
1111
import numpy as np
1212
13-
# Create two mock catalogs
13+
# Create two mock catalogs as numpy arrays
1414
catalogA = np.array([[80.894, 41.269], [120.689, -41.269], [10.689, -41.269]])
1515
catalogB = np.array([[10.688, -41.270], [10.689, -41.270], [10.690, -41.269], [120.690, -41.270]])
1616

pycorrelator/result_fof.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ def get_group_coordinates(self) -> list[tuple]:
2929
A list of tuples of coordinates of the center of each group.
3030
"""
3131
objects_coordinates = self.catalog.get_coordiantes()
32+
# [FIXME] This return a list of NDArrays, not a list of tuples.
3233
return [np.average(objects_coordinates[g, :], axis=0) for g in self.result_list]
3334

3435
def get_group_sizes(self) -> list[int]:

0 commit comments

Comments
 (0)