Code to verify if a collection of stars found in the Gaia database is an open cluster. Clustering algorithms (e.g. DBSCAN) returns tens of cluster candidates for every few square degrees of the sky area when they are applied to the Gaia DR2 database data. In reality some of these cluster candidates are not open clusters but rather are fluctuations of the Galactic disk stellar population. These collections of stars are expected to have very different ages, absorptions and metallicities. In this repository we train random forest and convolutional neural network to distiungush between actual open cluster and general Galactic disk population.
- Open clusters
- Gaia
- Dataset for first iteration
- Evaluation metrics
- Next steps
- Files description
- References
group of stars (up to several thousand) formed from a single molecular cloud and having approximately the same age.
You may have seen such open clusters in the sky in the constellation Taurus... It's M45 or the Pleiades!
Another types of clusters which can potentially be found in the Sky are globular clusters and young stellar clusters. Globular clusters contain from hundred thousands to a few millions of stars. There are approximately 150 globular clusters in our Galaxy and all of them are known. Young stellar clusters are objects typically embedded in gas and dust and cannot be seen in the Gaia data.
In articles on astronomy, you can see the terms "abundance" and "metallicity". The first term is usually used to define the amount of a chemical element in relation to the amount of such an element in the Sun, but can be defined differently. The second term usually refers to the ratio of the abundance of Fe in a star (object) to the abundance of Fe in the Sun, but can be defined differently.
Stars in the open clusters are born from the same gas cloud in a single episode of star formation, so they all are expected to have the same metallicity.
The younger the open cluster the further away from the galactic center and the older ones go beyond the disk. (in spiral galaxies).
After the collapse and fragmentation of a giant molecular cloud, young stars appear. The hottest stars of the spectral class O-B form H II regions around them, where new stars are formed.
The stars in open clusters are connected by gravity and move together in space, located about the same distance from the observer. (This means that the parallax and proper motion of stars within open clusters must be similar.)
If the gravitational connection is lost, but the stars still move in the same direction at similar speeds, then this group of stars is called the stellar association. Cluster could get gravitationally unbound when the gas and dust component is lost, also massive stars explodes as supernova. The cluster could lose stars due to the tidal forces from the Galactic disk.
is a powerful diagnostical tool in the stellar astronomy. The absolute magnitude of a star (y-axis) is an indications of the star size and mass (surface gravity) while the stellar colour (x-axis) shows stellar temperature. Stars spends most of their lives at the core nuclear burning stage where exists a unique relation between stellar mass and its temperature. This relation can be seen as the nearly stright line in the Hertzsprung–Russell diagram and is called the main sequence. Stars at the top of main sequence burn the nuclear fuel much faster than stars at the bottom of the main sequence. Therefore if stellar population was born at the same time the main sequence starts to disappear from the top. The Hertzsprung–Russell diagram shows us the age of the cluster (which also indicates the type of cluster: open, globular, etc.) using the characteristic deviation from the main sequence (MS) and the initial main sequence.
Global Astrometric Interferometer for Astrophysics are made by ESA (European Space Agency)
Space telescope have worked in the optical wavelength range. The main goal is to create a 3D map of the stars in our Milky Way Galaxy. The Gaia provides 5 element astrometry solution (parallax, proper motion and stellar coordianates) as well as the photometric measurements for stars.
(reference file: load_data.ipynb)
From catalog: A+A/618/A931
We filter stars for which Gaia didn't measure magnitude or colour
We use RandomForestClassifier in sklearn
- For parameters max_depth=2, random_state=0
Accuracy score on test set:
0.9382113821138212
Confusion matrix: [[265 38] [ 0 312]]
- For parameters max_depth=20, random_state=0
Accuracy score on test set: 0.9951219512195122
Confusion matrix: [[300 3] [ 0 312]]
- Cross validation and optimisation of the hyperparameters of the random forest.
- Get additional evaluation metrics (accuracy score, precision, recall, F1-score, etc.) on training and test sets, and decide on the meaningful ones to optimize for.
- data - contains: npy files with data and examples of plots either true or non clusters
- example notebooks - contains: jupyter notebooks
- methods script - contains: scripts of our method
You can start from here:
- example_notebooks/pull_clusters.ipynb - requests information about cluster membership from catalogue and bin Herzprung-Russell diagram into 400 2D bins (20 bins for color and 20 bins for G magnitude)
- example_notebooks/random_forest.ipynb - reads data and trains a Random Forest
- example_notebooks/cnn.ipynb - reads data and trains a Convolutional neural network
1. A Gaia DR2 view of the Open Cluster population in the Milky Way