Skip to content

Code to verify if a collection of stars found in the Gaia database is an open cluster

License

Notifications You must be signed in to change notification settings

AstroHackWeek2020/Random-forest-open-cluster

 
 

Repository files navigation

Random-forest-open-cluster

Made at #AstroHackWeek

Code to verify if a collection of stars found in the Gaia database is an open cluster.

Open clusters:

group of stars (up to several thousand) formed from a single molecular cloud and having approximately the same age.

You may have seen such open clusters in the sky in the constellation Taurus... It's M45 or the Pleiades!

Pleiades large.jpg

Typical features of open clusters:

1) Similar chemical composition

In articles on astronomy, you can see the terms "abundance" and "metallicity". The first term is usually used to define the amount of a chemical element in relation to the amount of such an element in the Sun, but can be defined differently. The second term usually refers to the ratio of the abundance of Fe in a star (object) to the abundance of Fe in the Sun, but can be defined differently.

2) Located either in the galactic disk or near it:

The younger the open cluster the further away from the galactic center and the older ones go beyond the disk. (in spiral galaxies).

3) Star formation:

After the collapse and fragmentation of a giant molecular cloud, young stars appear. The hottest stars of the spectral class O-B form H II regions around them, where new stars are formed.

4) Gravitationally bound group of stars:

The stars in open clusters are connected by gravity and move together in space, located about the same distance from the observer. (This means that the parallax and proper motion of stars within open clusters must be similar.)

If the gravitational connection is lost, but the stars still move in the same direction at similar speeds, then this group of stars is called the stellar association.

Hertzsprung–Russell diagram

shows us the age of the cluster (which also indicates the type of cluster: open, globular, etc.) using the characteristic deviation from the main sequence (MS) and the initial main sequence.

Gaia

Global Astrometric Interferometer for Astrophysics are made by ESA (European Space Agency)

Space telescope have worked in the optical wavelength range. The main goal is to create a 3D map of the distribution of stars in our Milky Way Galaxy.

Gaia mission insignia

GAIA DATA RELEASE 2 (GAIA DR2)

Description

Dataset used for first iteration of this project

(reference file: load_data.ipynb)

From catalog: A+A/618/A93[1]

We filter stars for which Gaia didn't measure magnitude or colour

Evaluation metrics

We use RandomForestClassifier in sklearn

  1. For parameters max_depth=2, random_state=0

Accuracy score on test set:

0.9382113821138212

Confusion matrix: [[265 38] [ 0 312]]

  1. For parameters max_depth=20, random_state=0

Accuracy score on test set: 0.9951219512195122

Confusion matrix: [[300 3] [ 0 312]]

Next steps

  1. Cross validation and optimisation of the hyperparameters of the random forest.
  2. Get additional evaluation metrics (accuracy score, precision, recall, F1-score, etc.) on training and test sets, and decide on the meaningful ones to optimize for.

References

  1. A Gaia DR2 view of the Open Cluster population in the Milky Way https://arxiv.org/abs/1805.08726

About

Code to verify if a collection of stars found in the Gaia database is an open cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 88.5%
  • Python 11.5%