Chapter 1: Introduction
With our planet being just one out of the billion planets that make up our galaxy, together with billions
of stars and the other galaxies that make up our cosmos, the vastness of it all may seem daunting. But
now, with Machine learning, we finally have a chance to search through the soup that is our cosmos and
make intelligible assumptions and/or conclusions from the big data that makes up our universe.
Machine learning offers a broad range of algorithms and modeling tools that are used for a broad array
of data processing. In recent times, it has been claimed that machine learning has greatly affected how
cosmologists interpret large data (Ntampaka, Michelle, et al., 2019).
In the next decade to come, to have the opportunity to make progressive sense from large cosmological
data, we must ensure that our limiting factor does not become the statistical and data-driven tools and
models we choose to adopt. Machine learning offers itself as a promising tool for the interpretation of
cosmological data, thus, affording us a chance at achieving breakthrough records as we try to
understand our rather complex universe.
Seeing that we are at a critical period of precision cosmology, the large amount of data and precision
theory have given us the opportunity to constraint the values of cosmological parameters with
exceptional precision (Planck Collaboration et al., 2020)
The aim of this research is to highlight some of the ways in which Machine learning models have
become vital to the way cosmological data is collected, analyzed, and finally interpreted. This research
work also aims to show how adopting various machine learning simulations can cause a positive effect
on how large cosmological data sets are interpreted in the coming decade.
At the end of this research work, I hope to interpret real-life data sets with my proposed machine
learning model and give a detailed explanation of how machine learning models have evolved over the
years to help in unlocking the complexity of our universe, in terms of moving us closer to precision
cosmology and meaningful interpretation of large cosmological data.
Chapter 2: Background
2.1 Machine Learning simulations and Cosmology
The Universe is believed to be made up of three essential parts, namely; dark energy (that is responsible
for the accelerated expansion of the universe), dark matter (that makes up the majority of the mass
density of our universe), and ordinary visible matter (that includes stars, planets, etc). In the study of the
Cosmos, dark matter plays a crucial role in the formation of galaxies. Galaxies clusters that form as a
result have become one of the areas of application of machine learning. Another area of machine
learning application is in the unknowns that are as a result of the particle nature of dark matter, which is
the main source of expansion of our universe.
Cosmology refers to the study of the universe, that is, its content and its evolution. Machine learning has
proven to be useful in cosmic probes that include galaxy clustering, strong and weak gravitational
lensing, supernovae, and cosmic microwave background. Machine learning models have become
essential in categorizing and detecting cosmic sources. Machine Learning (ML) models are also used to
extract information from images.
Nowadays, we see machine learning algorithms being successfully employed in classification, clustering,
regression, and/or dimensional reduction task of large sets of high-dimensional input data (Marsland, S.
Machine Learning (CRC Press, Taylor & Francis Inc., Boca Raton, FL, 2014)). However, some common
machine learning models that have been employed in cosmological studies include the supervised
machine learning model, especially in studying the problems of galaxy formation and the evolution of
semi-analytical models (SAM) (Kamdor, Turk & Brunner 2016). Machine Learning (ML) has allowed the
inference of a few complex phenomena and has also provided a distinct and strong connection between
the Dark Matter regime, that is large galaxy scales, and the baryonic regime (smaller galaxy scales)
It is also known that SAM has previously been used to train some ML algorithms with considerable
success (Breiman 2001; Geurts, Ernst & Wehenkel 2006). ML algorithms basically learn approximate
relationships between input data and output data so that they can draw a useful inference. A supervised
Machine Learning model is usually used for this.
ML in recent times has been applied in Astronomy with a decent record of success (Ball & Brunner 2010;
Ivezić et al. 2014). The areas of ML applications in some subfields in astronomy are as follows:
classification problems such as star–galaxy classification (Ball et al. 2006; Kim et al. 2015), applications in
regression such as photometric redshift estimation (Ball et al. 2007; Gerdes et al. 2010; Kind & Brunner
2013), galaxy morphology classification (Banerji et al. 2010; Dieleman et al. 2015), determining stellar
labels from spectroscopic data (Ness et al. 2015), and estimation of stellar atmospheric parameters
(Fiorentin et al. 2007).
Machine Learning models, with their non-parametric nature and powerful predictive capabilities,
provide an opportunity to study galaxy formation and evolution remarkably. For example, to study
galaxy formation and to analyze the full extent of the influence of DM halos on galaxies in the backdrop
of SAMs, some ML algorithms that can be employed are decision trees, random trees (RT), extremely
random trees (ERT) and k-Nearest Neighbors (kNN). To then measure how well these algorithms are
learning the relationship in the given data set, the Mean Square Error (MSE) is used, which is defined as
follows:
Source: Google
{n}
=
number of data points
Yi
=
observed values
Yi
=
predicted values
Machine learning provides a solid framework to study the halo-galaxy connection in the backdrop of
SAMs. However, it is important to note that even though ML has shown progressive strides in halogalaxy probes, ML is not a replacement for SAMs.
2.2 Futuristic Prospects of Machine Learning Models in Cosmology
Cosmology is rich in data. In the previous years, Machine Learning has greatly improved how this data is
interpreted by cosmologists. Also, we find a large deposit of data in Astronomy, waiting to be discovered.
Thus, to harness this big bank of data discovered in modern-day astronomy, efficient ML algorithms are
required.
As the future we anticipated comes into view with every year, there arises a need to be more strategic
in the interpretation of big data to reach important hypotheses and accurate conclusions. Realizing the
full potential of Machine Learning and how it affects the cosmos becomes of utmost importance. As we
step into the past speculations and predictions of scientists about our universe, creating ML models and
algorithms that actually work and can find usefulness in various fields is critical.
The interpretability of ML is one area of progressive research that promises to increase the quality and
diversity of interpretation models in data science in the next decade. On the other hand, Cosmology
proves itself a challenge for ML researchers in that it brings new tasks and questions about how to
employ ML models in interpreting data sets. These cosmological challenges serve as breakthrough
opportunities in the basic understanding of ML. However, at the point where Cosmology meets ML lies a
unique showcasing of the benefits that both fields provide.
With the appearance of more data, both small and big, Astronomy, it seems, has stepped into the big
data era. These data sets afford ML more opportunities for application in both cosmology and
astronomy. One of such opportunities is the big data availability to LSST (LSST Science Collaboration et
al.,2009). The continuous development and future implementation of carefully designed and selected
ML algorithms at both the image processing (Goulding et al., 2018; Dai & Tong, 2018; Ack-ermann et al.,
2018) and catalog (Narayan et al., 2018; Malz et al., 2018) levels have the potential of producing
meaningful advances in our ability to efficiently extract scientifically useful information, for example,
classification, distance, morphology, and mass, from the LSST data.
Another area of application of ML is Supernovae Cosmology. Machine learning simulations provide a
means for a more accurate supernova classification which is critical in analyzing massive public
supernova data sets.
As astronomical data sets are becoming larger and more difficult to process, ML has become
increasingly popular (Ball & Brunner 2010; Bloom & Richards 2012). It is known that only Type 1a
supernova is used for cosmology. However, Supernova cosmology is now possible without knowing the
supernova type being used. An example is using Bayesian methods (Kunz et al. 2007; Hlozek et al. 2012;
Newling et al. 2012; Knights et al. 2013; Rubin et al. 2015).
Thus far, ML has made great strides in explaining the complexity of the cosmos and even greater
successes have been recorded in different researches cutting across different fields. There is still,
however, a long way to go in achieving outstanding results in the field of cosmology using ML in the next
decade to come. The successes recorded in Machine Learning in Cosmology hint at the great potential
that ML is offering for data discovery and interpretation, especially as the data sets become bigger and
the complexity increases.
References:
1.
2.
3.
4.
5.
Ball & Brunner 2010; Bloom & Richards 2012
Ball et al. 2007; Gerdes et al. 2010; Kind & Brunner 2013
Ball & Brunner 2010; Ivezić et al. 2014
Ball et al. 2006; Kim et al. 2015
Banerji et al. 2010; Dieleman et al. 2015
6. Breiman 2001; Geurts, Ernst & Wehenkel 2006
7. Fiorentin et al. 2007
8. Goulding et al., 2018; Dai & Tong, 2018; Ack-ermann et al., 2018) and catalog (Narayan et al.,
2018; Malz et al., 2018
9. Kamdor, Turk & Brunner 2016
10. Kunz et al. 2007; Hlozek et al. 2012; Newling et al. 2012; Knights et al. 2013; Rubin et al. 2015)
11. LSST Science Collaboration et al.,2009
12. Marsland, S. Machine Learning (CRC Press, Taylor & Francis Inc., Boca Raton, FL, 2014
13. Ness et al. 2015
14. Narayan et al., 2018; Malz et al., 2018
15. Ntampaka, Michelle, et al., 2019
16. Planck Collaboration et al., 2020