The Exploration of Habitable Exoplanets using Data Mining Algorithms and Data Manipulation Hrithik Pai 1 , Srideep Dornala 1 , Aly Nathoo 1 , Sarina Mayya 1 , Winnifred Regan 1 , Ojasw Upadhyay 1 , Shashank Karthik Rajan 1 , Araav Diwan 1 , Prachi Soni 1 , and Robert Downing 1 1 Aspiring Scholars Directed Research Program, 46307 Warm Springs Blvd, Fremont, CA 94539 The NASA Exoplanet Archive is a dataset that is an extrac- tion from the total sets of data from the Keck, Kepler, TESS, and Gaia observations, where observations show that the ob- served stellar objects have been determined to possess one or more planets. It is continually updated as more and more exo- planets, or planets outside our own solar system, are discovered and documented. Our first objective was to see how many of these entries were duplicates, which would bring the total num- ber of entries we would work with from 29,283 to 4,259. In pre- vious research, this dataset was filtered by determining which of these exoplanets are inside their Circumstellar Habitable-Zone (CHZ), commonly defined as the range of distance from a host star such that a planet may contain liquid water, a key require- ment for life as we know it. However, this calculation was done only for exoplanets with M-type host stars. Over the course of our research, we were able to expand this calculation of the CHZ to exoplanets with host stars of all spectral types. We performed more in-depth investigation of planets with G, K, and M types stars by comparing them to planets in the Planetary Habitable Labratory (PHL) exoplanet dataset to see how many similari- ties there are. The PHL catalog used its own set of criteria to define those planets in it as habitable. Using this method, we determined that there were 3 exoplanets with M-type host stars, 0 exoplanets with a G-type host star, and 1 exoplanet with a K- type host star. Habitable Zone | Kth Nearest Neighbor | Duplicates | Planet Radius | Luminos- ity Correspondence: robert.downing@asdrp.org Introduction Background. The question of whether life exists beyond Earth is a question that has been asked throughout history, and right now we have the tools to begin answering it. People began with theories like the Drake equation, which is a prob- abilistic argument intended to estimate the number of active, communicating extraterrestrial civilizations in our galaxy. Now, after over a decade, the NASA Exoplanet Archive, which currently holds over 29000 confirmed exoplanets, has been and is still being updated as more and more exoplan- ets are discovered and documented. This is done by three main instruments, the Keck, Kepler, and Gaia telescopes, as well as many other telescopes, using many methods including the radial velocity (RV) technique, gravitational microlens- ing, most reliably transits, and most recently direct imaging (1). This source dataset will act as our input data as we sort through the various attributes of each entry. Discovering the habitability of the exoplanets outside our solar system would tell us much about the evolution and nature of other plane- tary and celestial bodies besides our own. It would change our perception of our place in the universe, in the same way the Copernican revolution did, giving us insight to just how expansive the world around us truly is and just how much potential it harbors. Planet Habitability. Arguably one of the most important contributors to habitability is the planet’s ability to sustain liquid water, meaning it is inside its HZ where the plane- tary temperature is just right to be livable and kept stable. At a pressure of 1 atmosphere, water is liquid only across the temperature range of 0 Celsius to 100 Celsius. However it is worth noting that at higher pressures, water can remain liquid over a larger range of temperatures. At lower pres- sures the temperature range for liquid water is smaller, and below a pressure of 0.006 atmospheres, no liquid water can exist; it is all either solid (ice) or gaseous (water vapour). Whether the planet is far away or close enough from its host star is part of what determines whether its in its HZ, and therefore whether water will be liquid. After the formation of a solar system, changes in the star’s interior means that it becomes brighter and hotter. Therefore, both the inner and outer boundaries of the HZ move outwards with time. The continuous habitable zone (CHZ) is defined as the overlap between habitable zones at two different (widely-separated) times, and represents the region where water can remain liq- uid over timescales long enough for life to form and evolve. Previously, using the planet’s semi-minor and semi-major axes (shortest and longest radii of an elliptical orbit, respec- tively), we were able to determine which exoplanets in the NASA Exoplanet Archive stay within their host star’s CHZ and can support liquid water. Planets also need an atmosphere that can protect the surface from harmful radiation, which is determine by various things, one of which is the parent star type. This is their classifi- cation based on their spectral characteristics, primarily their absolute magnitude, or the measure of how bright the star ap- pears at a standard distance of 10 parsecs. The electromag- netic radiation from the star is analyzed by splitting it with a prism or diffraction grating into a spectrum exhibiting the rainbow of colors interspersed with spectral lines. Each line Pai et al. | bioRχiv | May 15, 2021 | 1–6