Astronomy has long captivated the human imagination, particularly the quest to find planets that resemble Earth in size, composition, and temperature—commonly referred to as Earth-like planets. This search, however, is fraught with challenges. One significant hurdle is that small, rocky planets are notoriously difficult to detect using current planet-hunting techniques, which tend to favor the identification of gas giants. In addition, the temperature of a planet is closely tied to its distance from its host star; for a planet to be considered similar to Earth, it must orbit at a distance comparable to that of Earth’s orbit around the Sun, which takes approximately one year. This requirement means that astronomers face the daunting task of dedicating telescope time to monitor a single star for over a year to confirm the presence of an Earth-like planet.

To optimize the time and resources spent in the search for Earth-like planets, scientists are on the lookout for innovative strategies to identify stars that warrant thorough observation. A team of astronomers has recently explored whether the observable properties of planetary systems could signal the presence of Earth-like planets. Their research revealed that the arrangement of known planets within a system—combined with factors such as mass, radius, and the distance of the closest planet from its star—could potentially serve as indicators for predicting the existence of habitable, Earth-like worlds.

To test their hypothesis, the astronomers turned to machine learning. They began by compiling a sample set of planetary systems that included both Earth-like planets and systems without them. Typically, astronomers have documented around 5,000 stars that host orbiting exoplanets, a number too limited for effective machine learning training. To overcome this limitation, the research team employed a computational framework known as the Bern model, which simulates the formation of planetary systems.

The Bern model starts with 20 clumps of dust, each measuring approximately 600 meters—around 2,000 feet—in diameter. These clumps provide a starting point for the accumulation of gas and dust, leading to the formation of full-sized planets over a span of approximately 20 million years. The system evolves over an expansive timeline of 10 billion years, culminating in what is termed a synthetic planetary system, which the astronomers incorporated into their dataset. Using the Bern model, they generated 24,365 systems with stars comparable in size to the Sun, along with 14,559 systems around half the Sun’s size, and 14,958 systems orbiting stars one-fifth the size of the Sun. Each group was further subdivided into those containing Earth-like planets and those that did not.

With this significantly expanded dataset, the team then applied a machine learning technique known as the Random Forest model. This model operates by categorizing the planetary systems into two groups: those likely to harbor Earth-like planets and those unlikely to do so. In the Random Forest algorithm, outputs are classified as either true or false, while various components, referred to as trees, make decisions based on different sections of the training dataset. The researchers established that if a planetary system was likely to contain one or more Earth-like planets, it would be deemed “true.” They evaluated the accuracy of their algorithm using a metric called the precision score.

To guide its decision-making, the Random Forest model was programmed to consider specific characteristics of each synthetic planetary system. These included the arrangement of planets that astronomers might realistically observe in analogous real-life systems, the total number of planets present, the count of planets larger than 100 times Earth’s mass, and the size and distance of the closest planet to the star. The team utilized 80% of the synthetic planetary systems for training, reserving the remaining 20% for initial evaluations of their completed algorithm.

The results were promising: the Random Forest model achieved a precision score of 0.99, indicating that it could accurately identify systems housing Earth-like planets 99% of the time. Encouraged by this success, the researchers applied their model to real-world data involving 1,567 stars of similar sizes that are known to have at least one planet in orbit. Out of these candidates, 44 met the algorithm's criteria for potentially possessing an Earth-like planet. Importantly, the researchers noted that most systems in this subset were stable enough to support the presence of an Earth-like planet.

While the team expressed confidence in their model's capability to identify stars that may host Earth-like worlds, they acknowledged several caveats. One limitation is the relatively small size of their training data, as creating synthetic planetary systems is both time-consuming and costly. Perhaps more crucial is their reliance on the Bern model’s assumptions regarding the accuracy of planetary formation simulations. The researchers recommended that the validity of the Bern model be rigorously tested in future theoretical studies to enhance the reliability of their findings.