Battistuzzi, Fabia UrsulaPowell, Christopher Lowell EdwardOleksyk, Taras KBlumer-Schuette, Sara E2024-09-252024-09-252023-01-01https://hdl.handle.net/10323/18155Over the past three decades, computational capabilities have grown at such a rapid rate that they have given rise to many computationally heavy science fields such as phylogenomics. As increasingly more genomes are sequenced in the three domains of life, larger and more species-complete phylogenetic tree reconstructions are leading to a better understanding of the Tree of Life and the evolutionary histories in deep times. However, these large datasets pose unique challenges from a modeling and computational perspective: accurately describing the evolutionary process of thousands of species is still beyond the capability of current evolutionary models while the computational burden limits our ability to exhaustively explore and test multiple hypotheses. These limitations become even more problematic when attempting to estimate the absolute times within these phylogenetic reconstructions (timetrees). These time estimations are not only constrained computationally by run times and resource requirements but also bound by the availability of fossil data to estimate divergence times for the evolution of species (primary calibrations). All of these issues are particularly severe in prokaryotes, because of the high number of species available in databases, their large evolutionary variability, and the few primary calibrations available. Yet, they represent two out of the three domains of life and are therefore key to reconstructing the Tree of Life. This combination of computational and data constraints is forcing researchers to make choices on the datasets being analyzed without a clear understanding of the consequences of these choices on the accuracy of the results obtained. This work presents an in-depth analysis of the effects of dataset choices on the reconstruction of phylogenetic histories using a newly developed tool (Phylogenetic Assessment of Taxon Sampling) that will enable fast, simple, and reproducible testing of taxon sampling. The PATS pipeline is available on GitHub: https://github.com/BlabOaklandU/PATSBioinformaticsMolecular clockPhylogeneticsSimulationsTaxon samplingAssessment of Taxon Sampling on Phylogenetic Reconstructions and Timetrees: A New Methodology And Application