Battistuzzi, FabiaHelander, Aaron2017-05-082017-05-08http://hdl.handle.net/10323/4549Low complexity regions (LCR) are areas within genomes that are of unknown evolutionary history despite being present in all species. Among different species, they are known to be variable in length and conservation, but their variability within species (e.g., among strains) is unknown. Much research has been done on LCRs in Eukaryotes, but prokaryotes have been mostly overlooked despite the fact that they constitute the basis of every ecosystem on Earth and also include many human pathogens. Therefore, understanding the way their genomes evolve is a fundamental step to predict future adaptations, whether in response to environmental changes or to new drugs to combat diseases. This study aims to investigate the evolutionary processes of LCRs in bacteria strains by analyzing changes in their length, composition, and frequency within their genomes. We focus on strains rather than species because this level provides information on the evolution of these poorly known regions at short timescales, thus filling a gap in current knowledge that previously has only focused on longer (species-level) evolutionary timescales. Using a fully computational approach, we analyzed hundreds of proteomes across multiple Bacteria classes and determined the relative frequency, amino acid composition, and length of LCRs. Our data show that the overall composition of LCRs compared to the proteome favors higher levels of Leucine, Lysine, Alanine, and Glycine. This could suggest a potential role of selection in the evolution of these regions. In terms of LCR length, we observe that there is a relatively small variation among strains but there are numerous outliers present and further research is needed to determine why they do not follow the pattern of their species. It is expected that the variations have to do with habitats and pathogenicityDNABioinformaticsLow complexity regionsThe Forces that Drive Low Complexity Region EvolutionThesis