The Forces that Drive Low Complexity Region Evolution
Abstract
Low complexity regions (LCR) are areas within genomes that are of unknown
evolutionary history despite being present in all species. Among different species, they are
known to be variable in length and conservation, but their variability within species (e.g., among
strains) is unknown. Much research has been done on LCRs in Eukaryotes, but prokaryotes have
been mostly overlooked despite the fact that they constitute the basis of every ecosystem on
Earth and also include many human pathogens. Therefore, understanding the way their genomes
evolve is a fundamental step to predict future adaptations, whether in response to environmental
changes or to new drugs to combat diseases. This study aims to investigate the evolutionary
processes of LCRs in bacteria strains by analyzing changes in their length, composition, and
frequency within their genomes. We focus on strains rather than species because this level
provides information on the evolution of these poorly known regions at short timescales, thus
filling a gap in current knowledge that previously has only focused on longer (species-level)
evolutionary timescales. Using a fully computational approach, we analyzed hundreds of
proteomes across multiple Bacteria classes and determined the relative frequency, amino acid
composition, and length of LCRs. Our data show that the overall composition of LCRs compared
to the proteome favors higher levels of Leucine, Lysine, Alanine, and Glycine. This could
suggest a potential role of selection in the evolution of these regions. In terms of LCR length, we
observe that there is a relatively small variation among strains but there are numerous outliers
present and further research is needed to determine why they do not follow the pattern of their
species. It is expected that the variations have to do with habitats and pathogenicity