Computational Study of the Structure and Organization of Conserved Noncoding Elements (CNE) in Eukaryotic Genomes as a tool in order to elucidate their potential function and evolutionary dynamics

Doctoral Dissertation uoadl:1309491 539 Read counter

Unit:
Τομέας Βιοχημείας Μοριακής Βιολογίας
Library of the School of Science
Deposit date:
2014-10-29
Year:
2014
Author:
Πολυχρονόπουλος Δημήτριος
Dissertation committee:
Γ. Κ. Ροδάκης Καθηγητής ΕΚΠΑ (επιβλέπων), Σ. Ι. Χαμόδρακας Καθηγητής ΕΚΠΑ, Δρ. Γ. Αλμυράντης Ερευνητής Α’ Ε.Κ.Ε.Φ.Ε. "Δημόκριτος"
Original Title:
Υπολογιστική Μελέτη της Δομής και της Οργάνωσης των Συντηρημένων μη Εκφραζομένων Στοιχείων (CNE) στα Ευκαρυωτικά Γονιδιώματα ως εργαλείο διερεύνησης της πιθανής λειτουργίας και της εξελικτικής δυναμικής τους
Languages:
Greek
Translated title:
Computational Study of the Structure and Organization of Conserved Noncoding Elements (CNE) in Eukaryotic Genomes as a tool in order to elucidate their potential function and evolutionary dynamics
Summary:
In the present thesis, we attempted to analyse the spatial organization of
Conserved Noncoding Elements (CNEs) in vertebrate and invertebrate genomes with
the aim to investigate whether we could deduce how those sequences evolved. We
found out that the distances of consecutive CNEs follow power law-like
distributions in a variety of genomes. Such kinds of distributions are
associated with long range correlations and fractality (notions that have been
proposed for the conformation of the chromatin inside the nucleus) and seem to
occur frequently in the genome as evidenced by the study of different genomic
elements in a variety of organisms. Given that CNEs are spatially associated
with genes, especially with those that regulate developmental processes, we
verified by appropriate gene masking that a power-law-like pattern emerges
irrespectively of whether elements found inside protein-coding genes are
excluded or not. In addition, we found that the more ancient elements form the
most extended linearities in log log plots, when the distances between ancient
CNEs are plotted. An evolutionary model was put forward for the understanding
of these findings that includes segmental or whole genome duplication events
and eliminations (loss) of most of the duplicated CNEs. Simulations reproduce
the main features of the observed size distributions. Power-law-like patterns
in the genomic distributions of CNEs are in accordance with current knowledge
about their evolutionary history in several genomes. CNEs display interesting
DNA composition preferences. This prompted us to investigate whether we could
classify them by means of their sequence characteristics alone. More
specifically, CNEs are generally AT rich sequences while they are surrounded by
regions of low AT content. We attempted to classify constrained elements in
general (exons and CNEs) using two machine learning approaches: N-Gram Graphs
(NGGs) and Logic Alignment Free (LAF). The application of those of two
methodologies in the field of genomics is presented for the first time in this
thesis. Overall, we managed to effectively classify genomic sequences of
functional (or presumably functional) roles into different categories between
genomes or inside the same genome. We used pairwise comparisons to do our
analysis and naturally – occuring surrogate sequences that are of the same
length and GC content with each one of the sequences comprising the studied
dataset (CNEs / exons). We compared the classification rates obtained using
both these approaches (NGGs and LAF) with another methodology, widely
implemented in disciminating whole genomes, that is called «Genomic Signatures»
(GS). Our study is the first one demonstrating the applicability of the GS
approach in disciminating short biological sequences of length < 50 kb. For the
sake of all the above mentioned approaches, we also proceeded to the
identification of new Conserved Noncoding Elements in the human (H. sapiens),
worm (C. elegans) and insect (D. melanogaster) genomes. In those case, the
species selected for CNE identification are characterized by the fact that
evolutionary distances with every pair of whole genome alignments are close. We
managed to discriminate those sequences efficiently and proposed biological
interpretations. More specifically, CNE that display high sequence similarity (
> 95% and up to 100%) between human / chicken whole genome alignments are
thought to compose a distinct category of ultraconserved elements that probably
play roles in processes that are yet to be determined. This remarkable
percentage of sequence similarity is even greater than the one observed for
exonic sequences (comparing the two organisms, human / chicken) while there is
no known function that requires such a high degree of conservation.
Keywords:
Conserved noncoding elements, Comparative genomics, Ultraconserved elements, Classification of biological sequences, Genome evolution
Index:
No
Number of index pages:
0
Contains images:
Yes
Number of references:
165
Number of pages:
xiii, 150
document.pdf (3 MB) Open in new window