A method for identifying TSS from CAGE data using a Genomic Signal Processing approach

Postgraduate Thesis uoadl:2865306 305 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική
Πληροφορική
Deposit date:
2019-03-07
Year:
2019
Author:
Grigoriadis Dimitris
Supervisors info:
Καθ. Άρτεμις Χατζηγεωργίου, Καθηγήτρια Βιοπληροφορικής, Τμήμα Μηχανικών Η/Υ, Τηλεπικοινωνιών και Δικτύων του Πανεπιστημίου Θεσσαλίας
Original Title:
A method for identifying TSS from CAGE data using a Genomic Signal Processing approach
Languages:
English
Greek
Translated title:
A method for identifying TSS from CAGE data using a Genomic Signal Processing approach
Summary:
Genomic signal processing (GSP) can solve various biological problems in low computational cost. There are many mathematical algorithms mostly used for gene identification and comparison between sequences, making use of several DNA representations that did not evolve due to lack of efficiency.
The knowledge of the exact position of the transcription start sites (TSS), which is the location where transcription starts at the 5'-end of a gene sequence in an RNA molecule, is critical for the identification of the regulatory regions that flank it. Many approaches have been mentioned in the literature about locating the TSS positions.
This study presents a novel method for identifying transcription start sites (TSS) from CAGE (Cap Analysis of Gene Expression) data and applying features and techniques borrowed form GSP that also aim to identify TSSs. A fairly new representation method for nucleotides has been introduces able to extract and represent the information in a time series signal vector.
Signals were transferred from time to frequency domain, which allows for filtering artifacts in an efficient robust way, and vice versa. Several filters have been used and their parameters were optimized to maximize the accuracy and performance in results.
In the context of this work a fully modular computational tool has been designed and implemented using GSP techniques and mathematical algorithms able to detect TSSs with high accuracy.
The method was tested in real human cells (H9 line) with data downloaded from FANTOM5 repository and the accuracy has been compared with other algorithms and the ground truth. All the results are presented in this study.
Main subject category:
Technology - Computer science
Keywords:
Transcription start sites, genomics, signal processing, CAGE, FFT, GSP
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
141
Number of pages:
70
File:
File access is restricted only to the intranet of UoA.

Report_final.pdf
1 MB
File access is restricted only to the intranet of UoA.