Data processing approach for the construction and evaluation of an organism’s UNIQUOME

Postgraduate Thesis uoadl:2817059 421 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική
Πληροφορική
Deposit date:
2018-11-02
Year:
2018
Author:
Pierros Vasileios
Supervisors info:
Δρ. Τσάγκαρης Γεώργιος, Ειδικός Λειτουργικός Επιστήμονας Α' βαθμίδα, ΙΙΒΕΑΑ
Δρ. Αναστασιάδου Έμα, Ερευνήτρια Δ' βαθμίδα, ΙΙΒΕΑΑ
Δρ. Αναγνωστόπουλος Αθανάσιος,Ειδικός Λειτουργικός Επιστήμονας , ΙΙΒΕΑΑ
Original Title:
Υπολογιστική μέθοδος για την κατασκευή και αξιολόγηση του UNIQUOME ενός οργανισμού
Languages:
Greek
Translated title:
Data processing approach for the construction and evaluation of an organism’s UNIQUOME
Summary:
In a previous study we had identified the shortest aminoacids appearing solely in one protein, thus defining the uniquome of a set of proteins or proteome. For a peptide to be unique, it should not appear again in the set of proteins (except those coming from the same protein).
Human proteome consists of more than 20,000 reviewed proteins. The separation of human proteome in distinct peptides range from 4 to 100 aminoacids creates a search space of around 500 x 106 peptides. That renders the construction of the uniquome a quite expensive, in resource usage problem, in terms of CPU, RAM and storage. In this study we present the tools we developed for the construction and evaluation of the uniquome.
A tool suite was developed in C# that implemented the uniquome construction algorithm. It is highly scalable since it can work in parallel, distributed or hybrid mode, based on the underlying hardware. User can either set the parameters, or let the tool propose a data separation strategy based on the hardware. User can also decide (if any) the method in which the data should be grouped together (i.e. protein family). Finally, the tool can perform various meta data analysis in the created data and can be configured to run scripts in PERL for further knowledge extraction.
Our tool has constructed and stored the human uniquome in an i7 laptop with 8 cores and 32Gb of RAM in less than 30 % of the time needed by the previous PERL implementation. The developed tools were implemented in .NET Core so that they can be executed in all mainstream Operating Systems (Windows, Linux, MacOS) and run in a wide variety of hardware ranging from a SOC (like Raspberry) to high end servers with tens of CPUs.
With the use of this suite researchers can relatively quickly construct and evaluate the uniquome of any organization. Tool offers a variety of export formats such as Text, FASTA, CSV, JSON.
Main subject category:
Science
Keywords:
proteomics, parallel computation, distributed computation, core unique peptide, proteins
Index:
Yes
Number of index pages:
7
Contains images:
Yes
Number of references:
15
Number of pages:
84
Data processing approach for the construction and evaluation of an organisms UNIQUOME.pdf (3 MB) Open in new window