Structure tensor analysis on proteins: efficient feature extraction for heteromultimeric assembly prediction

Postgraduate Thesis uoadl:2800192 387 Read counter

Unit:
Κατεύθυνση Βιοπληροφορική
Πληροφορική
Deposit date:
2018-10-02
Year:
2018
Author:
Rapti Melivoia
Supervisors info:
Ι. Εμίρης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
Structure tensor analysis on proteins: efficient feature extraction for heteromultimeric assembly prediction
Languages:
English
Translated title:
Structure tensor analysis on proteins: efficient feature extraction for heteromultimeric assembly prediction
Summary:
The knowledge of the shape, structure, and interactions of macromolecules, defines bi- ology at the molecular level in atomic detail. Although knowing the architecture is an im- portant step before reaching the knowledge of the function, it still is a challenging task. Current structure resolution techniques (X-ray Crystallography, cryo-EM, etc.), although quite successful, they fail to generalize well across different types of structures, since each one of these methods is designed for specific kinds of components. A way to combine experimental and computational data regardless of their resolution, is through Integrative Modeling (IM), which provides a comprehensive structural characterization of biomolecules. It gets as input (a) high resolution structures of the individual components composing the supramolecular complex, and (b) low-resolution envelopes of native as- semblies, resulting in biologically relevant supramolecular assemblies consistent with the available set of experimental data. However, IM has limitations when it comes to heteromultimeric complexes, especially in the case of non-symmetric ones, where the heterogeneity increases the computational complexity. Most importantly, the individual components may adopt different conformations whether they are isolated or within their assembly. Very few methods exist to tackle this problem, and even fewer actually suc- ceed; thus, a different way for characterizing and locating these components within their assembly, regardless of their different conformational states, is mandatory. In this work, we exploit the different aspects provided by the field of computer vision, and treat our biological problem as if it was a problem of object recognition. Specifically, we adopt the concept of localizing objects in a scene, and make use of local descriptors and the main steps of SIFT algorithm, for extracting distinctive features (local extrema) from images. Translated to our biological problem, we detect informative features (keypoints) in the atomic structures’ density maps, so as to localize them within their macromolecular as- sembly. Our goal is to diminish the huge number of these extracted features, by specifi- cally searching for corners, as these points remain stable regardless any rotation or change. We adopt the principles of Harris corner detector and expand them by using three-dimensional structure tensor analysis (STA). The significance lies in the fact that the eigenvalues and the corresponding eigenvectors of the structure tensor, describe the principal curvatures of the neighborhood around the local extrema. Based on the statistics of the eigenvalues’ ratios, we apply multiple types of thresholding under differ- ent configurations, and benchmark the STA set of parameters on 54 different structures. For the evaluation of the parameters, we compare the extracted keypoints with a set that is known – from the already existing software – to lead to correct assembly predic- tion. Experimental results show the existence of parameter sets that remove almost all of the unstable keypoints (false positives), others that retain almost all of the stable ones (true positives), while others provide solutions that can balance the trade-off be- tween these two. Finally, we verify that there are specific complexes (1Z5S, 2GC7) without a trustworthy density profile, since no solutions can be obtained for every reso- lution. The proposed method considerably speeds up the existing software by reducing the computational complexity – a key issue for heteromultimers, and is a general and accurate way for extracting localized features for correct assembly prediction, which can serve as a baseline for studying the dynamics of these keypoints under conformational changes.
Main subject category:
Technology - Computer science
Keywords:
macromolecular structure, protein subunit localization, keypoint detection, Harris corner detection, extrema extraction
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
30
Number of pages:
48
Thesis_MSc_MRapti.pdf (12 MB) Open in new window