Nonconvex Optimization Algorithms for Structured Matrix Estimation in Large-Scale Data Applications

Doctoral Dissertation uoadl:2778669 143 Read counter

Department of Informatics and Telecommunications
Deposit date:
Giampouras Paraskevas
Dissertation committee:
Αθανάσιος Ροντογιάννης, Διευθυντής Ερευνών, Εθνικό Αστεροσκοπείο Αθηνών
Κωνσταντίνος Κουτρούμπας, Διευθυντής Ερευνών, Εθνικό Αστεροσκοπείο Αθηνών
Σέργιος Θεοδωρίδης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
Πέτρος Μαραγκός, Καθηγητής, Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Η/Υ, ΕΜΠ
Νικόλαος Καλουπτσίδης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
Κωνσταντίνος Μπερμπερίδης, Καθηγητής, Τμήμα Μηχανικών Η/Υ & Πληροφορικής, Πανεπιστήμιο Πατρών
Ελευθέριος Κοφίδης, Αναπληρωτής Καθηγητής, Τμήμα Στατιστικής Και Aσφαλιστικής Eπιστήμης, Πανεπιστήμιο Πειραιώς
Original Title:
Nonconvex Optimization Algorithms for Structured Matrix Estimation in Large-Scale Data Applications
Translated title:
Nonconvex Optimization Algorithms for Structured Matrix Estimation in Large-Scale Data Applications
Structured matrix estimation belongs to the family of learning tasks whose main goal is to reveal low-dimensional embeddings of high-dimensional data. Nowadays, this task appears in various forms in a plethora of signal processing and machine learning applications. In the present thesis, novel mathematical formulations for three different instances of structured matrix estimation are proposed. Concretely, the problems of a) simultaneously sparse, low-rank and nonnegative matrix estimation, b) low-rank matrix factorization and c) online low-rank subspace learning and matrix completion, are addressed and analyzed. In all cases, it is assumed that data are generated by a linear process, i.e., we deal with linear measurements. A suite of novel and efficient {\it optimization algorithms} amenable to handling {\it large-scale data} are presented. A key common feature of all the introduced schemes is {\it nonconvexity}. It should be noted that albeit nonconvexity complicates the derivation of theoretical guarantees (contrary to convex relevant approaches, which - in most cases - can be theoretically analyzed relatively easily), significant gains in terms of the estimation performance of the emerging algorithms have been recently witnessed in several real practical situations.

Let us first focus on simultaneously sparse, low-rank and nonnegative matrix estimation from linear measurements. In the thesis this problem is resolved by three different optimization algorithms, which address two different and novel formulations of the relevant task. All the proposed schemes are suitably devised for minimizing a cost function consisting of a least-squares data fitting term and two regularization terms. The latter are utilized for promoting sparsity and low-rankness. The novelty of the first formulation lies in the use, for the first time in the literature, of the sum of the reweighted $\ell_1$ and the reweighted nuclear norms. The merits of reweighted $\ell_1$ and nuclear norms have been exposed in numerous sparse and low-rank matrix recovery problems. As is known, albeit these two norms induce nonconvexity in the resulting optimization problems, they provide a better approximation of the $\ell_0$ norm and the rank function, respectively, as compared to relevant convex regularizers. Herein, we aspire to benefit from the use of the combination of these two norms. The first algorithm is an incremental proximal minimization scheme, while the second one is an ADMM solver. The third algorithm's main goal is to further reduce the computational complexity. Towards this end, it deviates from the other two in the use of a matrix factorization based approach for modelling low-rankness. Since the rank of the sought matrix is generally unknown, a low-rank imposing term, i.e., the variational form of the nuclear norm, which is a function of the matrix factors, is utilized. In this case, the optimization process takes place via a block coordinate descent type scheme. The proposed formulations are utilized for modelling in a pioneering way a very important problem in hyperspectral image processing, that of hyperspectral image unmixing. It is shown that both sparsity and low-rank offer meaningful interpretations of inherent natural characteristics of hyperspectral images. More specifically, both sparsity and low-rankness are reasonable hypotheses that can be made for the so-called {\it abundance} matrix, i.e., the nonnegative matrix containing the fractions of presence of the different materials, called {\it endmembers}, at the region depicted by each pixel. The merits of the proposed algorithms over other state-of-the-art hyperspectral unmixing algorithms are corroborated in a wealth of simulated and real hyperspectral imaging data experiments.

In the framework of low-rank matrix factorization (LRMF) four novel optimization algorithms are presented, each modelling a different instance of it. All the proposed schemes share a common thread: they impose low-rank on both matrix factors and the sought matrix by a newly introduced regularization term. This term can be considered as a generalized weighted version of the variational form of the nuclear norm. Notably, by appropriately selecting the weight matrix, low-rank enforcement amounts to imposing joint column sparsity on both matrix factors. This property is actually proven to be quite important in applications dealing with large-scale data, since it leads to a significant decrease of the induced computational complexity. Along these lines, three well-known machine learning tasks, namely, denoising, matrix completion and low-rank nonnegative matrix factorization (NMF), are redefined according to the new low-rank regularization approach. Then, following the block successive upper bound minimization framework, alternating iteratively reweighted least-squares, Newton-type algorithms are devised accounting for the particular characteristics of the problem that each time is addressed. Lastly, an additional low-rank and sparse NMF algorithm is proposed, which hinges upon the same low-rank promoting idea mentioned above, while also accounting for sparsity on one of the matrix factors. All the derived algorithms are tested on extensive simulated data experiments and real large-scale data applications such as hyperspectral image denoising, matrix completion for recommender systems, music signal decomposition and unsupervised hyperspectral image unmixing with unknown number of endmembers.

The last problem that this thesis touches upon is online low-rank subspace learning and matrix completion. This task follows a different learning model, i.e., online learning, which offers a valuable processing framework when one deals with large-scale streaming data possibly under time-varying conditions. In the thesis, two different online algorithms are put forth. The first one stems from a newly developed online variational Bayes scheme. This is applied for performing approximate inference based on a carefully designed novel multi-hierarchical Bayesian model. Notably, the adopted model encompasses similar low-rank promoting ideas to those mentioned for LRMF. That is, low-rank is imposed via promoting jointly column sparsity on the columns of the matrix factors. However, following the Bayesian rationale, this now takes place by assigning Laplace-type marginal priors on the matrix factors. Going one step further, additional sparsity is independently modelled on the subspace matrix thus imposing multiple structures on the same matrix. The resulting algorithm is fully automated, i.e., it does not demand fine-tuning of any parameters. The second algorithm follows a cost function minimization based strategy. Again, the same low-rank promoting idea introduced for LRMF is incorporated in this problem via the use of a - modified to the online processing scenario - low-rank regularization term. Interestingly, the resulting optimization scheme can be considered as the deterministic analogue of the Bayesian one. Both the proposed algorithms present a favorable feature, i.e., they are competent to learn subspaces without requiring the a priori knowledge of their true rank. Their effectiveness is showcased in extensive simulated data experiments and in online hyperspectral image completion and eigenface learning using real data.
Main subject category:
Technology - Computer science
structured matrix estimation, nonconvex optimization, large-scale data, online learning
Number of index pages:
Contains images:
Number of references:
Number of pages:
demo_pergamos.pdf (4 MB) Open in new window