Supervisors info:
Ιωάννης Παναγάκης, Αναπληρωτής Καθηγητής, Τμήμα Πληροφορικής Και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Summary:
Generative Adversarial Networks (GANs) are deep learning based generative models that learn to map noise latent vectors to high fidelity images. Recent work has shown that the input latent space can be decomposed to semantically meaningful directions. Moving towards these directions corresponds to human interpretable image transformations. For example, from high level aspects such as face shape and general hair style, to smaller scale facial features to color schemes and microstructures, everything can be controlled by moving in the corresponding GAN latent space direction.
In order to achieve image editing by identifying latent space directions, previous state-of-the-art methods either based on supervised approaches or leverage the Principal Components Analysis (PCA) algorithm. The former have a tremendous disadvantage for the range of directions that can be explored, as they rely on a human-annotated set of scores for each attribute. The latter tend to use the same method with minor modifications, resulting in similar experimental observations.
In this work, we approach the problem of discovering semantic directions in an unsupervised way, using semidefinite programming to perform nonlinear dimensionality reduction of the internal representation of GANs. In particular, we examine the generation mechanism of GANs and further utilize the famous algorithm of Maximum Variance Unfolding, also known as Semidefinite Embedding, to identify semantically meaningful directions by decomposing the pretrained weights. Furthermore, extensive experiments are conducted on the state-of-the art GAN architectures, StyleGAN and StyleGANv2, for 7 different datasets.
To our knowledge, this is the first work to approach this problem from the perspective of semidefinite programming. While the computational cost can be high, the results clearly demonstrate its superiority in various experiments, while in others they can be compared with the results of the most recent supervised and unsupervised methods. Code is available at https://github.com/PanPapag/MVUGAN.
Keywords:
GAN, Image Editing, Semantic Directions, Latent Space, Semidefinite Programming