Multi-View Correlation and Discriminant Analysis:
No Thumbnail Available
Date
2024-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Advancements in data acquisition make multiple data sources available to
explain different perspectives of an object. In order to enhance the performance of a
single-task learning such as classification, the multi-view learning (MVL) leverages the
complementary and consistent information across multiple views. However, MVL has
its own set of challenges. The major issues associated with MVL include selecting
relevant and informative views while discarding the noisy and redundant views,
integrating heterogeneous views while constructing discriminant subspaces, handling
“high-dimension low-sample size” nature of different views, and finding the intrinsic
non-linear class-geometry of the data across all the views. Moreover, applying MVL
under a multi-task learning (MTL) framework, for learning multiple related tasks
simultaneously to improve the performance of single-task MVL, is a major challenge.
In this regard, the thesis introduces some supervised MVL algorithms, based on the
theories of canonical correlation analysis (CCA). In order to construct the
discriminative subspaces while preserving the non-linear class-geometry of the data, a
novel supervised MVL method, termed as class-structure preserving multi-view
correlated discriminant analysis (CSP-MVCDA), is proposed, which judiciously
integrates the merits of multiset CCA (MCCA), linear discriminant analysis (LDA), and
a locality preserving norm. The proposed method jointly optimizes the inter-set
correlation across all the views and intra-set discrimination in each view to obtain a
common discriminative latent space, where the shared and complementary
information across multiple views is exploited. The locality preserving norm with
prior class labels helps to preserve the local class-structure of the data, while both
MCCA and LDA take care of its global class-structure across multiple views. A closed
form solution, based on the generalized eigenvalue problem, makes the proposed
method applicable for high-dimensional multi-omics data integration. In order to
compute view relevance and inter-view dependency for a desired task, and to address
the problem of “high-dimension low-sample size” nature of different views, a novel
supervised MVL method, termed as supervised graph regularized multi-view canonical
correlation and discrimination analysis (SGR-MCCDA), is next introduced based on the
maximum variance formulation of MCCA. Incorporating the known geometry of
source vectors encoded by the within-class and between-class graphs, the proposed
method preserves the class-structure of the data, which facilitates multi-omics cancer
stratification.
In imaging genetics study, sparse models are effective to select diagnosis- or task-
specific features for a comprehensive understanding of the underlying disease, and to
find the genetic basis for the brain function and structure associated with the disease.
In this regard, a new sparse multi-task two-view algorithm, termed as multi-task
learning and sparse discriminant canonical correlation analysis (MTL-SDCCA), is
proposed, judiciously integrating the theories of CCA and LDA under the MTL
framework to find the association between an imaging and a genetic modality. It uses
lasso and group lasso penalties to select the diagnosis-specific and diagnosis-consistent
features from the large number of features to identify group-wise imaging genetic
associations. In order to reduce the high complexity of existing algorithms, under
multiple imaging and genetic modalities, a multi-task multi-view algorithm, termed as
multi-view multi-task sparse canonical correlation analysis (MvMt-SCCA), is proposed,
which learns multiple sparse CCA tasks together for identifying the group-wise
imaging genetic association. Incorporating the lasso and fused lasso penalties, the
proposed method is able to select the modality-wise, class-specific, and class-
consistent features for large-scale imaging genetics studies.
Description
This thesis is under the supervision of Prof. Pradipta Maji
Keywords
Imaging genetics, Multi-omics integration, Canonical correlation analysis, Generalized lasso problem
Citation
199p.
