Please use this identifier to cite or link to this item:
http://hdl.handle.net/10263/7488
Title: | Multi-View Correlation and Discriminant Analysis: |
Other Titles: | Structure Preservation, Sparsity to Multi-Task Learning |
Authors: | Mondal, Sankar |
Keywords: | Imaging genetics Multi-omics integration Canonical correlation analysis Generalized lasso problem |
Issue Date: | Dec-2024 |
Publisher: | Indian Statistical Institute, Kolkata |
Citation: | 199p. |
Series/Report no.: | ISI Ph. D Thesis;TH |
Abstract: | Advancements in data acquisition make multiple data sources available to explain different perspectives of an object. In order to enhance the performance of a single-task learning such as classification, the multi-view learning (MVL) leverages the complementary and consistent information across multiple views. However, MVL has its own set of challenges. The major issues associated with MVL include selecting relevant and informative views while discarding the noisy and redundant views, integrating heterogeneous views while constructing discriminant subspaces, handling “high-dimension low-sample size” nature of different views, and finding the intrinsic non-linear class-geometry of the data across all the views. Moreover, applying MVL under a multi-task learning (MTL) framework, for learning multiple related tasks simultaneously to improve the performance of single-task MVL, is a major challenge. In this regard, the thesis introduces some supervised MVL algorithms, based on the theories of canonical correlation analysis (CCA). In order to construct the discriminative subspaces while preserving the non-linear class-geometry of the data, a novel supervised MVL method, termed as class-structure preserving multi-view correlated discriminant analysis (CSP-MVCDA), is proposed, which judiciously integrates the merits of multiset CCA (MCCA), linear discriminant analysis (LDA), and a locality preserving norm. The proposed method jointly optimizes the inter-set correlation across all the views and intra-set discrimination in each view to obtain a common discriminative latent space, where the shared and complementary information across multiple views is exploited. The locality preserving norm with prior class labels helps to preserve the local class-structure of the data, while both MCCA and LDA take care of its global class-structure across multiple views. A closed form solution, based on the generalized eigenvalue problem, makes the proposed method applicable for high-dimensional multi-omics data integration. In order to compute view relevance and inter-view dependency for a desired task, and to address the problem of “high-dimension low-sample size” nature of different views, a novel supervised MVL method, termed as supervised graph regularized multi-view canonical correlation and discrimination analysis (SGR-MCCDA), is next introduced based on the maximum variance formulation of MCCA. Incorporating the known geometry of source vectors encoded by the within-class and between-class graphs, the proposed method preserves the class-structure of the data, which facilitates multi-omics cancer stratification. In imaging genetics study, sparse models are effective to select diagnosis- or task- specific features for a comprehensive understanding of the underlying disease, and to find the genetic basis for the brain function and structure associated with the disease. In this regard, a new sparse multi-task two-view algorithm, termed as multi-task learning and sparse discriminant canonical correlation analysis (MTL-SDCCA), is proposed, judiciously integrating the theories of CCA and LDA under the MTL framework to find the association between an imaging and a genetic modality. It uses lasso and group lasso penalties to select the diagnosis-specific and diagnosis-consistent features from the large number of features to identify group-wise imaging genetic associations. In order to reduce the high complexity of existing algorithms, under multiple imaging and genetic modalities, a multi-task multi-view algorithm, termed as multi-view multi-task sparse canonical correlation analysis (MvMt-SCCA), is proposed, which learns multiple sparse CCA tasks together for identifying the group-wise imaging genetic association. Incorporating the lasso and fused lasso penalties, the proposed method is able to select the modality-wise, class-specific, and class- consistent features for large-scale imaging genetics studies. |
Description: | This thesis is under the supervision of Prof. Pradipta Maji |
URI: | http://hdl.handle.net/10263/7488 |
Appears in Collections: | Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thesis-SANKAR_MONDAL-8-1-25.pdf | Thesis | 18.72 MB | Adobe PDF | View/Open |
Form17-Sankar Mondal-8-1-25.pdf | Form 17 | 428.91 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.