Developing Pattern Recognition and Interpretable Convolutional Neural Network based Frameworks for Identifying Drug resistant and Pan cancer miRNAs from Expression Data
No Thumbnail Available
Date
2026-01-14
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Micro Ribonucleic Acids (miRNAs) are short length (∼24) non-coding RNAs and are considered as key
biomarkers in cancer diagnosis and treatment. They play a vital role in classifying cancer patients from
normal ones and drug resistant patients from control ones. The control patients are those who have not
received any drug for cancer treatment. The objective is to identify a subset of miRNAs those help in the
classification of the patients using expression data. The thesis is comprised of four contributory chapters in
addition to an introduction and conclusion. In the first two contributory chapters, computational methods for
ranking and selecting miRNAs associated with drug resistance in cancer are introduced. In the fourth and
fifth chapters, deep learning based methods are presented for identifying miRNAs for various cancer classes
in pan-cancer data. The contributory chapters are as follows:
Selecting drug-resistant miRNAs in cancer using Euclidean distance with fold change based score.
Integrating fuzzy rough set-based entropies for identifying drug resistant miRNAs and classifying
cancer patients.
Interpretable convolutional neural network for selecting miRNAs from multiple cancer classes and
cancer subtypes through pan-cancer analysis.
Set-theoretic explainable AI-based attribution score for identifying miRNAs in pan-cancer data.
In Chapter 1, an introduction to the related problems, literature review, motivation, and the organization of
the thesis are provided. In Chapter 2, two methods to predict the miRNAs associated with drug resistance in
cancer are presented. While, in the first method, a score is developed using the Euclidean distance with
weighted fold change (EDWFC), in the second method, a histogram-based clustering and Euclidean distance
with fold change-based ranking (HCEDFCR) is introduced. The EDWFC provides a ranked list of miRNAs
for classifying control and drug-resistant patients and the HCEDFCR returns a group of miRNAs associated
with drug resistance. The methods are trained with the help of existing biological knowledge. In Chapter 3,
two new z score based fuzzy rough relevance and redundancy entropies are developed, and then a weighted
framework is introduced to integrate the entropies for ranking and selecting miRNAs. The selected miRNAs
are used for classifying the control and drug-resistant patients. In Chapter 4, an interpretable onedimensional
convolutional neural network model (ICNNM) is developed and it is optimized in terms of
hyperparameters for identifying classes of patients among multiple cancer classes in pan-cancer data. An
attribution scores is also introduced using SHapley Additive exPlanations (SHAP) values for interpreting the
miRNAs and selecting important miRNAs for each cancer class. In Chapter 5, a multi-objective framework
for optimizing hyperparameters of a 1D CNN, called MOHCNN, and a set-theoretic explainable AI-based
attribution scores (STEAAS) for miRNA selection are developed. The objectives for optimization are
training error, validation error, and the number of training parameters. A set-theoretic explainable AI-based
attribution score is developed for identifying miRNAs in various cancers. The score of a miRNA is
represented by an ordered pair, where the first part represents the class score of the miRNA, and the second
part denotes the reliability score of that miRNA for belonging to the class. The miRNAs with high class
scores and reliability scores in a class are selected.
All the developed methods are compared with related miRNA and gene selection techniques and popular
classifiers. Data from public repositories such as Gene Expression Omnibus (GEO) and The Cancer Genome
Atlas (TCGA) data) are used. The biological significance of the miRNAs, selected by the developed
methods, is established using publicly available web based bioinformatics tools and existing literature.
Description
This thesis is under the supervision of Prof. Shubhra Sankar Ray
Keywords
Pattern Recognition, Explainable AI, miRNA Expression, Drug Resistance, Pan cancer
Citation
165p.
