Dissertations - M Tech (CS)

Permanent URI for this collectionhttps://dspace.isical.ac.in/handle/10263/2147

These Dissertations were submitted in partial fulfilment of the requirements for the award of M TECH (Computer Science) Degree of Indian Statistical Institute

Browse

Search Results

Now showing 1 - 10 of 608

Air-Writing Recognition
(Indian Statistical Institute, 2026-06-15) Shukla, Gaurang
Air-writing is the act of tracing characters or words in free space with a fingertip, recorded by a camera, giving a touch-free input modality for smart displays, augmented and virtual reality, and assistive interfaces. It is difficult because the finger never lifts: connecting strokes join adjacent letters with no pen-up signal to mark boundaries, and the same word varies widely in scale, position, and slant across writers. The WiTA benchmark of Kim et al. provides a large, person-disjoint dataset and a baseline that treats each clip as RGB video, recognised by a spatio-temporal 3D residual network trained with a CTC objective, reaching a character error rate (CER) of 0.292 on the English subset. The main goal of this dissertation was to improve on this error rate, which we achieve: we replace raw video with an explicit fingertip-trajectory sequence extracted from hand landmarks, fed to a Conformer encoder with a joint CTC/attention head. The resulting system attains a test CER of 0.219, improving on the published 0.292 of Kim et al. and 0.299 of Tan et al. by 15–27% relative.
Galaxy Morphology Classification Using Deep Learning
(Indian Statistical Institute, 2026-06-19) Kundu Roy, Dipanwita
Galaxy morphology is the study of the shape and visual appearance of galaxies, such as spiral, smooth, edge-on, and other morphological types. Morphological classification plays an important role in understanding how galaxies form and evolve over cosmic time. Most existing machine learning approaches for galaxy morphology classification rely solely on RGB galaxy images, which primarily capture spatial information and lack the physical spectral context of galaxies. In contrast, astronomical spectral datacubes contain rich information across multiple wavelengths, providing insights into the internal and physical properties of galaxies. However, such spectral observations are available for only a limited number of objects. Motivated by the availability of spectral information during training, this work investigates whether spectral knowledge can be used to improve morphology classification when only RGB images are available during testing. Different embedding techniques, including Siamese Networks, Supervised Autoencoders, and channel-based architectures, are explored to learn meaningful latent representations from spectral datacubes. These embeddings are then integrated with corresponding RGB galaxy images using multimodal deep learning frameworks for effective feature learning and classification. Experimental results demonstrate that incorporating spectral embeddings during training can guide the learning process and improve galaxy morphology classification performance using image-only inputs at inference time.
Spectral Unmixing using Machine Learning
(Indian Statistical Institute, 2026-06-15) Dhar, Debashis
Spectral Unmixing is an important field of study nowadays which focuses on gener ating fractional abundance of each pixel into constituent materials .In this thesis we have tried to unmix each pixel into three end members namely glacial lake,debris and others with primarily focusing on glacial lake.We have performed various meth ods of linear spectral unmixing and non linear spectral unmixing. These methods are applied on the collected LandSat Data of east Himalayan terrain .Experimental results demonstrate the effectiveness of the proposed approach in achieving high accuracy and efficiency in glacier lake tracking on LandSat data.
Efficiency Improvement of RAG based SLM for Edge Devices
(Indian Statistical Institute, 2026-06-16) Avanigadda, Pavan Prashanth
The increasing need to deploy language models on constrained devices has given rise to efficiency issues in retrieval-augmented generation (RAG) approaches. Although RAGs boost answers’ quality by retrieving knowledge from external sources, current methods utilize static retrieval mechanisms, resulting in unnecessary computation, higher latencies, and inefficiency in resource usage. In this work, an efficient RAG approach based on small language models (SLMs) is presented, which uses a efficient and adaptive retrieval scheme. This method dynamically changes the retrieval depth and context constrution based on the complexity of the query, using a trained MLP router whose routing decisions are learned from Adaptive-RAG-style oracle labels rather than hand-written rules, leading to a compromise between performance and efficiency. A full pipeline is provided, including dataset preprocessing, corpus generation, embedding construction, vector indexation, retrieval, and answer generation processes. Experiments were performed on HotpotQA bench mark and SQuAD 2.0 datasets, comparing the presented approach with the baseline RAG approach using static retrieval scheme. Experimental results show that the proposed approach lowers the computation cost while providing similar answers’ quality. By adaptively controlling retrieval and context size, the framework provides an effective solution for deploying RAG systems in constrained environments.
Developing a Model to Generate More Digital Data of Indian Languages for Multilingual Applications
(Indian Statistical Institute, 2026-06-16) Bagde, Arya
Most of India’s scheduled languages remain critically under-served by language technology because parallel (translated) text — the raw material that modern multilingual systems depend on — is extremely scarce. Back-translation can synthesise such data automatically, but its quality varies enormously, and unfiltered synthetic data can be worse than no data at all. This dissertation develops a framework that generates synthetic parallel data for four low-resource Indian languages spanning three language families and four scripts — Assamese (Indo-Aryan, Bengali script), Bodo (Tibeto-Burman, Devanagari), Manipuri (Tibeto-Burman, Bengali script) and Santali (Austroasiatic, Ol Chiki)—and introduces CASCADE, a learned multi-signal quality gate that scores each synthetic pair from four cheap signals: semantic similarity, round-trip consistency, length ratio, and language-identification confidence. CASCADE attains a held-out ROC–AUC of 0.954 and 91.7% accuracy at separating well-aligned pairs from misaligned ones, outperforming every single-signal filter (best single signal: round-trip chrF++, AUC 0.932). An ablation shows that round-trip consistency is the dominant signal, while language-identification confidence carries no quality information (AUC 0.534). Three algorithms are contributed on top of the gate: a cost-aware staged cascade that reduces filtering compute by 37.6%, a quality-diversity selector that preserves the data diversity that naive top-K filtering destroys, and an adaptive operating-point selector. Finally, a single multilingual generator of 52.5M parameters is trained from scratch on the curated, tagged data; steered by a target-language token, it produces text in all four languages in their correct scripts, and the low-resource languages benefit measurably from joint training. An honest analysis characterises when quality filtering improves downstream translation and when generation quality, rather than selection, is the binding constraint.
Non-Euclidean Geometries and Fairness Constraints in Advanced Clustering
(Indian Statistical Institute, 2026-06-23) Seal, Arnab
A fundamental challenge in modern unsupervised learning is adapting classical clustering algorithms to handle complex, real-world data constraints. Traditional models often assume data resides in a flat, Euclidean space and optimize strictly for cluster cohesion, thereby failing to capture intrinsic hierarchical structures and ignoring sociotechnical demographic biases. This thesis addresses these critical limitations by extending generalized mean-shift dynamics into two novel clustering frameworks. First, to natively accommodate data with tree-like structures (e.g., taxonomies and social networks), we propose Hyperbolic Gaussian Blurring Mean Shift (HypeGBMS). By projecting data into the Poincar´e ball model and utilizing M¨obius vector space operations, HypeGBMS successfully generalizes density-based clustering to non-Euclidean manifolds. Second, to tackle algorithmic bias in noisy datasets, we introduce Fair Possibilistic C-Means (F-PCM). By embedding a group-fairness Kullback-Leibler divergence penalty into the possibilistic objective function, F-PCM explicitly enforces demographic parity without sacrificing the outlier-robust nature of possibilistic typicalities. We provide rigorous theoretical proofs for both methodologies, including convergence guarantees, statistical consistency, and optimization bounds via Majorization-Minimization. Extensive experiments on complex real-world datasets demonstrate that HypeGBMS dramatically improves cluster quality on hierarchical data, while F-PCM maintains strict fairness criteria while matching the computational efficiency of traditional baselines.
Enhanced Embedding for Multimodal Medical Visual Question and Answering
(Indian Statistical Institute, 2026-06-19) Suna, Akash
Visual Answering of questions in the field of Medical which is called as (VqA) has grown as a dominant area of research that fuse processing of natural language and vision of computer often known as CV or NLP to assist in medical decision-making. However, effective multimodal fusion between medical images and clinical questions remains a significant challenge. This thesis examines the application of the Perceiver IO architecture as an efficient multimodal aggregator for medical VQA. The work has been carried out in multiple directions. First, a classification-based framework is developed by combining Vision Transformer (ViT) and ClinicalBERT alongside a Perceiver IO aggregator to perform multimodal fusion for generating answers to medical questions. In the second approach, the florENCE TWO vision language model is finely tuned with parameter-efficient techniques to enable generative medical VQA. Finally, a hybrid architecture is introduced, where Perceiver IO is employed as a fusion module to integrate visual and textual representations, which are then used to condition the Florence-2 model for answer generation. In this thesis, the VQARAD dataset and ImageCLEF Med VQA 2019 dataset are used for the experiments. Through these explorations, the thesis examines both the classification and generative paradigms in area of medical Visual Question and answering and analyzes the importance of Perceiver IO for multimodal understanding. Several challenges encountered in hybrid modeling are discussed, along with right direction in terms of research in future, including the growing development of effective fusion strategies and the extension of the proposed approaches to larger medical VQA datasets. Overall, this thesis contributes to the understanding of efficient multimodal fusion techniques and provides insights for building both classification and more general systems in medical question along with answers on visual data.
Kernelizing Protein Interaction Languages: Spectral Approximations and Random Fourier Features
(Indian Statistical Institute, 2026-06-17) Ghosh, Aishik
Protein-peptide interactions play an important role in many biological phenomena, spanning adaptive immunity to disease pathology. In the Sliding Window Interaction Grammar (SWING) framework, interactions are represented as sequences of biochemical tokens embedded using Doc2Vec, allowing robust generalisation to unobserved MHC alleles. However, classification remains limited to a single Euclidean feature space that is incapable of resolving binding landscapes. This dissertation develops SWING for four distinct kernel types: Gaussian, Laplacian, anisotropic (ARD), and the Spectral Mixture (SM) kernel, each approximated using scalable Random Fourier Features. The SM kernel incorporates prior knowledge about secondary structure into its spectral density as biological priors through optimisation by kernel target alignment, and the ARD kernel learns dimension-specific scales to downweight noise. Late-fusion ensembling is achieved through stacked meta-classification across diverse feature spaces. Evaluating the proposed method across five peptide-MHC binding datasets reveals that the SM kernel attains the highest AUROC in all settings, capturing 39% to 80% of the remaining headroom in each towards perfect predictions, including 80% in a mixed class dataset on which other kernels saturate. These results show that directly encoding secondary structure periodicity into the kernel leads to consistent and generalising improvements compared to the SWING approach.
Simultaneous Tumor Delineation and Report Generation from Brain MR Images
(Indian Statistical Institute, 2026-06-15) Mallik, Adish
Brain tumor analysis is an important application of medical image computing, where accurate segmentation and interpretation of tumor regions can support diagnosis and treatment planning. However, existing methods often address tumor segmentation and radiology report generation as separate tasks. Moreover, one of the major challenges in report generation tasks from MRI is accurate tumor localization. Most models fail to locate the lobe and hemisphere in which the tumor is located, causing incorrect report generation. In this regard, a unified 3D vision-language framework is proposed for simultaneous brain tumor segmentation and report generation from multi-modal MRI. Given T1, T2, T1C, and FLAIR volumes, the proposed model predicts clinically meaningful tumor regions and generates a textual description of tumor location and appearance. In order to encourage consistency between the predicted segmentation and generated report, the proposed model judiciously integrates a Swin UNETR-based 3D encoder-decoder, a multi-scale lesion tokenizer, auxiliary clinical grounding heads, and an iterative crossmodal refinement module. Moreover, anatomy and laterality heads are introduced, which provide clinical hints to the LLM, allowing better tumor localization. Further, an iterative refinement is incorporated so that the generated report and segmented outputs can refine each other, finally producing better outputs. Extensive experimentation on BraTS2020 and TextBraTS data sets shows that the proposed model achieves a mean Dice of 81.60% and HD95 of 6.21. In addition, the model achieves a BERTScore-F1 of 0.9226, clinical laterality F1 of 0.8459, clinical anatomy F1 of 0.7539 and clinical pathology F1 of 0.9976. These results indicate that the proposed framework can generate clinically meaningful reports while accurately localizing tumor regions and maintaining strong alignment between segmentation and textual interpretation.
An Efficient Hierarchical Deployment Of Sensors For K-coverage In Planner Wireless Sensor Network
(Indian Statistical Institute, 2026-06-17) Singh, Abhay Raj
Ensuring reliable sensing coverage is a fundamental challenge in wireless sensor networks (WSNs), particularly when multiple sensors are monitoring each location to provide robustness against node failures. In this work, we address the problem of deterministic k-coverage in planar WSN by proposing a hierarchical triangular lattice-based deployment strategy that organizes sensor locations across di↵erent refinement levels and guarantees coverage of every point in the sensing domain by at least k sensors. Each lattice is three-colorable, and selective activation of color classes ensures adjustable coverage guarantees. We prove that activating a single color class at refinement level t guarantees at least 4t-coverage inside a triangular region. Using a base-4 decomposition of the required coverage level k, we construct a deployment scheme that achieves arbitrary k-coverage while minimizing the number of sensors. For finite irregular hexagonal (IRH) domain, we drive closed-form expressions for the exact number of sensors and minimum sensor spacing required to ensure k-coverage. Analytical comparison with existing IRH edge-overlap, IRH inner-diamond and square-band based deployment strategies show that the proposed method either reduces the required number of sensors or increases the minimum sensor spacing while maintaining the similar coverage guarantees. The analytical results are also validated against extensive simulations. The proposed framework provides a constructive and scalable approach for efficient sensor deployment in large-scale wireless sensing systems.

Dissertations - M Tech (CS)

Browse

Filters

Settings

Sort By

Results per page

Search Results