Dissertations - M Tech (CS)
Permanent URI for this collectionhttp://164.52.219.250:4000/handle/10263/2147
These Dissertations were submitted in partial fulfilment of the requirements for the award of M TECH (Computer Science) Degree of Indian Statistical Institute
Browse
586 results
Search Results
Item Glacier velocity estimation using Adaptive Search Window and Patch size(Indian Statistical Institute, Kolkata, 2025-06) Patil, PratikSynthetic Aperture Radar (SAR) technology offers a robust solution for monitoring glacier surface motion, particularly in regions with challenging environmental conditions, since it do not dependence on time of day and weather. This paper presents an enhanced glacier motion monitoring approach based on a Deep Matching Network (DMN), which learns patch-pair correspondences in an end-to-end manner. Unlike traditional shallow feature tracking methods, DMN utilize deep feature similarity through a Siamese network architecture with dense connection blocks to maximize feature reuse and improve training efficiency. To further improve precision and reduce computational cost, the proposed method uses a variable search window and adaptive patch sizing, enabling efficient and accurate motion estimation across diverse glacier terrains. Experimental results demonstrate the effectiveness of the proposed approach in achieving high accuracy and efficiency in glacier motion tracking on SAR data.Item Complexity Results in Some Clustering Algorithms(Indian Statistical Institute, Kolkata, 2025-06) Das, RajdeepDensity-Based Spatial Clustering of Applications with Noise (DBSCAN) is a prevalent Clustering method without supervision renowned for its capability to recognize arbitrarily shaped clusters and detect noise in spatial data. Unlike partitioning methods such as k-means, DBSCAN operates without inputting a predefined number of clusters and is particularly effective in handling datasets with varying densities. In this dissertation, we have undertaken an in-depth exploration of the DBSCAN algorithm. We reviewed and analyzed several research papers that build upon or revise the original DBSCAN framework, with the goal of understanding their motivations, design choices, and computational implications. In addition to studying the foundational principles, we examined traditional spatial data structures that are commonly employed to accelerate DBSCAN, such as R-trees and KD-trees. This background enabled us to identify key computational bottlenecks in both neighbor search and density estimation. Building on these insights, we proposed two novel algorithms. The first is an approximate algorithm that efficiently replicates standard DBSCAN behavior, and the second is a modified version termed Box-based DBSCAN, which operates under a slightly altered definition of neighborhood using axis-aligned bounding boxes. The box-based approach improves clustering performance for geometrically structured data and introduces new ways to identify core regions without relying on exhaustive point-wise comparisons.Item Reducing Attention Complexity in Graph Transformers through Subgraph Partitioning(Indian Statistical Institute, Kolkata, 2025-06) Choubey, Ranjan KumarThis dissertation addresses the challenge of scaling Graph Transformers by proposing a subgraph-based strategy to reduce attention complexity. The proposed framework preserves representational power while making attention computation tractable for largescale graphs. The method begins by partitioning the input graph into K subgraphs using the METIS algorithm. Each subgraph is encoded using a combination of local structural features from a Graph Convolutional Network (GCN) and global positional cues from Laplacian Positional Embeddings (LPEs). These embeddings are fused via a trainable projection function to form subgraph tokens. A supergraph is constructed to model interactions among subgraphs, allowing attention to be applied over a K × K matrix instead of the full n × n space, thereby reducing complexity from O(n2) to O(K2). Finally, a component-aware prediction strategy maps subgraph-level predictions to individual nodes using learned weights and regularization. Empirical evaluations demonstrate that the framework delivers higher accuracy, improved convergence, and scalability across diverse benchmark datasets.Item Efficient Blending of Large Language Models(Indian Statistical Institute, Kolkata, 2025-06) Chatterjee, SandeepDue tothelimitedcapabilitiesofsingleLargeLanguageModels(LLMs),multipleLLMscanbe employedintandemforbetterreliabilityofanswers.Blendingreferstocombiningthestrengths of variousLLMstomakeuseoftheircomplementarycapabilitiesforgeneratinghigh-quality responses.Itisanon-trivialproblem,andthetaskbecomesevenmoredifficultwhenaiming for minimallatencyandsupervisingtheblendingcomponents.Thestandardframework,LLM- Blender, approachesthisinthreestages:responsegeneration,candidateselectionviaranking, and responsefusionthroughsummarization.However,thispipelinefacestwocriticallimita- tions—high latencyduetorepeatedrankingsteps,andheavyrelianceonexternal,supervised componentsincludingalearnedencoderforrankingandaseparatesequence-to-sequencesum- marizer forfusion. In thisthesis,weproposenovel,efficientalternativestoovercomethesechallenges.Thisthesis comprises twoworks.First,weshowthatreducingthefrequencyofrankingwithinmulti- turn conversationssignificantlyimproveslatencywithminimaldegradationinoutputquality. Second, weintroduceapeer-review-basedresponsefusionmechanism,whereLLMscollectively evaluateandreviseeachother’sresponses,removingtheneedforanyexternallytrainedrankers or summarizers.Thiscollaborativemethodenablesfullyself-containedLLMblendingwithout additional trainingorsupervision. WeassessourproposedmethodsonthetaskofConversationalQuestionAnsweringacrossfive multi-turnconversationalbenchmarks—ConvQuestions,Atlas-Converse,CoQA,QuAC,and DoQA—using tendiverse,publiclyavailableopen-weightLLMs.Experimentalresultsdemon- strate thatourpeer-review-drivenframeworkwithreducedrankingachievesqualityonparwith existing approacheswhilebeingsubstantiallymoreefficient.Ourworkpresentsasteptoward scalable, modularLLMensemblingforreal-worldopen-domaindialoguesystems.Item Explanation and Judgement of IR Ranking using LLM(Indian Statistical Institute, Kolkata, 2024-06) Mondal, SantanuPretrained transformer models such as BERT and T5 have significantly advanced the performance of information retrieval (IR) systems when fine-tuned with large-scale labeled datasets. However, their effectiveness diminishes notably in low-resource scenarios where annotated query-passage pairs are limited. This thesis explores an alternative supervision strategy by leveraging natural language explanations to enhance training signals during fine-tuning. We propose a novel methodology that augments traditional relevance labels with textual explanations generated by a large language model (LLM) using few-shot prompting. To achieve this, we generate explanations for 30,000 query-passage-label triples from the MS MARCO dataset using the open-source model google/gemma-2b, allowing for cost-free and scalable inference. These augmented samples are then used to fine-tune a T5-base sequence-to-sequence model, with the objective of producing both the relevance label and an accompanying explanation. During inference, the model predicts the label token, and the probability of that token is used as a soft relevance score, enabling efficient ranking. Empirical results demonstrate that our explanation-augmented retriever outperforms strong baselines, including BM25, a BERT reranker, and a T5 model trained with labels only. We further analyze the effectiveness of explanation order, training data size, and the quality of generated rationales. Our findings suggest that natural language explanations offer a powerful form of supervision, particularly valuable in data-scarce IR settings, and present a compelling direction for improving neural retrievers with minimal annotation overhead.Item Modeling and Verification of Sigma Delta Neural Networks(Indian Statistical Institute, Kolkata, 2025-06) Das, SirshenduIn the context of modern day embedded safety-critical systems and low-resource edge devices in particular, Sigma-Delta Neural Networks (SDNNs) offer a promising alternative to traditional Artificial Neural Networks (ANNs) by leveraging eventdriven, sparse computations inspired by biological neural processing. This energyefficient paradigm makes SDNNs well-suited for neuromorphic hardware and realtime applications, particularly in scenarios with temporal redundancy, such as video processing. However, as neural networks become integral to safety-critical systems, ensuring their robustness against adversarial perturbations is an absolute necessity. In this work, we propose an end-to-end framework for formal modeling and verification of SDNNs using Satisfiability Modulo Theory (SMT). Unlike empirical robustness evaluations, SMT-based verification provides formal guarantees by encoding SDNN behavior and adversarial robustness properties as mathematical constraints. We introduce an SMT-based formulation for encoding SDNNs with SMT constraints and define a robustness property motivated by video stream processing. Our approach systematically examines how well SDNNs can handle adversarial attacks, ensuring they work correctly in safety-critical applications. We validate our framework through experiments on temporal version of the MNIST dataset. To the best of our knowledge, this is the first formal verification framework for SDNNs, bridging the gap between neuromorphic computing and rigorous verification. We also focus on applying the proposed SDNN verification methodology to a real-world deep learning system– PilotNet, an end-to-end model for steering angle prediction in autonomous vehicles.Item Geometry Based UAV Trajectory Planning for Mixed User Traffic in mmWave Communication(Indian Statistical Institute, Kolkata, 2025-06) Hasan, Sk AbidUnmanned aerial vehicle (UAV) assisted communication is a revolutionary technology that has been recently presented as a potential candidate for beyond fifth-generation millimeter wave (mmWave) communications. Although mmWaves can o↵er a notably high data rate, their high penetration and propagation losses mean that line of sight (LoS) is necessary for e↵ective communication. Due to the presence of obstacles and user mobility, UAV trajectory planning plays a crucial role in improving system performance. In this work, we propose a novel computational geometry-based trajectory planning scheme by considering the user mobility, the priority of the delay sensitive ultra-reliable low-latency communications (URLLC) and the high throughput requirements of the enhanced mobile broadband (eMBB) traffic. Specifically, we use some geometric tools like Apollonius circle and minimum enclosing ball of balls to find the optimal position of the UAV that supports uninterrupted connections to the URLLC users and maximizes the aggregate throughput of the eMBB users. Finally, the numerical results demonstrate the benefits of the suggested approach over an existing state of the art benchmark scheme in terms of sum throughput obtained by URLLC and eMBB users.Item Understanding Batch-Normalization in Deep Neural Networks(Indian Statistical Institute, Kolkata, 2025-06) Srujan, Pendyala SaiBatch Normalization (BN) is a commonly used technique in various deep learning architectures for tasks such as image classification and object detection. It stabilizes and accelerates training by normalizing the activations of intermediate layers using mean and variance of the batch, allowing the use of higher learning rates and often improving generalization through implicit regularization. During inference, BN uses running estimates of batch statistics accumulated during training. However, if individual batches are not representative of the overall data distribution, these accumulated statistics may not accurately approximate the population statistics. This discrepancy can lead to a phenomenon known as **estimation shift**, which impairs the model’s generalization performance. In this project, we study the behavior of estimation shift in deep learning models using BN and explore techniques to mitigate its effects. Specifically, we introduce **dynamicity** in the momentum parameter of BN layer (DMBN) while computing exponential moving averages and evaluate its impact under various architectural configurations. We use MNIST, FashionMNIST, and CIFAR-10/100 datasets to train and test both simple Deep Neural Networks (DNNs) as well as deeper Convolutional Neural Networks (CNNs) such as ResNet-50. Our experiments are conducted in two phases: first, by varying the static momentum parameter across different values, and second, by introducing layer-wise dynamic momentum where each layer is assigned the momentum (or equivalently, β) that minimizes estimation shift. The performance of the proposed method, DMBN, is evaluated using various performance metrics such as sensitivity, specificity, accuracy, and F-score. The DMBN is compared with existing BN-BFN method and is observed to be performing better in most of cases. For example, for fashionMNIST data, the accuracy values achieved by DMBN and BN-BFN are 0.889 and 0.853, respectively.Item Binary Document Filtering for Retrieval-Augmented Generation(Indian Statistical Institute, Kolkata, 2025-06) Saha, SreyanRetrieval-Augmented Generation (RAG) has become a popular technique to enhance Large Language Models (LLMs) with access to external information sources. However, the success of RAG systems critically depends on the relevance and quality of the retrieved documents. In particular, supplying irrelevant or noisy context can lead to degraded downstream generation quality. To address this, our project focuses on improving the document filtering stage in a RAG pipeline through binary relevance classification — deciding whether a retrieved document is suitable to include in the final context window based on its usefulness in directly answering the user query. We explore a wide range of approaches to this task, including rule-based retrieval methods (TF-IDF, BM25), classical machine learning classifiers (logistic regression, SVM), deep neural networks, and LLM-based methods, both in zero-shot and few-shot settings. Our final pipeline leverages instruction-tuned LLMs to act as strict binary classifiers, with a focus on maximizing precision over recall, thereby ensuring that only the most relevant and high-quality documents are passed to the generation module. Experiments are conducted on a Reddit-based query-document dataset tailored to subjective and opinion-heavy queries. Our evaluations suggest that LLMs, even without fine-tuning, can outperform traditional methods in this setting, o”ering a strong foundation for further enhancement through supervised fine-tuningItem Detection of Fake News in Short Videos: A Multimodal Approach(Indian Statistical Institute, Kolkata, 2025-06) Kumari, MonaThe rise of generative models and affordable video editing tools has fueled the spread of fake and manipulated videos, undermining information reliabilityespecially on social media. Traditional detection methods, focused on single modalities like visual artifacts or text cues, often struggle with diverse, user-generated content. This dissertation presents a unified framework for fake video detection that integrates multimodal semantics, narrative structure, and propagation behavior. Visual, audio, text, and OCR features are extracted using pretrained models (CLIP, Wav2Vec2), and segment-level graphs are built to model narrative flow using Graph Attention Networks (GATv2Conv). User engagement dynamics are modeled via a bidirectional LSTM. A cross-modal consistency loss encourages semantic alignment across modalities, improving representational coherence. The end-to-end model is evaluated on heterogeneous datasets like FakeTT, demonstrating strong generalization and robustness. Results show the proposed system outperforms existing baselines, especially in challenging cases with asynchronous or fragmented content. By combining content, structure, and behavioral cues, the framework enables more reliable and interpretable fake video detection.
