Dissertations - M Tech (CS)

Permanent URI for this collectionhttps://dspace.isical.ac.in/handle/10263/2147

These Dissertations were submitted in partial fulfilment of the requirements for the award of M TECH (Computer Science) Degree of Indian Statistical Institute

Browse

Search Results

Now showing 1 - 10 of 598
  • Thumbnail Image
    Item
    A Switch-Point-Aware Contrastive Approach to Sentiment Analysis of Hinglish Code-Mixed Text
    (Indian Statistical Institute, 2026-06-15) Sahoo, Prasant Kumar
    With the increasing use of social media in non-English-speaking regions, especially in India, people often use Romanized Hindi and English together in their online communication. In a single sentence, they frequently mix Romanized Hindi and English, creating code-mixed text. However, most multilingual transformer models are pre-trained primarily on monolingual data. As a result, NLP systems face challenges when processing code-mixed text, as a single word may be fragmented into meaningless subword pieces, making it difficult for the model to capture its semantic meaning accurately. In this dissertation, we propose a parameter efficient neural architecture consisting of three main components to address these challenges: First, there is a character-level CNN encoder, which handles spelling differences such as "nahi", "nahin", "nah", and "nai" through the chracter n-gram pattern. Next, there is a frozen XLM-R backbone(Conneau et al., 2019) , the top three layers, which are partly fine-tuned at a slower rate by which it provides rich cross lingual embeddings. Finally, there is a switch-point-aware bilingual gate that spots where the language label switches and blends two adapters using a learned gate weight.During training, it uses Supervised Contrastive Loss to learn better feature representations and Cross-Entropy Loss for classification. Since human annotators agreed on labels only 55% of the time, we use label smoothing to reflect this uncertainty and prevent the model from becoming overly confident in noisy labels. Evaluated on the SentiMix 2020 benchmark(Patwa et al., 2020), our proposed architecture achieves a weighted F1 score of 0.705, which outperforms the baseline model M-BERT (0.654 F1) and is comparable to fully fine-tuned transformer models while requiring only one-tenth of the trainable parameters.Adapter gate visualizations provide interpretable evidence that the gating mechanism captures linguistically meaningful codemixing structure. The architecture is designed to generalize to other code-mixed language pairs through its modular adapter design.
  • Thumbnail Image
    Item
    A Study of Prompt Tuning on Small Language Models(SLMs): A Controlled Benchmark and a Lightweight Instance-Aware Method
    (2026-06-16) Sahith, Narkadamilli
    Parameter-efficient fine-tuning (PEFT) adapts a frozen pre-trained language model by training only a small number of additional parameters. Among PEFT approaches, prompt tuning prepends trainable continuous vectors (soft prompts) to the input. A recurring finding in the literature is that prompt tuning is strongly scale dependent: it rivals full fine-tuning on very large models but lags on smaller ones. This dissertation studies prompt tuning specifically in the small-language-model (SLM) regime. We (i) re-implement a representative set of prompt-tuning methods—Prompt Tuning, P-Tuning v2, LoPT, DPT, DePT, ACCEPT, Residual Prompt Tuning, and PARA—within a single controlled harness, enabling a fair head-to-head comparison against full fine-tuning; (ii) propose IA-DePT, a lightweight instance-aware extension of Decomposed Prompt Tuning that conditions the short soft prompt on each input through a small, zero-initialised gate; and (iii) extend the benchmark beyond a single backbone and task, evaluating the full method suite on six backbone/task settings that span encoder–decoder (t5-small), encoder-only (BERT-base, RoBERTa-base, ELECTRA-small), and decoder-only (DistilGPT-2) architectures across the GLUE/SuperGLUE tasks RTE, WSC, CB, COPA, WiC, and MRPC. On RTE with t5-small, IA-DePT is the strongest parameter-efficient method in our benchmark (55.6% single-seed accuracy) and improves over its own base, DePT, by 6.5 points (53.6% vs. 47.1%, mean over three seeds) while adding only ≈16.9k parameters—a total trainable footprint of 0.05% of the backbone. Because the gate degrades exactly to DePT at initialisation, the comparison is a clean single-variable ablation. The cross-architecture study shows that the instance gate improves on DePT in five of the six settings on each setting’s primary metric (it ties or marginally regresses only on WiC, where every PEFT method sits at chance), so the benefit is broad but not universal. Our analysis characterises the accuracy/parameter trade-offs across method families, the strong effect of task difficulty on the small-model regime, and the role of instance-conditioning, including an honest discussion of why many prompt-tuning methods remain close to the chance baseline at this scale.
  • Thumbnail Image
    Item
    Extending UniBreak: Semantic Retrieval and Harmful-Intent Direction Suppression for Token-Level LLM Jailbreaking
    (Indian Statistical Institute, 2026-06-15) Saha, Sanket
    Token-level adversarial perturbations remain one of the most efficient known attacks against the safety alignment of instruction-tuned large language models (LLMs). Among recent works, the UniBreak framework (You et al., 2026) stands out for unifying gradient-based optimization with an evolutionary perturbation repository. However, its repository relies solely on accumulated success frequency without utilizing query content, and its fitness function implicitly assumes that suppressing refusal tokens is sufficient to elicit harmful responses. In this dissertation, we extend UniBreak along both axes and re-evaluates the framework under stricter generalization and judgment protocols. Specifically, we introduce a semantic perturbation repository that replaces frequency-only repository retrieval and geometric interpolation between historical frequency and sentence-encoder cosine similarity. Furthermore, we use Harmful-Intent Direction Suppression (HIDS) to augment the fitness function by explicitly penalizing the model’s residual-stream projection onto a validated harmful-intent direction. To isolate genuine cross-query generalization from within-dataset memorization, we introduce a two-phase frozen-repository evaluation protocol. Results are evaluated under two complementary judges: a binary classification judge and a 0-10 actionability scoring judge.The scoring judge itself is subsequently analysed through Grad×Input attribution.
  • Thumbnail Image
    Item
    American Sign Language Recognition and Analysis Using Deep Learning
    (2026-06-19) Soni, Saurabh Kumar
    In this work I build a system that recognizes isolated American Sign Language (ASL) words, and I use it to ask one fairly direct question: when training data is scarce, is it better to look at the video pixels or at the geometry of the signer’s body? To find out, I train two very different models on exactly the same clips. The first is appearance-based. Every frame is run through standard preprocessing and a ResNet50 backbone pre-trained on ImageNet, which turns it into a 2048-dimensional feature vector, and a Bidirectional LSTM then reads that sequence over time. The second model never sees a pixel. It works only on Media Pipe key points, the tracked coordinates of the body and the two hands, and feeds them to a Transformer encoder. So both models have to learn the same two things, the shape of the hands in each frame and the way those shapes move across frames, and both are trained, validated and tested under one identical protocol. What I care about throughout is a recognizer that is accurate but still light enough to be useful in practice, so it could eventually make communication a little easier between people who sign and people who do not.
  • Thumbnail Image
    Item
    Learning to See Lesions, Not Skin Tone: Counterfactual Multimodal Learning for Fair, Trustworthy, and Text-Free Dermatology AI
    (Indian Statistical Institute, 2026-06-16) Jangid, Shivam
    Recent advances in deep learning have significantly improved the performance of automated skin lesion classification systems, enabling accurate detection of various dermatological conditions from medical images. Despite these achievements, concerns regarding fairness and generalization remain a major challenge for the deployment of such systems in real-world clinical settings. A key factor contributing to these challenges is the presence of bias in training datasets, particularly with respect to skin-tone representation. Most publicly available skin lesion datasets contain a disproportionate number of samples from individuals with lighter skin tones. As a result, deep learning models trained on these datasets often learn representations that perform well for majority populations while exhibiting degraded performance on underrepresented groups. Such disparities can lead to unequal diagnostic outcomes and raise important concerns regarding the reliability and fairness of artificial intelligence systems in healthcare. The problem is especially relevant in the Indian context, where substantial diversity exists in skin tone, ethnicity, and geographical distribution. Variations in lesion appearance across di!erent skin types, combined with the limited availability of representative datasets, make it di”cult to develop models that generalize e!ectively to the broader population. Consequently, addressing dataset bias and improving model robustness across diverse demographic groups has become an important research objective in skin lesion analysis. Motivated by these challenges, this dissertation investigates fairness-aware approaches for skin lesion classification. The work focuses on the creation and analysis of diverse skin lesion datasets, the study of bias mitigation techniques, and the development of robust deep learning models capable of learning equitable representations. Furthermore, the dissertation explores invariant and equivariant learning paradigms as potential mechanisms for improving generalization across heterogeneous data distributions. To enhance model transparency and support trustworthy decision-making, explainability techniques are also examined to provide insights into the features utilized by the learned models during classification. Through these investigations, the dissertation aims to contribute toward the development of more reliable, interpretable, and fair deep learning systems for skin lesion analysis, with particular emphasis on improving performance across diverse skin-tone distributions.
  • Thumbnail Image
    Item
    Multi-Frequency Associative Memory for Continual Graph Learning through Nested Optimization
    (2026-06-16) Kundu, Shuvam
    Graph Neural Networks struggle to learn new tasks without forgetting old ones a problem known as catastrophic forgetting. In graph domains, this is compounded by structural shift, where newly added edges corrupt the learned representations of historical nodes even when model weights remain unchanged. We present CAM-Titans, a continual graph learning framework built around a two-buffer associative memory to address both parametric and structural forgetting. Our architecture operates across three timescales of adaptation: a slow base memory updated via ordinary gradient descent, an intermediate task buffer re-encoded after every task using the delta-rule, and a transient in-context state for rapid within-pass adaptation. To ensure historical class prototypes remain retrievable as the network backbone evolves, memory retrieval is anchored in a dynamically maintained prototype coordinate system. Furthermore, a cosine classifier mitigates magnitude imbalance, preventing older classes from dominating predictions. Empirical evaluations across diverse continual learning benchmarks demonstrate that CAM-Titans effectively mitigates catastrophic forgetting, achieving superior stability and accuracy in both Task-Incremental and Class-Incremental settings.
  • Thumbnail Image
    Item
    Reproducing and Analyzing the “Lost in the Middle” and “The Power of Noise” Phenomenon in Retrieval-Augmented Generation
    (Indian Statistical Institute, 2026-06-16) Samanta, Kousik
    Retrieval-Augmented Generation has become the way to improve Large Language Models. They help with problems like knowledge and hallucinations. Recent studies show that these models still have limitations. One big problem is the “Lost in the Middle” phenomenon. Models can’t access information in the middle of contexts properly. Another counterintuitive observation is the “Power of Noise” paradigm, which suggests adding unrelated documents can actually make the generation better. We know these happen in extractive QA tasks, but we don’t know if they happen in tasks that need complex reasoning. This dissertation looks into how position and noise affect Long-Form Question Answering. We use the ELI5 dataset and test three models. We give them varying amounts of context and see how they do. We also change the location of the correct information and add distracting or random information to observe the effects of these perturbations. Traditional metrics for evaluating model-generated answers aren’t very effective for long-form responses. We introduce two new metrics of evaluation, Prop Score and Sentence Score. Our experiments give us three findings. First, the “Lost in the Middle” issue still happen to a certain degree in Long-Form QA. Second, we confirm that noise can actually improve generation. Third, we hypothesize the reasons of persistence of the “Lost in the Middle” phenomenon and the “power of noise” paradigm in Long-Form QA.
  • Thumbnail Image
    Item
    Learning in Infants using Intrinsically Motivated Goal Conditioned Reinforcement Learning
    (2026-06-17) Darsan, T I
    Traditional artificial intelligence models learnby passively digesting large datasets. In contrast, human infants discover skills by actively interacting with their bodies and environments without explicit external rewards. This thesis introduces the Composer Architecture, a machine learning framework designed to mimic this autonomous, open-ended development. The Composer architecture operates in a multi-stage loop, the latent model using Contrastive Learning Through Time (CLTT) to compress high-dimensional raw data from visual, proprioceptive, and touch sensors into a low-dimensional space. To preserve data relationships and prevent topological collapse, a Softmax activation forces these latent representations to lie smoothly on a probability simplex. A goal-conditioned reinforcement learning policy then trains on this space by targeting randomly sampled one-hot goals. We evaluated the architecture on the MIMo platform, a highly realistic simulation of an 18-month-old child embedded in theMuJoCo physics engine. Testing progressed from primitive shapes to complex multi-joint control channels on the robot. On a single-finger setup, the architecture mapped latent extrema directly to full extension and flexion, while five-finger trials isolated whole-hand opening and closing configurations. To scale control, a hierarchical extension called Hi-Composer successfully coordinates complex limb motions by routing high level commands as sub-goals to localized lower-level finger policies. Finally, full body exploration benchmarks on a rollover task validated the "thin pancake" hypothesis. Compared to white-noise walks, the Composer architecture expands the agent’s spatial reach by 134 percent while successfully restricting local exploration to an organized, lower-dimensional submanifold. This demonstrates that self generated latent goals effectively guide open-ended exploration into structured movements.
  • Thumbnail Image
    Item
    Bias Before Generation: Attention-based Preemptive Fairness Signals in Large Language Models
    (Indian Statistical Institute, 2026-06-15) Das, Aniket
    Warning: This paper includes examples of language that may be perceived as inappropriate or offensive. Large language models (LLMs) are known to propagate social biases embedded in their training corpora, producing outputs that disproportionately disadvantage individuals based on sensitive attributes such as gender, religion, race, sexual orientation and nationality. Existing mitigation strategies are either computationally prohibitive, require access to model parameters, or apply corrections only after biased content has already been generated. This work addresses a different question: can the model’s own internal attention dynamics, observed at inference time, serve as a reliable early-warning signal for bias, enabling intervention before generation proceeds? We propose Bias Before Generation (BBG), an attention-based, trainingfree framework for preemptive fairness intervention in generative language models. BBG analyses three complementary attention-based signals during a single forward pass: Protected Attribute Attention, which quantifies the proportion of generative attention directed at protected demographic tokens; Attention Entropy, which captures the global dispersion of attention across the input; and the Identity-Conditioned Entropy Ratio (ICER), a novel metric that isolates the fraction of total attention entropy attributable to identity-bearing tokens, thereby distinguishing legitimate identity-aware discourse from stereotype-driven uncertainty. These three signals are combined into a weighted bias score, and prompts whose score exceeds a learned threshold receive an automatically prepended alert prefix that steers the model toward neutral reasoning before generation. The framework is evaluated on multiple open-weight LLM families across two standard fairness benchmarks: BBQ and CrowS-Pairs. Experimental results demonstrate consistent, statistically significant reductions in bias scores across all tested models and social-group categories, with minimal degradation in overall response quality. These findings indicate that attention-level signals offer a principled and computationally efficient basis for preemptive fairness intervention in generative language models. We hope this work opens further inquiry into inference-time pproaches for bias detection and mitigation.