Dissertations - M Tech (CS)
Permanent URI for this collectionhttps://dspace.isical.ac.in/handle/10263/2147
These Dissertations were submitted in partial fulfilment of the requirements for the award of M TECH (Computer Science) Degree of Indian Statistical Institute
Browse
2 results
Search Results
Item Reproducing and Analyzing the “Lost in the Middle” and “The Power of Noise” Phenomenon in Retrieval-Augmented Generation(Indian Statistical Institute, 2026-06-16) Samanta, KousikRetrieval-Augmented Generation has become the way to improve Large Language Models. They help with problems like knowledge and hallucinations. Recent studies show that these models still have limitations. One big problem is the “Lost in the Middle” phenomenon. Models can’t access information in the middle of contexts properly. Another counterintuitive observation is the “Power of Noise” paradigm, which suggests adding unrelated documents can actually make the generation better. We know these happen in extractive QA tasks, but we don’t know if they happen in tasks that need complex reasoning. This dissertation looks into how position and noise affect Long-Form Question Answering. We use the ELI5 dataset and test three models. We give them varying amounts of context and see how they do. We also change the location of the correct information and add distracting or random information to observe the effects of these perturbations. Traditional metrics for evaluating model-generated answers aren’t very effective for long-form responses. We introduce two new metrics of evaluation, Prop Score and Sentence Score. Our experiments give us three findings. First, the “Lost in the Middle” issue still happen to a certain degree in Long-Form QA. Second, we confirm that noise can actually improve generation. Third, we hypothesize the reasons of persistence of the “Lost in the Middle” phenomenon and the “power of noise” paradigm in Long-Form QA.Item Neural Machine Translation for Indian Sign Language(Indian Statistical Institute, Kolkata, 2020-07) Moon, Sushant SharadSign languages being the primary language of the deaf community, researchers from many elds have been working in this domain from the past two decades. Until now, the majority of the work was in Sign Language Recognition. And only recently, few methods on Sign Language Translation have been developed, but even today, there does not exist any work on Indian Sign Language Translation. This work aims to translate Indian sign language videos to their corresponding spoken Indian English sentences. In this work, we are publicly releasing the rst of its kind Indian Sign Language Translation dataset, namely, the ISI-ISL-DDNEWS-2020T that we collected and annotated. Our dataset has >3 Million sign language frames, which translate to >93 Thousand words made out of >6 Thousand vocabulary words in spoken Indian English language. We also formalize a neural machine translation system trainable end-to-end for Indian Sign Language and benchmark on the said dataset. The model jointly learns the spatial & temporal relationship, underlying language model, and the sign & spoken language alignment. This baseline model gives the translation a BLEU-4 score of 4.02.
