Dissertations - M Tech (CS)

Permanent URI for this collectionhttps://dspace.isical.ac.in/handle/10263/2147

These Dissertations were submitted in partial fulfilment of the requirements for the award of M TECH (Computer Science) Degree of Indian Statistical Institute

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    Item
    Reproducing and Analyzing the “Lost in the Middle” and “The Power of Noise” Phenomenon in Retrieval-Augmented Generation
    (Indian Statistical Institute, 2026-06-16) Samanta, Kousik
    Retrieval-Augmented Generation has become the way to improve Large Language Models. They help with problems like knowledge and hallucinations. Recent studies show that these models still have limitations. One big problem is the “Lost in the Middle” phenomenon. Models can’t access information in the middle of contexts properly. Another counterintuitive observation is the “Power of Noise” paradigm, which suggests adding unrelated documents can actually make the generation better. We know these happen in extractive QA tasks, but we don’t know if they happen in tasks that need complex reasoning. This dissertation looks into how position and noise affect Long-Form Question Answering. We use the ELI5 dataset and test three models. We give them varying amounts of context and see how they do. We also change the location of the correct information and add distracting or random information to observe the effects of these perturbations. Traditional metrics for evaluating model-generated answers aren’t very effective for long-form responses. We introduce two new metrics of evaluation, Prop Score and Sentence Score. Our experiments give us three findings. First, the “Lost in the Middle” issue still happen to a certain degree in Long-Form QA. Second, we confirm that noise can actually improve generation. Third, we hypothesize the reasons of persistence of the “Lost in the Middle” phenomenon and the “power of noise” paradigm in Long-Form QA.