Exploring Character-level Attacks on Neural Ranking Models
No Thumbnail Available
Date
2025-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Neural ranking models (NRMs) have achieved state-of-the-art performance in
information retrieval, yet they remain highly susceptible to subtle adversarial inputs
such as character-level typos. This project explores the robustness of such
systems by introducing a reinforcement learning (RL)-based query perturbation
framework. RL agents—PPO, DQN, and A2C—were trained to minimally
modify user queries (e.g., through character deletions or swaps) with the goal of
significantly altering the resulting document rankings, as measured by Kendall’s
Tau. Experiments were conducted on the TREC DL 2019 and 2020 benchmarks
using two different neural rankers: MiniLM and a fine-tuned CharacterBERT
model. The perturbation attacks were shown to succeed in over 85% of cases
for MiniLM and approximately 40% for CharacterBERT, indicating varying
degrees of vulnerability. To mitigate these effects, a set of pretrained query recovery
models—such as T5-large-spell, spelling-correction-base, and grammar
correction modules—were applied to restore the original query form. When
used in combination, these recovery mechanisms reduced the MiniLM attack
success rate to around 52%, demonstrating partial robustness. This study underscores
both the fragility of neural rankers to character-level noise and the
value of lightweight correction pipelines in improving retrieval resilience.
Description
Dissertation under the supervision of Dr. Debapriyo Majumdar
Keywords
Neural Ranking Models, Reinforcement learning (RL), TREC DL 2019, MiniLM, CharacterBERT model
Citation
45p.
