American Sign Language Recognition and Analysis Using Deep Learning

Soni, Saurabh Kumar

American Sign Language Recognition and Analysis Using Deep Learning

Files

Dissertation-Saurabh kumar soni.pdf (1.34 MB)

Plagiarism_CS2426.pdf (1.48 MB)

Date

2026-06-19

Authors

Soni, Saurabh Kumar

Abstract

In this work I build a system that recognizes isolated American Sign Language (ASL) words, and I use it to ask one fairly direct question: when training data is scarce, is it better to look at the video pixels or at the geometry of the signer’s body? To find out, I train two very different models on exactly the same clips. The first is appearance-based. Every frame is run through standard preprocessing and a ResNet50 backbone pre-trained on ImageNet, which turns it into a 2048-dimensional feature vector, and a Bidirectional LSTM then reads that sequence over time. The second model never sees a pixel. It works only on Media Pipe key points, the tracked coordinates of the body and the two hands, and feeds them to a Transformer encoder. So both models have to learn the same two things, the shape of the hands in each frame and the way those shapes move across frames, and both are trained, validated and tested under one identical protocol. What I care about throughout is a recognizer that is accurate but still light enough to be useful in practice, so it could eventually make communication a little easier between people who sign and people who do not.

Description

This dissertation has been completed under the supervision of Prof. Ujjwal Bhattacharya

Keywords

American Sign Language, Resnet, Transformer

Citation

48p.

URI

http://hdl.handle.net/10263/7738

Collections

Dissertations - M Tech (CS)

Full item page

American Sign Language Recognition and Analysis Using Deep Learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By