Unsupervised Machine Translation For Indian Languages Using Monolingual Corpora

dc.contributor.authorChoudhary, Sudeep
dc.date.accessioned2022-02-03T06:44:08Z
dc.date.available2022-02-03T06:44:08Z
dc.date.issued2019-07
dc.descriptionDissertation under the supervision of Dr. Utpal Garainen_US
dc.description.abstractMachine translation has traditionally relied on parallel data but the amount of parallel data available for Indian languages is very less . The parallel data for Hindi-Marathi translation is around 50000 sentences which is very less in terms of data set required for supervised machine translation. But the good news is that monolingual data is very easy to find for this low-resource Indian languages .The aim of this project is to investigate whether it is possible to learn without the help of any parallel data . To serve the purpose we have implemented a model that takes sentences from two different monolingual corpora of different languages and maps them into the same latent space. We can encode sentences into the same latent space and can translate into any of the required languages . In this way, the model effectively learns to translate (encode/decode) without any form of supervision .The model only relies on monolingual corpora of two different languages and in our case it is Hindi and Marathi .The BLUE scores achieved by the model for Hindi to Marathi is 18.40 and Marathi to Hindi is 22.84 on the FIRE data set without using a single parallel sentence at training time. iiien_US
dc.identifier.citation28p.en_US
dc.identifier.urihttp://hdl.handle.net/10263/7265
dc.language.isoenen_US
dc.publisherIndian Statistical Institute,Kolkataen_US
dc.relation.ispartofseriesDissertation;;2019-17
dc.subjectMachine Translationen_US
dc.subjectDeNoising Auto-Encodersen_US
dc.titleUnsupervised Machine Translation For Indian Languages Using Monolingual Corporaen_US
dc.typeOtheren_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CS1727_thesis_Sudeep_Choudhary.pdf
Size:
472.14 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: