A Generic Character Aligned Machine Transliteration System for Indic Languages

A Generic Character Aligned Machine Transliteration System for Indic Languages
Author :
Publisher :
Total Pages : 32
Release :
ISBN-10 : OCLC:860861728
ISBN-13 :
Rating : 4/5 (28 Downloads)

Book Synopsis A Generic Character Aligned Machine Transliteration System for Indic Languages by : Nikhil Londhe

Download or read book A Generic Character Aligned Machine Transliteration System for Indic Languages written by Nikhil Londhe and published by . This book was released on 2013 with total page 32 pages. Available in PDF, EPUB and Kindle. Book excerpt: A typical problem encountered in machine translation is the Out of Vocabulary (OOV) terms. These are usually names of places, people or technical terms that cannot be easily translated from one language to another or become obfuscated when translated. These end up as transliterated terms, i.e., a syllable or syllable group conversion from one language to another while trying to preserve the phonetic pronunciation. Although a large number of transliteration systems have been built over the years, they suffer from several problems. Firstly, any machine learning system is only as good as the underlying dataset used to train the system. For resource poor languages thus, either no such systems exist or perform extremely poorly. Secondly, most transliteration systems are over fitted to cater to the source language. However, with the proliferation of the Internet and the social media, language mixing is fairly common and most such systems fail if words derived from other languages are introduced. In this research, we aim to build better transliteration systems that can better model the language under consideration and incorporate additional features that can offset the over fitting problem described above. Also we explore how inherent language similarities can be used to bootstrap transliteration systems for resource poor languages. We explore how classical techniques in machine translation and information retrieval can be adapted to the problem in hand to build better and more robust systems.


A Generic Character Aligned Machine Transliteration System for Indic Languages Related Books

A Generic Character Aligned Machine Transliteration System for Indic Languages
Language: en
Pages: 32
Authors: Nikhil Londhe
Categories:
Type: BOOK - Published: 2013 - Publisher:

DOWNLOAD EBOOK

A typical problem encountered in machine translation is the Out of Vocabulary (OOV) terms. These are usually names of places, people or technical terms that can
Machine Translation and Transliteration involving Related, Low-resource Languages
Language: en
Pages: 215
Authors: Anoop Kunchukuttan
Categories: Computers
Type: BOOK - Published: 2021-09-08 - Publisher: CRC Press

DOWNLOAD EBOOK

Machine Translation and Transliteration involving Related, Low-resource Languages discusses an important aspect of natural language processing that has received
Machine Translation and Transliteration Involving Related and Low-resource Languages
Language: en
Pages: 0
Authors: Anoop Kunchukuttan
Categories: Computers
Type: BOOK - Published: 2021-08-12 - Publisher: Chapman & Hall/CRC

DOWNLOAD EBOOK

Machine Translation and Transliteration involving Related, Low-resource Languages discusses an important aspect of natural language processing that has received
Information Systems for Indian Languages
Language: en
Pages: 331
Authors: Chandan Singh
Categories: Computers
Type: BOOK - Published: 2011-02-28 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

This book constitutes the refereed proceedings of the International Conference on Information Systems for Indian Languages, ICISIL 2011, held in Patiala, India,
International Journal of Translation
Language: en
Pages: 300
Authors:
Categories: Translating and interpreting
Type: BOOK - Published: 2007 - Publisher:

DOWNLOAD EBOOK