| Skip to main content | Skip to Navigation
Indian Goverment
 
What's New
Localisation Resources
Web Standardization Initiative (WSI)
Media Coverage
Success Stories
Messages
Report Language Computing Issues
Language Technology Players
Language Technology Products
Related Links
Frequently Asked Questions
RTI Act - 2005
 
Indian Language Technology Proliferation and Deployment Centre
India National Portal
  Skip Navigation LinksHome ->Research Efforts Print   Font increase   Font size reset   Font size decrease

Research Effort

 

1. Machine Aided Translation (MAT)

In Machine translation one natural language gets translated to another language using computational applications without real time human interface or with minimal human effort, the various software’s developed under the Machine Translation project are as follows:-

A) Development of English to Indian Languages Machine Translation System (Anuvadaksh)
Since majority of the Indian population could not read or write in English, while most of the information available on web or electronic media is in English language, therefore to reach out to the common man across various sections, an automatic language translator is important. Hence to begin with, two specific domains are identified as Tourism and Health for the machine translation .The project is being implemented in consortium mode and ten institutions are participating to build the system. Work on experimental Machine Translation System has been made available for following languages pair as technology demonstrator:

i) English to Hindi
ii) English to Marathi
iii) English to Bangla
iv) English to Odia
v) English to Tamil
vi) English to Urdu
vii)English to Gujarati
viii)English to Bodo

B) Development of English to Indian Languages Machine Translation (MT) System with Angla-Bharti Technology:
ANGLABHARTI represents a machine-aided translation methodology specifically designed for translating English to Indian languages. Angla-Bharti uses pattern directed approach using context free grammar like structures. It analyses English only once and creates an intermediate structure called PLIL (Pseudo Lingua for Indian Languages). The PLIL structure is then converted to each Indian language through a process of text-generation. There is a provision for automatic pre-editing & paraphrasing, recognition of named-entities and incorporated an error-analysis module and statistical language-model for automated post-editing. The purpose of automatic pre-editing module is to transform/paraphrase the input sentence to a form, which is more easily translatable. The project had being implemented in consortium mode with four institutions are participating to build the system. The languages pairs being targeted are English to Hindi/ Marathi/ Bengali/ Odia/ Tamil/ Urdu. Experimental Machine Translation System has been made available for following languages pair as technology demonstrator:

i) English to Bangla
ii) English to Punjabi
iii) English to Malayalam
iv) English to Urdu
v) English to Hindi
vi) English to Telugu


C) Development of Indian Language to Indian Language Machine Translation System (Sampark):
As India has 22 constitutionally recognised languages, Indian Language to Indian Language Machine Translation system (IL-ILMT) is an important application to convert text written in one Indian language to other Indian language. The project is being implemented in consortium mode and eleven institutions are participating to build it the system. Experimental Machine Translation System has been made available for following languages pair as technology demonstrator:
i) Hindi to Bengali
ii) Bengali to Hindi
iii) Hindi to Kannada
iv) Kannada to Hindi
v) Hindi to Marathi
vi) Marathi to Hindi
vii) Hindi to Punjabi
viii) Punjabi to Hindi
ix) Hindi to Tamil
x) Tamil to Hindi
xi) Hindi to Telugu
xii) Telugu to Hindi
xiii) Hindi to Urdu
xiv) Urdu to Hindi
xv) Malayalam to Tamil
xvi) Tamil to Malayalam
xvii) Tamil to Telugu
xviii) Telugu to Tamil
v) Marathi to Hindi
vi) Tamil to Hindi
vii) Tamil to Telugu
viii) Urdu to Hindi

2. Development of Cross-lingual Information Access (CLIA)

Cross-Language Information Access is an extension of the Cross-Language Information Retrieval paradigm. It enables a user to enter queries in languages they are familiar with, and uses language translation methods to retrieve documents originally created in other languages.
The objective of Cross-Language Information Access is to introduce additional post retrieval processing to enable users make sense of these retrieved documents. This additional processing may take the form of machine translation of snippets, summarization and subsequent translation of summaries and/or information extraction. The project is being implemented in consortium mode and eleven institutions are participating to build the system. At present, nine languages are being targeted under Tourism and Health domain:-

i) Assamese
ii) Bengali
iii) Gujarati
iv) Hindi
v) Marathi
vi) Odia
vii) Punjabi
viii) Tamil
ix) Telugu

3. Development of Robust Document Analysis & Recognition System for Indian Languages (OCR)

Optical Character Recognition (OCR) is a utility tool for digitizing the content and is essential for development of knowledge networks such as digital libraries. OCR technology offers the facility to scan and store the printed text. There are three basic elements of OCR technology - scanning, recognition and then reading text. The project is being implemented in consortium mode. Web based OCR for ten script/languages are hosted on TDIL-DC as technology demonstrator. Fourteen languages being targeted are:-
i) Assamese
ii) Bengali
iii) Bodo
iii) Devanagari
iv) Gujarati
v) Gurumukhi
vi) Kannada
vii) Malayalam
viii) Manipuri
ix) Marathi
x) Odia
xi) Tamil
xii) Telugu
xiii) Tibetan
xiv) Urdu

4. Development of On-line handwriting recognition system (OHWR)

On-line handwriting recognition system (OHWR) is a useful tool that converts the written strokes of an individual into editable text thus bypassing the need for a keyboard for text entry. There are seven institutions participating to build the On-Line Handwriting Recognition System. The seven scripts being targeted are:-
i) Assamese
ii) Bengali
iii) Devanagari
iv) Gurumukhi
v) Kannada
vi) Malayalam
vii) Tamil
viii) Telugu

5. Development of Text to Speech System for Indian Languages (TTS)

Consortium Mode Project has been initiated to develop Text-to-Speech system in 13 Indian Languages Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Marathi, Manipuri, Odia, Rajasthani, Tamil, Telugu and Indian English. The objective of the project is to develop and deploy Text to Speech system for visually challenged persons with JAWS product (For English) like functionality, which will be an application for benefit of social cause.

6. Development of Automatic Speech Recognition in Indian Languages (ASR)

Consortium Mode project has been initiated for development of Automatic Speech Recognition system for accessing prices of agricultural commodities through telephone channel As an interface on NIC website , which is multilingual and provides information on agricultural commodities .

7. Development of Sanskrit Machine Translation System

In India, there have been several efforts in the development of computational tools for Sanskrit. Under the leadership of University of Hyderabad a consortium Mode project has been initiated with the objective to develop Sanskrit computational tools and use them to develop machine translation technology from Sanskrit to Hindi.

Valid XHTML 1.0 Transitional Valid CSS! Level A conformance icon, 
		  W3C-WAI Web Content Accessibility Guidelines 1.0 Web Quality Certificate    
Website Last Updated on : 22 Apr 2022