Skip to main content | Skip to Navigation
Indian Goverment
About WSI
Indic Task Force
Current verticles
Mobile Localization Guidelines
Best Practices for E-Governance Applications in Indian Languages
Other Standard Bodies
Awareness News
RTI Act - 2005
Indian Language Technology Proliferation and Deployment Centre
India National Portal
  Skip Navigation Links Print   Font increase   Font size reset   Font size decrease
WSI Header image

1. Common Locale Data Repository

CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR contains the following types of data:

  • Locale-specific patterns for formatting and parsing: dates, times, time zones, numbers and currency values
  • Translations of names: languages, scripts, countries and regions, currencies, eras, months, weekdays, day periods, time zones, cities, and time units
  • Language & script information: characters used; plural cases; gender of lists; capitalization; rules for sorting & searching; writing direction; transliteration rules; rules for spelling out numbers; rules for segmenting text into graphemes, words, and sentences
  • Country information: language usage, currency information, calendar preference and week conventions, and telephone codes
  • Other: ISO & BCP 47 code support (cross mappings, etc.), keyboard layout

Technology Development for Indian Languages (TDIL) programme ( of Ministry of Electronics and Information Technology are focusing on the development of futuristic technologies for all 22 constitutionally recognized languages and standardization issues which will act as enablers in deployment of these technologies. TDIL has initiated the process of building the locale data for Indian languages. This Common locale data repository (CLDR) is being maintained by the UNICODE consortium at the international level.

We invite you to fill the CLDR data in the respective language available in this data format so that this standard would also become a reference guide for localized application developers such as e-governance applications. The CLDR data of other languages is available at for your reference.

2. Web Payment

Web and Digital payment Web payment contains a set of rules for the execution of payment transactions that are followed by adhering entities (payment processors, payers and payees), where transactions take place over networks (such as the Web). Some digital payment schemes make use internally of payment instruments from other payment schemes.
The Web Payments ecosystem strives to support fundamental Web principles by:

  • Adhering to Web architecture fundamentals
  • Supporting network and device independence
  • Providing for payers and payees with differing physical and cognitive abilities
  • Being machine-readable where possible to enable automation and engagement of non-human entities

The WSI proposed work on localization of Web payments activity in Indian languages. The expert committee and the background document have been evolved. The following organizations are taking part in this activity:

  • NPCI
  • SBI
  • Canara bank
  • HDFC Pay Zapp
  • Bill Desk
  • TCS
  • ReBIT

The background document on Web payment

3. Media and Entertainment

The mission of the W3C Media and Entertainment Interest Group, formerly known as the Web and TV Interest Group, is to provide a forum for media-related technical discussions to track progress of media features on the Web within W3C groups and use of Web technologies by external organizations, and to identify use cases and requirements that existing and/or new specifications need to meet to achieve a tighter support of media services on the Web. The following community can participate in this group:

  • Vendors of Web Browsers which have TV related capability
  • TV broadcasters and TV service providers
  • TV and set-top boxes manufacturers
  • Companies that own solutions for the development of broadcasting, Web+TV and integration of them
  • Companies seeking to exploit integration of broadcasting and Web technologies
  • Software vendors or open source projects that currently offer XML-based languages for the description of handling broadcasting, or customers of those languages
  • Government organizations seeking to standardize the integration of broadcasting and Web technologies
  • Academic researchers with an interest in smarter integration of Web technologies, broadcasting and non-PC devices

The work has been evolved on Media & Entertainment group to identify the Indian languages requirements. The above community can participate in this initiative.

4. CSS & Digital Publishing

CSS is the abbreviation for Cascading Style Sheet. A style sheet simply holds a collection of rules that we define to enable us to manipulate our web pages. CSS can be applied to our web pages in many ways; however the most powerful way to employ CSS rules is from an external cascading style sheet. When used in this manner, the full power of CSS can be used to control the design and appearance of our work from a single controlling location, which makes it easy to update our site on a global basis. Each cultural community has its own language, script and writing system. In that sense, the transfer of each writing system into cyberspace is a task with very high importance for information and communication technology.

Current work based on Indic layout requirements in CSS & Digital Publishing are shown below :

1. First draft of Indic layout requirements

2. Minimum Requirements of E-publishing for Indic (pdf file) 926 KB

5. Mobile Technology

SMS standard

The mobile technology is an important means of communications today. With the accelerating growth of this technology in India, the number of subscribers from rural areas will grow manifold for the simple reason that English literacy is relatively low in rural areas. In other words, unless Indian language messaging support is improved significantly, a large number of subscribers will be deprived of the benefits of SMS.

In the Mobile technology, the multilingual data handling becomes vital across different layers. Any chosen encoding scheme for data transmission should consider the following:

1. The data encoding scheme should support all possible characters, character combinations in Indian Languages as per Unicode standard

2. There should be a provision to change languages within a single message.

3. The encoding should be flexible for future Unicode standard.

Currently prevalent 3 SMS encoding schemes in India are :

ISCII based 7-Bit encoding

7-bit default alphabets as per GSM standard


The GSM standard supports 7-bits default alphabet and UCS2. For Indian languages, these encodings have their own pros and cons; especially when it comes to number of characters, standard implementation etc. The 7-bit EA-ISCII is capable of handling all the intricacies of Indian languages but it lacks the flexibility and at present does not support all the Unicode characters.But adopting 7-bit standards to cater growing demands of Indian Languages will not make mobile devices truly localized for Indian languages.

Meetings based on SMS Standard for Indian languages

1st Meeting (pdf file) 142 KB

2nd Meeting (pdf file) 133 KB

3rd Meeting (pdf file) 130 KB

4th Meeting (pdf file) 345 KB

Consultation papers

1. Consultation paper for Mobile Manufacturers (pdf file) 865 KB

2. Consultation paper for Mobile Service Providers (pdf file) 652 KB

3. Consultation paper for VAS (pdf file) 895 KB

6. Semantic web

E-Goveranance in India has recently gained momentum through various National and State Level Mission Mode Projects, having the objective of better citizen centric services and long term vision of 'Digital Unite for All'. However the accessibility of data is still a major concern as most of the data are coupled with applications and not reusable for better planning and coordination. To overcome this barrier, National Data Accessibility Policy has been framed, which will enable use of open linked data and other semantic web technologies to create an ecosystem of open framework data publishing and accessibility. we shall eleucidate the present state of implementation of E-Governance in India and future direction towards Open Linked Data for reaching out towards masses.

1. Background paper on Semantic Web (pdf file) 144 KB

2. Presentation on NLP Interchange format(NIF) (pdf file) 1.15 MB

3. Research Paper on Semantic Web (pdf file) 334 KB

7. Internationalization teg set 2.0

ITS 2.0 is a framework to add metadata to Web content, for the benefit of localization, language technologies, and internationalization [1]. The Internationalization Tag Set (ITS) 2.0 addresses some of the challenges and opportunities related to internationalization, translation, and localization. ITS 2.0 in particular contributes to concepts in the realm of metadata for internationalization, translation, and localization related to core Web technologies such as XML.The ITS 2.0 specification both identifies concepts (such as “Translate”) that are important for internationalization and localization, and defines implementations of these concepts (termed “ITS dataInternationalization Tag Set (ITS).

ITS 2.0 requirements for Indian languages (pdf file) 267 KB

8. Speech Technology

Speech processing provides powerful capabilities for improving the interaction between humans and machines, and between humans using machines. Speech processing can also be enhanced with Natural Language Processing (NLP) technology to model the human capacity to comprehend and process the content of human language, and to enable translation of a spoken sentence from one language to another, and many other intelligent linguistic applications. Speech Tools help in a great extent for providing information access interface to differently-abled persons such as people with visual and cerebral disability. Since Speech Resources are the key building blocks for development of speech based systems, initiatives are being taken to develop speech resources for Indian Languages. To develop Speech Resources for synthesis, recognition and speaker identification, different standards and methodologies are required.

1. Pronunciation Lexicon Specification (PLS) Draft (pdf file) 493 KB

2. Part of Speech(POS) Draft Standard

3. Speech Synthesis Markup Language (SSML)requirements for Indic (pdf file) 74 KB

4. Report of Workshop on PLS for Indic languages (pdf file) 170 KB

9. WOFF(Web Open Font Format)

WOFF requirements for Indian languages (pdf file) 1.40 MB KB

10. Web Accessibility

1.Guidelines for Indian Government website

2. WCAG 2.0 Guidelines in Hindi(Contributed by Centre for Internet and Society)

11. Web & TV

TV is a mature but rapidly-changing market. With the advent of IP-based devices, connected TVs are progressing at a fast pace and traditional TV broadcasting is quickly evolving into a more immersive experience where users can interact with rich applications that are at least partly based on Web technologies. There is strong growth in the deployment of devices that integrate regular Web technologies such as HTML, CSS, and SVG, coupled with various device APIs. There is huge potential for the future to create an interoperable platform where Web and TV benefit from each other. For instance, TV comes with quality of services requirements for video delivery that would benefit video delivery on the Web, e.g. through the introduction of HTTP adaptive streaming mechanisms.

Background paper on Web & TV(pdf file) 588 KB

Valid XHTML 1.0 Transitional Valid CSS! Level A conformance icon, 
          W3C-WAI Web Content Accessibility Guidelines 1.0
Website Last Updated on : 08 Aug 2019