Dynamic Language Tools is a bookmarklet application which I have developed, which helps users read and write Urdu easily on any web page without installing any software. This tool uses the Google Transliteration API to do the on-demand transliteration of the roman script to Urdu script. This tool also provides on the fly Hindi to Urdu transliteration on the web-pages making all the Hindi content (in the Devanagari script) readable to Urdu readers.
The reign of the English language over modern technology and the Internet may soon be at an end. Increasingly, local language technologies are emerging to challenge the role of English as the language of the web. Representing Urdu and other Pakistani languages at the forefront of this battle is the Centre for Research in Urdu Language Processing (CRULP). For Dr. Sarmad Hussain, founding director of CRULP, and his team, developing the capacity of local language processing is not merely an intellectual exercise in machine processing research but their contribution to the global struggle, which aims to provide every human access to information regardless of the language they speak. Like the translators of Al-Mamun, the eighth Abbasid caliph, who translated and protected many of the classics of Greek, Indian, Persian and Chinese scholarship from the ash-heap of history, the team at CRULP is working to bridge the disconnect that exists between the wealth of knowledge available on the Internet and the large non-English speaking segment of Pakistani society. While this team may not have royal patronage like the Abbasid translators, who were paid in gold equal to the weight of the books that they translated, the dissemination of knowledge and the legacy of scholarship team CRULP leaves behind will be invaluable. I recently visited the CRULP headquarters at National University of Computer and Emerging Sciences (NUCES), Lahore, where project manager Kiran Khurshid showed me around the CRULP lab and talked about the various projects currently in progress.
The overarching goal of CRULP is to develop local language processing technologies to provide people easy access to information regardless of the local language they speak. The traditional approaches to introducing technology into rural areas have involved providing schools and colleges with computers and expecting the locals to learn and adapt to modern technology. Dr. Hussein sees a fundamental flaw in this approach, in that they either fail to address or underestimate the two major barriers people face in using modern technology: illiteracy and language. With 45% of the population illiterate and most people unable to interact in English, it is impractical to expect them to use computers to access information through current technology. The team at CRULP aims to break the illiteracy barrier by developing Urdu Speech Recognition systems and Text to Speech systems to allow users to operate technology vocally. The language barriers are being tackled through the development of software in Urdu, examples of which include the SeaMonkey internet suite that provides users Urdu-based tools to make websites, surf the internet, email etc.
The vast majority of Pakistanis using the web are familiar only with English keyboards. Creating content in Urdu script is a slow and frustrating experience, as it requires either learning the Urdu keyboard layout, which is forced onto a keyboard designed for writing English, or using on-screen keyboards, which are useful but limited by the speed at which one can click the mouse. As a result, producing online content in Urdu script has mostly been limited to a small number of bloggers and commercial websites. For most users, writing Urdu using Roman script (transliteration) has become the main way of writing Urdu on computers. Transliteration is a technique that is used to do phonetic mapping of words written in one script (e.g. Arabic) to another script (e.g. Roman). For example, شکریہ transliterates into shukriya. While using Roman transliteration may be adequate for a lot of purposes (chatting), it leaves a lot to be desired from the perspective of people who prefer to read and write the language in its original script.