Editor's Note: Zia Syed was involved in the development of the tool at Google, working with Google India's Indic Transliteration team.

Google's New Urdu Transliteration Tool

The vast majority of Pakistanis using the web are familiar only with English keyboards. Creating content in Urdu script is a slow and frustrating experience, as it requires either learning the Urdu keyboard layout, which is forced onto a keyboard designed for writing English, or using on-screen keyboards, which are useful but limited by the speed at which one can click the mouse. As a result, producing online content in Urdu script has mostly been limited to a small number of bloggers and commercial websites. For most users, writing Urdu using Roman script (transliteration) has become the main way of writing Urdu on computers. Transliteration is a technique that is used to do phonetic mapping of words written in one script (e.g. Arabic) to another script (e.g. Roman). For example, شکریہ transliterates into shukriya. While using Roman transliteration may be adequate for a lot of purposes (chatting), it leaves a lot to be desired from the perspective of people who prefer to read and write the language in its original script.

Google recently launched an exciting solution that turns the transliteration problem on its head, reverse transliterating Roman script into Arabic script: http://www.google.com/transliterate/indic/Urdu.

Even though transliteration is much simpler than translation, there are several challenges a transliteration system must overcome. The source script may not allow the users to correctly produce the desired sounds, e.g., there is no equivalent sound for ت or ڑ in English. Secondly, even if the equivalent sounding letters exist, they may map to several letters in the target script, e.g., ‘s‘ can map to س ,ص, or ث. The vowels pose yet another challenge, for example an ‘a‘ can either map to a punctuation mark in Urdu, e.g., a zabar and not show up in the script, or it can map to ‘ا’, ‘آ’, ‘ع’, or ‘ء’, and be visible. Lastly, not all people write Urdu in Roman script using the same convention, e.g., some people use a ‘q‘ to indicate ‘ق’ and others use ‘k‘ to mean both ‘ق’ and ‘ک’. A good transliteration system has to overcome these problems to be usable.

The Google service is not yet perfect, but it uses a combination of techniques to disambiguate between many potential choices during transliteration. These techniques include the use of an Urdu dictionary to give more weight to valid Urdu words, hard coding common words/pronouns, and using machine learning on parallel Roman and Urdu transliterated texts to learn about the common character sequence mappings. It performs on-the-fly conversion of words to the Urdu script. Any mistakes can be fixed by either pressing backspace for the last written word, or by clicking on any word. To correct a mistake, users have the option to choose from a list of alternatives or enter the word manually by using an on-screen Urdu keyboard. Try writing Mustansar Husain Tarar and you will see that it does a fairly good job (the last word will need correction by pressing backspace). More detailed usage instructions can be found here.

This service is launched as a Google Labs project, which means that it is experimental and will undergo changes to keep improving its quality based on the user feedback. It has already been well received by the online Urdu community and will hopefully contribute towards significantly increasing the amount of online Urdu content.

31 Responses to “Google Launches Urdu Transliteration Tool”

  1. ghazanfar72 says:

    آگیا وہ شاہکار تھا جس کا انتظار میں یہ سطریں گوگل سے ہی لکھ رہا ہوں شکریہ گوگل،بس اایسا ہی کوی طریقہ اگر آف لآیِنے ہو جااے تو مزہ آجاے

    • Dr.Fareed Ali Shamsi says:

      Google transliteration into Urdu is indeed a wonder.But unfortunately one cannot save whatever is transliterated on the same page.One has to copy and paste.This is complicated and most irritating

  2. Faisal Azeem says:

    بھی واہ ! یہ ہوئی نا بات ! اب اردو کو اردو میں تو لکھ سکیں گے

  3. yaser says:

    Have you guys thought about using an online learner to learn from the corrections people make? That would allow you to automatically improve the system over time.

    • Zia Syed says:

      Yaser, We do analyze corrections made by the users and we already know some common mistakes made by the current system. Some of these mistakes will be corrected in the next iteration and the process will continue. But yes, its not online or real-time (assuming that is what you meant) but it doesn’t mean we haven’t experimented with that approach.

  4. SM Fahad says:

    مزہ آگیا بھی کیا بات ہیں گوگل کی

  5. Omar Javed says:

    The ICT R&D fund is soliciting proposals for a content creating competition in local languages in Pakistan (http://www.ictrdf.org.pk/). I think Google’s transliteration tool can play a major role here. Zia, do you think it will be possible for Google to help organize such a competition in Pakistan?

    here is the abstract…
    “Proposals are invited from IT institutions,and program management companies to design, develop and execute a nation-wide student competition for content creation in local and national languages. Through this competition we want to introduce our youth to opportunities provided by connectivity to cyberspace.”

    • Zia Syed says:

      Omer, It is great that the ICT R&D fund is trying to facilitate efforts for local content creation and it is definitely something that will directly benefit Pakistanis and indirectly benefit Google. Google has been trying to make it easy for people to create content in the local languages but it is up to us now to make best use of these tools. So it is hard for me to say that if Google will help in organizing such a competition. But I am personally interested in knowing how this effort moves forward and see if I can be of any help. You can get in touch with me at http://tinyurl.com/n8tyae and we can continue this discussion there.

  6. Nisar Ahmed says:

    I appreciate Google’s effort for Urdu Transliteration. However, I have noticed that Urdu option is not available if you want to use it on other websites (e.g. by placing a tool on your toolbar). Interestingly, all other languages e.g. Marathi, Gujrati and other less known South Indian languages have this functionality. I was wondering when Google would be able to add this functionality to Urdu?

    • Zia Syed says:

      Nisar, The feature to use Urdu transliteration on any website will be launched soon. There are several other changes in plans too and they will be launched incrementally.

  7. Syed Muhammad Tanveer says:

    It is good, and much appreciated. It requires practice to overcome the difficulties, otherwise it is a good approach to send and express oneself in Urdu or in any other language.

  8. Syed Muhammad Tanveer says:

    It is good to know how to express ourselves in our own language.

  9. Zia Syed says:

    The transliteration bookmarklet is now available for Urdu. Which means that now you can use Urdu transliteration in any text box on any web page. This includes chatting and writing emails (in Plain Text mode only) in GMail. Please follow these instructions for installing the bookmarklet:

    http://t13n.googlecode.com/svn/trunk/blet/docs/help_ur.html

    Note that the rich text support is still not available (for Urdu or any other language). For GMail users, this means that you have to click on “Plain Text” while composing a message if you are in the Rich Formatting mode.

  10. mf_s@hotmail.com says:

    can reach the website now. I don’t know what the problem is.

  11. I have developed a bookmarklet application using the Googgle Trasnliteration API and with this you can read and write Urdu on any web page. Try this at:
    http://syedgakbar.co.cc/products/web/

  12. syed shah says:

    گوگل واقعی میں بے مثال ہے اس لیئے یہ دنیا کا نمبر ویب سائٹ ہے .گوگل واقعی میں کما ل ہے

  13. Indscribe says:

    Zia Sahab.

    Hamara salaam qubul farmaiye. It was a colossal job. It is people like you who do work siletly, without media attention, and help lots of Urdu lovers like us. You, your team and of course Google deserve a standing ovation.

    Tashakkur

  14. Raan Jee says:

    Wah je wa google aya te cha gaya ay

  15. gulzar ahmad says:

    thank you bare madad mali ha

  16. ناصر دین says:

    گوگل گوگل ہے

  17. [...] read and write Urdu easily on any web page without installing any software. This tool uses the Google Transliteration API to do the on-demand transliteration of the roman script to Urdu script.  This tool also provides [...]

  18. MUHAMMAD FAHAD says:

    گوگل نے تو کمال کردیا ہے واقعی بہت اچھا پیشکش ہے بہت ہی شاندار

  19. اسلم says:

    یار آپ لوگ سیدھے سیدھے اردو میں کیؤں نہیں لکھتے

  20. اسلم says:

    میرا مطلب یونیکوڈ اردو میں ،،، اور فانیٹیک کی بورڈ کے ساتھ

  21. khan says:

    thanka my dear google

  22. غلام نبی پرواز says:

    اردو کے حرف “ہ ” کی درمیانی صورت میں “ہ” کے نیچے ایک شوشہ لگا دیا جاتا ہے جو کہ غلط ہے جیسے “کہا” میں . مہربانی کر کے اسے صحیح کریں
    مثال کے طور پر ‘اخبار جہاں ‘ کا سرورق دیکھیں اور اس میں جہاں کی “ہ” دیکھیں وه بلکل صحیح لکهی ہوئی ہے۔

  23. Dr.Fareed Ali Shamsi says:

    I am unable to transliterate the Urdu word Baais.Can anyone kindly help ?

  24. Very good post. I absolutely love this website. Thanks!

    My page Air conditioners nyc

Discuss

  • STEP aspires to be the central place for discussion on improving the state of Science, Technology, and Education in Pakistan. Read More
  • To learn how you can contribute, click here
  • Never miss a new article! Choose your favorite method to stay up to date with STEP
  •            

Recent Comments