Google expands languages push in India to serve non-English speakers

Copy paste programmers

There are over 600 million internet users in India, but only a fraction of this population is fluent in English. Most online services and much of the content on the web currently, however, are available exclusively in English.

This language barrier continues to contribute to a digital divide in the world’s second largest internet market that has limited hundreds of millions of users’ rendition of the world wide web to a select few websites and services.

So it comes as no surprise that American tech giants, which are counting on emerging markets such as India to continue their growth. are increasingly attempting to make the web and their services accessible to more people.

Google, which has so far led this effort, on Thursday announced a range of changes it is rolling out across some of its services to make them speak more local languages and unveiled a whole new approach it’s taking to translate languages.

Product changes

Users will now be able to see search results to their queries in Tamil, Telugu, Bangla, and Marathi, in addition to English and Hindi that are currently available. The addition comes four years after Google added the Hindi tab to the search page in India. The company said the volume of search queries in Hindi grew more than 10 times after the introduction of this tab. If someone prefers to see their query in Tamil, for instance, they will be able to set Tamil tab next to English and quickly toggle between the two.

Getting search results in a local language is helpful, but often people want to make their queries in those languages as well. Google says it has found that typing in non-English language is another challenge users face today. “As a result, many users search in English even if they really would prefer to see results in a local language they understand,” the company said.

To address this challenge, Search will start to show relevant content in supported Indian languages where appropriate even if the local language query is typed in English. The feature, which the company plans to roll out over the next month, supports five Indian languages: Hindi, Bangla, Marathi, Tamil, and Telugu.

Google is also making it easier for users to quickly change the preferred language in which they see results in an app without altering the device’s language settings. The feature, which is currently available in Discover and Google Assistant, will now roll out in Maps. Similarly, Google Lens’s Homework feature, which allows users to take a picture of a math or science problem and then delivers its answer, now supports Hindi language.

MuRIL

Google executives also detailed a new language AI model, which they are calling Multilingual Representations for Indian Languages, that handles transliteration, spelling mistakes and other nuances of Indian languages.

The company said it trained the new model with articles on Wikipedia and texts from a dataset called Common Crawl. They also trained it on transliterated text from, among other sources, Wikipedia (fed through Google’s existing neural machine translation models). The result is it handles these languages better than previous, more general language models and can contend with letters or words that have been transliterated — i.e., they’re using the closest corresponding letters of a different alphabet or script.

Google executives said the previous model was not scalable. MuRIL signficantly outperforms the earlier model — by 10% on native text and 27% on transliterated text. MuRIL, which was developed by executives in India, is open source.

One of the many tasks MuRIL is good at, is determining the sentiment of the sentence. For example, “Achha hua account bandh nahi hua” would previously be interpreted as having a negative meaning, but MuRIL correctly identifies this as a positive statement. Or take the ability to classify a person versus a place: ‘Shirdi ke sai baba’ would previously be interpreted as a place, which is wrong, but MuRIL correctly interprets it as a person.

More to follow…