MUMBAI: Is there more to ghar than four walls and a roof ? Does
the word pyaar have a life outside mushy movie posters? Does khoon
mean different things to different people? These questions may sound
like the patter of vacuous veejays rather than the grist of
bespectacled academia.
But they are being posed by a group of computer scientists and
linguists seeking serious answers—not just from dictionaryspouting
scholars but from all Mumbaikars who speak Hindi.
As part of the Universal Networking Language Project—an
ambitious attempt by 18 countries to make computers multilingual —a
team of researchers at the Indian Institute of Technology (Powai)
are busy teaching computers Hindi and Marathi. Before these
tutorials can be delivered, however, they have to strip the language
down to its nuts, bolts and basic concepts.
And it is here that help is needed. “We will soon be putting up
a list of Hindi words and usages on the Net. As the language belongs
to the community at large, we hope people will vet them and make
suggestions,’’ says Pushpak Bhattacharyya, a computer science
professor who heads the Centre for Indian Language Technology
Solutions at IIT (Powai).
Adds Debasri Chakrabarti, a doctoral student in linguistics,
“Language undergoes daily change. New words and usages are not found
in dictionaries,which is why we want native speakers to
contribute.’’
Those who make their way to the centre’s webpage to discuss the
current meaning of mrig may not realise it, but they are part of a
worldwide movement to democratise technologies.
“We have to make computers speak our language,’’ says Jitendra
Shah, a professor at VJTI. Mr Bhattacharyya, who points to more than
80 per cent of Internet content being in English, adds, “People who
don’t know English are at a tremendous disadvantage. In 1996, the
United Nations initiated a project to overcome the language barrier
through machine translation. If it succeeds, it will be possible for
a Marathi-speaker to access English websites in his mother-tongue.
Or for me to send an e-mail in Bengali, which my friend in Tokyo
will read in Japanese.’’
This doesn’t sound too arduous a task for a machine that is
able to run a nuclear power plant and beat Kasparov in chess, except
that natural languages have always eluded the straitjacket of
mathematical formulae.
“When the idea of machine translation emerged in the ‘50s, it
was seen as a trivial problem which involved little more than
programming a bilingual dictionary,’’ says Mr Bhattacharyya. But
this simplistic notion was soon dispelled. According to a famous
story, the English sentence ‘The spirit is willing but the flesh is
weak’ was fed into the computer, translated into Russian, and then
back into English.
What emerged was ‘The alcohol is strong but the meat is
rotten’, he says. Half a century later, computers are still unable
to grasp the subtle difference between ‘I saw the boy with the
telescope’ and ‘I saw the boy with the bat’. Indeed, to explain
‘childish’ and ‘childlike’ to this most literal of machines is a bit
like describing crimson and scarlet to a colourblind cow.
“Translation is an unbelievably complex process, and how the human
mind functions during translation is still unknown,’’ says Milind
Malshe, an IIT professor and well-known translator.
Adds Mr Bhattacharyya, “Natural languages are rich in
ambiguities and implications —which computers are unable to handle.
So, for example, our system finds technical documents simple to
translate, but not childrens’ stories.’’
At the heart of this evolving system is Universal Networking
Language—a techno-Esperanto which serves as a steppingstone in the
translation process. Take, for example, a document that needs to be
translated from English to Hindi. “The computer converts English
into UNL, and then UNL into Hindi,’’ explains Mr Bhattacharyya,
adding that Japanese, Indonesian, Hindi, Arabic, Italian, French,
Spanish and Portuguese have all been successfully mated with
UNL.
“At IIT, we are focusing on Hindi, Marathi and English.
Incidentally, we are the only research group in the world converting
English into UNL. This is because countries like the US and England
don’t see machine translation as a priority. As far as they are
concerned, the rest of the world should learn English.’’
In an attempt to chop languages into bytesized pieces, the IIT
team has distilled 4,500 rules from Hindi. It is also erecting a
Hindi Wordnet—a complex scaffolding of words and related concepts.
“By pairing words with their synonyms, we avoid ambiguities arising
out of multiple meanings,’’ explains Mr Bhattacharyya, pointing out
that a computer confronted with ghar, for example, has no idea which
of the nine meanings to adopt. “However if ghar is paired with gruh,
it is clear that the word is being used in the astrological sense.
If it is paired with parivar, it refers to household.’’
These maps of words might well help to navigate the unspoken,
metaphoric depths of language. “Few countries will benefit as much
as India if this dream comes true,’’ points out Mr Bhattacharyya.
“After all, few countries have the number of languages and barriers
that we do.’’