| 
            
             The Internet today has to deal with multilinguality. People speak 
            different languages and the number of natural languages along with 
            their dialects is estimated to be close to 4,000. Of the top 100 
            languages in the world, English occupies the top position, with 
            Hindi coming fifth and Marathi fourteenth.  
            This is where UNL (Universal Networking Language) comes in. It is 
            a digital meta language for describing, summarizing, refining, 
            storing and disseminating information in a machine-independent and 
            human-language-neutral form. UNL represents information (ie, 
            meaning) sentence by sentence. Sentence information is represented 
            as a hyper-graph having concepts as nodes and relations as arcs. 
            This hyper-graph is also represented as a set of directed binary 
            relations, each between two of the concepts present in the sentence. 
            Concepts are represented as character-strings called UWs (Universal 
            Words).  
            The encoded UNL is used not only for machine translation, but 
            also for other document-processing activities. The encoding process 
            can be looked upon as the process of knowledge extraction. The 
            extracted knowledge is used for automatic hyper linking, summarizing 
            and categorizing of documents. 
            
              
              
                  |  
              
                | 
                   UNL can describe and 
                  disseminate information over the net irrespective of the 
                  language used by different 
            people  |   
            The UNL vocabulary consists of the following. 
            UWs (Universal Words): Labels that represent word 
            meaning 
            Relation Labels: Tags that represent the relationship 
            between UWs 
            Attribute Labels: Express additional information about the 
            UWs that appear in a sentence 
            A UNL expression can be seen as a UNL graph. For 
            example,  
            John, who is the chairman of the company, has arranged a meeting 
            at his residence. 
            The UNL for the sentence is 
            [S] mod(chairman(icl>post), company) aoj(chairman, 
            John) agt(arrange.@complete, John) pos(residence, 
            John) obj(arrange, meeting) plc(arrange, 
residence) [/S] 
            You can see the UNL graph for the sentence in the accompanying 
            picture. 
            In the above, agt means the agent, obj the object, plc the place, 
            aoj the attributed object and mod the modifier. The detailed list of 
            such relations can be found in the reference cited in th einbox next 
            page. Also the icl construct helps restrict the meaning of the word. 
            In the above we show only one example of such restriction, viz., 
            chairman(icl>post).  
            Conversion to and from UNL expressions Encoding into 
            UNL is first of all a parsing problem. The analysis process makes 
            use of a framework for morphological, syntactic and semantic 
            analysis synchronously. It analyses sentences by accessing a 
            knowledge-rich lexicon and interpreting the Analysis Rules, which 
            essentially capture the language phenomena. The process of 
            formulating the rules is programming a sophisticated 
            symbol-processing machine. Thus, the process of converting 
            natural-language sentences into UNL involves constructing analysis 
            rules and building a knowledge-rich lexicon linking the language 
            words with UWs covering the extremely varied language phenomena and 
            concepts. 
            
            Some examples of dictionary entries for Hindi are given 
below. 
            The attributes in the lexicon are collectively called Lexical 
            Attributes (both semantic and syntactic attributes). The syntactic 
            attributes include the word category: noun, verb, adjectives, etc. 
            and attributes like person and number for nouns and tense in for 
            verbs.  
            Decoding the UNL expressions into a sentence of any target 
            language is done using word dictionary and the generation rules of 
            the target language. Initially, syntax planning of the target words 
            is done, after which the morphology is generated to produce a 
            natural sentence. 
            Some statistics We have constructed analysers for Hindi 
            and English and the generator for Hindi. The work on the generator 
            for Marathi has also has been started. This needed linking English, 
            Hindi and Marathi language strings with the UWs. Also the Analysis 
            and Generation rules for these languages had to be made. Below is 
            some quantitative information for the English and Hindi 
            languages. 
            Number of Entries in the Hindi-UW dictionary: 70,000 
            Number of Analysis Rules for English: ~5000 
            Number of Analysis Rules for Hindi: ~6000 
            Number of Generation Rules for Hindi: ~6500 
            Other applications  Since the UNL expressions can 
            be looked upon as the extracted knowledge of the documents, we have 
            carried out research on how to use these for various 
            document-processing tasks. Notable among them are automatic hyper 
            linking and text clustering. In the former, the keywords—as 
            candidates for setting up links from—are obtained from the UNL 
            graphs. Heavily linked word-as are possible candidates for keywords. 
            Similarly, the linkage and relation label information in the UNL 
            graphs are used for constructing the document vectors in the 
            semantic dimension. These vectors are then processed with clustering 
            algorithms. The experimental results are promising. 
            
              
              
                
                  
                    
                    
                      | UNL in India 
                         In India, UNL work is being 
                        carried on at the Computer Science and Engineering 
                        Department, IIT Bombay. Here, we do sentence-level 
                        encoding of English, Hindi and Marathi into the UNL form 
                        and decode this information into Hindi and Marathi, thus 
                        creating a way of semi-automated translation from 
                        English to Hindi and Marathi and also between Hindi and 
                        Marathi. For more on UNL, visit www.unl. 
                        ias.unu.edu  |    |   
            Present and future UNL has been found to be very useful 
            for various multilingual information tasks as well as document 
            processing applications. The UNL graph is looked upon as the 
            extracted knowledge from the documents.  
            The countries participating in this project are Japan, China, 
            Indonesia, India, Jordan, Russia, Italy, France, Spain and Brazil. 
            The United Nations Head Quarters in Geneva are developing 
            multilingual information access systems using the UNL.  In 
            IIT Bombay the following high-impact projects are making use of the 
            UNL representation for various text processing and language 
            technology tasks. 
            
              
              
                
                  
                    
                    
                      | Multi-lingual Web 
                         UNL can be a very effective 
                        vehicle for developing multilingual Web-based 
                        applications. The UNL expressions provide the meaning 
                        content of the text and search can be carried out on 
                        this meaning base instead of the text. This, of course, 
                        means developing a novel kind of search-engine 
                        technology. The merit of such a system is that the 
                        information in one language need not be stored in 
                        multiple languages. 
                |    |   
            The Center for Indian Language Technology Solutions 
            (www.cse.iitb.ac.in/tukaram) funded by the Ministry of Information 
            Technology, India. 
            The Center for Intelligent Internet Research (www.cse.iitb. 
            ac.in/laiir) funded by Tata Consultancy Services. 
            Media Lab Asia (www.ircc. iitb.ac.in/~MLAsia), funded by the 
            Ministry of Information Technology, India and with participation 
            from the Masachusetts Institute of Technlogy, USA. 
            The commercial level exploitation of the UNL technology for the 
            Internet scale multilingual access is expected to happen in a couple 
            of years’ time. 
            Pushpak Bhattacharyya, 
            Department of Computer Science and Engineering, Indian Institute of 
            Technology, Bombay     
  
         |