Software
For a complete list of software, check my GitHub page
- Indic Language NLP library
The goal of this project is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.
The library provides the following functionalities:
- Text Normalization
- Indic Script Conversion
- Romanization of Indic Scripts (ITRANS) and vice-versa
- Indian Language Transliteration
- Tokenization
- Word Segmentation
- GeoMM
Geometry-aware multilingual embedding to map different languages to a common space
- McTorch
A manifold optimization library for deep learning
- IIT Bombay Unsupervised Transliterator
Unsupervised transliteration system which uses phoentic features to define transliteration priors. This is an EM based method which builds on Ravi and Knight's 2009 work. In addition, self training is used to iteratively build a substring based transliteration system in order to incorporate contextual information.
- Multilingual Neural Machine Translation System
A multilingual Neural Machine Translation/Transliteration system written in Tensorflow.
- METEOR-Indic
METEOR for Indian languages. It uses IndoWordNet for synonyms in Indian langauges. It uses a trie based stemmer for matching stems in Indian languages.
- CFILT Pre-order: A Source Reordering System for English-Indian Language Translation
There is many structural divergences between Indian languages and English, the principal of them being the word order viz. Subject-Object-Verb for Indian languages and Subject-Verb-Object for English. This toolkit reorders a given English sentence so address these structural divergences, so that t he word order of the modified English sentence conforms to the canonical word order in Indian languages. This transformation is useful for Machine Translation.
Originally written by Ananthakrishnan Ramanathan, I currently maintain this.
- Job Scripts for Moses
A simple experiment management system for Moses. It contains scripts for batch-training of many MT systems
- SarcasmBot: A sarcasm-generation module for chatbots
This software is a chat generation module that replies sarcastically to user input
Online Systems
- Śata-Anuva̅dak: 100+ Automatic Translators for Indian Languages: A broad coverage Statisitical Machine Translation system for Indian languages. It is a Phrase-Based MT system with pre-processing and post-processing extensions. The pre-processing includes source-side reordering for English to Indian language translation.
- Brahmi-Net: Transliteration and Script Conversion for Indian Languages: Brahmi-Net is an online system for transliteration and script conversion among 18 major Indian languages of the Indian subcontinent (306 language pairs).
Languages supported include:
- Indo-Aryan languages: Hindi, Urdu, Bengali, Gujarati, Punjabi, Marathi, Konkani, Assamese, Odia, Sindhi, Sinhala, Nepali, Sanskrit
- Dravidian languages: Tamil, Telugu, Malayalam, Kannada
English