The rise of a user base accessing content primarily in native Indic languages has raised demand for creating wide-coverage content and providing services accessible to them in Indic languages. Given the diverse languages spoken in India, automated translation solutions can benefit content creators and service providers in serving a wide demography with timely information. Translation of content between Indic languages can often be challenging owing to the morphological richness, high lexical productivity, relative free word ordering, code switching and mixing, semantic shifting of lexical entries between languages, etc.
Neural network based approaches for machine translation have shown to report state of the art results for several languages. However, such approaches generally expect availability of large parallel corpora for the languages in consideration, a major challenge for several of the resource-scarce Indic languages. In this project, we intend to develop low-resource neural machine translation strategies by deeply integrating recent advancements in deep learning and the traditional linguistic and grammatical knowledge. Specifically, we intend to develop an interlingua, rooted on the Pāṇinian Sanskrit grammatical framework, that acts as an intermediate interpretable representation for all the languages involved. The project will currently focus on three Indic languages, namely, Hindi, Kannada and Sanskrit. The overall objective of the project includes development of a data-efficient, interpretable, human-in-the loop neural machine translation framework and associated tools, in addition to achieving state of the art results.
Develop a suitable interlingua schema that can encode morphosyntactic, semantic and pragmatic information present source language sentences to a suitable structured schema, facilitating the sentence translation to a variety of target languages, with Hindi, Kannada and Sanskrit as use cases.
Develop and Deploy API based solutions, applications and web-based services based on the translation technology developed in this project.
Develop and deploy post-hoc analysis based explainers and tools aimed at language learners for pedagogical purposes, as a use-case in the educational domain.
Develop neural encoder-decoder models, neural module networks and distillation based models. We intend to develop and deploy end to end encoder-decoder models for translation using the interlingua, models for refining the interlingua and translating it to target language, and finally for distillation of large models.
#PI
#PhD
#Consultant
Reach out to us, if you are interested to contribute to our work. If you can see yourself in any of the open positions, please do not hesitate to contact us!
Prof. Ganesh Ramakrishnan
Ayush Maheshwari
Amrith Krishna