Minali

Minali Upreti

M.Tech. (CSE), IIT Bombay

Specialization: Automatic Speech Recognition






Minali

Minali Upreti

M.Tech. (CSE), IIT Bombay

Specialization: Automatic Speech Recognition







PUBLICATIONS


Abhinav J., Minali U. & Preethi Jyothi. "Improved Accented Speech Recognition using accent embeddings and multi-task learning". Accepted in INTERSPEECH (CORE A) Conference (pp. 1-6).

Guide: Prof. Preethi Jyothi, May’18

This paper investigates the use of accent embeddings and multi-task learning to improve speech recognition for accented speech. These techniques together give significant relative performance improvements of 15% and 10% over a multi-accent baseline system on test sets containing seen and unseen accents, respectively.


SEMINAR


Topic: Multi-task acoustic modeling for accented speech recognition

Guide: Prof. Preethi Jyothi, Jan’18-April’18

Objective: To explore the scope of improving accented speech recognition using Multi-task Learning
Work Done: Performed a literature survey on the use of Multi-task learning in NLP, computer-vision and speech recognition. Implemented a Multi-task architecture for acoustic modeling with a relative performance improvement of 7% and 6% over the baseline on test sets of seen and unseen accents respectively. Extracted accent embeddings and fed them as auxiliary input to the Multi-task network which further improved the performance of the model.
Tool(s)/Language(s) used: ‘Kaldi’ - Speech Recognition Toolkit


M.TECH. PROJECT (PHASE-1)


Topic: Improving Accented Speech Recognition

Guide: Prof. Preethi Jyothi, Jan’18-present

Objective: To improve the accuracy of Automatic Speech Recognition systems on accent diverse speech using Multi-task learning techniques e.g. Cross-stitch networks
Work Done: Implemented Cross-stitch network to make the model learn an optimal information sharing strategy between a multi-accent acoustic model and accent-classifier through training data. Performing experiments on the implemented network to find the most appropriate initialization values and learning-rates for cross-stitch units at different layers.
Tool(s)/Language(s) used: ‘Kaldi’ - Speech Recognition Toolkit

Image Synthesis from Text using Generative Adversarial Network


Advanced Machine Learning, Instructor: Prof. Sunita Sarawagi
We extracted text embeddings from a pre-trained encoder trained over a large text corpus. Then we implemented a GAN which generates images from text-embeddings, using random noise to capture variance. We also studied and experimented the state-of-the-art Stack-GAN which generates high-resolution images from text.
Tool(s)/Language(s) used: Keras, Python


Who, where, what: A live activity detection and location tracking Android application


Mobile Computing, Guide: Prof. Vinayak Nayak
We developed an Android app to record sensor-data for an activity and trained a Random Forest classifier on it. In this app we reported the live activity (using the activity-classifier) and live location of all friends of a user in a MapView.
Tool(s)/Language(s) used: Android, Python, sklearn, pandas


Keystrokes recognition from keyboard acoustics


Automatic Speech Recognition, Guide: Prof. Preethi Jyothi
We extracted MFCC feature vectors of the individual keystroke audio and trained an SVM classifier on it with an accuracy of 87% on data recorded from ThinkPad E450 Keyboard. We appended the SVM classifier with a dictionary and a language model to get a CER of 25% on the same dataset.
Tool(s)/Language(s) used: Python, sox, librosa, sklearn


Implementation of static partitioning of SSD cache


Kernel Programming, Guide: Prof. Puru Kulkarni
Implemented static partitioning with configurable number of partitions in SSD cache from user-space.
Tool(s)/Language(s) used: dm-cache module (a component of Linux kernel’s device mapper framework)


Traffic Sign Detection and Recognition


Machine Learning, Guide: Prof. Ganesh Ramakrishnan
Implemented a Convolutional Neural Network architecture to recognize traffic signs present in the images. We also studied and experimented with the state-of-the-art real-time object detection system i.e. YOLO (You Look Only Once) and Inception: Google’s CNN framework for this task.
Tool(s)/Language(s) used: Python


The page-size duality implication


Virtualization, Guide: Prof. Puru Kulkarni
Designed, performed and analyzed experiments to evaluate the implications of Huge-Page settings in GuestOS and HostOS for CPU intensive, memory intensive and IO intensive workloads.
Tool(s)/Language(s) used: perf Linux profiler