Building Large Scale Databases
Satyanarayanan M (HP)
Abstract: This talk focuses on the ability of the NonStop SQL/MX database to scale linearly and showcase the aspect of building large scalable databases. Typical databases face issues with scaling above certain data limits and this gets largely addressed by clustering of servers. The NonStop server node concept with its inherent characteristics of high reliability, availability and scalability provides the ideal platform for the SQL/MX database to provide linear scalability to enable creating really large databases and still retain performance and reliability levels for key OLTP and mixed workload environments.
Cloud based Big Data Analytics
Anupam Singh (MarketShare)
Abstract: In this talk, we want to talk about the challenges presented by Non-Relational modeling workflow. Our goal in this talk is to be a conversation starter for discussions on Non-Relational analytics on top of Hadoop and Amazon Web Services. Based on your feedback, we might add the following – a) A detailed case study of data flowing through the system, and b) Performance numbers for various levels of complexity.
A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search
Vinod Nair (Yahoo)
Abstract: Fast nearest neighbor search is necessary for a variety of large scale web applications such as information retrieval, nearest neighbor classification and nearest neighbor regression. Recently a number of machine learning algorithms have been proposed for representing the data to be searched as (short) bit vectors and then using hashing to do rapid search. These algorithms have been limited in their applicability in that they are suited for only one type of task e.g. Spectral Hashing learns bit vector representations for retrieval, but not say, classification. In this talk I will present a unified approach to learning bit vector representations for many applications that use nearest neighbor search. The main contribution is a single learning algorithm that can be customized to learn a bit vector representation suited for the task at hand. This broadens the usefulness of bit vector representations to tasks beyond just conventional retrieval.
Bio: Vinod Nair is a research scientist at Yahoo! Labs Bangalore. He received his PhD in Computer Science from the University of Toronto in 2010, and Masters and Bachelors degrees in Electrical Engineering from McGill University. His research interests are in machine learning and its applications to computer vision.
Practical Learnings from real-world Data Mining
Shailesh Kumar (Google)
Abstract: Data Mining is part science, part engineering, and part art. The "art" of data mining implies, among other things, correct understanding of the data's generative process, unsupervised discovery of novel and useful insights from the data, and engineering meaningful features to build robust models. In this talk the speaker will share three key and practical learnings along these aspects using examples from three different domains - retail analytics, text mining, and computer vision.
Bio: Dr. Shailesh Kumar is a Member of Technical Staff at Google, Hyderabad. Prior to joining Google, he worked as a Principal Research Scientist at Fair Isaac Research, San Diego, USA and Senior Scientist at Yahoo! Labs Bangalore, India on a variety of machine learning and data mining problems in text mining, retail analytics, fraud analytics, bioinformatics, information retrieval, and computer vision. Dr. Kumar has authored more than 20 refereed papers in International conferences and journals and has filed more than 15 patents in these areas. Dr. Kumar received his Masters and PhD degrees in Computer Engineering from the University of Texas at Austin.