Title (T1): Stream Processing: Going Beyond Database Management Systems (3 hours)
Presenter:S. Chakravarthy (U. of Texas at Arlington, USA)
Currently, a large class of data-intensive applications, in which data is in the form of continuous streams, has been widely recognized. Not only is the size of the data for these applications unbounded, but the data arrives in a highly bursty mode. Furthermore, these applications have to respond in a timely manner. In other words, these applications have specific Quality of Service (QoS) requirements for query processing. The common QoS requirements include response time, tuple latency, accuracy of the query results, and so on. The amount of computation required and the memory used by a DSMS for processing continuous queries is also very important (for capacity planning). These new characteristics make it infeasible to simply load the arriving data into a traditional database management and use currently available techniques for their processing. Therefore, a data stream management system (DSMS) with its own set of techniques is needed for processing continuous stream data effectively and efficiently. In this tutorial, we discuss main challenges, techniques, and solutions for building a general-purpose DSMS and present our work in this area as well as the work in the literature. We present work on Aurora, Stream, Fjord/Telegraph, and MavStream (to name a few) covering the major efforts in data stream management systems. We will cover the following topics in detail during the tutorial: differences between traditional query processing in a DBMS and continuous query processing, operator and query modeling for stream processing, scheduling strategies (for conserving memory and reducing tuple latency), capacity estimation (to determine strategies and to determine when and how much load to shed), and load shedding strategies. The emphasis will be on satisfying QoS requirements as it is extremely important for stream processing applications. Implementation of a stream processing system will also be covered using the implementation of MavStream at UTA. Some of the work on MavStream can be found at http://itlab.uta.edu/sharma under publications (by topic). The connection between complex event processing (or CEP) which has gained a lot of attention from commercial vendors (e.g., coral8, Esper, Amit, Aleri, …) will also be discussed as part of the tutorial.
Sharma Chakravarthy is Professor of Computer and Engineering Department at The University of Texas at Arlington, Texas. He established the Information Technology Laboratory at UT Arlington in Jan 2000 and currently heads it. Sharma Chakravarthy has also established the NSF funded, Distributed and Parallel Computing Cluster (DPCC@UTA) at UT Arlington in 2003. He is the recipient of the college level "Excellence in Research" award in 2006, university level "Creative Outstanding Researcher" award in 2003 and the department level senior outstanding researcher award in 2002. He is well known for his work on semantic query optimization, multiple query optimization, active databases (HiPAC project at CCA and Sentinel project at the University of Florida, Gainesville), and more recently scalability issues in graph mining and its applications. His group at UTA is currently developing MavStream - a QoS driven stream processing system. Other systems under development include InfoSift- a classification system for text, email, and web that uses graph mining techniques. WebVigiL - a web content monitoring system has been developed. His current research includes web technologies, stream data processing, mining and knowledge discovery - association, graph and text, active databases, distributed and heterogeneous databases, query optimization, and multi-media databases. He has published over 120 papers in refereed international journals and conference proceedings. He has given tutorial on a number of database topics, such as active, real-time, distributed, object-oriented, and heterogeneous databases in North America, Europe, and Asia. He is listed in Who's Who Among South Asian Americans and Who's Who Among America's Teachers. Prior to joining UTA, he was with the University of Florida, Gainesville. Prior to that, he worked as a Computer Scientist at the Computer Corporation of America (CCA) and as a Member, Technical Staff at Xerox Advanced Information Technology, Cambridge, MA. Sharma Chakrvarthy received the B.E. degree in Electrical Engineering from the Indian Institute of Science, Bangalore and M.Tech from IIT Bombay, India. He worked at TIFR (Tata Institute of Fundamental Research), Bombay, India for a few years. He received M.S. and Ph.D degrees from the University of Maryland in College park in 1981 and 1985, respectively.
Title (T2): The semantic web: Semantics for data and services on the web (3 hours)
Presenters:Vipul Kashyap (Partners HealthCare System, USA) & Christoph Bussler (BEA Systems, USA)
There is a wide-spread misperception that the Semantic Web is primarily a rehash of existing AI and database work focused on encoding KR formalisms in markup languages such as RDF(S), DAML+OIL or OWL. We seek to dispel this notion by presenting the broad dimensions of this emerging Semantic Web and the multi-disciplinary technological underpinnings. In fact, we argue that it is absolutely critical to be able to seek, leverage and synergize contributions from a wide variety of technologies and sub-fields of computer science. The Semantic Web can be viewed from an Information Aspect as well as Computational Aspect, together with their associated technologies. This tutorial aims to present these technologies from a database and application perspective. We plan to present the design rationales behind these technologies, to help formulate research questions for the database researcher and also provide guidance for the practitioner.
Vipul Kashyap is a Senior Medical Informatician in the Clinical Informatics Research & Development group at Partners HealthCare System and is currently the chief architect of a Knowledge Management Platform that enables browsing, retrieval, aggregation, analysis and management of clinical knowledge across the Partners Healthcare System. Vipul received his PhD from the Department of Computer Science at Rutgers University in New Brunswick in the area of metadata and semantics-based knowledge and information management. He is also interested in characterization of the value proposition of semantic technologies in the enterprise context. Before coming to Partners, Vipul has held positions at MCC, Telcordia (Bellcore) and was a fellow at the National Library of Medicine. Vipul has published 2 books on the topic of Semantics, 40-50 articles in prestigious conferences and journals; and has participated in panels and presented tutorials on the topic of semantic technologies. Vipul sits on the technical advisory board of an early stage company developing semantics-based products, and represents Partners on the W3C advisory committee and the HealthCare Information Technology Standards Panel (HITSP).
Christoph Bussler is Staff Software Engineer at BEA Systems, Inc., working in the core WebLogic application server product development organization. Before joining BEA, Chris was architect at Cisco Systems, Inc. in San Jose, CA, USA, responsible for the service-oriented architecture at Cisco Systems' Quote-to-Cash business unit. Before taking this position he was Science Foundation Ireland Professor at the National University of Ireland, Galway in Ireland and Executive Director of the Digital Enterprise Research Institute (DERI). In addition to his role as Executive Director of DERI, Chris led the Semantic Web Services research group at DERI. Before DERI he was member of Oracle’s Integration Platform Architecture Group based in Redwood Shores, CA, USA. He was responsible for the architecture of Oracle’s next generation integration product providing EAI, B2B and ASP integration. Prior to joining Oracle he was at Jamcracker, Cupertino, CA, USA, responsible for defining Jamcracker’s ASP aggregation architecture, Netfish Technologies (acquired by IONA), Santa Clara, CA, USA, responsible for Netfish’s B2B integration server, The Boeing Company, Seattle, WA, USA, leading Boeing’s workflow research and Digital Equipment Corporation (acquired by Compaq, acquired by Hewlett-Packard), Mountain View, CA, USA, defining the policy resolution component of Digital’s workflow product. Chris has a Ph.D. in computer science from the University of Erlangen, Germany and a Master in computer science from the Technical University of Munich, Germany. Chris published a book titled 'B2B Integration', two books on workflow management, over 100 research papers in journals and academic conferences, he gave tutorials on several topics including B2B integration, workflow management and service-oriented architectures and he was keynote speaker at many conferences and workshops on topics like workflow management, B2B and EAI integration as well as Semantic Web.
Title (T3): Preference query formulation and processing: Ranking and Skyline query approaches (3 hours)
Presenters:Seung-won Hwang (POSTECH, Korea) & Wolf-Tilo Balke (L3S Research Center, Germany)
As near-infinite amount of data are becoming accessible on the Web, it is getting more and more important to support intelligent query mechanisms, to better help each user to identify the preferred results of manageable size. As such mechanism, ranking and skyline queries have gained a lot of attention lately, which have the following complementary strengths: Ranking queries enable a strong control over the quality and size of query results, while skyline queries support a more intuitive formulation procedure. In this tutorial, we will explore how user preferences are modeled and how advanced query semantics such as ranking and skyline queries support such preference queries. We then discuss the strengths and weaknesses of the two representative semantics, and recent efforts to combine the strength. We also overview the existing research works on processing ranking and skyline queries efficiently.
Seung-won Hwang is an assistant professor in the Department of Computer Science and Engineering at Pohang University of Science and Technology (POSTECH). Prior to joining POSTECH in 2005, she received her Ph.D in Computer Science from University of Illinois at Urbana-Champaign, for her thesis developing intuitive ranking formulation and efficient rank processing techniques. Her research works focus on advanced query semantics, including ranking and skyline queries, and query categorization, published at major international journals and conferences, including ACM TODS, IEEE TKDE, SIGMOD, and ICDE.
Wolf-Tilo Balke currently is the associate research director of L3S Research Center of University of Hannover, Germany. Before that he was a research fellow at the University of California at Berkeley. His research is in the area of databases, information systems and query processing, including middleware retrieval algorithms, preference-based retrieval, and ontology-based retrieval techniques. Wolf-Tilo Balke is the recipient of two Emmy-Noether-Grants of the German Research Foundation (DFG) and the Scientific Award of the University Foundation Augsburg. He has received his B.A and M.S degree in mathematics and a Ph.D in computer science from University of Augsburg, Germany.