2nd International Workshop on New Trends in Information Integration (NTII 2010)

5-6 March 2010, Long Beach, California, USA

In conjunction with the
26th International Conference on Data Engineering (ICDE 2010)



Workshop Program

5th March, 2010

01:30-01:45 Opening Remarks
01:45-03:00 Keynote-1: Business Information Management and Controls: Lessons from the Current Financial Crisis, Dan Wolfson (IBM Distinguished Engineer).
03:00-03:30 Coffee Break
03:30-05:00 New Architectures and Models (30 mins presentation)

6th March, 2010

08:00-08:30 Breakfast
08:30-10:00 Keynote-2: Interactively Building Geospatial Mashups, Craig Knoblock (University of Southern California)
10:00-10:30 Coffee Break
10:30-12:00 Applications (22 mins presentation)
  • Best Effort Data Exchange of Taxonomically Organized Data. David Thau, Shawn Bowers
  • BI Style Relation Discovery between Entities in Text. Wojciech Barczynski, Falk Brauer, Adrian Mocan, Marcus Schramm, Jan Froemberg
  • Profiling Linked Open Data with ProLOD. Christoph Bohm, Felix Naumann, Ziawasch Abedjan, Dandy Fenz, Toni Grütze, Daniel Hefenbrock, Matthias Pohl, David Sonnabend
  • Coordination of Data in Heterogenous Domains. Michael Lawrence, Rachel Pottinger, Sheryl Staub-French
12:00-01:30 Lunch
01:30-03:00 New Primitives and Techniques (30 mins presentation)
  • Duplicate Detection in Probabilistic Data. Fabian Panse, Norbert Ritter, Maurice Van Keulen, Ander De Keijzer
  • Complement Union for Data Integration. Jens Bleiholder, Sascha Szott, Melanie Herschel, Felix Naumann
  • Midas for Government: Integration of Government Spending Data on Hadoop. Antonio Sala, Calvin Lin, Howard Ho
03:00-03:15 Brief Concluding Remarks


Keynote Details

Keynote-1: Dan Wolfson (IBM Distinguished Engineer).

Title: Business Information Management and Controls: Lessons from the Current Financial Crisis
Abstract:
In the wake of the current financial crisis, the critical importance of Business Information Management and Controls has become increasingly evident. Many institutions around the world have been forced to evaluate their information systems for regulatory conformance, business efficiency and their ability to access risk/opportunity. As businesses continues to change their focus, merge together and divest unprofitable divisions, the challenges in providing information systems are many. The sheer complexity and scale is daunting. Political and economic realities must be accommodated. The regulatory requirements continue to evolve. To meet these challenges, a combination of software and engineering practices needs to be applied.
In this talk we will explore some of the common information issues that have arisen and the technical and architectural approaches enlisted to improve the situation, We will focus on how to understand and record the meaning of information, its quality, and how it can be aggregated and shared across and organization. In addition to current practices, we will also highlight some of the key challenges and opportunities for the research community.

Keynote-2: Craig Knoblock (University of Southern California)

Title: Interactively Building Geospatial Mashups
Abstract:
There are a number of tools and services available now for building mashups on the Web. However, many of the tools for constructing mashups reply on a widget paradigm, where users must select, customize, and connect widgets to build the desired application. While this approach does not require programming, the users must still understand programming concepts to successfully create a mashup. In this talk I describe our programming-by-demonstration approach to building mashups by example. Instead of requiring a user to select and customize a set of widgets, the user simply demonstrates the integration task by example. I will describe how this approach addresses the problems of extracting data from various sources, cleaning and modeling the extracted data, integrating the data across sources, and visualizing the integrated results in a geospatial context. We implemented these ideas in a system called Karma and evaluated Karma on a set of 20 users and showed that compared to other mashup construction tools, Karma allowed more of the users to successfully build mashups and made it possible to build these mashups significantly faster compared to using a widget-based approach.
This research is joint work with Shubham Gupta, Pedro Szekely, and Rattapoom Tuchinda.

Workshop Goals

Virtually every enterprise, scientific domain, or health care provider will assert that information integration is their most pressing information technology need. Despite the fact that research in data integration has been going on for over 20 years, we see few success stories from the real world. There are many reasons for this: perhaps predominantly that (1) integration encompasses a wide variety of tasks and domains, and there is a delicate balance between general solutions and domain-specific ones; and (2) general solutions typically require a combination of techniques from a range of communities, including databases, information retrieval, machine learning, and knowledge representation or Semantic Web. For instance, integrating contact center call transcripts with structured (transaction and profile) data in real-time requires efficient techniques which can work on noisy transcribed data, integrating Web data may need to deal with adversarial content providers, and integrating genetic data may require similarity matching on gene sequences. In recent years there has been a new emphasis on best-effort systems that combine automated approaches with user refinement or feedback, on integration techniques that combine the traditional stages of integration, and on using machine learning and other techniques with database concepts to address the needs of integration. These new approaches, generally targeting certain subclasses of the information integration problem, are highly promising.

The aim of the workshop is to encourage researchers from the information integration community to present novel issues and techniques related to applying information integration in different areas (especially in the context of integrating structured and unstructured data). The workshop will serve as a confluence of new ideas that will help drive research in the area of information integration from being ?generic? to being more focused, interactive, and realistic. We invite papers from researchers and practitioners working in information integration, data warehousing, privacy and trustworthy data systems and related areas to submit their original papers in this workshop. The main topics include, but are not limited to:

Paper Submission

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum. We also encourage submissions describing work-in-progress or lessons learnt in practice. Submissions must clearly identify the nature of the paper as research, experience, or position. All submitted papers will be peer reviewed and evaluated on originality, significance, technical soundness, and clarity of expression. By submitting a paper, authors explicitly agree that at least one of them will register for the workshop and present the paper. Submissions must be in the standard ICDE format. The papers can be 4-6 pages in length. We also encourage early papers on novel work. Please submit your paper through the CMT site: https://cmt.research.microsoft.com/NTII2010/. The accepted papers will be published in the ICDE proceedings (CD version).

THE CAMERA READY PAPER WILL BE LIMITED TO 4 PAGES, THEREBY NOT RESTRICTING THE PUBLICATION OF THE MAIN IDEA IN OTHER CONFERENCES.

Important Dates

Revised Dates
Submission Deadline : 15 Oct 2009
Notification Deadline : 25 Nov 2009
Camera Ready : 11 Dec 2009

Organization

Workshop Chair:

Laura Haas (IBM Almaden Research Center, USA)

Program Committee Co-Chairs:

Zachary Ives (University of Pennsylvania, USA)

Manish A Bhide (IBM Research, India)

Publicity Chair:

Sumit Negi (IBM Research, India)

Program Committee:

Michael J Cafarella (University of Michigan, USA)

Yi Chen (Arizona State University, USA)

Kevin Chang (UIUC, USA)

Anish Das Sarma (Yahoo Research, USA)

Luna Dong (AT&T Research, USA)

Christoph Koch (Cornell University, USA)

Ullas B Nambiar (IBM Research, India)

Felix Naumann (Hasso Plattner Institute, Germany)

Michalis Petropolous (University at Buffalo, USA)

Evaggelia Pitoura (University of Ioannina, Greece)

Prasan Roy (Independent Consultant, India)

Michael Schrefl (JKU Linz, Austria)

Kohichi Takeda (IBM Research, Japan)

Millist Vincent (University of South Australia, Australia)

Ji-Rong Wen (Microsoft Research Asia, China)

Xiaofang Zhou (University of Queensland, Brisbane)