- RAIN: Always On Data Warehousing
Jorge Vieira, Marco Vieira, Marco Costa, Henrique Madeira
The Redundant Arrays of Inexpensive DWS Nodes (RAIN) technique is a node-level data replication approach that introduces failover capabilities to DWS (Data Warehouse Striping) clusters. RAIN is based on the selective replication of fact tables’ data across the cluster nodes and endows DWS clus-ters with the capability of providing query answers even when one or more nodes are unavailable. Two distinct replication modes are supported: simple re-dundancy (RAIN-0) and stripped redundancy (RAIN-S). In this demo we are going to show a DWS cluster using the RAIN technique, focusing on the execu-tion of queries in the presence of nodes failures and on the process of recover-ing failed nodes.
- Data Compression for Incremental Data Cube Maintenance
Tatsuo Tsuji, Dong Jin, Ken Higuchi
We have proposed an incremental maintenance scheme of data cubes employing extendible multidimensional array model. Such an array enables incremental cube maintenance without relocating any data dumped at an earlier time, while computing the data cube efficiently by utilizing the fast random accessing capability of arrays. But in practice, most multidimensional arrays for data cube are large but sparse. In this paper, we describe a data compression scheme for our proposed cube maintenance method, and demon-strate the physical refreshing algorithm working on the data structure thus compressed.
- A Bilingual Dictionary Extracted from the Wikipedia Link Structure
Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio
A lot of bilingual dictionaries have been released on the WWW. However, these dictionaries insufficiently cover new and domain-specific terminology. In our demonstration, we present a dictionary constructed by analyzing the link structure of Wikipedia, a huge scale encyclopedia containing a large amount of links between articles in different languages. We analyzed not only these interlanguage links but extracted even more translation candidates from redirect page and link text information. In an experiment, we already proved the advantages of our dictionary compared to manually created dictionaries as well as to extracting bilingual terminology from parallel corpora
- A Search Engine for Browsing the Wikipedia Thesaurus
Kotaro Nakayama, Takahiro Hara and Shojiro Nishio
Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedia. The association thesaurus covers almost 1.3 million concepts and the significant accuracy is proved in detailed experiments. To prove its practicality, we implemented three features on the association thesaurus; a search engine for browsing Wikipedia Thesaurus, an XML Web service for the thesaurus and a Semantic Web support feature. We show these features in this demonstration.
- An Interactive Predictive Data Mining System for Informed Decision
Esther Ge, Richi Nayak
There exists a need to utilize the predictive data mining models for querying to obtain the predicted outcome based on user provided inputs in its real use. This demo illustrates a real-world situation in which the trained predictive data mining system is being deployed and now users can interact with the model for informed decision.
- Analysis of Time Series Using Compact Model-Based Descriptions
Hans-Peter Kriegel, Peer Kröger, Alexey Pryakhin, Matthias Renz
Recently, we have proposed a novel method for the compression of
time series based on mathematical models that explore dependencies
between different time series. This representation models each time
series by a combination of a set of specific reference time series.
The cost of this representation depend only on the number of
reference time series rather than on the length of the time series.
In this demonstration, we present a Java toolkit which is able to
perform several data mining tasks based on this novel time series
representation. In particular, this framework allows the user to
explore the properties of our novel approach in comparison to other
state-of-the-art compression methods. The results are visually
presented in a very concise way so that the user can easily identify
important settings of the model-based time series representation.
- Collecting Data Streams from a Distributed Radio-based Measurement System
Marcin Gorawski, Pawel Marks, Michal Gorawski
Nowadays it becomes more and more popular to process rapid data streams representing real-time events, such as large scale financial transfers, road or network traffic, sensor data. Analysis of data streams enables new capabilities. It is possible to perform intrusion detection while it is happening, it is possible to predict road traffic basing on the analysis of the past and current vehicle flow. We addressed the problem of real-time analysis of the stream data from a radio-based measurement system. The system consists of large number of water, gas and electricity meters. Our work is focused on data delivery from meters to the stream data warehouse as quick as possible even if transmission failures occur. The system we designed is intended to increase significantly system reliability and availability. During this demonstration we want to present an example of the system capabilities.
- A Web Visualization Tool for Historical Analysis of Geo-referenced Multidimensional Data
Sonia Fernandes Silva
This paper describes the recent visual features of an interactive
web-based tool that couples geographic map and visual diagrams in
order to promote its use for users who need to remotely explore
large multidimensional datasets in a spatio-temporal context for
decision-making. The tool puts together useful techniques for data
exploration and optimization, by enabling the user to create and
explore several interactive web-based visual reports of summarized
data almost instantaneously. We describe the new contributions in a
particular tourist Datawarehouse.