Publications
Tools and Softwares
Personal
BioData
|
Tools and Softwares Development
I have very often got more inspiration when I developed system
prototypes of research models
that we try to study. In my opinion, implementation is the key to get a
better grasp of the problem you are studying and helps you to envisage
immediate application of the research problem that catalyses a greater
zeal to crack it. Though one might argue that its better to
focus on either theory or practice, I believe that one compliments
the other, and hence whenever you get bored with theory, its a
good idea to start implementing it, and vice versa. I also find a
similar analogy between front-end development and back-end development,
if you are back-end developer, you might have experienced that having a
nice front-end can always motivate your
back-end development, and vice.versa. Details of projects that
I was involved in the recent years and my contributions in them are
listed
below.
- XData: A generic SQL turtoring platform.
Some of the reasons why SQL is so ubiquitous as a database query
language is its effectiveness, expressiveness, and its formal
properties that allow efficient implementations (under bag semantics
and closed world assumption). However its terse syntax and complex
semantics is difficult to crunch for a novice (under) grad student.
XData is a tool that becomes handy for automated tutoring and grading
of SQL. Though its usage was initially intended for SQL course
instructors in academia, due to its automated grading features
and expressivity, it is very well suitable as a tool for automatically
verifying/testing programmer written queries in the industrial setting.
One of its usage is query grading for automated tutoring of SQL
queries, where an instructor publicized a schema, a set of integrity
constraints (eg: foreign key and primary key), and sets questions by
specifying them in english language. He also can set SQL queries as
answers for these questions. Any student written query is evaluated
against the instructor query by an equivalence checking (a problem that
is considered difficult) algorithm. A key step is to generate critical
datasets that guarantees to differentiate any non-equivalent query
(called data generation
phase). Another key step is to give score a student written query,
inspite of its non-equivalence against the instructor query,
depending on how close the student written queries is to the instructor
query (called partial marking
phase) by syntax. Some of my contributions were to extend and implement
the partial marking module. Input SQL student and instructor query are
represented as tree shaped structures (developed an interface to
display SQL query trees, see figure below), they are canonicalized by
converting to standard forms (for instance, where clause is transformed
to disjunctive normal form of nonnegative atomic relational conditions.
eg: B >= A AND A < 2 are transformed to A<=B AND
A<=1), so that a judicious syntactic comparison can be made by
comparing the respective components of the instructor and student
query. Other tasks involved implementing query minimization
by removing redundant tables, (outer) joins, non recursive with clause
elimination, extending the JSQL query parser for the support of
operations that it does not support.

Performance of partial marking was compared to manual scoring given by
teaching assistants (TAs), and we found that the freedom to write SQL
queries in multifarious syntactic ways distorts badly such a pure
syntactic comparison. Hence, partial marking was only used for cases
where student queries were verified to incorrect. For instance, many
cases of students using subqueries, when an instructor query contained
only select-project-join features were discovered. Though in our
experimental analysis (interface for graphical analysis of student
results against instructor query were developed, see figure below),
partial marking performed much worse that manual scoring, we still
found good correlation (coefficiant of correlation ~0.6) between the
scores awarded by TAs and partial marking scores for our real dataset
that had hundreds of student queries.

Guides: Prof. S. Sudarshan
Programming Language: Java, Other Tools: JSQL query parser, PostgreSQL open source DB, Apache Struts framework, DHtml Javascript library (for tree and chart display)
- Search Bomb: A Keyword querying engine for RDF knowledge. A web search suite that includes an RDF crawler, indexes RDF files
onto an Apache Solr server, which is then used for keyword search. An
RDF File is scanned for entitities (pottentially any RDF:Resource), and
for each entity, fields such as RDF:Description, RDF:Comment,
RDF:Label, the information of whether it is a class, property, or an
individual, its unique id (URI), its provenance, which is the name of
the source file in which appears are used to form a single Solr
document that is added to the Solr Server. An interface for uploading
RDF files to the Solr server was also implemented (see Fig. below)

Indexed document server can be keyword searched (see snapshot of
interface below) leveraging the advanced search features of Solr that
includes stemming using porter's algorithm, removal of delimiters,
stopwords, duplicate removals, case conversion, whitespace
removal. Keyword expansion was enabled using expanding the search
keywords using synonymns, hypernyms, and hyponyms from the wordnet
dictionary.

Guides: Prof. S. Sudarshan
Programming Language: Java, Other Tools: Apache Solr indexing engine, JAWS Wordnet API, Apache Struts framework, OWL API RDF library, Apache Jena RDF library
- Contextualized Quad-Systems:
Developing chase based algorithms for detecting membership and
deductive closure computation of the various contextualized quad-system
classes (cAcyclic, safe, csafe, and range restricted), with large data
sets for the evaluation part of my PhD thesis. The average dChase
computation time, and membership detection time, query response time of
the various quad-system classes were tabulated and graphically
visualized. The contexts were implemented as OWLIM repositories.
Guides: Dr. Luciano Serafini, Prof..Gabriel Kuper, Prof.Till Mossakowski
Programming Language: JAVA, Other Tools: Sesame rdf4J RDF library, OWLIM GraphDB OWL engine
- Contextualized
Knowledge Repository
is a knowledge representation and
reasoning framework with an accompanying prototype that build on
Semantic Web technologies to represent, store, query and reason with
contextualized knowledge, i.e. knowledge that holds under specific
circumstances or contexts. The CKR addresses an arising needs in the
Semantic Web, where as large amounts of Linked Data are published on
the Web, it is becoming apparent that the validity of published
knowledge is not absolute, but often depends on time, location, topic,
and other contextual attributes. See figure below snapshot of the
multi-context querying interface which was one of my
contributions in the implementation

.
Guides: Dr. Luciano Serafini, Dr. Andrei Tamilin
Programming Language and Tools used: Java, Sesame rdf4J RDF library, OWLIM GraphDB OWL engine, Apache Solr indexing engine
- RDF
approximator for OWL ontologies As Query answering
on OWL ontologies are often very inefficient and complexity of conjunctive
query answering is still an open problem even for OWL DL. Subsumption and Instance
checking in 2NEXPTIME-complete for OWL 2 DL, and also computing
deductive closure is impractical as it can be worst-case
infnite. We developed a semantics called RDF-Reduct semantics
for approximating OWL ontologies in RDF. The semantics can be used for
partially axiomatizing an OWL ontology as an RDF graph. Project was
conceived at DFKI GMBH, Bremen, Germany while I was on an internship at University of Bremen.
Guides: Prof. Till Mossakowski, Dr.Oliver Kutz, Dr. Christophe Lange
Programming Language and Tools used: JAVA, Pellet API, OWLIM GraphDB OWL engine, Apache Jena RDF library
- Owl2Latex: A
latex-like easily readable scripting language, called Ontex, for
developing OWL/DL ontologies had been developed in our group in FBK
(for details see: https://dkm.fbk.eu/technologies/tex-owl). There was
the need to omplement a tool that acts as the inverse mapper. I
implemented the OWL2Latex tool, that given an OWL ontology, parses this ontology, and converts into an
ontology serialized in the Ontex format.
Guides: Dr. Luciano Serafini, Dr. Marco Rospocher
Programming Language: JAVA, Other Tools: OWL API
- Ephemerizer:
Ephemerizer is a system that ensures that public keys are expired after
their validity time. Implemention of a public key based
Ephemerizer System that uses the power of Identity based Cryptography,
where an arbitrary public key
can be chosen to encrypt a message for a recipient, the private key for
this key is
generated at a convenient time by the help of an Ephemerizer Server,
which holds a master key, that
also plays the role of the private key generator. This power of IBC is
used to implement an
Ephemerizer system that make sure that data once deleted (timed out)
cannot be recovered. The
Ephemerizer holds the private keys of the encrypted data on
clients(temperory data) that will
eventually be deleted. We implemented the Ephemerizer as a web
service.A snapshot of the commandline interface that lists the various
encryption options for the end-user is depicted in the figure below.

Guides: Prof. Bruno Crispo (DISI, University of Trento) and Dr. Srijith K Nair
(Vrije University, Amsterdam)
Programming Language: C Other tools: MySQL server, Pairing-based Crypto library
|