-->
Desktop Search Engine
It is not important that how to create data
but our focus should to how handle large amuont of data so that it can be retrieved
in case of necessity.
Created in HTML Slidy:
Slide Shows in XHTML
Harsh Vardhan
M Tech I CSE
IIT Bombay
Home Page
XHTML/CSS
Overview
- Problem Statement
- Solution Design
- Implementation Plan
- Testing Plan
- Context
Problem Statement
- This project is about developing a Desktop Search Engine.It takes query from the user
parses it and displays the text based documents stored in the drive with some specified
ranking criterian.
- This Search Engine also initially builds the database of each and every document stored in the form
of tokens and searching is performed on this database only.
- Minimum Work : A basic search engine would be developed which takes input from user
deletes the stop words ie redundant words search through the indexed databse which is prepared earlier
and ranks them to single criteria and displays the output in the GUI .
- Wish list : Their are two improvements possible
- To add a scheduler on the crawler part so that
user will not have to specify the path and it would construct a index for all the text files present in it.
- To give user freedom
to himself self the ranking criteria
Solution Design
-
- In this project LUCENE API's are going to be used provided by Apache
- LUCENE provides a kind of framework on which search routine can be made.
- Indexer :Indexer module is been
used to create the indexed database which is been generated by storing the tokens of every text documents
on which searech is to be performed.
- Analyser : This module takes input the
user query from GUI and analyses it . By Analysing it means that it deletes the unnecessary words called
as stop words . And it generates the internal query which would be parsed by the searcher.
- Searcher : This module takes input the from
the analyser module which a internal query and interprets it. Searcher then executes the query on the indexed database
and takes output ranks them in specified criteria and returns back the link to the user of document containing
input words.
- GUI : This module directly interacts with the user
takes the input and displays the output.
Implementation
Language Used | JAVA |
API Used for back end | LUCENE API |
API Used for front end | JAVA Swing classes |
DATA Used | Reasonable amount of text based documents |
Amount of code | upto 500 LOC |
Current Status | 30% of back end coding done |
Testing
To test this project we have to divide the test cases into 2 phases
- To test modules involving creation of database : We will specify some
path and then we will see whether all the text based documents are properly
indexed and stored into the database or not .
- To test modules regarding the searching : The user will provide some words
which are used in many documents and we will see the result whether all those documents are displayed
or not and whether they are ranked in proper order or not.
Context
Desktop Search Engine is now a days most required
module to be stored in the users hard disk. Since the bulk of data is increasing tremendously as liks that
of internet now the necessity has come to search the words in the document stored in our system only.
Since this search module is a general part it can also be incorporated on webpages also . For example
some web sites are very bulky and contain thousands of pages so if developer wants to provide user a
solution to effectively search to his required data he can prepare the index database for his web sit itself.
This desktop search Engine can be extended to a web based search engine . Instaead of taking documents stored on the web pages
it should taks documents from the web itself and store them in form of tokens.
Farewell
