Project topics for CS490 (Spring 2008)
1. Boosting the performance of search engines
Typing
a few keywords and “google”ing for something that you
are interested is a common practice of your life today.
But
do you ever wonder... how a search engine finds a single piece of information
(say, the latest album of a megastar,
the
picture of a beautiful countryside or some clues about your algorithms
homework!!) so quickly among billions of
documents?
what sort of indexes, algorithms and architectural
tricks make this possible? and finally, is it possible
to
make it
faster?
If
your is answer “yes!” to above questions, this research project will be a great
opportunity to get a background
on all of
these, to investigate new ideas in this exciting field, and, who knows, to
obtain results good enough for
publication...
If you are really ambitious, you can even find something that can make you
rich!!!
In
this study, you will try to boost the search engine performance by combining
two popular techniques, namely, index
caching
and pruning. This is a well-defined research project with the following stages:
·
Getting familiar with the topic: This requires some guided reading about collection
indexing, index caching and
index pruning.
·
Research:
You will first investigate a hybrid approach that combines caching and pruning
techniques.
New ideas may always arise, and always welcome!!
·
Implementation:
You can adapt codes either developed within our research group or publicly
available
on the net. You may also write a reasonable amount of
code for implementing the “new” ideas. In any case,
you should be good at programming in at least one of the
C/C++ or Java.
·
Experimentation:
You will evaluate how good the proposed ideas are with respect to the
state-of-the-art solutions.
Expected
output: We expect this research would shed light on the overall system
performance in an environment
where
index caching and pruning used together. The result is expected to be –at
least- worthwhile to prepare a formal
“technical report”. According to the quality and originality
of the work, and the final results, the entire study can also
be
considered for submission to an international conference and/or journal. If
accepted, you will have your first scientific
paper,
even before applying for an MS or PhD degree!
2. Image/Video crawler This project involves crawling the web to collect images/videos for database construction. The system to be developed for crawling should have utilities to collect specific type of media along with the nearby relevant labels.
3. Paper search & bibliography generation We often need to search for the publications on the web relevant to our research. The system to be developed by this project will search for a list of web sites (or use google search results) and present the collected information
in an easy to use format.
Some of the facilities provided by this system will be: · A link to the pdf file of the paper if it exists,
· Abstract of the paper,
· Bibtex entry to be included as reference in latex documents,
· etc.
If done properly, this tool can be very useful and used by many researches if it is made available online.