Department of Computer Engineering
Latent Semantic Indexing for Information Retrieval
Based on Singular Value Decomposition
Computer Engineering, Bilkent University
Common approaches to retrieving textual materials are based on lexical match between words in users' requests and those assigned to documents in a collection. Traditional lexical retrieval techniques are valuable to experts trained to search collections from a specific discipline, but they often return too much information to the user. Other times because the terms used in the query differ from the terms used in the document, valuable information is never found in the document collection. Because of this diversity in the words people use to describe the same document, traditional lexical information retrieval techniques will become less useful in searching that much information and deriving useful facts.
This presentation introduces Latent Semantic Indexing (LSI), which is a retrieval method to overcome those problems. LSI is based on matrix computation using the singular value decomposition (SVD). The goal in LSI is to find and fit a useful model of the relationships between terms and documents.
We want to use the matrix of observed occurrences of terms in documents to estimate parameters of that model. With resulting model we can then estimate what the observed occurrences really should have been. In this way, for instance, we may be able to predict that a given term should have been associated with a document, even though, because of variability in word use, no such association was observed explicitly.
In this seminar, motivation and rationale of LSI method will be discussed first. Then a review of the basic concepts needed to understand LSI through SVD will be given. The LSI method is to be illustrated by a constructive example. There, it is shown how LSI represents terms and documents in the same semantic space, how a query is represented, how dynamic collections are managed when additional documents are added, and how SVD-updating represents additional documents.
Keywords:Singular Value Decomposition, Latent Semantic Indexing, Information Retrieval.
DATE: January 16, 2002, Wednesday @ 14:30