Retrieval and Recognition of Historical Manuscripts


Esra Ataer
Computer Engineering Department
Bilkent University

Large archives of historical documents are challenging to many researchers all over the world. However, these archives remain inaccessible since manual indexing and transcription of such a huge volume is difficult. In addition, electronic imaging tools and image processing techniques gain importance with the rapid increase in digitalization of materials in libraries and archives and need of applications like automatic mail sorting, signature recognition and bank-check processing. In this thesis, a language independent method is proposed for representation, retrieval and recognition of such historical documents. While character recognition methods suffer from preprocessing and overtraining, we make use of another method, which is based on extracting words from documents and representing each word image with the features of invariant regions. The bag-of-words approach, which is shown to be successful to classify objects and scenes, is adapted for matching words. The idea is rather simple, that we act like a novice person to language and treat words as images rather than a collection of characters. Since the curvature or connection points, or the dots are important visual features to distinct two words from each other, we make use of the salient points which are shown to be successful in representing such distinctive areas and heavily used for matching. Difference of Gaussian (DoG) detector, which is able to find scale invariant regions, and Harris Affine detector, which detects affine invariant regions, are used for detection of such areas and detected keypoints are described with Scale Invariant Feature Transform (SIFT) features. Then, each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors. Different feature representations are generated with the visual term information, both with the use of location information of keypoints or classical histogram of visual terms or string form of the visual vocabulary. Then similar words are matched based on the similarity of these representations by using different distance measures. These representations are used both for document retrieval and word recognition.The experiments are carried out on Arabic, Latin and Ottoman datasets, which included different writing styles and different writers. The results show that the proposed method is successful on retrieval and recognition of documents even if with different scripts and different writers and since it is language independent, it can be easily adapted to other languages as well. In addition, the system is succesfull on capturing semantic similarities, which is useful for indexing, and it does not include any supervising step. The results are also compared with other popular methods on word matching, like Dynamic Time Warping and string matching.


DATE: 19 July, 2007, Thursday@ 11:00