Technical Reports 1996

Technical Reports Published in 1996

BU-CEIS-9601: PDF

k Nearest Neighbor Classification on Feature Projections
Aynur Akkuş and H. Altay Güvenir
This paper proposes a new approach to classification based on a majority voting on individual classifications made by the projections of the training set on each feature. We have applied the k-nearest neighbor algorithm to determine the classifications made on individual feature projections. We called the resulting algorithm k-NNFP, for k-Nearest Neighbor on Feature Projections. The most important characteristic of this technique is that the training instances are stored as their projections on each feature dimension. This allows the classification of a new instance to be made much faster than k-NN algorithm. The voting mechanism reduces the negative effect of possible irrelevant features in classification. Also the classification accuracy of k-NNFP increases when the k value is increased, which indicates that the process of classification can incorporate the learned classification knowledge better when k increases. The k-NNFP algorithm is compared with the k-NN algorithm, in terms of classification accuracy and running time on some real-world and artificial datasets.
BU-CEIS-9602: PDF

Tactical Generation in a Free Constituent Order Language
Dilek Zeynep Hakkani, Kemal Oflazer, and İlyas Çiçekli
This paper describes tactical generation in Turkish, a free constituent order language, in which the order of the constituents may change according to the information structure of the sentences to be generated. In the absence of any information regarding the information structure of a sentence (i.e., topic, focus, background, etc.), the constituents of the sentence obey a default order, but the order is almost freely changeable, otherwise. We have used a recursively structured finite state machine (essentially an ATN minus the registers) for handling the changes in constituent order, implemented as a right-linear grammar in a unification-based generation grammar environment, GenKit, developed at Carnegie Mellon University--Center for Machine Translation. Morphological realization has been implemented using an external morphological analysis/generation component which performs morpheme selection and handles morphographemic processes.
BU-CEIS-9603: PDF

Unsupervised Learning in Constraint-based Morphological Disambiguation
Kemal Oflazer and Gökhan Tür
This paper presents a constraint-based morphological disambiguation approach that uses unsupervised learning component to discover some of the constraints it uses. It is specifically applicable to languages with productive inflectional and derivational morphological processes, such as Turkish, where morphological ambiguity has a rather different nature than that found in languages like English. Our approach starts with a set of corpus-independent hand-crafted rules that reduce morphological ambiguity (hence improve precision) without sacrificing recall. It then uses an untagged training corpus in which all lexical items have been annotated with all possible morphological analyses, incrementally proposing and evaluating additional (possibly corpus dependent) constraints for disambiguation of morphological parses using the constraints imposed by unambiguous contexts. These rules choose parses with specified features. It then learns in an unsupervised manner, additional rules for removing parses with certain features. In certain respects, our approach has been motivated by Brill's recent work, but with the observation that his transformational approach is not directly applicable to languages like Turkish. Our results indicate that using hand-crafted rules and rules learned to choose, we can attain a recall of 99.08% and a precision of 88.08% with 1.119 parses per token, on the training text. When rules learned to delete are used in addition to these, we can attain a recall of 96.76% and a precision of 92.05% and 1.051 parses per token on the training text. On previously unseen text, we can attain a recall of 98.04% and a precision of 86.23% with 1.137 parses per token using just the hand-crafted rules and rules learned to choose. When rules learned to delete are used we can attain a recall of 96.99% and a precision of 88.13% and 1.100 parses per token.
BU-CEIS-9604: PDF

Partial Query Evaluation for Vertically Partitioned Signature Files in very Large Unformatted Databases (Ph. D. Thesis)
Seyit Koçberber
Signature file approach is a well-known indexing technique. The Bit Sliced Signature File (BSSF) method provides an efficient retrieval environment by only accessing on-bits of query signatures. However, its response time increases for increasing number of query terms. For BSSF we define a stopping condition that tries to avoid the processing of all on-bits of query signatures through partial evaluation. The aim of the stopping condition is to reduce the expected number of false drops to the level that also provides the lowest response time within the framework of BSSF. We propose the Partially evaluated Bit- Sliced Signature File (P-BSSF) method that employs the partial evaluation strategy and minimizes the response time in a multi-term query environment by considering the submission probabilities of the queries with different number of terms. Experiments show that P-BSSF provides 85% improvement in response time over BSSF depending on space overhead and the number of query terms. To provide better optimization of the signature file parameters in multi-term query environments, we propose Multi-Fragmented Signature File (MFSF) method as an extension of P-BSSF. In MFSF, a signature file is divided into variable sized vertical fragments with different on-bit densities to optimize the response time using a similar query evaluation methodology. In query evaluation the query signature on-bits of the lower on-bit density fragments are used first. As the number of query terms increases, the number of query signature on-bits in the lower on-bit density fragments increases and the query stopping condition is reached in fewer bit slice evaluation steps. Therefore, in MFSF, the response time decreases for an increasing number of query terms. The analysis shows that, with no space overhead, MFSF outperforms the P-BSSF and generalized frame-sliced signature file organizations. Due to hashing and superimposition operations used in obtaining signatures, the signature of an irrelevant record may match the query signature, i.e., it is possible to have false drops. In signature file processing the accurate estimation of the number of false drops is essential to obtain system parameters that will yield a desirable response time. We propose a more accurate false drop estimation method for databases with varying record lengths instead of using an average number of distinct terms per record. In this method, the records of a database are conceptually partitioned according to the number of distinct terms they contain and the number of false drops of each group is estimated separately. Experiments with real data show that depending on the space overhead, the proposed method obtains up to 33%, 25%, and 20% response time improvements for the sequential, generalized frame-sliced, and MFSF methods, respectively. For very large databases even one bit slice of MFSF may occupy several disk blocks. We propose the Compressed Multi-Fragmented Signature File (C-MFSF) method that stores the bit slices of MFSF in a compact form which provides a better response time. To minimize the number of disk accesses, the signature size and the disk block size can be adjusted such that most bit slices fit into a single disk block after compression. In such environments, C-MFSF evaluates the queries with more than two terms with only one disk access per query term rather than two disk accesses of the inverted file method which are respectively for the pointer of the query term posting list and the list itself. Our projection based on real data shows that for a database of one million records C-MFSF provides a response time of 0.85 seconds.
BU-CEIS-9605: PDF

Inductive Logic Program Synthesis with DIALOGS
Pierre Flener
DIALOGS (Dialog-based Inductive and Abductive LOGic program Synthesizer) is a schema-guided synthesizer of recursive logic programs; it takes the initiative and minimally queries a (possibly computationally naive) specifier for evidence in her/his conceptual language. The specifier must know the answers to such simple queries, because otherwise s/he wouldn't even feel the need for the synthesized program. DIALOGS can be used by any learner (including itself) that detects, or merely conjectures, the necessity of invention of a new predicate. Due to its foundation on a powerful codification of a "recursion-theory" (by means of the template and constraints of a divide-and-conquer schema), DIALOGS needs very little evidence and is very fast.
BU-CEIS-9606: PDF

Learning Translation Templates from Bilingual Texts
H. Altay Güvenir and İlyas Çiçekli
This paper proposes a mechanism for learning structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the different parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also differences are learned in the form of translation templates. The system is tested on a small training dataset and produced promising results for further investigation.
BU-EE-9606: PDF

Using Planes of Optoelectronic Devices Interconnected by Free-space Optics: Systems Issues and Application Opportunities
Haldun M. Özaktaş
Please note the numbering of the report above.
BU-CEIS-9607: PDF

Corpus-Based Learning of Generalized Parse Tree Rules for Translation
H. Altay Güvenir and Ayşegül Tunç
This paper proposes a learning mechanism to acquire structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the different parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also differences are learned in the form of rewrite rules. The system is tested on a small training dataset and produced promising results for further investigation.
BU-CEIS-9608: PDF

Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication
Ümit V. Çatalyürek and Cevdet Aykanat
In this work, we show the deficiencies of the graph model for decomposing sparse matrices for parallel matrix-vector multiplication. Then, we propose two hypergraph models which avoid all deficiencies of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem widely encountered in circuit partitioning in VLSI. We have implemented fast Kernighan-Lin based graph and hypergraph partitioning heuristics and used the successful multilevel graph partitioning tool (Metis) for the experimental evaluation of the validity of the proposed hypergraph models. We have also developed a multilevel hypergraph partitioning heuristic for experimenting the performance of the multilevel approach on hypergraph partitioning. Experimental results on sparse matrices, selected from Harwell-Boeing collection and NETLIB suite, confirm both the validity of our proposed hypergraph models and appropriateness of the multilevel approach to hypergraph partitioning.
BU-CEIS-9609: PDF

Two Novel Multiway Circuit Partitioning Algorithms Using Relaxed Locking
Ali Daşdan and Cevdet Aykanat
All the previous Kernighan-Lin based circuit partitioning algorithms employ the locking mechanism, which enforces each cell to be moved exactly once per pass. In this paper, we propose for multiway circuit partitioning two novel approaches to overcome this limitation. The proposed approaches are based on our claim that the performance of a partitioning algorithm gets better by allowing each cell to be moved more than once per pass. The first approach introduces the phase concept so that each pass can contain more than one phase, and each cell has a chance to be moved in all the passes. This approach uses the locking mechanism in a relaxed manner in that it does not allow a cell to be moved more than once in a phase. The second approach does not use the locking mechanism at all. It introduces the mobility concept, which allows a cell to be moved more than once in a pass. Each approach leads to a Kernighan-Lin based generic algorithm whose parameters can be set in such a way that better performance is obtained by spending more time. By setting these parameters, we generated three versions of each generic algorithm. The proposed algorithms were experimentally evaluated in comparison with Sanchis' multiway circuit partitioning algorithm (FMS algorithm) and the simulated annealing algorithm (SA algorithm) on benchmark circuits. The proposed algorithms outperform FMS algorithm significantly especially when the number of parts in the partition was large and and the circuit was sparse. Their performance is comparable to that of SA algorithm though the running time of SA algorithm is far larger than those of the proposed algorithms. We also performed some experiments on the parameters of the proposed algorithms, and provided guidelines for good parameter settings. Besides, we gave a new formulation of multiway circuit partitioning that combined those of the previous approaches, and presented detailed algorithms and their time and space complexity analysis. We observed that experimental results supported our claim. The proposed approaches are effective and efficient, and are applicable to almost all of the previous Kernighan-Lin based algorithms.
BU-CEIS-9610: PDF

OBJECTIVE: A Benchmark for Object-Oriented Active Database Systems
Uğur Çetintemel, Jurgen Zimmerman, Özgür Ulusoy, Alejandro Buchmann
Although much work in the area of Active Database Management Systems (ADBMSs) has been done, it is not yet clear how the performance of an active DBMS can be evaluated systematically. In this paper, we describe the OBJECTIVE Benchmark for object-oriented ADBMSs, and present experimental results from its implementation in an active database system prototype. OBJECTIVE can be used to identify performance bottlenecks and active functionalities of an ADBMS, and compare the performance of multiple ADBMSs.
BU-CEIS-9611: PDF

OBJECTIVE: A Benchmark for Object-Oriented Active Database Systems (M. Sc. Thesis)
Uğur Çetintemel
Although much work in the area of Active Database Management Systems (ADBMSs) has been done, there have been only a few attempts to evaluate the performance of these systems, and it is not yet clear how the performance of an active DBMS can be evaluated systematically. In this thesis, we describe the OBJECTIVE Benchmark for object-oriented ADBMSs, and present experimental results from its implementation in an active database system prototype. OBJECTIVE can be used to identify performance bottlenecks and active functionalities of an ADBMS, and compare the performance of multiple ADBMSs. The philosophy of OBJECTIVE is to isolate components providing active functionalities, and concentrate only on the performance of these components while attempting to minimize the effects of other factors.
BU-CEIS-9612: PDF

Genetic Algorithms to Learn Feature Weights for the Nearest Neighbor Algorithm
Gülşen Demiröz and H. Altay Güvenir
In this paper we use genetic algorithms to learn feature weights for the Nearest Neighbor classification algorithm. We represent feature weights as real values in [0..1] and their sum is 1. A new crossover operation, called continuous uniform crossover, is introduced where the legality of chromosomes is preserved after the crossover operation. This new crossover operation is compared with three other crossover operations one-point crossover>, two-point crossover, and uniform crossover all of which require normalization after the crossover operation. Four genetic algorithms using each of these four crossover operations are implemented. Each genetic algorithm tries to learn feature weights to improve the classification accuracy of the Nearest Neighbor algorithm. Then the learned weights are assigned to features to be used in distance calculations of the Weighted Nearest Neighbor classification algorithm and the resulting classification accuracies in our datasets are compared.
BU-CEIS-9613: PDF

Decision Tree Induction Using Genetic Programming
Gökhan Tür and Halil Altay Güvenir
A decision tree induction algorithm using genetic programming (GP) is presented. The best decision tree is defined as the one which achieves maximum accuracy with minimum number of internal nodes. In this approach every individual is a decision tree candidate. The results are satisfactory in the sense that it can find the optimum solution, i.e. the best decision tree, for a small sized dataset. For the dataset related to American Congress, this algorithm can reach an accuracy of 97.3%. Solutions to unknown or noise data are also proposed in this framework.
BU-CEIS-9614: PDF

Design and Implementation of a Tactical Generator for Turkish, a Free Constituent Order Language (M.Sc. Thesis)
Dilek Zeynep Hakkani
This thesis describes a tactical generator for Turkish, a free constituent order language, in which the order of the constituents may change according to the information structure of the sentences to be generated. In the absence of any information regarding the information structure of a sentence (i.e., topic, focus, background, etc.), the constituents of the sentence obey a default order, but the order is almost freely changeable, depending on the constraints of the text flow or discourse. We have used a recursively structured finite state machine for handling the changes in constituent order, implemented as a right-linear grammar backbone. Our implementation environment is the GenKit system, developed at Carnegie Mellon University--Center for Machine Translation. Morphological realization has been implemented using an external morphological analysis/generation component which performs concrete morpheme selection and handles morphographemic processes.
BU-CEIS-9615: PDF

Using Multiple Sources of Information for Constraint-based Morphological Disambiguation (M.Sc. Thesis)
Gökhan Tür
This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena. For morphologically complex languages like Turkish, automatic morphological disambiguation involves selecting for each token morphological parse(s), with the right set of inflectional and derivational markers. Our system combines corpus independent hand-crafted constraint rules, constraint rules that are learned via unsupervised learning from a training corpus, and additional statistical information obtained from the corpus to be morphologically disambiguated. The hand-crafted rules are linguistically motivated and tuned to improve precision without sacrificing recall. In certain respects, our approach has been motivated by Brill's recent work, but with the observation that his transformational approach is not directly applicable to languages like Turkish. Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant inflectional and derivational information from a lexical item whose root is unknown. With this approach, well below 1% of the tokens remains as unknown in the texts we have experimented with. Our results indicate that by combining these hand-crafted, statistical and learned information sources, we can attain a recall of 96 to 97% with a corresponding precision of 93 to 94%, and ambiguity of 1.02 to 1.03 parses per token.
BU-CEIS-9616: PDF

Weighting Features in k Nearest Neighbor Classification on Feature Projections
Aynur Akkuş and H. Altay Güvenir
This paper proposes two methods for learning feature weights to improve the classification accuracy of the k-NNFP algorithm. In the k-NNFP algorithm, instances are stored as their projections on each feature dimension. The classification of unseen examples are made on the basis of feature projections by a majority voting among the k (k <= 1) predictions of each feature separately. We have treated all features as equivalent in this algorithm. However, all features may not have equal relevancy, even some features may be completely irrelevant. In order to determine features' relevances, the best method is to assign them weights. The first method is based on the assumption of homogeneous feature projections for which the number of consequent values of feature projections of a same class supports an evidence for increasing the probability of correct classification in the k-NNFP algorithm. The second method is based on the individual accuracies of features. In this approach, the k-NNFP algorithm is run on the basis of a single feature, once for each feature. The resulting accuracy is taken as the weight of that feature since it is a measure of contribution to classification for that feature. Empirical evaluation of these feature weighting methods in the k-NNFP algorithm on real world datasets is given.
BU-CEIS-9617: PDF

Generation of Turkish Verbal Groups with Systemic-Functional Grammar
Turgay Korkmaz and İlyas Çiçekli
This paper mainly presents the verbal group part of a Turkish sentence generator that is currently under development for producing the actual text from its semantic description. To concentrate on the text generation rather than text planning, we assume that the semantic description of the text is produced in some way, currently given by hand. In the generation, we need a linguistic theory to describe the linguistic resources, and also a software tool to perform them in a computational environment. We use a functional linguistic theory called Systemic--Functional Grammar (SFG) to represent the linguistic resources, and FUF text generation system as a software tool to perform them. In this paper, we present the systemic--functional representation and realization of Turkish verbal groups.
BU-CEIS-9618: PDF

Turkish Text Generation With Systemic-Functional Grammar (M.Sc. Thesis)
Turgay Korkmaz
Natural Language Generation (NLG) is roughly decomposed into two stages: text planning, and text generation. In the text planning stage, the semantic description of the text is produced from the conceptual inputs. Then, the text generation system transforms this semantic description into an actual text. This thesis focuses on the design and implementation of a Turkish text generation system rather than text planning. To develop a text generator, we need a linguistic theory that describes the resources of the desired natural language, and also a software tool that represents and performs these linguistic resources in a computational environment. In this thesis, in order to carry out the mentioned requirements, we have used a functional linguistic theory called Systemic--Functional Grammar (SFG), and the FUF text generation system as a software tool. The ultimate text generation system takes the semantic description of the text sentence by sentence, and then produces a morphological description for each lexical constituent of the sentence. The morphological descriptions are worded by a Turkish morphological generator. Because of our concentration on the text generation, we have not considered the details of the text planning. Hence, we assume that the semantic description of the text is produced and lexicalized by an application (currently given by hand).
BU-CEIS-9619: PDF

Learning Translation Rules From A Bilingual Corpus
İlyas Çiçekli and H. Altay Güvenir
This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the different parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also differences are learned in the form of translation rules. The system is tested on a small training dataset and produced promising results for further investigation.
BU-CEIS-9620: PDF

Nonsymmetric Skyline versus Compressed Sparse Row Format in Direct Methods for Markov Chains
Tuğrul Dayar
This paper investigates the implementation of Gaussian Elimination (GE) and Grassmann-Taksar-Heyman (GTH) algorithms for Markov chains in nonsymmetric skyline (NSK) and compressed sparse row (CSR) formats. Numerical experiments are carried out using three real life applications. Implementations that allocate space for both lower- and upper-triangular factors result in faster GTH solvers. Specifically, the CSR implementation of the GTH algorithm for the nontransposed linear system of equations gives the smallest run-time among different GTH implementations considered. For GE, the experiments suggest the CSR implementation for the transposed system of equations as the appropriate choice.
BU-CEIS-9621: PDF

Animating Facial Images with Drawings (M.Sc. Thesis)
Gamze Dilek Tunalı
The work presented here describes the power of 2D animation with texture mapping controlled by line drawings. Animation is specifically intended for facial animation and not restricted by the human face. We initially have a sequence of facial images which are taken from a video sequence of the same face and an image of another face to be animated. The aim is to animate the face image with the same expressions as those of the given video sequence. To realize the animation, a set of frames are taken from a video sequence. Key features of the first frame are rotoscoped and the other frames are automatically rotoscoped using the first frame. Similarly, the corresponding features of the image which will be animated are rotoscoped. The key features of the first frame of the sequence and the image to be animated are mapped and using cross-synthesis procedure, other drawings for the given image are produced. Using these animated line drawings and the original image, the corresponding frame sequence is produced by image warping. The resulting sequence has the same expressions as those of the video sequence. This work encourages the reuse of animated motion by gathering facial motion sequences into a database. Furthermore, by using motion sequences of a human face, non-human characters can be animated realistically or complex characters can be animated by the help of motion sequences of simpler characters.
BU-CEIS-9622: PDF

Learning Translation Templates from Bilingual Texts
H. Altay Güvenir and İlyas Çiçekli
This paper proposes a mechanism for learning structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the different parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also differences are learned in the form of translation templates. The system is tested on a small training dataset and produced promising results for further investigation.
BU-CEIS-9623: PDF

Batch Learning of Disjoint Feature Intervals (M.Sc. Thesis)
Aynur Akkuş
This thesis presents several learning algorithms for multi-concept descriptions in the form of disjoint feature intervals, called Feature Interval Learning algorithms (FIL). These algorithms are batch supervised inductive learning algorithms, and use feature projections of the training instances for the representation of the classification knowledge induced. These projections can be generalized into disjoint feature intervals. Therefore, the concept description learned is a set of disjoint intervals separately for each feature. The classification of an unseen instance is based on the weighted majority voting among the local predictions of features. In order to handle noisy instances, several extensions are developed by placing weights to intervals rather than features. Empirical evaluation of the FIL algorithms is presented and compared with some other similar classification algorithms. Although the FIL algorithms achieve comparable accuracies with other algorithms, their average running times are much more less than the others.
This thesis also presents a new adaptation of the well-known k-NN classification algorithm to the feature projections approach, called k-NNFP for k-Nearest Neighbor on Feature Projections, based on a majority voting on individual classifications made by the projections of the training set on each feature and compares with the k-NN algorithm on some real-world and artificial datasets.
BU-CEIS-9624: PDF

Morphological Disambiguation by Voting Constraints
Kemal Oflazer and Gökhan Tür
We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the very end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the rule developer from worrying about potentially conflicting rule sequencing found in other systems. The vote of each rule is determined by its complexity measured with the kind and number of features used in the rule. We have applied our approach in a system for morphological disambiguation of Turkish, a language with complex agglutinative word structures, displaying rather different types of morphological ambiguity not found in languages like English. Our results indicate that using about 500 constraint rules and some additional simple statistics, we can attain a recall of 95-96% and a precision of 94-95% with about 1.01 parses per token. Our system is implemented in Prolog and we are currently investigating an efficient implementation based on discrimination networks used in AI production systems
BU-CEIS-9625: PDF

Fast Information Retrieval Using Compressed. Multi-Fragmented Signature Files
Seyit Koçberber and Fazlı Can
A new file method, called Compressed Multi-Fragmented Signature File (C-MFSF), that uses a partial query evaluation strategy with compressed signature bit slices is presented. The experiments show that for queries with more than one term, C-MFSF obtains the query results with fewer disk accesses than the inverted file approach. For single term queries, which is rare for very large databases, it requires more disk accesses than the inverted file method while its response time is still satisfactory. The number of disk accesses is the same as the number of query terms for queries with more than two terms. The experiments with a real database of 152,850 records validate the simulation results and are used to project the response time for very large databases. For a database of one million records, the projected response times for single term and multi-term queries are less than 2 and 1.14 seconds, respectively.
BU-CEIS-9626: PDF

Learning Translation Templates from Examples
H. Altay Güvenir and İlyas Çiçekli
This paper proposes a mechanism for learning lexical level correspondences between two languages from a set of translated sentence pairs. The proposed mechanism is based on an analogical reasoning between two translation examples. Given two translation examples, the similar parts of the sentences in the source language must correspond the similar parts of the sentences in the target language. Similarly, the different parts should correspond to the respective parts in the translated sentences. The correspondences between the similarities, and also differences are learned in the form of translation templates. The approach has been implemented and tested on a small training dataset and produced promising results for further investigation.