Abstract: Classical classification algorithms have been designed to maximize the number of correctly classified instances, given a set of unseen test cases. However, in domains such as medical diagnosis, evaluation of loan applications and performance evaluation of commercial firms for investment purposes, the profit of correct classification and the cost misclassification will be different for all classes. For example, misclassification of a healthy person as ill will have a different cost than misclassification of an ill person as normal. In the similar manner, the benefit from correct classification of an ill person will be higher than correct classification of a normal person. To the best of our knowledge, cost-sensitive classification techniques have been applied only on decision tree induction algorithms. Feature projections based representation techniques have been investigated in our previous research, and successfully applied to many classification algorithms. These algorithms have been tested in large number of real domains and very successful results were obtained. However, these algorithms were designed for domains where the cost of classification is the same for all different class pairs. The aim of this project is the further development of these feature projection based classification algorithms to work in domains where the cost of misclassification and benefit of correct classification are different for all class pairs. On the other hand, due to the increase in data mining research and applications, selection of interesting rules among the huge number of learned rules becomes important. One approach is the pruning of redundant rules, and labeling the remaining ones as interesting. One of the most important factors affecting the interestingness of a rule is its benefit in classification. In this project, other factors for the interestingness of a rule will be investigated and an algorithm that can sort the learned rules according to their interestingness will be developed. With these two components, a complete system for data mining applications is to be obtained. Performance of the algorithms will be analyzed, and factors and conditions for successful applications will be investigated.
Keywords: Machine Learning, Feature Projections, Benefit Maximization, Rule Interestingness
H. Altay Guvenir, Ph.D.
Investigator: Tolga Aydin, MSc.
Investigator: Nazli Ikizler, BSc.
Duration: March 2002 - September 2003.
Sponsor: Scientific and Technical Research Council of Turkey
Grant No: 101E044
Budget: 7.240.500.000 TL (USD 5,400 in March 2002).