Bilkent University
Department of Computer Engineering
M.S.THESİS PRESENTATİON

 

AN ENSEMBLE CLASSIFICATION MODEL FOR DETECTING VOICE PHISHING IN TELECOMMUNICATION NETWORKS AND ITS INTEGRATION INTO A VISUAL ANALYSIS TOOL

 

Hüseyin Eren Çalık
Master Student
(Supervisor: Prof. Dr.Uğur Doğrusöz)
Computer Engineering Department
Bilkent University

Abstract: Voice phishing, a method of social engineering fraud performed over phone calls, has been a major problem globally since the use of phones became widespread. Traditional and modern methods to detect these fraud schemes include visual analysis of the customers' behaviour, rule-based systems and machine learning models such as clustering, decision trees, shallow classifiers and deep learning models. Visual analysis depends only on human expertise and requires very high labor force to be effective. Rule-based systems are useful for extreme cases but are vulnerable to concept drifts. The-state-of-the-art generally utilize machine learning approaches. However, they require one or more of feature engineering done by experts, high computational power and privacy infringements. Therefore, in collaboration with Turkcell Technology, we aimed to develop a system that benefits from the advantages of the traditional methods while exploiting the effectiveness and efficiency of the state-of-the-art ones to tackle this issue. In doing so, we integrated an ensemble learning model to an existing visualization tool for detecting fraud users. This tool visualizes relational data as knowledge graphs, shows the informational data as texts and statistical data with charts and texts. Our ensemble learning model has two deep neural networks and one decision tree classifier. Multiple neural networks are used to reduce the variance and make a more stable model. One of them is composed of an input layer, two hidden layers with 200 nodes using Rectified Linear Unit (ReLU) activation function, each followed by a dropout layer and an output layer of one node with sigmoid activation function. We used dropout layers in this network to prevent over-fitting. The second neural network we built has 3 hidden layers instead with node numbers 64, 64 and 32, respectively, with ReLU as their activation function. To feed these models, a total of 34 features, 20 of which are raw, have been engineered with Turkcell fraud experts. The aggregation of the outputs is done by taking their average. We measured the success of our model by calculating the F1 Score as the class imbalance is high. Our model's F1 score is 0.82 with a precision of 0.82 and a recall of 0.83. Also, with the integration of our model into this visualization tool, a framework was formed allowing mobile network operators to examine and detect fraud cases more efficiently and act accordingly.

 

DATE: 2 September 2022, Friday @ 10:00 Zoom