Bilkent University
Department of Computer Engineering


A Multi-Modal Approach for Depression Detection Using Audio Recordings and Transcriptions from Clinical Interviews


Kaan Gönç
Master Student
(Supervisor: Asst.Prof.Dr.Hamdi Dibeklioğlu)
Computer Engineering Department
Bilkent University

Abstract: On behalf of the high prevalence of depression in society, the demand for automated depression detection technologies has increased. However, accessibility to the visual recordings of the interviewed individuals is mostly restricted due to privacy concerns. Therefore, most of the approaches in the literature that uses visual or audiovisual cues are not practical in real-life scenarios. To overcome such privacy issues, we propose a multi-modal architecture depending on only audio recordings and their corresponding transcriptions. The proposed approach is a pipeline architecture that consists of Automatic Speech Recognition (ASR), Speaker Diarization, text and audio embedding, attention-based fusion, and regression modules. We assess the performance of our approach on predicting the depression severity using the eight-item Patient Health Questionaire (PHQ-8) scores. Experiments show that the proposed approach can obtain remarkable results on the depression detection task.


DATE: 28 March, Monday @ 16:40