Seminar in Computer Engineering

Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION

OBJECT DETECTION AND SYNTHETIC INFRARED IMAGE GENERATION FOR UAV-BASED AERIAL IMAGES

Mehmet Akif Özkanoğlu

Master Student
(Supervisor: Prof.Dr. İbrahim Körpeoğlu)
Computer Engineering Department
Bilkent University

Abstract: Object detection has been a recent interest in both infrared and visible (RGB) aerial images. This thesis contains two main works related to the aerial image processing. In the first work, we present algorithms to detect objects in aerial images. To train recent deep learning based object detection algorithms, availability of annotations and images are essential. However, large datasets typically come in visible spectrum. Therefore, a domain transfer based approach is first presented in this thesis to artificially generate infrared equivalents of visible images. Such image pairs, then can be used to train object detection algorithms for either mode. In the second main work, we introduce a novel object detection algorithm based on CenterNet which was one of the state of the art algorithms used during the writing of this thesis. We show that our proposed approaches helps improving certain aspects of the learning process for detecting objects in aerial images. Utilizing both visible and infrared (IR) images in various deep learning based computer vision tasks has been a recent trend. Consequently, datasets having both visible and IR image pairs are desired in many applications. However, while large image datasets taken at the visible spectrum can be found in many domains, large IR-based datasets are not easily available in many domains. The lack of IR counterparts of the available visible image datasets limits existing deep algorithms to perform on IR images effectively. In this paper, to overcome with that challenge, we introduce a generative adversarial network (GAN) based solution and generate the IR equivalent of a given visible image by training our deep network to learn the relation between visible and IR modalities. In our proposed GAN architecture (InfraGAN), we introduce using structural similarity as an additional loss function. Furthermore, in our discriminator, we do not only consider the entire image being fake or real but also each pixel being fake or real. We evaluate our comparative results on three different datasets and report the state of the art results over five metrics when compared to Pix2Pix and ThermalGAN architectures from the literature. We report up to +16% better performance in Structural Similarity Index Measure (SSIM) over Pix2Pix and +8% better performance over ThermalGAN for VEDAI dataset. Further gains on different metrics and on different datasets are also reported in our experiments section.

DATE: September 6, Wednesday @ 13:30 Place: Zoom