Bilkent University
Department of Computer Engineering


Privacy Preserving and Robust Watermarking on Sequential Genome Data using Belief Propagation and Local Differential Privacy


Abdullah Çağlar Öksüz
MS Student
(Supervisor: Prof. Dr. Uğur Güdükbay)(Co-Supervisor: Asst. Prof. Dr. Erman Ayday)
Computer Engineering Department
Bilkent University

Genome data is a subject of study for both biology and computer science since the start of Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. For research, these genome data can be shared on public websites or with service providers. However, this sharing process compromises the privacy of donors even under partial sharing conditions. In this work, we mainly focus on the liability aspect ensued by unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is watermarking mechanism. In order to detect malicious correspondents and service providers (SPs) -whose aim is to share genome data without individuals’ consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm.

In our method, we have three criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries can not temper the watermark by modification and are identified with high probability (ii) Achieving epsilon-local differential privacy in all data sharings with SPs and (iii) Preserving the utility by keeping the watermark length short and the watermarks non-conflicting. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Also, considering the fact that the attackers may know our optimality strategy in watermarking, we incorporate local differential privacy as plausible deniability factor that induces malicious inference strength. As opposed to traditional differential privacy-based data sharing schemes in which the noise is added based on summary statistic of the population data, noise is added in local setting based on local probabilities.


DATE: 14 August 2020, Friday @ 14:00