Bilkent University
Department of Computer Engineering


Security and Privacy in the Age of Big Data: The Case of Genomics


Dr. Erman Ayday

In this talk, I will mainly focus on the research I have been actively carrying for the last 3 years: security and privacy of genomic data. However, techniques that will be presented in this talk can also be applied to other domains such as online social networks, banks, hospitals, military, mobile devices, cyber-physical systems, and sensor data.

Genomics is becoming the next significant challenge for privacy. The price of a complete genome profile has plummeted below $100 for genome-wide genotyping, which is offered by a number of companies. This low cost of DNA sequencing will break the physician/patient connection and it can open the door to all kinds of abuse, not yet fully understood. Access to genomic data prompts some important privacy concerns: (i) DNA reflects information about genetic conditions and predispositions to specific diseases such as Alzheimers, cancer, or schizophrenia, (ii) DNA contains information about ancestors, and progeny, (iii) DNA (almost) does not change over time, hence revoking or replacing it is impossible, and (iv) DNA analysis is already being used both in law enforcement and health-care, thus prompting numerous ethical issues. Such issues could lead to abuse, threats, and genetic discrimination. As pointed out by author Rebecca Skloot, the view we have today of genomes is like a world map, but Google Street View is coming very soon. This growing precision can be highly beneficial in terms of personalized medicine, but it can have devastating consequences on individuals peace of mind.

In this talk, after discussing the threats on genomic privacy, I will first focus on inference attacks and quantification of kin genomic privacy, using information theoretical tools. First, I will show how vulnerable the genomic privacy of individuals is due to genomic data shared by their relatives, and data available on online social networks. That is, I will show how genomic data of family members can be efficiently inferred using data publicly shared by other relatives and background knowledge on genomics. For this, we propose an algorithm to model such an attack using (i) available genomic data of a subset of family members, (ii) statistical relationships (correlations) between the nucleotides on the DNA, and (iii) publicly known genomic background. For the efficiency of such an algorithm, we represent this attack as an inference problem (to infer the unknown nucleotides of the family members from the available data). We model the familial relationships, nucleotides on the DNA, and the correlations between the nucleotides on a factor graph, and we use the belief propagation algorithm to efficiently infer the unknown nucleotides on the factor graph via message passing. Then, I will show how this attack threatens the real users who share genomic data on the Internet.

In the remaining of the talk, I will introduce a new protection mechanism, GeneVault, based on a newly proposed cryptographic primitive called honey-encryption. Considering the high sensitivity and longevity of genomic data, GeneVault is able to provide security against brute-force attacks (by attackers with unlimited computational power). To encrypt a genome, we propose a tree-based generative model based on public genomic statistics. To retrieve the genomic sequence of a patient, a client (the patient or his doctor) has to provide a password which has the ability to reconstruct a sequence from the ciphertext. Providing a wrong password yields a plausible but incorrect sequence, which makes it hard for an adversary to decide whether he has used a correct password or not.


Erman Ayday is a Post-Doctoral Researcher at Ecole Polytechnique Fdrale de Lausanne (EPFL), Switzerland, in the Laboratory for Communications and Applications 1 (LCA1) led by Prof. Jean-Pierre Hubaux. He received his M.S. and Ph.D. degrees from Georgia Tech Information Processing, Communications and Security Research Lab (IPCAS) in the School of Electrical and Computer Engineering (ECE), Georgia Institute of Technology, Atlanta, GA, in 2007 and 2011, respectively under the supervision of Dr. Faramarz Fekri. He received his B.S. degree in Electrical and Electronics Engineering from the Middle East Technical University, Ankara, Turkey, in 2005.

Erman's research interests include privacy enhancing technologies, big data security and privacy, big data analytics, trust and reputation management, wireless network security, and recommender systems. Erman Ayday is the recipient of 2010 Outstanding Research Award from the Center of Signal and Image Processing (CSIP) at Georgia Tech and 2011 ECE Graduate Research Assistant (GRA) Excellence Award from Georgia Tech. He is a member of the IEEE and the ACM.


DATE: 20 October, 2014, Monday @ 14:40