Department of Computer Engineering
CS 590 SEMINAR
High Throughput UDP-based P2P Data Transfer
F. Tuğba Doğan
Computer Engineering Department
High throughput DNA sequencing technologies (HTS) now enable researchers to answer a wide range of biological questions, however they also impose various computational problems. The rapid progress in genome sequencing technologies leads to availability of high amounts of genomic data. The sequence data from the genome of one human individual sequenced at high depth (30-fold) using the Illumina platform totals to 480 GB in FASTQ format, and approximately 110 GB in BAM format. In 2015, more than 150 TB of compressed data for 250 individuals from 26 populations was produced for 1000 Genomes Project. A public repository for sequencing data (Sequence Read Archive (SRA)) exceeded 1 PB by the end of 2013. One of the most urgent issues to address is data sharing among collaborators located in different geographical locations, due to the very large amounts of data. There is a need for a user-friendly, peer-to-peer (P2P), open source, secure, and fast file sharing system that would enable researchers share unpublished data with their collaborators. We developed a new cross-platform desktop application (BioPeer) to address this problem, which is a hybrid of various data transfer approaches. Briefly, BioPeer uses the UDP-based open source UDT protocol for data transfer, and provides a P2P file sharing architecture similar to that of BitTorrent, where large Files are transferred in chunks, and synchronized between peers (i.e. collaborators) within the same project. Different from other P2P platforms, BioPeer also includes user authentication through the ORCID database (http://www.orcid. org) to protect data privacy. In addition, files are encrypted using the 128-bit AES (Advance Encryption Standard) key. RSA cryptography is used to exchange this encryption key.
DATE: 27 March, 2017, Monday @ 16:05