Bilkent University
Department of Computer Engineering


Stale Data Problem in Parallel Stochastic Gradient Descent Algorithms


Orhun Çağlayan
MS Student
Computer Engineering Department
Bilkent University

Stochastic Gradient Descent (SGD) is a widely used algorithm for matrix problems such as matrix factorization. Using SGD on very large matrices requires high computational power, therefore parallelization of SGD is a popular topic in parallel computing area. The need for synchronization after every iteration is the drawback of iterative methods in parallel such as SGD. If the data is not synchronized after every iteration, processors run the algorithm on stale(old) data. On the other hand, iterative algorithms like SGD can tolerate staleness, they can converge even when the data is stale. There are several methods in the literature that exploits data staleness to improve speedup of parallel execution. Methods like Bulk Synchronous Parallel and Stale Synchronous Parallel relies on the fact that iterative algorithms converge even with stale data and attempt to decrease communication and synchronisation costs. In this study, we aim to compare different Stale Data Exploitation methods in order to find out which ones can be used in our research.


DATE: 25 November 2019, Monday @ 15:40