Bilkent University
Department of Computer Engineering


A Bipartite Graph Model for Placement, Scheduling and Replication in Data Grids


Burcu Dal
MSc Student
Computer Engineering Department
Bilkent University

Data Grids provide geographically distributed resources for applications that generate and utilize large data sets. However, ensuring fast access to data and low turnaround time for scheduled jobs in data grids is hindered by various reasons. To address these issues, various replication management strategies have been introduced to offer high data availability, low bandwidth consumption and reduced turnaround time for grid systems by maintaining multiple copies of existing data at different locations. Replication strategies are broadly categorized as dynamic and static. In static replication strategies, replication decisions are generally based on a cost model that includes data access costs, bandwidth characteristics and storage constraints of the grid system. However, the problem of static data replication in data grids has proven to be NP-hard, making it difficult to solve. In this study, we propose a bipartite graph for modeling tasks and data in the grid system, and then we partition this graph to obtain a data placement and job scheduling strategy using the access logs. The obtained parts are further refined in order to be assigned to grid sites by using a KL-based heuristic that takes the bandwidth and hop information between sites into account. Replication is achieved by replicating a certain amount of most accessed files prior to the partitioning process.

Keywords: data grids, bipartite graph, data placement, job scheduling, data replication


DATE: 16 April, 2012, Monday @ 16:20