Joint Latency and Cost Optimization for Erasure-coded Data Center Storage

Yu Xiang, Tian Lan, Vaneet Aggarwal, Yih-Farn R Chen

Modern distributed storage systems offer large capacity to satisfy the exponentially increasing need of storage space. They often use erasure codes to protect against disk and node failures to increase reliability, while trying to meet the latency requirements of the applications and clients. This paper provides an insightful upper bound on the average service delay of such erasure-coded storage with arbitrary service time distribution and consisting of multiple heterogeneous files. Not only does the result supersede known delay bounds that only work for a single file or homogeneous files, it also enables a novel problem of joint latency and storage cost minimization over three dimensions: selecting the erasure code, placement of encoded chunks, and optimizing scheduling policy. The problem is efficiently solved via the computation of a sequence of convex approximations with provable convergence. We further prototype our solution in an open-source, cloud storage deployment over three geographically distributed data centers. Experimental results validate our theoretical delay analysis and show significant latency reduction, providing valuable insights into the proposed latency-cost tradeoff in erasure-coded storage.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment