The Cloud's Cloudy Moment: A Systematic Survey of Public Cloud Service Outage

Zheng Li, Mingfei Liang, Liam O'Brien, He Zhang

Inadequate service availability is the top concern when employing Cloud computing. It has been recognized that zero downtime is impossible for large-scale Internet services. By learning from the previous and others' mistakes, nevertheless, it is possible for Cloud vendors to minimize the risk of future downtime or at least keep the downtime short. To facilitate summarizing lessons for Cloud providers, we performed a systematic survey of public Cloud service outage events. This paper reports the result of this survey. In addition to a set of findings, our work generated a lessons framework by classifying the outage root causes. The framework can in turn be used to arrange outage lessons for reference by Cloud providers. By including potentially new root causes, this lessons framework will be smoothly expanded in our future work.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment