Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes

Authors

DOI:

https://doi.org/10.4108/eai.10-4-2018.154455

Keywords:

Apache ZooKeeper, Atomic Broadcast, Crash-Tolerance, Server Replication, Protocol Latency, Throughput, PerformanceEvaluation

Abstract

Operating at the core of the highly-available ZooKeeper system is the ZooKeeper atomic broadcast (Zab) for imposing a total order on service requests that seek to modify the replicated system state. Zab is designed with the weakest assumptions possible under crash-recovery fault model; e.g., any number - even all - of servers can crash simultaneously and the system will continue or resume its service provisioning when a server quorum remains or resumes to be operative. Our aim is to explore ways of improving Zab performance without modifying its easy-to-implement structure. To this end, we first assume that server crashes are independent and a server quorum remains operative at all time. Under these restrictive, yet practical, assumptions, we propose three variations of Zab and do performance comparison. The first variation orders excellent performance but can be only used for 3-server systems; the other two do not have this limitation. One of them reduces the leader overhead further by conditioning the sending of acknowledgements on the outcomes of coin tosses. Owing to its superb performance, it is re-designed to operate under the least-restricted Zab fault assumptions. Further performance comparisons confirm the potential of coin-tossing in ordering performances better than Zab, particularly at high workloads.

Downloads

Download data is not yet available.

Downloads

Published

10-04-2018

How to Cite

1.
EL-Sanosi I, Ezhilchelvan P. Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes. EAI Endorsed Trans Energy Web [Internet]. 2018 Apr. 10 [cited 2025 Jan. 19];5(17):e11. Available from: https://publications.eai.eu/index.php/ew/article/view/1006