Publications & Reports - Document Abstract
Zdenek Kouba, Ondrej Tománek, Lukáš Kencl
Evaluation of Datacenter Network Topology Influence on Hadoop MapReduce Performance
Cloud Networking (Cloudnet), 2016 5th IEEE International Conference on
Hadoop MapReduce has nowadays become the de-facto standard for the Big-Data processing within Cloud datacenters. However, little is known about the influence of datacenter network topology on Hadoop performance, and suitability of various topologies for different workload distributions. By extending a publicly available simulator CloudSim, we simulate six well-known or recently proposed topologies (Hierarchical, FatTree, DCell, CamCube, BCube, MapReduce) and evaluate Hadoop MapReduce performance across varyingly distributed workloads. We conclude that while no topology is clearly optimal, the experimental CamCube topology exhibits the most promising results. However, different topologies correspond to different workload divisions in terms of best performance, with greatly differing results, and generally weak performance under highly skewed workloads. This finding could lead to significant Hadoop MapReduce performance improvements by adjusting or selecting appropriate datacenter network topologies - potentially even at runtime, using Software Defined Networking.