Investigating the Straggler Problem in Parameter Server on Iterative Convergent Distributed Machine Learning: An Empirical Study (arXiv:2308.15482v1 [cs.DC])

The aim of this study is to evaluate the effectiveness of existing techniques for reducing straggler delays in important iterative convergent machine learning algorithms such as Matrix Factorization (MF), Multinomial Logistic Regression (MLR), and Latent Dirichlet Allocation (LDA). The experiment was conducted using the FlexPS system, which is a state-of-the-art implementation that utilizes a parameter server architecture. The experiment utilized the Bulk Synchronous Parallel (BSP) computational model to investigate the straggler problem in Parameter Server on Iterative Convergent Distributed Machine Learning. Additionally, the study analyzes the experimental setup of the parameter server approach in relation to parallel learning problems by introducing universal straggler patterns and implementing the latest mitigation techniques. The findings of this research are significant as they provide a foundation for further investigation into the problem and enable researchers to compare different methods for various applications. Ultimately, the results are expected to facilitate the development of new techniques and perspectives to effectively address this problem.