|
›› 2015,Vol. 30 ›› Issue (1): 20-29.doi: 10.1007/s11390-015-1501-x
所属专题: Computer Architecture and Systems; Artificial Intelligence and Pattern Recognition; Data Management and Data Mining
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Rong Chen(陈榕), Member, CCF, ACM, IEEE, Jia-Xin Shi(施佳鑫), Hai-Bo Chen(陈海波), Senior Member, CCF, Member, ACM, IEEE, Bin-Yu Zang(臧斌宇), Senior Member, CCF, Member, ACM, IEEE
Rong Chen(陈榕), Member, CCF, ACM, IEEE, Jia-Xin Shi(施佳鑫), Hai-Bo Chen(陈海波), Senior Member, CCF, Member, ACM, IEEE, Bin-Yu Zang(臧斌宇), Senior Member, CCF, Member, ACM, IEEE
众多机器学习和数据挖掘问题,如推荐系统、主题建模和医学诊断等,能够被抽象成基于二分图的计算.然而,大部分分布式图并行计算系统均忽视了二分图的独有特征,并且现有的在线图划分算法也普遍会造成大量冗余的顶点复制和严重的网络通信负担.本文针对在分布式机器学习和数据挖掘应用中对二分图进行划分存在的机遇和挑战进行了深入分析,并在此基础上提出一组面向二分图的划分算法(BiGraph).BiGraph利用二分图顶点分布特征和两组顶点间计算和数据负载的不平衡特性构建了最优的图划分算法,能够尽可能减少图顶点的复制数和网络通信开销.在PowerGraph系统上的实现和测试表明,BiGraph通过减少高达80%的顶点复制和96%的网络开销,将四个典型机器学习和数据挖掘算法的性能提升1.16倍至17.75倍.
[1] Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.135-146.[2] Dhillon I S. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2001, pp.269-274.[3] Zha H, He X, Ding C, Simon H, Gu M. Bipartite graph partitioning and data clustering. In Proc. the 10th International Conference on Information and Knowledge Management, August 2001, pp.25-32.[4] Gao B, Liu T Y, Zheng X, Cheng Q S, Ma W Y. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 2005, pp.41-50.[5] Gao B, Liu T Y, Feng G, Qin T, Cheng Q S, Ma W Y. Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(9): 1263-1273.[6] Chen R, Shi J, Chen Y, Guan H, Zang B, Chen H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. Technical Report, IPADSTR-2013-001, Shanghai Jiao Tong University, 2013.[7] Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727.[8] Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symp. Operating Systems Design and Implementation, October 2012, pp.17-30.[9] Jain N, Liao G, Willke T L. Graphbuilder: Scalable graph ETL framework. In Proc. the 1st International Workshop on Graph Data Management Experiences and Systems, June 2013, Article No.4.[10] Chen R, Shi J, Zang B, Guan H. Bipartite-oriented distributed graph partitioning for big learning. In Proc. the 5th Asia-Paci c Workshop on Systems, June 2014, pp.14:1-14:7.[11] Chen R, Ding X, Wang P, Chen H, Zang B, Guan H. Computation and communication efficient graph processing with distributed immutable view. In Proc. the 23rd International Symposium on High-Performance Parallel and Distributed Computing, June 2014, pp.215-226.[12] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30(1): 107-117.[13] Schloegel K, Karypis G, Kumar V. Parallel multilevel algorithms for multi-constraint graph partitioning. In Proc. the 6th Int. Euro-Par Conf. Parallel Processing, August 2000, pp.296-310.[14] Ng A Y, Jordan M I,Weiss Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, Dietterich T G, Becker S, Ghahramani Z (eds), MIT Press, 2002, pp.849-856.[15] LÜcking T, Monien B, Elsässer R. New spectral bounds on k-partitioning of graphs. In Proc. the 13th Annual ACM Symposium on Parallel Algorithms and Architectures, July 2001, pp.255-262.[16] Stanton I, Kliot G. Streaming graph partitioning for large distributed graphs. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2012, pp.1222-1230.[17] Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M. FENNEL: Streaming graph partitioning for massive scale graphs. In Proc. the 7th ACM International Conference on Web Search and Data Mining, February 2014, pp.333-342.[18] Abou-Rjeili A, Karypis G. Multilevel algorithms for partitioning power-law graphs. In Proc. the 20th International Parallel and Distributed Processing Symposium, April 2006, p.124.[19] Leskovec J, Lang K J, Dasgupta A, Mahoney M W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009, 6(1): 29-123.[20] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30-37.[21] Kumar A, Beutel A, Ho Q, Xing E P. Fugue: Slow-workeragnostic distributed learning for big models on big data. In Proc. the 17th International Conference on Arti cial Intelligence and Statistics, April 2014, pp.531-539. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |