›› 2014, Vol. 29 ›› Issue (1): 38-52.doi: 10.1007/s11390-013-1410-9

• Computer Networks and Distributed Computing • Previous Articles     Next Articles

Improving Scalability of Cloud Monitoring Through PCA-Based Clustering of Virtual Machines

Claudia Canali, Member, IEEE, and Riccardo Lancellotti, Member, ACM, IEEE   

  1. Department of Information Engineering, University of Modena and Reggio Emilia, 41125 Modena, Italy
  • Received:2013-02-12 Revised:2013-06-13 Online:2014-01-05 Published:2014-01-05

Cloud computing has recently emerged as a leading paradigm to allow customers to run their applications in virtualized large-scale data centers. Existing solutions for monitoring and management of these infrastructures consider virtual machines (VMs) as independent entities with their own characteristics. However, these approaches suffer from scalability issues due to the increasing number of VMs in modern cloud data centers. We claim that scalability issues can be addressed by leveraging the similarity among VMs behavior in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. The innovative contribution of the proposed methodology is the use of the statistical technique known as principal component analysis (PCA) to automatically select the most relevant information to cluster similar VMs. We apply the methodology to two case studies, a virtualized testbed and a real enterprise data center. In both case studies, the automatic data selection based on PCA allows us to achieve high performance, with a percentage of correctly clustered VMs between 80% and 100% even for short time series (1 day) of monitored data. Furthermore, we estimate the potential reduction in the amount of collected data to demonstrate how our proposal may address the scalability issues related to monitoring and management in cloud computing data centers.

[1] Singh R, Shenoy P J, Natu M, Sadaphal V P, Vin H M. Predico: A system for what-if analysis in complex data center applications. In Proc. the 12th International Middleware Conference, Dec. 2011, pp.123-142.

[2] Wood T, Shenoy P, Venkataramani A, Yousif M. Black-box and gray-box strategies for virtual machine migration. In Proc. the 4th USENIX Conference on Networked Systems Design and Implementation, Apr. 2007, pp.229-242.

[3] Andreolini M, Colajanni M, Tosi S. A software architecture for the analysis of large sets of data streams in cloud infrastructures. In Proc. the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Aug. 31-Sept. 2, 2011, pp.389-394.

[4] Ardagna D, Panicucci B, Trubian M, Zhang L. Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Transactions on Services Computing, 2012, 5(1): 2-19.

[5] Beloglazov A, Buyya R. Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In Proc. the 8th Int. Workshop on Middlewave for Grids, Clouds and e-Science, Dec. 2010, Article No.4.

[6] Gmach D, Rolia J, Cherkasova L, Kemper A. Resource pool management: Reactive versus proactive or let's be friends. Computer Networks, 2009, 53(17): 2905-2922.

[7] Lancellotti R, Andreolini M, Canali C, Colajanni M. Dynamic request management algorithms for Web-based services in cloud computing. In Proc. the 35th IEEE Computer Software and Applications Conference, Jul. 2011, pp.401-406.

[8] Tang C, Steinder M, Spreitzer M, Pacifici G. A scalable application placement controller for enterprise data centers. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.331-340.

[9] Durkee D. Why cloud computing will never be free. Queue, 2010, 8(4): 20:20-20:29.

[10] Canali C, Lancellotti R. Automated clustering of virtual machines based on correlation of resource usage. Communications Software and Systems, 2012, 8(4): 102-109.

[11] Canali C, Lancellotti R. Automated clustering of VMs for scalable cloud monitoring and management. In Proc. the 20th International Conference on Software, Telecommunications and Computer Networks, Sept. 2012, pp.1-5.

[12] Gong Z, Gu X. PAC: Pattern-driven application consolidation for efficient cloud computing. In Proc. the IEEE Int. Symp. Modeling, Analysis & Simulation of Computer and Telecommunication Systems, Aug. 2010, pp.24-33.

[13] Setzer T, Stage A. Decision support for virtual machine reassignments in enterprise data centers. In Proc. the IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS), Apr. 2010, pp.88-94.

[14] Castro M, Liskov B. Practical Byzantine fault tolerance. In Proc. the 3rd Symposium on Operating Systems Design and Implementation, Feb. 1999, pp.173-186.

[15] Cecchet E, Chanda A, Elnikety S, Marguerite J, Zwaenepoel W. Performance comparison of middleware architectures for generating dynamic Web content. In Proc. the 4th International Middleware Conference, Jun. 2003, pp.242-261.

[16] Kavalanekar S, Narayanan D, Sankar S, Thereska E, Vaid K, Worthington B. Measuring database performance in online services: A trace-based approach. In Lecture Notes in Computer Science 5895, Nambiar R, Poess M (eds.), Berlin, Heidelberg: Springer-Verlag, 2009, pp.132-145.

[17] de Menezes M A, Barabási A L. Separating internal and external dynamics of complex systems. Physical Review Letters, 2004, 93(6).

[18] Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Networks, 2000, 13(4/5): 411-430.

[19] Greenacre M. Correspondence Analysis in Practice. Chapman and Hall/CRC, 2007.

[20] Mardia K V, Kent J T, Bibby J M. Multivariate Analysis (Probability and Mathematical Statistics). Academic Press, 1995.

[21] Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459.

[22] Jain A K. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 2010, 31(8): 651-666.

[23] Filippone M, Camastra F, Masulli F, Rovetta S. A survey of kernel and spectral methods for clustering. Pattern Recognition, 2008, 41(1): 176-190.

[24] Andreolini M, Colajanni M, Pietri M. A scalable architecture for real-time monitoring of large information systems. In Proc. the 2nd IEEE Symposium on Network Cloud Computing and Applications, Dec. 2012, pp.143-150.

[25] Dinda P A, O'Hallaron D R. Host load prediction using linear models. Cluster Computing, 2000, 3(4): 265-280.

[26] Vogels W. Beyond server consolidation. ACM Queue, 2008, 6(1): 20-26.

[27] AmigÇ E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Journal of Information Retrieval, 2009, 12(4): 461486.

[28] Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008.

[29] Kusic D, Kephart J O, Hanson J E, Kandasamy N, Jiang G. Power and performance management of virtualized computing environment via lookahead. Cluster Computing, 2009, 12(1): 1-15.

[30] ChungWC, Chang R S. A new mechanism for resource monitoring in Grid computing. Future Generation Computer Systems, 2009, 25(1): 1-7.

[31] Naeem A N, Ramadass S, Yong C. Controlling scale sensor networks data quality in the Ganglia grid monitoring tool. Communication and Computer, 2010, 7(11): 18-26.

[32] Tu C Y, Kuo W C, Teng W H, Wang Y T, Shiau S. A poweraware cloud architecture with smart metering. In Proc. the 39th International Conference on Parallel Processing Workshops, Sept. 2010, pp.497-503.
No related articles found!
Full text



[1] Yao Rong; Kang Tai; Chen Tinghuai;. Algorithms for the Determination of Cutsets in a Hypergraph[J]. , 1990, 5(1): 41 -46 .
[2] Han Jianchao; Shi Zhongzhi;. Formalizing Default Reasoning[J]. , 1990, 5(4): 374 -378 .
[3] Cai Shijie; Zhang Fuyan;. A Fast Algorithm for Polygon Operations[J]. , 1991, 6(1): 91 -96 .
[4] Fei Xianglin; Liao Lei; Wang Hezhen; Wang Chengzao;. Structured Development Environment Based on the Object-Oriented Concepts[J]. , 1992, 7(3): 193 -201 .
[5] Zhou Yong; Tang Zesheng;. Constructing Isosurfaces from 3D Data Sets Taking Account of Depth Sorting of Polyhedra[J]. , 1994, 9(2): 117 -127 .
[6] Liu Jian; Chen Zhiming; Zhong Yanru; Du Zhong;. Compact DC-DC Converter for Pocket Micro-Controller Systems[J]. , 1996, 11(6): 607 -614 .
[7] Dun-Ren Che. Accomplishing Deterministic XML Query Optimization[J]. , 2005, 20(3): 357 -366 .
[8] Li-Na Ni, Jin-Quan Zhang, Chun-Gang Yan, and Chang-Jun Jiang. A Heuristic Algorithm for Task Scheduling Based on Mean Load on Grid[J]. , 2006, 21(4): 559 -564 .
[9] Ying Xu. Computational Challenges in Deciphering Genomic Structures of Bacteria[J]. , 2010, 25(1): 53 -70 .
[10] Xiang-Dong Hu, Senior Member, CCF, Yong Guo, Ying Zhu, Xin Guo, and Peng Wang. Design and Application of Instruction Set Simulator on Multi-Core Verification[J]. , 2010, 25(2): 267 -273 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved