|Abstract Cloud computing has recently emerged as a leading paradigm to allow customers to run their applications in virtualized large-scale data centers. Existing solutions for monitoring and management of these infrastructures consider virtual machines (VMs) as independent entities with their own characteristics. However, these approaches suffer from scalability issues due to the increasing number of VMs in modern cloud data centers. We claim that scalability issues can be addressed by leveraging the similarity among VMs behavior in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. The innovative contribution of the proposed methodology is the use of the statistical technique known as principal component analysis (PCA) to automatically select the most relevant information to cluster similar VMs. We apply the methodology to two case studies, a virtualized testbed and a real enterprise data center. In both case studies, the automatic data selection based on PCA allows us to achieve high performance, with a percentage of correctly clustered VMs between 80% and 100% even for short time series (1 day) of monitored data. Furthermore, we estimate the potential reduction in the amount of collected data to demonstrate how our proposal may address the scalability issues related to monitoring and management in cloud computing data centers.