Semi-Supervised Learning Based Tag Recommendation for Docker Repositories
-
Abstract
Docker has been the mainstream technology of providing reusable software artifacts recently. Developers can easily build and deploy their applications using Docker. Currently, a large number of reusable Docker images are publicly shared in online communities, and semantic tags can be created to help developers effectively reuse the images. However, the communities do not provide tagging services, and manually tagging is exhausting and time-consuming. This paper addresses the problem through a semi-supervised learning-based approach, named SemiTagRec. SemiTagRec contains four components:(1) the predictor, which calculates the probability of assigning a specific tag to a given Docker repository; (2) the extender, which introduces new tags as the candidates based on tag correlation analysis; (3) the evaluator, which measures the candidate tags based on a logistic regression model; (4) the integrator, which calculates a final score by combining the results of the predictor and the evaluator, and then assigns the tags with high scores to the given Docker repositories. SemiTagRec includes the newly tagged repositories into the training data for the next round of training. In this way, SemiTagRec iteratively trains the predictor with the cumulative tagged repositories and the extended tag vocabulary, to achieve a high accuracy of tag recommendation. Finally, the experimental results show that SemiTagRec outperforms the other approaches and SemiTagRec's accuracy, in terms of Recall@5 and Recall@10, is 0.688 and 0.781 respectively.
-
-