We use cookies to improve your experience with our site.

用于分类、回归和聚集的主动学习查询策略:综述

Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey

  • 摘要: 通常,大量数据是未标记的,而这标记需要成本。标记和学习成本可以通过基于最少标记数据实例的学习而最小化。主动学习(AL)指使用额外的方式,从专家注解或Oracle查询标记实例进行学习。主动学习者使用实例选择策略选择那些关键的查询实例,这可以快速减少泛化错误。这个过程最终得到一个精炼的训练数据集,有助于最小化总成本。AL成功的关键在于查询策略,它选择候选查询实例,帮助学习者学习有效实例。本文在基于学习池AL情景背景下,综述了用于分类、回归和聚集的AL查询策略。分类查询策略进一步分为:基于信息式,基于代表式,基于信息和代表式,以及其它。同时,基于强化学习、深度学习和现实环境设置的查询策略,本文介绍了更多先进的查询策略。在对这些策略作对比分析之前严谨精确地分析了AL策略。最后,讨论了AL实施指导、应用及其面临的挑战。

     

    Abstract: Generally, data is available abundantly in unlabeled form, and its annotation requires some cost. The labeling, as well as learning cost, can be minimized by learning with the minimum labeled data instances. Active learning (AL), learns from a few labeled data instances with the additional facility of querying the labels of instances from an expert annotator or oracle. The active learner uses an instance selection strategy for selecting those critical query instances, which reduce the generalization error as fast as possible. This process results in a refined training dataset, which helps in minimizing the overall cost. The key to the success of AL is query strategies that select the candidate query instances and help the learner in learning a valid hypothesis. This survey reviews AL query strategies for classification, regression, and clustering under the pool-based AL scenario. The query strategies under classification are further divided into:informative-based, representative-based, informative- and representative-based, and others. Also, more advanced query strategies based on reinforcement learning and deep learning, along with query strategies under the realistic environment setting, are presented. After a rigorous mathematical analysis of AL strategies, this work presents a comparative analysis of these strategies. Finally, implementation guide, applications, and challenges of AL are discussed.

     

/

返回文章
返回