Journal of Computer Science and Technology

   

Shapelet Based Two-step Time Series Positive and Unlabeled Learning

Han-Bo Zhang(张翰博), Peng Wang*(王鹏) Member, CCF, Ming-Ming Zhang(张明明) and Wei Wang(汪卫), Member, CCF   

  1. School of Computer Science, Fudan University, Shanghai, China, 200438
  • Received:2021-01-25 Revised:2022-12-16 Accepted:2022-12-22
  • Contact: Peng Wang E-mail:pengwang5@fudan.edu.cn
  • About author:Peng Wang received the Ph.D. degree from Fudan University, Shanghai, in 2007. Now he is a professor in School of Computer Science, Fudan University, Shanghai. His research interests include database, data mining, and series data processing. He has published more than 30 papers in refereed international journals and conference proceedings.

In the last decade, there has been significant progress in time series classification. However, in real-world industrial settings, it is expensive and difficult to obtain high-quality labeled data. Therefore, the positive and unlabeled learning (PU-learning) problem becomes more and more popular recently. The current PU-learning approaches of the time series data suffer from low accuracy due to the lack of negative labeled time series. In this paper, we propose a novel shapelet based two-step (2STEP) PU-learning approach. In the first step, we generate shapelet features based on the positive time series, which are used to select a set of negative examples. In the second step, based on both positive and negative time series, we select the final features and build the classification model. The experiments results show that our 2STEP approach can improve the average F 1 score on 15 datasets by 9.1% compared with baseline, and achieves the highest F 1 score on 10 out of 15 time series datasets.


中文摘要

1、 研究背景(context):
在过去十年中,时间序列分类取得了重大进展。然而,在现实的工业环境中,获取高质量的标记数据既昂贵又困难。往往我们需要面对的数据场景是有限的正例和大量无标签的样本,例如事件发现、异常检测等。因此,一个更现实的问题,正例-无标签例学习(PU-learning)问题最近变得越来越流行。
2、目的(Objective):
问题的初始设定是,我们只有少数正例样本P和大量无标签的时间序列样本U,根据他们去构建而分类器对测试数据进行分类。当前时间序列数据的PU学习方法精度较低,因为缺乏负例时间序列样本,使得识别特征的提取具有挑战性。我们的研究目的就是只根据有限的P和大量U,来寻找有意义的时间序列特征,进而根据他设计分类算法提升PU学习算法精度。
3、方法(Method):
我们用了一种两阶段的方法来得到分类器。
阶段一
通过P集合将P中的时间序列拼接起来成为一个长序列通过motif发现的方法找到motif子序列集合,并根据我们提出的一个类TFIDF统计指标TF-DDF排序筛选生成具有P类别代表意义的P-shapelet集合。
由于P-shapelet集合可以代表P集合的特征,那么通过投票的方法利用这个P-shapelet集合可以找到和P-shapelet不那么像的时间序列集合N作为负例时间序列。
然后用寻找与P-shapelet类似的方法拼接N序列生成代表N例子集合的N-shapelet集合与P-shapelet合并形成所有shapelet candidate备用。
阶段二
利用shapelet candidate 中的shapelet与DS中的时间序列样本算最近距离(这个过程称为Shapelet transformation(形状变换))生成时间序列特征,通过我们的CSI分数对特征进行筛选得到训练集合。P和N作为训练集。U集合中其他例子作为测试集构建svm分类器进行分类。
4、结果(Result & Findings):
实验结果显示,我们的基于shapelet的两阶段时间序列PU学习方法在与传统的标签传播和基于代价敏感学习的PU学习算法比较中在精度上平均提升了9.1%的F1分数。在15个时间序列数据集上我们的方法在10个数据集上的准确率都要优于其他方法。
5、结论(Conclusions):
在本文中,我们提出了一种时间序列PU学习问题的两阶段方法。我们首先找到一组高质量的shapelet,然后基于它们获得一些负时间序列。因此,我们将PU学习问题转化为传统的时间序列分类问题。该方法在15个时间序列数据集中的10个数据集上实现了最高的精度,这验证了两步方法比基于标签传播的方法和基于ERM的方法具有优势。

Key words: positive unlabeled learning; time series; Shapelet;

[1] Feng Zhou, Hao-Min Zhou, Zhi-Hua Yang, Li-Hua Yang. A 2-Stage Strategy for Non-Stationary Signal Prediction and Recovery Using Iterative Filtering and Neural Network [J]. Journal of Computer Science and Technology, 2019, 34(2): 318-338.
[2] Jing Zhou, Shan-Feng Zhu, Xiaodi Huang, Yanchun Zhang. Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning [J]. , 2015, 30(4): 859-873.
[3] Hao Wang, Chao-Kun Wang, Ya-Jun Xu and Yuan-Chi Ning. Dominant Skyline Query Processing over Multiple Time Series [J]. , 2013, 28(4): 625-635.
[4] Wei Luo, Marcus Gallagher, and Janet Wiles. Parameter-Free Search of Time-Series Discord [J]. , 2013, 28(2): 300-310.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Xiao-Qing Deng, Bo-Lin Chen, Wei-Qi Luo, and Da Luo. Universal Image Steganalysis Based on Convolutional Neural Network with Global Covariance Pooling[J]. Journal of Computer Science and Technology, 2022, 37(5): 1134 -1145 .
[2] Ying Zhang, Hua-Wei Li, Member, CCF, Senior Member, IEEE, and Xiao-Wei Li, Member, CCF, Senior Member, IEEE. Selected Crosstalk Avoidance Code for Reliable Network-on-Chip[J]. , 2009, 24(6): 1074 -1085 .
[3] Xia-Bing Zhou, Zhong-Qing Wang, Xing-Wei Liang, Min Zhang, and Guo-Dong Zhou. Neural Emotion Detection via Personal Attributes[J]. Journal of Computer Science and Technology, 2022, 37(5): 1146 -1160 .
[4] Yu-Qian Zhu, Jia-Ying Deng, Jia-Chen Pu, Peng Wang, Shen Liang and Wei Wang. ML-Parser: An Efficient and Accurate Online Log Parser[J]. Journal of Computer Science and Technology, 2022, 37(6): 1412 -1426 .
[5] Tian-Ni Xu, Hai-Feng Sun, Di Zhang, Xiao-Ming Zhou, Xiu-Feng Sui, Sa Wang, Qun Huang, and Yun-Gang Bao. NfvInsight: A Framework for Automatically Deploying and Benchmarking VNF Chains[J]. Journal of Computer Science and Technology, 2022, 37(3): 680 -698 .
[6] Yu-Jing Feng, De-Jian Li, Xu Tan, Xiao-Chun Ye, Dong-Rui Fan, Wen-Ming Li, Da Wang, Hao Zhang, and Zhi-Min Tang. Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism[J]. Journal of Computer Science and Technology, 2022, 37(4): 942 -959 .
[7] Yi-fan Zhang, Lei Sun, and Qiang Cao. TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis[J]. Journal of Computer Science and Technology, 2022, 37(6): 1290 -1306 .
[8] Jia Chen, Peng Wang, Fan Qiao, Shi-Qing Du, and Wei Wang. PLQ: An Efficient Approach to Processing Pattern-Based Log Queries[J]. Journal of Computer Science and Technology, 2022, 37(5): 1239 -1254 .
[9] Lei Liu, Xiu Ma, Hua-Xiao Liu, Guang-Li Li, and Lei Liu. FlexPDA: A Flexible Programming Framework for Deep Learning Accelerators[J]. Journal of Computer Science and Technology, 2022, 37(5): 1200 -1220 .
[10] Meng-Xin Chen, Xiao-Dong Zhu, Hao Zhang, Zhen Liu, and Yuan-Ning Liu. SMRI: A New Method for siRNA Design for COVID-19 Therapy[J]. Journal of Computer Science and Technology, 2022, 37(4): 991 -1002 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved