Bimonthly    Since 1986
ISSN 1000-9000(Print)
CN 11-2296/TP
Indexed in:
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
Other Countries: Springer
  • Table of Content
      30 September 2022, Volume 37 Issue 5 Previous Issue   
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Scalable Data Science
    Guo-Liang Li, Nan Tang and Cheng-Liang Chai
    Journal of Computer Science and Technology, 2022, 37 (5): 1003-1004.  DOI: 10.1007/s11390-022-0005-8
    Abstract   PDF(195KB) ( 183 )   Chinese Summary

    Data science targets the data life cycle of real applications, studying phenomena at scales, complexities, and granularities never before possible. This data life cycle encompasses databases and data engineering often leveraging statistical, machine learning, and artificial intelligence methods and, in many instances, using massive and heterogeneous collections of potentially noisy datasets. In this special section, we focus on data-intensive components of data science pipelines; and solve problems in areas of interest to our community (e.g., data curation, optimization, performance, storage, and systems).

    To promote the recent work on scalable data science, we organize this special section at Journal of Computer Science and Technology (JCST). We received XX papers from all over the world. First, the guest editors preformed quick reviews and immediately rejected insufficiently highquality submissions. Then, each remaining submission was reviewed by at least three invited international reviewers. All the papers were carried out two rounds of reviews, and the authors were asked to address all the major and minor issues in their submissions during the review process. Eventually we accepted seven high-quality submissions in terms of clarity, novelty, significance, and relevance.

    The first paper "GAM: A GPU-Accelerated Algorithm for MaxRS Queries in Road Networks" by Jian Chen et al. proposes a novel GPU-accelerated algorithm GAM to tackle maximizing range sum queries in road networks efficiently with a two-level framework. The framework first proposes an effective multi-grained pruning technique to prune the cells derived from partitioning the road network, and then GPU-friendly storage structure is designed to compute the final result in the remaining cells.

    The second paper "Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishingby Sun et al. systematically evaluates the individual privacy in terms of unicity and the utility in terms of practical applications of the anonymized trajectory data. This paper reveals the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories.

    The third paper "Efficient Partitioning Method for Optimizing the Compression on Array Databy Han et al. utilizes header compression to address the problem of array partitioning for optimizing the compression performance. The paper designs a greedy strategy which can help to find the partition point with the best compression performance.

    The forth paper "Discovering Cohesive Temporal Subgraphs with Temporal Density Aware Explorationby Zhu et al. proposes a temporal subgraph model to discover cohesive temporal subgraphs by capturing both the structural and the temporal characteristics of temporal cohesive subgraphs. This paper designs strategies to mine temporal densest subgraphs efficiently by decomposing the temporal graph into the sequence of snapshots.

    The fifth paper "Incremental User Identification Across Social Networks Based on User-Guider Similarity Index" by Kou et al. proposes an incremental user identification method across social networks based on User-guider Similarity Index. The paper first constructs a novel User guider Similarity Index to speed up the matching between users, and then applies a two-phase user identification strategy to efficiently identify users.

    The sixth paper "An Exercise Collection Auto-Assembling Framework with Knowledge Tracing and Reinforcement Learning" by Zhao et al. introduces an exercise collection auto-assembling framework, in which the assembled exercise collection can meet the teacher’s requirements on the difficulty index and the discrimination index. The paper designs a two-stage approach where a knowledge tracing model is used to predict the students’ answers and a deep reinforcement learning model to select exercises to satisfy the query parameters.

    Related Articles | Metrics
    GAM: A GPU-Accelerated Algorithm for MaxRS Queries in Road Networks
    Jian Chen, Kai-Qi Zhang, Tian Ren, Zhen-Qing Wu, and Hong Gao
    Journal of Computer Science and Technology, 2022, 37 (5): 1005-1025.  DOI: 10.1007/s11390-022-2330-3
    In smart phones, vehicles and wearable devices, GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world. Given a set of weighted points and a rectangle r in the space, a maximizing range sum (MaxRS) query is to find the position of r, so as to maximize the total weight of the points covered by r (i.e., the range sum). It has a wide spectrum of applications in spatial crowdsourcing, facility location and traffic monitoring. Most of the existing research focuses on the euclidean space; however, in real life, the user's moving route is constrained by the road network, and the existing MaxRS query algorithms in the road network are inefficient. In this paper, we propose a novel GPU-accelerated algorithm, namely, GAM, to tackle MaxRS queries in road networks in two phases efficiently. In phase 1, we partition the entire road network into many small cells by grid and theoretically prove the correctness of parallel query results by grid shifting, and then we propose an effective multi-grained pruning technique, by which the majority of cells can be pruned without further checking. In phase 2, we design a GPU-friendly storage structure, cell-based road network (CRN), and a two-level parallel framework to compute the final result in the remaining cells. Finally, we conduct extensive experiments on two real-world road networks, and experimental results demonstrate that GAM is on average one order faster than state-of-the-art competitors, and the maximum speedup can achieve about 55 times.
    References | Supplementary Material | Related Articles | Metrics
    Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing
    She Sun, Shuai Ma, Jing-He Song, Wen-Hai Yue, Xue-Lian Lin, and Tiejun Ma
    Journal of Computer Science and Technology, 2022, 37 (5): 1026-1048.  DOI: 10.1007/s11390-022-2409-x
    With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez et al. is exactly opposite to that of De Montjoye et al., which raises the first question: "what is the true situation of the privacy preservation for trajectories in terms of reidentification?'' Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: "what is the true situation of the utility of anonymized trajectories?'' To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, k-anonymity and ε-differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye et al. and Sánchez et al. To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications.
    References | Supplementary Material | Related Articles | Metrics
    Efficient Partitioning Method for Optimizing the Compression on Array Data
    Shuai Han, Xian-Min Liu, and Jian-Zhong Li
    Journal of Computer Science and Technology, 2022, 37 (5): 1049-1067.  DOI: 10.1007/s11390-022-2371-7
    Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that the array partitioning can affect the compression performance significantly, this paper aims to design the efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.
    References | Supplementary Material | Related Articles | Metrics
    Discovering Cohesive Temporal Subgraphs with Temporal Density Aware Exploration
    Chun-Xue Zhu, Long-Long Lin, Ping-Peng Yuan, and Hai Jin
    Journal of Computer Science and Technology, 2022, 37 (5): 1068-1085.  DOI: 10.1007/s11390-022-2431-z
    Real-world networks, such as social networks, cryptocurrency networks, and e-commerce networks, always have occurrence time of interactions between nodes. Such networks are typically modeled as temporal graphs. Mining cohesive subgraphs from temporal graphs is practical and essential in numerous data mining applications, since mining cohesive subgraphs gets insights into the time-varying nature of temporal graphs. However, existing studies on mining cohesive subgraphs, such as Densest-Exact and k-truss, are mainly tailored for static graphs (whose edges have no temporal information). Therefore, those cohesive subgraph models cannot indicate both the temporal and the structural characteristics of subgraphs. To this end, we explore the model of cohesive temporal subgraphs by incorporating both the evolving and the structural characteristics of temporal subgraphs. Unfortunately, the volume of time intervals in a temporal network is quadratic. As a result, the time complexity of mining temporal cohesive subgraphs is high. To efficiently address the problem, we first mine the temporal density distribution of temporal graphs. Guided by the distribution, we can safely prune many unqualified time intervals with the linear time cost. Then, the remaining time intervals where cohesive temporal subgraphs fall in are examined using the greedy search. The results of the experiments on nine real-world temporal graphs indicate that our model outperforms state-of-the-art solutions in efficiency and quality. Specifically, our model only takes less than two minutes on a million-vertex DBLP and has the highest overall average ranking in EDB and TC metrics.
    References | Supplementary Material | Related Articles | Metrics
    Incremental User Identification Across Social Networks Based on User-Guider Similarity Index
    Yue Kou, Dong Li, De-Rong Shen, Tie-Zheng Nie, and Ge Yu
    Journal of Computer Science and Technology, 2022, 37 (5): 1086-1104.  DOI: 10.1007/s11390-022-2430-0
    Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions. However, existing techniques rely on given user seeds and ignore the dynamic changes of online social networks, which fails to generate high quality identification results. In order to solve this problem, we propose an incremental user identification method based on user-guider similarity index (called CURIOUS), which efficiently identifies users and well captures the changes of user features over time. Specifically, we first construct a novel user-guider similarity index (called USI) to speed up the matching between users. Second we propose a two-phase user identification strategy consisting of USI-based bidirectional user matching and seed-based user matching, which is effective even for incomplete networks. Finally, we propose incremental maintenance for both USI and the identification results, which dynamically captures the instant states of social networks. We conduct experimental studies based on three real-world social networks. The experiments demonstrate the effectiveness and the efficiency of our proposed method in comparison with traditional methods. Compared with the traditional methods, our method improves precision, recall and rank score by an average of 0.19, 0.16 and 0.09 respectively, and reduces the time cost by an average of 81%.
    References | Supplementary Material | Related Articles | Metrics
    An Exercise Collection Auto-Assembling Framework with Knowledge Tracing and Reinforcement Learning
    Tian-Yu Zhao, Man Zeng, and Jian-Hua Feng
    Journal of Computer Science and Technology, 2022, 37 (5): 1105-1117.  DOI: 10.1007/s11390-022-2412-2
    In educational practice, teachers often need to manually assemble an exercise collection as a class quiz or a homework assignment. A well-assembled exercise collection needs to have the proper difficulty index and discrimination index so that it can better develop students' abilities. In this paper, we propose an exercise collection auto-assembling framework, in which a teacher provides the target values of difficulty and discrimination indices and a qualified exercise collection is automatically assembled. The framework consists of two stages. At the answer prediction stage, a knowledge tracing model is utilized to predict the students' answers to unseen exercises based on their history interaction records. In addition, to better represent the exercises in the model, we propose exercise embeddings and design a pre-training approach. At the collection assembling stage, we propose a deep reinforcement learning model to assemble the required exercise collection effectively. Since the knowledge tracing model in the first stage has different confidences in the predicted answers, it is also taken into account in the objective. Experimental results show the effectiveness and efficiency of the proposed framework.
    References | Supplementary Material | Related Articles | Metrics
    Artificial Intelligence and Pattern Recognition
    Enhancing N-Gram Based Metrics with Semantics for Better Evaluation of Abstractive Text Summarization
    Jia-Wei He, Wen-Jun Jiang, Guo-Bang Chen, Yu-Quan Le, and Xiao-Fei Ding
    Journal of Computer Science and Technology, 2022, 37 (5): 1118-1133.  DOI: 10.1007/s11390-022-2125-6
    Text summarization is an important task in natural language processing and it has been applied in many applications. Recently, abstractive summarization has attracted many attentions. However, the traditional evaluation metrics that consider little semantic information, are unsuitable for evaluating the quality of deep learning based abstractive summarization models, since these models may generate new words that do not exist in the original text. Moreover, the out-of-vocabulary (OOV) problem that affects the evaluation results, has not been well solved yet. To address these issues, we propose a novel model called ENMS, to enhance existing N-gram based evaluation metrics with semantics. To be specific, we present two types of methods: N-gram based Semantic Matching (NSM for short), and N-gram based Semantic Similarity (NSS for short), to improve several widely-used evaluation metrics including ROUGE (Recall-Oriented Understudy for Gisting Evaluation), BLEU (Bilingual Evaluation Understudy), etc. NSM and NSS work in different ways. The former calculates the matching degree directly, while the latter mainly improves the similarity measurement. Moreover we propose an N-gram representation mechanism to explore the vector representation of N-grams (including skip-grams). It serves as the basis of our ENMS model, in which we exploit some simple but effective integration methods to solve the OOV problem efficiently. Experimental results over the TAC AESOP dataset show that the metrics improved by our methods are well correlated with human judgements and can be used to better evaluate abstractive summarization methods.
    References | Supplementary Material | Related Articles | Metrics
    Universal Image Steganalysis Based on Convolutional Neural Network with Global Covariance Pooling
    Xiao-Qing Deng, Bo-Lin Chen, Wei-Qi Luo, and Da Luo
    Journal of Computer Science and Technology, 2022, 37 (5): 1134-1145.  DOI: 10.1007/s11390-021-0572-0
    Recently, steganalytic methods based on deep learning have achieved much better performance than traditional methods based on handcrafted features. However, most existing methods based on deep learning are specially designed for one image domain (i.e., spatial or JPEG), and they often take long time to train. To make a balance between the detection performance and the training time, in this paper, we propose an effective and relatively fast steganalytic network called US-CovNet (Universal Steganalytic Covariance Network) for both {the} spatial and JPEG domains. To this end, we carefully design several important components of {US-CovNet} that will significantly affect the detection performance, including the high-pass filter set, the shortcut connection and the pooling {layer}. Extensive experimental results show that compared with the current best steganalytic networks (i.e., SRNet and J-YeNet), {US-CovNet} can achieve the state-of-the-art results for detecting spatial steganography and have competitive performance for detecting JPEG steganography. For example, the detection accuracy of US-CovNet is at least 0.56% higher than that of SRNet in the spatial domain. In the JPEG domain, US-CovNet performs slightly worse than J-YeNet in some cases with the degradation of less than 0.78%. However, the training time of US-CovNet is significantly reduced, which is less than 1/4 and 1/2 of SRNet and J-YeNet respectively.
    References | Supplementary Material | Related Articles | Metrics
    Neural Emotion Detection via Personal Attributes
    Xia-Bing Zhou, Zhong-Qing Wang, Xing-Wei Liang, Min Zhang, and Guo-Dong Zhou
    Journal of Computer Science and Technology, 2022, 37 (5): 1146-1160.  DOI: 10.1007/s11390-021-0606-7
    There has been a recent line of work to automatically detect the emotions of posts in social media. In literature, studies treat posts independently and detect their emotions separately. Different from previous studies, we explore the dependence among relevant posts via authors' backgrounds, since the authors with similar backgrounds, e.g., "gender", "location", tend to express similar emotions. However, personal attributes are not easy to obtain in most social media websites. Accordingly, we propose two approaches to determine personal attributes and capture personal attributes between different posts for emotion detection: the Joint Model with Personal Attention Mechanism (JPA) model is used to detect emotion and personal attributes jointly, and capture the attributes-aware words to connect similar people; the Neural Personal Discrimination (NPD) model is employed to determine the personal attributes from posts and connect the relevant posts with similar attributes for emotion detection. Experimental results show the usefulness of personal attributes in emotion detection, and the effectiveness of the proposed JPA and NPD approaches in capturing personal attributes over the state-of-the-art statistic and neural models.
    References | Supplementary Material | Related Articles | Metrics
    Towards Defense Against Adversarial Attacks on Graph Neural Networks via Calibrated Co-Training
    Xu-Gang Wu, Hui-Jun Wu, Xu Zhou, Xiang Zhao, and Kai Lu
    Journal of Computer Science and Technology, 2022, 37 (5): 1161-1175.  DOI: 10.1007/s11390-022-2129-2
    Graph neural networks (GNNs) have achieved significant success in graph representation learning. Nevertheless, the recent work indicates that current GNNs are vulnerable to adversarial perturbations, in particular structural perturbations. This, therefore, narrows the application of GNN models in real-world scenarios. Such vulnerability can be attributed to the model’s excessive reliance on incomplete data views (e.g., graph convolutional networks (GCNs) heavily rely on graph structures to make predictions). By integrating the information from multiple perspectives, this problem can be effectively addressed, and typical views of graphs include the node feature view and the graph structure view. In this paper, we propose C2oG, which combines these two typical views to train sub-models and fuses their knowledge through co-training. Due to the orthogonality of the views, sub-models in the feature view tend to be robust against the perturbations targeted at sub-models in the structure view. C2oG allows sub-models to correct one another mutually and thus enhance the robustness of their ensembles. In our evaluations, C2oG significantly improves the robustness of graph models against adversarial attacks without sacrificing their performance on clean datasets.
    References | Supplementary Material | Related Articles | Metrics
    Regular Paper
    Reliability and Incentive of Performance Assessment for Decentralized Clouds
    Jiu-Chen Shi, Xiao-Qing Cai, Wen-Li Zheng, Quan Chen, De-Ze Zeng, Tatsuhiro Tsuchiya, and Min-Yi Guo
    Journal of Computer Science and Technology, 2022, 37 (5): 1176-1199.  DOI: 10.1007/s11390-022-2120-y
    Decentralized cloud platforms have emerged as a promising paradigm to exploit the idle computing resources across the Internet to catch up with the ever-increasing cloud computing demands. As any user or enterprise can be the cloud provider in the decentralized cloud, the performance assessment of the heterogeneous computing resources is of vital significance. However, with the consideration of the untrustworthiness of the participants and the lack of unified performance assessment metric, the performance monitoring reliability and the incentive for cloud providers to offer real and stable performance together constitute the computational performance assessment problem in the decentralized cloud. In this paper, we present a robust performance assessment solution RODE to solve this problem. RODE mainly consists of a performance monitoring mechanism and an assessment of the claimed performance (AoCP) mechanism. The performance monitoring mechanism first generates reliable and verifiable performance monitoring results for the workloads executed by untrusted cloud providers. Based on the performance monitoring results, the AoCP mechanism forms a unified performance assessment metric to incentivize cloud providers to offer performance as claimed. Via extensive experiments, we show RODE can accurately monitor the performance of cloud providers on the premise of reliability, and incentivize cloud providers to honestly present the performance information and maintain the performance stability.
    References | Supplementary Material | Related Articles | Metrics
    FlexPDA: A Flexible Programming Framework for Deep Learning Accelerators
    Lei Liu, Xiu Ma, Hua-xiao Liu, Guang-li Li, and Lei Liu
    Journal of Computer Science and Technology, 2022, 37 (5): 1200-1220.  DOI: 10.1007/s11390-021-1406-9
    There are a wide variety of intelligence accelerators with promising performance and energy efficiency, deployed in a broad range of applications such as computer vision and speech recognition. However, programming productivity hinders the deployment of deep learning accelerators. The low-level library invoked in the high-level deep learning framework which supports the end-to-end execution with a given model, is designed to reduce the programming burden on the intelligence accelerators. Unfortunately, it is inflexible for developers to build a network model for every deep learning application, which probably brings unnecessary repetitive implementation. In this paper, a flexible and efficient programming framework for deep learning accelerators, FlexPDA, is proposed, which provides more optimization opportunities than the low-level library and realizes quick transplantation of applications to intelligence accelerators for fast upgrades. We evaluate FlexPDA by using 10 representative operators selected from deep learning algorithms and an end-to-end network. The experimental results validate the effectiveness of FlexPDA, which achieves an end-to-end performance improvement of 1.620x over the low-level library.
    References | Supplementary Material | Related Articles | Metrics
    Quasi-Developable B-Spline Surface Design with Control Rulings
    Zi-Xuan Hu, Peng-Bo Bo, and Cai-Ming Zhang
    Journal of Computer Science and Technology, 2022, 37 (5): 1221-1238.  DOI: 10.1007/s11390-022-0680-5
    We propose a method for generating a ruled B-spline surface fitting to a sequence of pre-defined ruling lines and the generated surface is required to be as-developable-as-possible. Specifically, the terminal ruling lines are treated as hard constraints. Different from existing methods that compute a quasi-developable surface from two boundary curves and cannot achieve explicit ruling control, our method controls ruling lines in an intuitive way and serves as an effective tool for computing quasi-developable surfaces from freely-designed rulings. We treat this problem from the point of view of numerical optimization and solve for surfaces meeting the distance error tolerance allowed in applications. The performance and the efficacy of the proposed method are demonstrated by the experiments on a variety of models including an application of the method for path planning in 5-axis computer numerical control (CNC) flank milling.
    References | Supplementary Material | Related Articles | Metrics
    PLQ: An Efficient Approach to Processing Pattern-Based Log Queries
    Jia Chen, Peng Wang, Fan Qiao, Shi-Qing Du, and Wei Wang
    Journal of Computer Science and Technology, 2022, 37 (5): 1239-1254.  DOI: 10.1007/s11390-020-0653-5
    As software systems grow more and more complex, extensive techniques have been proposed to analyze the log data to obtain the insight of the system status. However, during log data analysis, tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data, named pattern-based queries. Although existing log management tools and DMBS systems can also support pattern-based queries, they suffer from a low efficiency. To deal with this problem, we propose a novel approach, named PLQ (Pattern-based Log Query). First, PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values. Then, based on bitmap indexes, PLQ finds candidate logs with a set of efficient bit-wise operations. Finally, PLQ fetches candidate logs and validates them according to the queried pattern. Extensive experiments are conducted on real-life datasets. According to experimental results, compared with existing log management systems, PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs. Moreover, in PLQ, since the ratio of the index size to the data size does not exceed 2.5% for log datasets of different sizes, PLQ has a high scalability.
    References | Supplementary Material | Related Articles | Metrics
  Journal Online
Just Accepted
Top Cited Papers
Top 30 Most Read
Paper Lists of Areas
Special Issues
   ScholarOne Manuscripts
   Log In

User ID:


  Forgot your password?

Enter your e-mail address to receive your account information.

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved