Bimonthly    Since 1986
ISSN 1000-9000(Print)
CN 11-2296/TP
Indexed in:
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
Other Countries: Springer
Top Read Articles
Published in last 1 year |  In last 2 years |  In last 3 years |  All
Please wait a minute...
For Selected: View Abstracts Toggle Thumbnails
DG-CNN: Introducing Margin Information into Convolutional Neural Networks for Breast Cancer Diagnosis in Ultrasound Images
Xiao-Zheng Xie, Jian-Wei Niu, Xue-Feng Liu, Qing-Feng Li, Yong Wang, Jie Han, and Shaojie Tang
Journal of Computer Science and Technology    2022, 37 (2): 277-294.   DOI: 10.1007/s11390-020-0192-0
Accepted: 02 June 2020

Abstract659)      PDF   

Although using convolutional neural networks (CNN) for computer-aided diagnosis (CAD) has made tremendous progress in the last few years, the small medical datasets remain to be the major bottleneck in this area. To address this problem, researchers start looking for information out of the medical datasets. Previous efforts mainly leverage information from natural images via transfer learning. More recent research work focuses on integrating knowledge from medical practitioners, either letting networks resemble how practitioners are trained, how they view images, or using extra annotations. In this paper, we propose a scheme named Domain Guided-CNN (DG-CNN) to incorporate the margin information, a feature described in the consensus for radiologists to diagnose cancer in breast ultrasound (BUS) images. In DG-CNN, attention maps that highlight margin areas of tumors are first generated, and then incorporated via different approaches into the networks. We have tested the performance of DG-CNN on our own dataset (including 1485 ultrasound images) and on a public dataset. The results show that DG-CNN can be applied to different network structures like VGG and ResNet to improve their performance. For example, experimental results on our dataset show that with a certain integrating mode, the improvement of using DG-CNN over a baseline network structure ResNet18 is 2.17% in accuracy, 1.69% in sensitivity, 2.64% in specificity and 2.57% in AUC (Area Under Curve). To the best of our knowledge, this is the first time that the margin information is utilized to improve the performance of deep neural networks in diagnosing breast cancer in BUS images.

Reference | Supplementary Material | Related Articles | Metrics
DeltaFuzz: Historical Version Information Guided Fuzz Testing
Jia-Ming Zhang, Zhan-Qi Cui, Xiang Chen, Huan-Huan Wu, Li-Wei Zheng, and Jian-Bin Liu
Journal of Computer Science and Technology    2022, 37 (1): 29-49.   DOI: 10.1007/s11390-021-1663-7
Abstract549)      PDF   
With the widespread use of agile software development methods, such as agile and scrum, software is iteratively updated more frequently. To ensure the quality of the software, regression testing is conducted before new versions are released. Moreover, to improve the efficiency of regression testing, testing efforts should be concentrated on the modified and impacted parts of a program. However, the costs of manually constructing new test cases for the modified and impacted parts are relatively expensive. Fuzz testing is an effective method for generating test data automatically, but it is usually devoted to achieving higher code coverage, which makes fuzz testing unsuitable for direct regression testing scenarios. For this reason, we propose a fuzz testing method based on the guidance of historical version information. First, the differences between the program being tested and the last version are analyzed, and the results of the analysis are used to locate change points. Second, change impact analysis is performed to find the corresponding impacted basic blocks. Finally, the fitness values of test cases are calculated according to the execution traces, and new test cases are generated iteratively by the genetic algorithm. Based on the proposed method, we implement a prototype tool DeltaFuzz and conduct experiments on six open-source projects. Compared with the fuzzing tool AFLGo, AFLFast and AFL, DeltaFuzz can reach the target faster, and the time taken by DeltaFuzz was reduced by 20.59%, 30.05% and 32.61%, respectively.
Reference | Supplementary Material | Related Articles | Metrics
MEBS: Uncovering Memory Life-Cycle Bugs in Operating System Kernels
Gen Zhang, Peng-Fei Wang, Tai Yue, Xu Zhou, Kai Lu
Journal of Computer Science and Technology    2021, 36 (6): 1248-1268.   DOI: 10.1007/s11390-021-1593-4
Accepted: 25 August 2021

Abstract547)      PDF   
Allocation, dereferencing, and freeing of memory data in kernels are coherently linked. There widely exist real cases where the correctness of memory is compromised. This incorrectness in kernel memory brings about significant security issues, e.g., information leaking. Though memory allocation, dereferencing, and freeing are closely related, previous work failed to realize they are closely related. In this paper, we study the life-cycle of kernel memory, which consists of allocation, dereferencing, and freeing. Errors in them are called memory life-cycle (MLC) bugs. We propose an in-depth study of MLC bugs and implement a memory life-cycle bug sanitizer (MEBS) for MLC bug detection. Utilizing an interprocedural global call graph and novel identification approaches, MEBS can reveal memory allocation, dereferencing, and freeing sites in kernels. By constructing a modified define-use chain and examining the errors in the life-cycle, MLC bugs can be identified. Moreover, the experimental results on the latest kernels demonstrate that MEBS can effectively detect MLC bugs, and MEBS can be scaled to different kernels. More than 100 new bugs are exposed in Linux and FreeBSD, and 12 common vulnerabilities and exposures (CVE) are assigned.
Reference | Supplementary Material | Related Articles | Metrics
Connecting the Dots in Self-Supervised Learning: A Brief Survey for Beginners
Peng-Fei Fang, Xian Li, Yang Yan, Shuai Zhang, Qi-Yue Kang, Xiao-Fei Li, and Zhen-Zhong Lan
Journal of Computer Science and Technology    2022, 37 (3): 507-526.   DOI: 10.1007/s11390-022-2158-x
Accepted: 18 May 2022

Abstract487)      PDF   

The artificial intelligence (AI) community has recently made tremendous progress in developing self-supervised learning (SSL) algorithms that can learn high-quality data representations from massive amounts of unlabeled data. These methods brought great results even to the fields outside of AI. Due to the joint efforts of researchers in various areas, new SSL methods come out daily. However, such a sheer number of publications make it difficult for beginners to see clearly how the subject progresses. This survey bridges this gap by carefully selecting a small portion of papers that we believe are milestones or essential work. We see these researches as the "dots" of SSL and connect them through how they evolve. Hopefully, by viewing the connections of these dots, readers will have a high-level picture of the development of SSL across multiple disciplines including natural language processing, computer vision, graph learning, audio processing, and protein learning.

Reference | Supplementary Material | Related Articles | Metrics
Improving Ocean Data Services with Semantics and Quick Index
Xiao-Li Ren, Kai-Jun Ren, Zi-Chen Xu, Xiao-Yong Li, Ao-Long Zhou, Jun-Qiang Song, Ke-Feng Deng
Journal of Computer Science and Technology    2021, 36 (5): 963-984.   DOI: 10.1007/s11390-021-1374-0
Accepted: 20 August 2021

Abstract477)      PDF   
Massive ocean data acquired by various observing platforms and sensors poses new challenges to data management and utilization. Typically, it is difficult to find the desired data from the large amount of datasets efficiently and effectively. Most of existing methods for data discovery are based on the keyword retrieval or direct semantic reasoning, and they are either limited in data access rate or do not take the time cost into account. In this paper, we creatively design and implement a novel system to alleviate the problem by introducing semantics with ontologies, which is referred to as Data Ontology and List-Based Publishing (DOLP). Specifically, we mainly improve the ocean data services in the following three aspects. First, we propose a unified semantic model called OEDO (Ocean Environmental Data Ontology) to represent heterogeneous ocean data by metadata and to be published as data services. Second, we propose an optimized quick service query list (QSQL) data structure for storing the pre-inferred semantically related services, and reducing the service querying time. Third, we propose two algorithms for optimizing QSQL hierarchically and horizontally, respectively, which aim to extend the semantics relationships of the data service and improve the data access rate. Experimental results prove that DOLP outperforms the benchmark methods. First, our QSQL-based data discovery methods obtain a higher recall rate than the keyword-based method, and are faster than the traditional semantic method based on direct reasoning. Second, DOLP can handle more complex semantic relationships than the existing methods.
Reference | Supplementary Material | Related Articles | Metrics
Constructing an Educational Knowledge Graph with Concepts Linked to Wikipedia
Fu-Rong Dang, Jin-Tao Tang, Kun-Yuan Pang, Ting Wang, Sha-Sha Li, Xiao Li
Journal of Computer Science and Technology    2021, 36 (5): 1200-1211.   DOI: 10.1007/s11390-020-0328-2
Accepted: 18 August 2020

Abstract470)      PDF   
To use educational resources efficiently and dig out the nature of relations among MOOCs (massive open online courses), a knowledge graph was built for MOOCs on four major platforms:Coursera, EDX, XuetangX, and ICourse. This paper demonstrates the whole process of educational knowledge graph construction for reference. And this knowledge graph, the largest knowledge graph of MOOC resources at present, stores and represents five classes, 11 kinds of relations and 52 779 entities with their corresponding properties, amounting to more than 300 000 triples. Notably, 24 188 concepts are extracted from text attributes of MOOCs and linked them directly with corresponding Wikipedia entries or the closest entries calculated semantically, which provides the normalized representation of knowledge and a more precise description for MOOCs far more than enriching words with explanatory links. Besides, prerequisites discovered by direct extractions are viewed as an essential supplement to augment the connectivity in the knowledge graph. This knowledge graph could be considered as a collection of unified MOOC resources for learners and the abundant data for researchers on MOOC-related applications, such as prerequisites mining.
Reference | Supplementary Material | Related Articles | Metrics
Activity Diagram Synthesis Using Labelled Graphs and the Genetic Algorithm
Chun-Hui Wang, Zhi Jin, Wei Zhang, Didar Zowghi, Hai-Yan Zhao, Wen-Pin Jiao
Journal of Computer Science and Technology    2021, 36 (6): 1388-1406.   DOI: 10.1007/s11390-020-0293-9
Accepted: 09 June 2020

Abstract459)      PDF   
Many applications need to meet diverse requirements of a large-scale distributed user group. That challenges the current requirements engineering techniques. Crowd-based requirements engineering was proposed as an umbrella term for dealing with the requirements development in the context of the large-scale user group. However, there are still many issues. Among others, a key issue is how to merge these requirements to produce the synthesized requirements description when a set of requirements descriptions from different participants are received. Appropriate techniques are needed for supporting the requirements synthesis. Diagrams are widely used in industry to represent requirements. This paper chooses the activity diagrams and proposes a novel approach for the activity diagram synthesis which adopts the genetic algorithm to repeatedly modify a population of individual solutions toward an optimal solution. As a result, it can automatically generate a resulting diagram which combines the commonalities as many as possible while leveraging the variabilities of a set of input diagrams. The approach is featured by: 1) the labelled graph proposed as the representation of the candidate solutions during the iterative evolution; 2) the generalized entropy proposed and defined as the measurement of the solutions; 3) the genetic algorithm designed for sorting out the high-quality solution. Four cases of different scales are used to evaluate the effectiveness of the approach. The experimental results show that not only the approach gets high precision and recall but also the resulting diagram satisfies the properties of minimization and information preservation and can support the requirements traceability.
Reference | Supplementary Material | Related Articles | Metrics
Pre-Train and Learn: Preserving Global Information for Graph Neural Networks
Dan-Hao Zhu, Xin-Yu Dai, Jia-Jun Chen
Journal of Computer Science and Technology    2021, 36 (6): 1420-1430.   DOI: 10.1007/s11390-020-0142-x
Accepted: 10 October 2020

Abstract448)      PDF   
Graph neural networks (GNNs) have shown great power in learning on graphs. However, it is still a challenge for GNNs to model information faraway from the source node. The ability to preserve global information can enhance graph representation and hence improve classification precision. In the paper, we propose a new learning framework named G-GNN (Global information for GNN) to address the challenge. First, the global structure and global attribute features of each node are obtained via unsupervised pre-training, and those global features preserve the global information associated with the node. Then, using the pre-trained global features and the raw attributes of the graph, a set of parallel kernel GNNs is used to learn different aspects from these heterogeneous features. Any general GNN can be used as a kernal and easily obtain the ability of preserving global information, without having to alter their own algorithms. Extensive experiments have shown that state-of-the-art models, e.g., GCN, GAT, Graphsage and APPNP, can achieve improvement with G-GNN on three standard evaluation datasets. Specially, we establish new benchmark precision records on Cora (84.31%) and Pubmed (80.95%) when learning on attributed graphs.
Reference | Related Articles | Metrics
Vulnerable Region-Aware Greybox Fuzzing
Ling-Yun Situ, Zhi-Qiang Zuo, Le Guan, Lin-Zhang Wang, Xuan-Dong Li, Jin Shi, Peng Liu
Journal of Computer Science and Technology    2021, 36 (5): 1212-1228.   DOI: 10.1007/s11390-021-1196-0
Accepted: 18 May 2021

Abstract442)      PDF   
Fuzzing is known to be one of the most effective techniques to uncover security vulnerabilities of large-scale software systems. During fuzzing, it is crucial to distribute the fuzzing resource appropriately so as to achieve the best fuzzing performance under a limited budget. Existing distribution strategies of American Fuzzy Lop (AFL) based greybox fuzzing focus on increasing coverage blindly without considering the metrics of code regions, thus lacking the insight regarding which region is more likely to be vulnerable and deserves more fuzzing resources. We tackle the above drawback by proposing a vulnerable region-aware greybox fuzzing approach. Specifically, we distribute more fuzzing resources towards regions that are more likely to be vulnerable based on four kinds of code metrics. We implemented the approach as an extension to AFL named RegionFuzz. Large-scale experimental evaluations validate the effectiveness and efficiency of RegionFuzz-11 new bugs including three new CVEs are successfully uncovered by RegionFuzz.
Reference | Supplementary Material | Related Articles | Metrics
A Multi-Agent Spatial Logic for Scenario-Based Decision Modeling and Verification in Platoon Systems
Jingwen Xu, Yanhong Huang, Jianqi Shi, Shengchao Qin
Journal of Computer Science and Technology    2021, 36 (6): 1231-1247.   DOI: 10.1007/s11390-021-1565-8
Abstract409)      PDF   
To cater for the scenario of coordinated transportation of multiple trucks on the highway, a platoon system for autonomous driving has been extensively explored in the industry. Before such a platoon is deployed, it is necessary to ensure the safety of its driving behavior, whereby each vehicle’s behavior is commanded by the decision-making function whose decision is based on the observed driving scenario. However, there is currently a lack of verification methods to ensure the reliability of the scenario-based decision-making process in the platoon system. In this paper, we focus on the platoon driving scenario, whereby the platoon is composed of intelligent heavy trucks driving on cross-sea highways. We propose a formal modeling and verification approach to provide safety assurance for platoon vehicles’ cooperative driving behaviors. The existing Multi-Lane Spatial Logic (MLSL) with a dedicated abstract model can express driving scene spatial properties and prove the safety of multi-lane traffic maneuvers under the single-vehicle perspective. To cater for the platoon system’s multi-vehicle perspective, we modify the existing abstract model and propose a Multi-Agent Spatial Logic (MASL) that extends MLSL by relative orientation and multi-agent observation. We then utilize a timed automata type supporting MASL formulas to model vehicles’ decision controllers for platoon driving. Taking the behavior of a human-driven vehicle (HDV) joining the platoon as a case study, we have implemented the model and verified safety properties on the UPPAAL tool to illustrate the viability of our framework.
Reference | Supplementary Material | Related Articles | Metrics
NfvInsight: A Framework for Automatically Deploying and Benchmarking VNF Chains
Tian-Ni Xu, Hai-Feng Sun, Di Zhang, Xiao-Ming Zhou, Xiu-Feng Sui, Sa Wang, Qun Huang, and Yun-Gang Bao
Journal of Computer Science and Technology    2022, 37 (3): 680-698.   DOI: 10.1007/s11390-020-0434-1
Abstract371)      PDF   
With the advent of virtualization techniques and software-defined networking (SDN), network function virtualization (NFV) shifts network functions (NFs) from hardware implementations to software appliances, between which exists a performance gap. How to narrow the gap is an essential issue of current NFV research. However, the cumbersomeness of deployment, the water pipe effect of virtual network function (VNF) chains, and the complexity of the system software stack together make it tough to figure out the cause of low performance in the NFV system. To pinpoint the NFV system performance, we propose NfvInsight, a framework for automatic deployment and benchmarking VNF chains. Our framework tackles the challenges in NFV performance analysis. The framework components include chain graph generation, automatic deployment, and fine granularity measurement. The design and implementation of each component have their advantages. To the best of our knowledge, we make the first attempt to collect rules forming a knowledge base for generating reasonable chain graphs. NfvInsight deploys the generated chain graphs automatically, which frees the network operators from executing at least 391 lines of bash commands for a single test. To diagnose the performance bottleneck, NfvInsight collects metrics from multiple layers of the software stack. Specifically, we collect the network stack latency distribution ingeniously, introducing only less than 2.2% overhead. We showcase the convenience and usability of NfvInsight in finding bottlenecks for both VNF chains and the underlying system. Leveraging our framework, we find several design flaws of the network stack, which are unsuitable for packet forwarding inside one single server under the NFV circumstance. Our optimization for these flaws gains at most 3x performance improvement.
Reference | Supplementary Material | Related Articles | Metrics
A Unified Shared-Private Network with Denoising for Dialogue State Tracking
Qing-Bin Liu, Shi-Zhu He, Kang Liu, Sheng-Ping Liu, Jun Zhao
Journal of Computer Science and Technology    2021, 36 (6): 1407-1419.   DOI: 10.1007/s11390-020-0338-0
Accepted: 21 January 2021

Abstract350)      PDF   
Dialogue state tracking (DST) leverages dialogue information to predict dialogues states which are generally represented as slot-value pairs. However, previous work usually has limitations to efficiently predict values due to the lack of a powerful strategy for generating values from both the dialogue history and the predefined values. By predicting values from the predefined value set, previous discriminative DST methods are difficult to handle unknown values. Previous generative DST methods determine values based on mentions in the dialogue history, which makes it difficult for them to handle uncovered and non-pointable mentions. Besides, existing generative DST methods usually ignore the unlabeled instances and suffer from the label noise problem, which limits the generation of mentions and eventually hurts performance. In this paper, we propose a unified shared-private network (USPN) to generate values from both the dialogue history and the predefined values through a unified strategy. Specifically, USPN uses an encoder to construct a complete generative space for each slot and to discern shared information between slots through a shared-private architecture. Then, our model predicts values from the generative space through a shared-private decoder. We further utilize reinforcement learning to alleviate the label noise problem by learning indirect supervision from semantic relations between conversational words and predefined slot-value pairs. Experimental results on three public datasets show the effectiveness of USPN by outperforming state-of-the-art baselines in both supervised and unsupervised DST tasks.
Reference | Supplementary Material | Related Articles | Metrics
MacroTrend: A Write-Efficient Cache Algorithm for NVM-Based Read Cache
Ning Bao, Yun-Peng Chai, Xiao Qin, and Chuan-Wen Wang
Journal of Computer Science and Technology    2022, 37 (1): 207-230.   DOI: 10.1007/s11390-021-0178-6
Accepted: 21 May 2021

Abstract333)      PDF   
The future storage systems are expected to contain a wide variety of storage media and layers due to the rapid development of NVM (non-volatile memory) techniques. For NVM-based read caches, many kinds of NVM devices cannot stand frequent data updates due to limited write endurance or high energy consumption of writing. However, traditional cache algorithms have to update cached blocks frequently because it is difficult for them to predict long-term popularity according to such limited information about data blocks, such as only a single value or a queue that reflects frequency or recency. In this paper, we propose a new MacroTrend (macroscopic trend) prediction method to discover long-term hot blocks through blocks' macro trends illustrated by their access count histograms. And then a new cache replacement algorithm is designed based on the MacroTrend prediction to greatly reduce the write amount while improving the hit ratio. We conduct extensive experiments driven by a series of real-world traces and find that compared with LRU, MacroTrend can reduce the write amounts of NVM cache devices significantly with similar hit ratios, leading to longer NVM lifetime or less energy consumption.
Reference | Supplementary Material | Related Articles | Metrics
Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View
Tong Chen, Ji-Qiang Liu, He Li, Shuo-Ru Wang, Wen-Jia Niu, En-Dong Tong, Liang Chang, Qi Alfred Chen, Gang Li
Journal of Computer Science and Technology    2021, 36 (5): 1002-1021.   DOI: 10.1007/s11390-021-1217-z
Accepted: 26 July 2021

Abstract329)      PDF   
Reinforcement learning as autonomous learning is greatly driving artificial intelligence (AI) development to practical applications. Having demonstrated the potential to significantly improve synchronously parallel learning, the parallel computing based asynchronous advantage actor-critic (A3C) opens a new door for reinforcement learning. Unfortunately, the acceleration's influence on A3C robustness has been largely overlooked. In this paper, we perform the first robustness assessment of A3C based on parallel computing. By perceiving the policy's action, we construct a global matrix of action probability deviation and define two novel measures of skewness and sparseness to form an integral robustness measure. Based on such static assessment, we then develop a dynamic robustness assessing algorithm through situational whole-space state sampling of changing episodes. Extensive experiments with different combinations of agent number and learning rate are implemented on an A3C-based pathfinding application, demonstrating that our proposed robustness assessment can effectively measure the robustness of A3C, which can achieve an accuracy of 83.3%.
Reference | Supplementary Material | Related Articles | Metrics
Event-Based Semantics of UML 2.X Concurrent Sequence Diagrams for Formal Verification
Inès Mouakher, Fatma Dhaou, and J. Christian Attiogbé
Journal of Computer Science and Technology    2022, 37 (1): 4-28.   DOI: 10.1007/s11390-021-1673-5
Abstract318)      PDF   

UML 2.X sequence diagrams (SD) are among privileged scenarios-based approaches dealing with the complexity of modeling the behaviors of some current systems. However, there are several issues related to the standard semantics of UML 2.X SD proposed by the Object Management Group (OMG). They mainly concern ambiguities of the interpretation of SDs, and the computation of causal relations between events which is not specifically laid out. Moreover, SD is a semi-formal language, and it does not support the verification of the modeled system. This justifies the considerable number of research studies intending to define formal semantics of UML SDs. We proposed in our previous work semantics covering the most popular combined fragments (CF) of control-flow ALT, OPT, LOOP and SEQ, allowing to model alternative, optional, iterative and sequential behaviors respectively. The proposed semantics is based on partial order theory relations that permit the computation of the precedence relations between the events of an SD with nested CFs. We also addressed the issue of the evaluation of the interaction constraint (guard) for guarded CFs, and the related synchronization issue. In this paper, we first extend our semantics, proposed in our previous work; indeed, we propose new rules for the computation of causal relations for SD with PAR and STRICT CFs (dedicated to modeling concurrent and strict behaviors respectively) as well as their nesting. Then, we propose a transformational semantics in Event-B. Our modeling approach emphasizes computation of causal relations, guard handling and transformational semantics into Event-B. The transformation of UML 2.X SD into the formal method Event-B allows us to perform several kinds of verification including simulation, trace acceptance, verification of properties, and verification of refinement relation between SDs.

Reference | Supplementary Material | Related Articles | Metrics
CDM: Content Diffusion Model for Information-Centric Networks
Bo Chen, Liang Liu, Hua-Dong Ma
Journal of Computer Science and Technology    2021, 36 (6): 1431-1451.   DOI: 10.1007/s11390-021-0205-7
Accepted: 07 November 2021

Abstract311)      PDF   
This paper proposes the Content Diffusion Model (CDM) for modeling the content diffusion process in information-centric networking (ICN). CDM is inspired by the epidemic model and it provides a method of theoretical quantitative analysis for the content diffusion process in ICN. Specifically, CDM introduces the key functions to formalize the key factors that influence the content diffusion process, and thus it can construct the model via a simple but efficient way. Further, we derive CDM by using different combinations of those key factors and put them into several typical ICN scenarios, to analyze the characteristics during the diffusion process such as diffusion speed, diffusion scope, average fetching hops, changing and final state, which can greatly help to analyze the network performance and application design. A series of experiments are conducted to evaluate the efficacy and accuracy of CDM. The results show that CDM can accurately illustrate and model the content diffusion process in ICN.
Reference | Supplementary Material | Related Articles | Metrics
Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context
Xin Tan, Long-Yin Zhang, and Guo-Dong Zhou
Journal of Computer Science and Technology    2022, 37 (2): 295-308.   DOI: 10.1007/s11390-021-0286-3
Accepted: 11 January 2021

Abstract308)      PDF   

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.

Reference | Supplementary Material | Related Articles | Metrics
A Blockchain-Based Protocol for Malicious Price Discrimination
Li-De Xue, Ya-Jun Liu, Wei Yang, Wei-Lin Chen, and Liu-Sheng Huang
Journal of Computer Science and Technology    2022, 37 (1): 266-276.   DOI: 10.1007/s11390-021-0583-x
Accepted: 29 December 2021

Abstract298)      PDF   
Serious price discrimination emerges with the development of big data and mobile networks, which harms the interests of consumers. To solve this problem, we propose a blockchain-based price consensus protocol to solve the malicious price discrimination faced by consumers. We give a mathematical definition of price discrimination, which requires the system to satisfy consistency and timeliness. The distributed blockchain can make the different pricing of merchants transparent to consumers, thus satisfying the consistency. The aging window mechanism of our protocol ensures that there is no disagreement between any node on the consensus on price or price discrimination within a fixed period, which meets the timeliness. Moreover, we evaluate its performance through a prototype implementation and experiments with up to 100 user nodes. Experimental results show that our protocol achieves all the expected goals like price transparency, consistency, and timeliness, and it additionally guarantees the consensus of the optimal price with a high probability.
Reference | Supplementary Material | Related Articles | Metrics
Correlated Differential Privacy of Multiparty Data Release in Machine Learning
Jian-Zhe Zhao, Xing-Wei Wang, Ke-Ming Mao, Chen-Xi Huang, Yu-Kai Su, and Yu-Chen Li
Journal of Computer Science and Technology    2022, 37 (1): 231-251.   DOI: 10.1007/s11390-021-1754-5
Accepted: 12 November 2021

Abstract281)      PDF   
Differential privacy (DP) is widely employed for the private data release in the single-party scenario. Data utility could be degraded with noise generated by ubiquitous data correlation, and it is often addressed by sensitivity reduction with correlation analysis. However, increasing multiparty data release applications present new challenges for existing methods. In this paper, we propose a novel correlated differential privacy of the multiparty data release (MP-CRDP). It effectively reduces the merged dataset's dimensionality and correlated sensitivity in two steps to optimize the utility. We also propose a multiparty correlation analysis technique. Based on the prior knowledge of multiparty data, a more reasonable and rigorous standard is designed to measure the correlated degree, reducing correlated sensitivity, and thus improve the data utility. Moreover, by adding noise to the weights of machine learning algorithms and query noise to the release data, MP-CRDP provides the release technology for both low-noise private data and private machine learning algorithms. Comprehensive experiments demonstrate the effectiveness and practicability of the proposed method on the utilized Adult and Breast Cancer datasets.
Reference | Supplementary Material | Related Articles | Metrics
SOOP: Efficient Distributed Graph Computation Supporting Second-Order Random Walks
Songjie Niu, Dongyan Zhou
Journal of Computer Science and Technology    2021, 36 (5): 985-1001.   DOI: 10.1007/s11390-021-1234-y
Abstract274)      PDF   
The second-order random walk has recently been shown to effectively improve the accuracy in graph analysis tasks. Existing work mainly focuses on centralized second-order random walk (SOW) algorithms. SOW algorithms rely on edge-to-edge transition probabilities to generate next random steps. However, it is prohibitively costly to store all the probabilities for large-scale graphs, and restricting the number of probabilities to consider can negatively impact the accuracy of graph analysis tasks. In this paper, we propose and study an alternative approach, SOOP (second-order random walks with on-demand probability computation), that avoids the space overhead by computing the edge-to-edge transition probabilities on demand during the random walk. However, the same probabilities may be computed multiple times when the same edge appears multiple times in SOW, incurring extra cost for redundant computation and communication. We propose two optimization techniques that reduce the complexity of computing edge-to-edge transition probabilities to generate next random steps, and reduce the cost of communicating out-neighbors for the probability computation, respectively. Our experiments on real-world and synthetic graphs show that SOOP achieves orders of magnitude better performance than baseline precompute solutions, and it can efficiently computes SOW algorithms on billion-scale graphs.
Reference | Supplementary Material | Related Articles | Metrics
Characterizing and Detecting Gas-Inefficient Patterns in Smart Contracts
Que-Ping Kong, Zi-Yan Wang, Yuan Huang, Xiang-Ping Chen, Xiao-Cong Zhou, Zi-Bin Zheng, and Gang Huang
Journal of Computer Science and Technology    2022, 37 (1): 67-82.   DOI: 10.1007/s11390-021-1674-4
Accepted: 03 December 2021

Abstract268)      PDF   
Ethereum blockchain is a new internetware with tens of millions of smart contracts running on it. Different from general programs, smart contracts are decentralized, tamper-resistant and permanently running. Moreover, to avoid resource abuse, Ethereum charges users for deploying and invoking smart contracts according to the size of contract and the operations executed by contracts. It is necessary to optimize smart contracts to save money. However, since developers are not familiar with the operating environment of smart contracts (i.e., Ethereum virtual machine) or do not pay attention to resource consumption during development, there are many optimization opportunities for smart contracts. To fill this gap, this paper defines six gas-inefficient patterns from more than 25,000 posts and proposes an optimization approach at the source code level to let users know clearly where the contract is optimized. To evaluate the prevalence and economic benefits of gas-inefficient patterns, this paper conducts an empirical study on more than 160,000 real smart contracts. The promising experimental results demonstrate that 52.75% of contracts contain at least one gas-inefficient pattern proposed in this paper. If these patterns are removed from the contract, at least 0.30 can be saved per contract.
Reference | Supplementary Material | Related Articles | Metrics
Self-Supervised Task Augmentation for Few-Shot Intent Detection
Peng-Fei Sun, Ya-Wen Ouyang, Ding-Jie Song, and Xin-Yu Dai
Journal of Computer Science and Technology    2022, 37 (3): 527-538.   DOI: 10.1007/s11390-022-2029-5
Abstract257)      PDF   
Few-shot intent detection is a practical challenge task, because new intents are frequently emerging and collecting large-scale data for them could be costly. Meta-learning, a promising technique for leveraging data from previous tasks to enable efficient learning of new tasks, has been a popular way to tackle this problem. However, the existing meta-learning models have been evidenced to be overfitting when the meta-training tasks are insufficient. To overcome this challenge, we present a novel self-supervised task augmentation with meta-learning framework, namely STAM. Firstly, we introduce the task augmentation, which explores two different strategies and combines them to extend meta-training tasks. Secondly, we devise two auxiliary losses for integrating self-supervised learning into meta-learning to learn more generalizable and transferable features. Experimental results show that STAM can achieve consistent and considerable performance improvement to existing state-of-the-art methods on four datasets.
Reference | Supplementary Material | Related Articles | Metrics
ovAFLow: Detecting Memory Corruption Bugs with Fuzzing-Based Taint Inference
Gen Zhang, Peng-Fei Wang, Tai Yue, Xiang-Dong Kong, Xu Zhou, and Kai Lu
Journal of Computer Science and Technology    2022, 37 (2): 405-422.   DOI: 10.1007/s11390-021-1600-9
Accepted: 15 November 2021

Abstract255)      PDF   

Grey-box fuzzing is an effective technology to detect software vulnerabilities, such as memory corruption. Previous fuzzers in detecting memory corruption bugs either use heavy-weight analysis, or use techniques which are not customized for memory corruption detection. In this paper, we propose a novel memory bug guided fuzzer, ovAFLow. To begin with, we broaden the memory corruption targets where we frequently identify bugs. Next, ovAFLow utilizes light-weight and effective methods to build connections between the fuzzing inputs and these corruption targets. Based on the connection results, ovAFLow uses customized techniques to direct the fuzzing process closer to memory corruption. We evaluate ovAFLow against state-of-the-art fuzzers, including AFL (american fuzzy lop), AFLFast, FairFuzz, QSYM, Angora, TIFF, and TortoiseFuzz. The evaluation results show better vulnerability detection ability of ovAFLow, and the performance overhead is acceptable. Moreover, we identify 12 new memory corruption bugs and two CVEs (common vulnerability exposures) with the help of ovAFLow.

Reference | Supplementary Material | Related Articles | Metrics
Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion Generation
Fan Liu, De-Long Chen, Rui-Zhi Zhou, Sai Yang, and Feng Xu
Journal of Computer Science and Technology    2022, 37 (3): 539-558.   DOI: 10.1007/s11390-022-2030-z
Accepted: 10 March 2022

Abstract251)      PDF   
The correlation between music and human motion has attracted widespread research attention. Although recent studies have successfully generated motion for singers, dancers, and musicians, few have explored motion generation for orchestral conductors. The generation of music-driven conducting motion should consider not only the basic music beats, but also mid-level music structures, high-level music semantic expressions, and hints for different parts of orchestras (strings, woodwind, etc.). However, most existing conducting motion generation methods rely heavily on human-designed rules, which significantly limits the quality of generated motion. Therefore, we propose a novel Music Motion Synchronized Generative Adversarial Network (M2S-GAN), which generates motions according to the automatically learned music representations. More specifically, M2S-GAN is a cross-modal generative network comprising four components: 1) a music encoder that encodes the music signal; 2) a generator that generates conducting motion from the music codes; 3) a motion encoder that encodes the motion; 4) a discriminator that differentiates the real and generated motions. These four components respectively imitate four key aspects of human conductors: understanding music, interpreting music, precision and elegance. The music and motion encoders are first jointly trained by a self-supervised contrastive loss, and can thus help to facilitate the music motion synchronization during the following adversarial learning process. To verify the effectiveness of our method, we construct a large-scale dataset, named ConductorMotion100, which consists of unprecedented 100 hours of conducting motion data. Extensive experiments on ConductorMotion100 demonstrate the effectiveness of M2S-GAN. Our proposed approach outperforms various comparison methods both quantitatively and qualitatively. Through visualization, we show that our approach can generate plausible, diverse, and music-synchronized conducting motion.
Reference | Supplementary Material | Related Articles | Metrics
A Novel Probabilistic Saturating Counter Design for Secure Branch Predictor
Lu-Tan Zhao, Rui Hou, Kai Wang, Yu-Lan Su, Pei-Nan Li, Dan Meng
Journal of Computer Science and Technology    2021, 36 (5): 1022-1036.   DOI: 10.1007/s11390-021-1253-8
Abstract249)      PDF   
In a modern processor, branch prediction is crucial in effectively exploiting the instruction-level parallelism for high-performance execution. However, recently exposed vulnerabilities reveal the urgency to improve the security of branch predictors. The vital cause of the branch predictor vulnerabilities is that the update strategy of the saturating counter is deterministic. As a fundamental building block in a modern branch predictor, previous studies have paid too much attention to the performance and hardware cost and ignored the security of saturating counter. This leaves attackers with the opportunities to perform side-channel attacks on the branch predictor. This paper focuses on the saturating counter to explore a secure and lightweight design to mitigate branch predictor side-channel attacks. Instead of applying the isolation mechanism to branch predictor resources, we propose a novel probabilistic saturating counter design to confuse the attacker's perception of the victim's behaviour. It changes the conventional deterministic state transition function to a probabilistic state transition function. When a branch is committed, the conventional saturating counter needs to be updated about whether the prediction results are correct or not. While for the probabilistic saturating counter, the branch predictor determines whether the update is performed based on the update probability. The probabilistic saturating counter dramatically reduces the ability of the attacker to spy the saturating counter's state. Our analyses using a cycle-accurate simulator suggest that the proposed mechanism incurs 2.4% performance overhead and hardware cost while providing strong protection.
Reference | Supplementary Material | Related Articles | Metrics
Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web Search
Dong-Hui Yang, Zhen-Yu Li, Xiao-Hui Wang, Kavé Salamatian, Gao-Gang Xie
Journal of Computer Science and Technology    2021, 36 (5): 1167-1183.   DOI: 10.1007/s11390-021-0218-2
Accepted: 01 July 2021

Abstract236)      PDF   
Internet users heavily rely on web search engines for their intended information. The major revenue of search engines is advertisements (or ads). However, the search advertising suffers from fraud. Fraudsters generate fake traffic which does not reach the intended audience, and increases the cost of the advertisers. Therefore, it is critical to detect fraud in web search. Previous studies solve this problem through fraudster detection (especially bots) by leveraging fraudsters' unique behaviors. However, they may fail to detect new means of fraud, such as crowdsourcing fraud, since crowd workers behave in part like normal users. To this end, this paper proposes an approach to detecting fraud in web search from the perspective of fraudulent keywords. We begin by using a unique dataset of 150 million web search logs to examine the discriminating features of fraudulent keywords. Specifically, we model the temporal correlation of fraudulent keywords as a graph, which reveals a very well-connected community structure. Next, we design DFW (detection of fraudulent keywords) that mines the temporal correlations between candidate fraudulent keywords and a given list of seeds. In particular, DFW leverages several refinements to filter out non-fraudulent keywords that co-occur with seeds occasionally. The evaluation using the search logs shows that DFW achieves high fraud detection precision (99%) and accuracy (93%). A further analysis reveals several typical temporal evolution patterns of fraudulent keywords and the co-existence of both bots and crowd workers as fraudsters for web search fraud.
Reference | Supplementary Material | Related Articles | Metrics
TOAST: Automated Testing of Object Transformers in Dynamic Software Updates
Ze-Lin Zhao, Di Huang, and Xiao-Xing Ma
Journal of Computer Science and Technology    2022, 37 (1): 50-66.   DOI: 10.1007/s11390-021-1693-1
Abstract234)      PDF   
Dynamic software update (DSU) patches programs on the fly. It often involves the critical task of object transformation that converts live objects of the old-version program to their semantically consistent counterparts under the new-version program. This task is accomplished by invoking an object transformer on each stale object. However, a defective transformer failing to maintain consistency would cause errors or even crash the program. We propose TOAST (Test Object trAnSformaTion), an automated approach to detecting potential inconsistency caused by object transformers. TOAST first analyzes an update to identify multiple target methods and then adopts a fuzzer with specially designed inconsistency guidance to randomly generate object states to drive two versions of a target method. This creates two corresponding execution traces and a pair of old and new objects. TOAST finally performs object transformation to create a transformed object and detects inconsistency between it and the corresponding new object produced from scratch by the new program. Moreover, TOAST checks behavior inconsistency by comparing the return variables and exceptions of the two executions. Experimental evaluation on 130 updates with default transformers shows that TOAST is promising: it got 96.0% precision and 85.7% recall in state inconsistency detection, and 81.4% precision and 94.6% recall in behavior inconsistency detection. The inconsistency guidance improved the fuzzing efficiency by 14.1% for state inconsistency detection and 40.5% for behavior inconsistency detection.
Reference | Supplementary Material | Related Articles | Metrics
Write-Optimized B+ Tree Index Technology for Persistent Memory
Rui-Xiang Ma, Fei Wu, Bu-Rong Dong, Meng Zhang, Wei-Jun Li, Chang-Sheng Xie
Journal of Computer Science and Technology    2021, 36 (5): 1037-1050.   DOI: 10.1007/s11390-021-1247-6
Abstract232)      PDF   
Due to its low latency, byte-addressable, non-volatile, and high density, persistent memory (PM) is expected to be used to design a high-performance storage system. However, PM also has disadvantages such as limited endurance, thereby proposing challenges to traditional index technologies such as B+ tree. B+ tree is originally designed for dynamic random access memory (DRAM)-based or disk-based systems and has a large write amplification problem. The high write amplification is detrimental to a PM-based system. This paper proposes WO-tree, a write-optimized B+ tree for PM. WO-tree adopts an unordered write mechanism for the leaf nodes, and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes. When the leaf node is split, WO-tree performs the cache line flushing operation after all write operations are completed, which can reduce frequent data flushing operations. WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node. The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information, thereby significantly reducing the logging overhead. Furthermore, WO-tree adopts a lock-free search for inner nodes, which reduces the locking overhead for concurrency operation. We evaluate WO-tree using the Yahoo! Cloud Serving Benchmark (YCSB) workloads. Compared with traditional B+ tree, wB-tree, and Fast-Fair, the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7%, 22.2%, and 30.8%, respectively, and the execution time is reduced by 84.3%, 27.3%, and 44.7%, respectively.
Reference | Supplementary Material | Related Articles | Metrics
Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach
Yue-Wen Wu, Yuan-Jia Xu, Heng Wu, Lin-Gang Su, Wen-Bo Zhang, Hua Zhong
Journal of Computer Science and Technology    2021, 36 (5): 1184-1199.   DOI: 10.1007/s11390-021-0232-4
Accepted: 05 August 2021

Abstract231)      PDF   
Big data analytics applications are increasingly deployed on cloud computing infrastructures, and it is still a big challenge to pick the optimal cloud configurations in a cost-effective way. In this paper, we address this problem with a high accuracy and a low overhead. We propose Apollo, a data-driven approach that can rapidly pick the optimal cloud configurations by reusing data from similar workloads. We first classify 12 typical workloads in BigDataBench by characterizing pairwise correlations in our offline benchmarks. When a new workload comes, we run it with several small datasets to rank its key characteristics and get its similar workloads. Based on the rank, we then limit the search space of cloud configurations through a classification mechanism. At last, we leverage a hierarchical regression model to measure which cluster is more suitable and use a local search strategy to pick the optimal cloud configurations in a few extra tests. Our evaluation on 12 typical workloads in HiBench shows that compared with state-of-the-art approaches, Apollo can improve up to 30% search accuracy, while reducing as much as 50% overhead for picking the optimal cloud configurations.
Reference | Supplementary Material | Related Articles | Metrics
Community Smell Occurrence Prediction on Multi-Granularity by Developer-Oriented Features and Process Metrics
Zi-Jie Huang, Zhi-Qing Shao, Gui-Sheng Fan, Hui-Qun Yu, Xing-Guang Yang, and Kang Yang
Journal of Computer Science and Technology    2022, 37 (1): 182-206.   DOI: 10.1007/s11390-021-1596-1
Abstract230)      PDF   
Community smells are sub-optimal developer community structures that hinder productivity. Prior studies performed smell prediction and provided refactoring guidelines from a top-down aspect to help community shepherds. Simultaneously, refactoring smells also requires bottom-up effort from every developer. However, supportive measures and guidelines for them are not available at a fine-grained level. Since recent work revealed developers' personalities and working states could influence community smells' emergence and variation, we build prediction models with experience, sentiment, and development process features of developers considering three smells including Organizational Silo, Lone Wolf, and Bottleneck, as well as two related classes including smelly developer and smelly quitter. We predict the five classes in the individual granularity, and we also generate forecasts for the number of smelly developers in the community granularity. The proposed models achieve F-measures ranging from 0.73 to 0.92 in individual-wide within-project, time-wise, and cross-project prediction, and mean R2 performance of 0.68 in community-wide Smelly Developer prediction. We also exploit SHAP (SHapley Additive exPlanations) to assess feature importance to explain our predictors. In conclusion, we suggest developers with heavy workload should foster more frequent communication in a straightforward and polite way to build healthier communities, and we recommend community shepherds to use the forecasting model for refactoring planning.
Reference | Supplementary Material | Related Articles | Metrics
Quality of Service Support in RPL Networks: Standing State and Future Prospects
Ibrahim S. Alsukayti
Journal of Computer Science and Technology    2022, 37 (2): 344-368.   DOI: 10.1007/s11390-022-1027-y
Accepted: 05 March 2022

Abstract229)      PDF   

The development of IP-based Internet of Things (IoT) networks would facilitate more effective end-to-end IP network architectures, but it remains a challenge. Network routing needs to be effectively addressed in the IoT environments of scarce computational and energy resources. Accordingly, the Internet Engineering Task Force (IETF) has specified the IPv6 Routing Protocol for Low Power and Lossy Network (RPL) to provide a bespoke IPv6-based routing framework for IoT networks. However, RPL comes with no Quality of Service (QoS) support which is an essential requirement for many IoT applications. The network research community has introduced a number of research proposals enhancing RPL with different QoS solutions. This paper presents a review of these proposed solutions and aims to establish a firm understanding of recent QoS developments for RPL and possible areas for future IoT routing research. The focus is on comprehending the protocol and networking properties that can affect QoS performance in RPL networks. Consideration is also given to different objective functions developed for addressing varying QoS aspects such as throughput, delay, and packet loss. RPL is also extended in a number of QoS solutions following different approaches at the MAC, network, and application layers. However, there is still a need for further developments to address effective QoS support, particularly for dynamic RPL networks.

Reference | Supplementary Material | Related Articles | Metrics
Trace Semantics and Algebraic Laws for Total Store Order Memory Model
Li-Li Xiao, Hui-Biao Zhu, Qi-Wen Xu
Journal of Computer Science and Technology    2021, 36 (6): 1269-1290.   DOI: 10.1007/s11390-021-1616-1
Accepted: 07 November 2021

Abstract227)      PDF   
Modern multiprocessors deploy a variety of weak memory models (WMMs). Total Store Order (TSO) is a widely-used weak memory model in SPARC implementations and x86 architecture. It omits the store-load constraint by allowing each core to employ a write buffer. In this paper, we apply Unifying Theories of Programming (abbreviated as UTP) in investigating the trace semantics for TSO, acting in the denotational semantics style. A trace is expressed as a sequence of snapshots, which records the changes in registers, write buffers and the shared memory. All the valid execution results containing reorderings can be described after kicking out those that do not satisfy program order and modification order. This paper also presents a set of algebraic laws for TSO. We study the concept of head normal form, and every program can be expressed in the head normal form of the guarded choice which is able to model the execution of a program with reorderings. Then the linearizability of the TSO model is supported. Furthermore, we consider the linking between trace semantics and algebraic semantics. The linking is achieved through deriving trace semantics from algebraic semantics, and the derivation strategy under the TSO model is provided.
Reference | Supplementary Material | Related Articles | Metrics
Intent-Slot Correlation Modeling for Joint Intent Prediction and Slot Filling
Jun-Feng Fan, Mei-Ling Wang, Chang-Liang Li, Zi-Qiang Zhu, and Lu Mao
Journal of Computer Science and Technology    2022, 37 (2): 309-319.   DOI: 10.1007/s11390-020-0326-4
Accepted: 20 September 2020

Abstract225)      PDF   

Slot filling and intent prediction are basic tasks in capturing semantic frame of human utterances. Slots and intent have strong correlation for semantic frame parsing. For each utterance, a specific intent type is generally determined with the indication information of words having slot tags (called as slot words), and in reverse the intent type decides that words of certain categories should be used to fill as slots. However, the Intent-Slot correlation is rarely modeled explicitly in existing studies, and hence may be not fully exploited. In this paper, we model Intent-Slot correlation explicitly and propose a new framework for joint intent prediction and slot filling. Firstly, we explore the effects of slot words on intent by differentiating them from the other words, and we recognize slot words by solving a sequence labeling task with the bi-directional long short-term memory (BiLSTM) model. Then, slot recognition information is introduced into attention-based intent prediction and slot filling to improve semantic results. In addition, we integrate the Slot-Gated mechanism into slot filling to model dependency of slots on intent. Finally, we obtain slot recognition, intent prediction and slot filling by training with joint optimization. Experimental results on the benchmark Air-line Travel Information System (ATIS) and Snips datasets show that our Intent-Slot correlation model achieves state-of-the-art semantic frame performance with a lightweight structure.

Reference | Supplementary Material | Related Articles | Metrics
Tao Xie, Shengchao Qin, Wenhui Zhang
Journal of Computer Science and Technology    2021, 36 (6): 1229-1230.   DOI: 10.1007/s11390-021-0008-x
Abstract224)      PDF(pc) (188KB)(3)   
Related Articles | Metrics
Verification of Real Time Operating System Exception Management Based on SPARCv8
Zhi Ma, Lei Qiao, Meng-Fei Yang, Shao-Feng Li, Jin-Kun Zhang
Journal of Computer Science and Technology    2021, 36 (6): 1367-1387.   DOI: 10.1007/s11390-021-1644-x
Accepted: 16 November 2021

Abstract224)      PDF   
Exception management, as the lowest level function module of the operating system, is responsible for making abrupt changes in the control flow to react to exception events in the system. The correctness of the exception management is crucial to guaranteeing the safety of the whole system. However, existing formal verification projects have not fully considered the issues of exceptions at the assembly level. Especially for real-time operating systems, in addition to basic exception handling, there are nested exceptions and task switching by exceptions service routine. In our previous work, we used high-level abstraction to describe the basic elements of the exception management and verified correctness only at the requirement layer. Building on earlier work, this paper proposes EMS (Exception Management SPARCv8), a practical Hoare-style program framework to verify the exception management based on SPARCv8 (Scalable Processor Architecture Version 8) at the design layer. The framework describes the low-level details of the machine, such as registers and memory stack. It divides the execution logic of the exception management into six phases for comprehensive formal modeling. Taking the executing scenario of the real-time operating system SpaceOS on the Beidou-3 satellite as an example, we use the EMS framework to verify the exception management. All the formalization and proofs are implemented in the interactive theorem prover Coq.
Reference | Supplementary Material | Related Articles | Metrics
Checking Causal Consistency of MongoDB
Hong-Rong Ouyang, Heng-Feng Wei, Hai-Xiang Li, An-Qun Pan, and Yu Huang
Journal of Computer Science and Technology    2022, 37 (1): 128-146.   DOI: 10.1007/s11390-021-1662-8
Abstract221)      PDF   
MongoDB is one of the first commercial distributed databases that support causal consistency. Its implementation of causal consistency combines several research ideas for achieving scalability, fault tolerance, and security. Given its inherent complexity, a natural question arises: "Has MongoDB correctly implemented causal consistency as it claimed?'' To address this concern, the Jepsen team has conducted black-box testing of MongoDB. However, this Jepsen testing has several drawbacks in terms of specification, test case generation, implementation of causal consistency checking algorithms, and testing scenarios, which undermine the credibility of its reports. In this work, we propose a more thorough design of Jepsen testing of causal consistency of MongoDB. Specifically, we fully implement the causal consistency checking algorithms proposed by Bouajjani et al. and test MongoDB against three well-known variants of causal consistency, namely CC, CCv, and CM, under various scenarios including node failures, data movement, and network partitions. In addition, we develop formal specifications of causal consistency and their checking algorithms in TLA+ , and verify them using the TLC model checker. We also explain how TLA+ specification can be related to Jepsen testing.
Reference | Supplementary Material | Related Articles | Metrics
Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism
Yu-Jing Feng, De-Jian Li, Xu Tan, Xiao-Chun Ye, Dong-Rui Fan, Wen-Ming Li, Da Wang, Hao Zhang, and Zhi-Min Tang
Journal of Computer Science and Technology    2022, 37 (4): 942-959.   DOI: 10.1007/s11390-020-0555-6
Accepted: 17 December 2020

Abstract219)      PDF(pc) (1266KB)(52)   
The dataflow architecture, which is characterized by a lack of a redundant unified control logic, has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency, especially of applications used in high-performance computing (HPC). Importantly, the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner. Therefore, a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts. Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection, or a handshake mechanism in which the data is only sent after acknowledgments are received. However, these mechanisms are characterized by both inefficient data transfer and increased area cost. Good performance of the dataflow architecture depends on the efficiency of data transfer. In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost, we propose a Look-Ahead Acknowledgment (LAA) mechanism. LAA accelerates the execution flow by speculatively acknowledging ahead without penalties. Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%, with a reduction in the average execution time by 17.4% and an increase in the average power efficiency of dataflow processors by 22.4%. Crucially, our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%. In conclusion, the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.
Reference | Supplementary Material | Related Articles | Metrics
AMCheX: Accurate Analysis of Missing-Check Bugs for Linux Kernel
Ying-Jie Wang, Liang-Ze Yin, Wei Dong
Journal of Computer Science and Technology    2021, 36 (6): 1325-1341.   DOI: 10.1007/s11390-021-1666-4
Accepted: 30 September 2021

Abstract216)      PDF   
The Linux kernel adopts a large number of security checks to prevent security-sensitive operations from being executed under unsafe conditions. If a security-sensitive operation is unchecked, a missing-check issue arises. Missing check is a class of severe bugs in software programs especially in operating system kernels, which may cause a variety of security issues, such as out-of-bound accesses, permission bypasses, and privilege escalations. Due to the lack of security specifications, how to automatically identify security-sensitive operations and their required security checks in the Linux kernel becomes a challenge for missing-check analysis. In this paper, we present an accurate missing-check analysis method for Linux kernel, which can automatically infer possible security-sensitive operations. Particularly, we first automatically identify all possible security check functions of Linux. Then according to their callsites, a two-direction analysis method is leveraged to identify possible security-sensitive operations. A missing-check bug is reported when the security-sensitive operation is not protected by its corresponding security check. We have implemented our method as a tool, named AMCheX, on top of the LLVM (Low Level Virtual Machine) framework and evaluated it on the Linux kernel. AMCheX reported 12 new missing-check bugs which can cause security issues. Five of them have been confirmed by Linux maintainers.
Reference | Supplementary Material | Related Articles | Metrics
Chao Li, Yun Liang
Journal of Computer Science and Technology    2021, 36 (5): 961-962.   DOI: 10.1007/s11390-021-0006-z
Abstract214)      PDF(pc) (138KB)(3)   
Related Articles | Metrics
Imputing DNA Methylation by Transferred Learning Based Neural Network
Xin-Feng Wang, Xiang Zhou, Jia-Hua Rao, Zhu-Jin Zhang, and Yue-Dong Yang
Journal of Computer Science and Technology    2022, 37 (2): 320-329.   DOI: 10.1007/s11390-021-1174-6
Accepted: 18 February 2022

Abstract212)      PDF   

DNA methylation is one important epigenetic type to play a vital role in many diseases including cancers. With the development of the high-throughput sequencing technology, there is much progress to disclose the relations of DNA methylation with diseases. However, the analyses of DNA methylation data are challenging due to the missing values caused by the limitations of current techniques. While many methods have been developed to impute the missing values, these methods are mostly based on the correlations between individual samples, and thus are limited for the abnormal samples in cancers. In this study, we present a novel transfer learning based neural network to impute missing DNA methylation data, namely the TDimpute-DNAmeth method. The method learns common relations between DNA methylation from pan-cancer samples, and then fine-tunes the learned relations over each specific cancer type for imputing the missing data. Tested on 16 cancer datasets, our method was shown to outperform other commonly-used methods. Further analyses indicated that DNA methylation is related to cancer survival and thus can be used as a biomarker of cancer prognosis.

Reference | Supplementary Material | Related Articles | Metrics

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved