Bimonthly    Since 1986
ISSN 1000-9000(Print)
/1860-4749(Online)
CN 11-2296/TP
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
Other Countries: Springer
 
ip访问总数:
ip当日访问总数:
当前在线人数:
Top Read Articles
Published in last 1 year |  In last 2 years |  In last 3 years |  All
Please wait a minute...
For Selected: View Abstracts Toggle Thumbnails
Analyzing and Optimizing Packet Corruption in RDMA Network
Yi-Xiao Gao, Chen Tian, Wei Chen, Duo-Xing Li, Jian Yan, Yuan-Yuan Gong, Bing-Quan Wang, Tao Wu, Lei Han, Fa-Zhi Qi, Shan Zeng, Wan-Chun Dou, and Gui-Hai Chen
Journal of Computer Science and Technology    2022, 37 (4): 743-762.   DOI: 10.1007/s11390-022-2123-8
Abstract949)      PDF   
Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.
Reference | Supplementary Material | Related Articles | Metrics
GAM: A GPU-Accelerated Algorithm for MaxRS Queries in Road Networks
Jian Chen, Kai-Qi Zhang, Tian Ren, Zhen-Qing Wu, and Hong Gao
Journal of Computer Science and Technology    2022, 37 (5): 1005-1025.   DOI: 10.1007/s11390-022-2330-3
Abstract923)      PDF   
In smart phones, vehicles and wearable devices, GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world. Given a set of weighted points and a rectangle r in the space, a maximizing range sum (MaxRS) query is to find the position of r, so as to maximize the total weight of the points covered by r (i.e., the range sum). It has a wide spectrum of applications in spatial crowdsourcing, facility location and traffic monitoring. Most of the existing research focuses on the euclidean space; however, in real life, the user's moving route is constrained by the road network, and the existing MaxRS query algorithms in the road network are inefficient. In this paper, we propose a novel GPU-accelerated algorithm, namely, GAM, to tackle MaxRS queries in road networks in two phases efficiently. In phase 1, we partition the entire road network into many small cells by grid and theoretically prove the correctness of parallel query results by grid shifting, and then we propose an effective multi-grained pruning technique, by which the majority of cells can be pruned without further checking. In phase 2, we design a GPU-friendly storage structure, cell-based road network (CRN), and a two-level parallel framework to compute the final result in the remaining cells. Finally, we conduct extensive experiments on two real-world road networks, and experimental results demonstrate that GAM is on average one order faster than state-of-the-art competitors, and the maximum speedup can achieve about 55 times.
Reference | Supplementary Material | Related Articles | Metrics
PLQ: An Efficient Approach to Processing Pattern-Based Log Queries
Jia Chen, Peng Wang, Fan Qiao, Shi-Qing Du, and Wei Wang
Journal of Computer Science and Technology    2022, 37 (5): 1239-1254.   DOI: 10.1007/s11390-020-0653-5
Accepted: 30 November 2020

Abstract890)      PDF   
As software systems grow more and more complex, extensive techniques have been proposed to analyze the log data to obtain the insight of the system status. However, during log data analysis, tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data, named pattern-based queries. Although existing log management tools and DMBS systems can also support pattern-based queries, they suffer from a low efficiency. To deal with this problem, we propose a novel approach, named PLQ (Pattern-based Log Query). First, PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values. Then, based on bitmap indexes, PLQ finds candidate logs with a set of efficient bit-wise operations. Finally, PLQ fetches candidate logs and validates them according to the queried pattern. Extensive experiments are conducted on real-life datasets. According to experimental results, compared with existing log management systems, PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs. Moreover, in PLQ, since the ratio of the index size to the data size does not exceed 2.5% for log datasets of different sizes, PLQ has a high scalability.
Reference | Supplementary Material | Related Articles | Metrics
Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators
Xiao-Bing Chen, Hao Qi, Shao-Hui Peng, Yi-Min Zhuang, Tian Zhi, and Yun-Ji Chen
Journal of Computer Science and Technology    2022, 37 (6): 1255-1270.   DOI: 10.1007/s11390-021-1213-3
Accepted: 31 May 2021

Abstract782)      PDF   
Uniform memory multicore neural network accelerators (UNNAs) furnish huge computing power to emerging neural network applications. Meanwhile, with neural network architectures going deeper and wider, the limited memory capacity has become a constraint to deploy models on UNNA platforms. Therefore how to efficiently manage memory space and how to reduce workload footprints are urgently significant. In this paper, we propose Tetris: a heuristic static memory management framework for UNNA platforms. Tetris reconstructs execution flows and synchronization relationships among cores to analyze each tensor’s liveness interval. Then the memory management problem is converted to a sequence permutation problem. Tetris uses a genetic algorithm to explore the permutation space to optimize the memory management strategy and reduce memory footprints. We evaluate several typical neural networks and the experimental results demonstrate that Tetris outperforms the state-of-the-art memory allocation methods, and achieves an average memory reduction ratio of 91.9% and 87.9% for a quad-core and a 16-core Cambricon-X platform, respectively.
Reference | Supplementary Material | Related Articles | Metrics
Gaze-Assisted Viewport Control for 360° Video on Smartphone
Linfeng Shen, Yuchi Chen, and Jiangchuan Liu
Journal of Computer Science and Technology    2022, 37 (4): 906-918.   DOI: 10.1007/s11390-022-2037-5
Abstract598)      PDF   
360° video has been becoming one of the major media in recent years, providing immersive experience for viewers with more interactions compared with traditional videos. Most of today's implementations rely on bulky Head-Mounted Displays (HMDs) or require touch screen operations for interactive display, which are not only expensive but also inconvenient for viewers. In this paper, we demonstrate that interactive 360° video streaming can be done with hints from gaze movement detected by the front camera of today's mobile devices (e.g., a smartphone). We design a lightweight real-time gaze point tracking method for this purpose. We integrate it with streaming module and apply a dynamic margin adaption algorithm to minimize the overall energy consumption for battery-constrained mobile devices. Our experiments on state-of-the-art smartphones show the feasibility of our solution and its energy efficiency toward cost-effective real-time 360° video streaming.
Reference | Supplementary Material | Related Articles | Metrics
Universal Image Steganalysis Based on Convolutional Neural Network with Global Covariance Pooling
Xiao-Qing Deng, Bo-Lin Chen, Wei-Qi Luo, and Da Luo
Journal of Computer Science and Technology    2022, 37 (5): 1134-1145.   DOI: 10.1007/s11390-021-0572-0
Accepted: 29 June 2021

Abstract571)      PDF   
Recently, steganalytic methods based on deep learning have achieved much better performance than traditional methods based on handcrafted features. However, most existing methods based on deep learning are specially designed for one image domain (i.e., spatial or JPEG), and they often take long time to train. To make a balance between the detection performance and the training time, in this paper, we propose an effective and relatively fast steganalytic network called US-CovNet (Universal Steganalytic Covariance Network) for both {the} spatial and JPEG domains. To this end, we carefully design several important components of {US-CovNet} that will significantly affect the detection performance, including the high-pass filter set, the shortcut connection and the pooling {layer}. Extensive experimental results show that compared with the current best steganalytic networks (i.e., SRNet and J-YeNet), {US-CovNet} can achieve the state-of-the-art results for detecting spatial steganography and have competitive performance for detecting JPEG steganography. For example, the detection accuracy of US-CovNet is at least 0.56% higher than that of SRNet in the spatial domain. In the JPEG domain, US-CovNet performs slightly worse than J-YeNet in some cases with the degradation of less than 0.78%. However, the training time of US-CovNet is significantly reduced, which is less than 1/4 and 1/2 of SRNet and J-YeNet respectively.
Reference | Supplementary Material | Related Articles | Metrics
SOCA-DOM: A Mobile System-on-Chip Array System for Analyzing Big Data on the Move
Le-Le Li, Jiang-Yi Liu, Jian-Ping Fan, Xue-Hai Qian, Kai Hwang, Yeh-Ching Chung, and Zhi-Bin Yu
Journal of Computer Science and Technology    2022, 37 (6): 1271-1289.   DOI: 10.1007/s11390-022-1087-z
Accepted: 25 April 2022

Abstract523)      PDF   
Recently, analyzing big data on the move is booming. It requires that the hardware resource should be low volume, low power, light in weight, high-performance, and highly scalable whereas the management software should be flexible and consume little hardware resource. To meet these requirements, we present a system named SOCA-DOM that encompasses a mobile system-on-chip array architecture and a two-tier “software-defined” resource manager named Chameleon. First, we design an Ethernet communication board to support an array of mobile system-on-chips. Second, we propose a two-tier software architecture for Chameleon to make it flexible. Third, we devise data, configuration, and control planes for Chameleon to make it “software-defined” and in turn consume hardware resources on demand. Fourth, we design an accurate synthetic metric that represents the computational power of a computing node. We employ 12 Apache Spark benchmarks to evaluate SOCA-DOM. Surprisingly, SOCA-DOM consumes up to 9.4x less CPU resources and 13.5x less memory than Mesos which is an existing resource manager. In addition, we show that a 16-node SOCA-DOM consumes up to 4x less energy than two standard Xeon servers. Based on the results, we conclude that an array architecture with fine-grained hardware resources and a software-defined resource manager works well for analyzing big data on the move.
Reference | Supplementary Material | Related Articles | Metrics
FlexPDA: A Flexible Programming Framework for Deep Learning Accelerators
Lei Liu, Xiu Ma, Hua-Xiao Liu, Guang-Li Li, and Lei Liu
Journal of Computer Science and Technology    2022, 37 (5): 1200-1220.   DOI: 10.1007/s11390-021-1406-9
Accepted: 18 September 2021

Abstract506)      PDF   
There are a wide variety of intelligence accelerators with promising performance and energy efficiency, deployed in a broad range of applications such as computer vision and speech recognition. However, programming productivity hinders the deployment of deep learning accelerators. The low-level library invoked in the high-level deep learning framework which supports the end-to-end execution with a given model, is designed to reduce the programming burden on the intelligence accelerators. Unfortunately, it is inflexible for developers to build a network model for every deep learning application, which probably brings unnecessary repetitive implementation. In this paper, a flexible and efficient programming framework for deep learning accelerators, FlexPDA, is proposed, which provides more optimization opportunities than the low-level library and realizes quick transplantation of applications to intelligence accelerators for fast upgrades. We evaluate FlexPDA by using 10 representative operators selected from deep learning algorithms and an end-to-end network. The experimental results validate the effectiveness of FlexPDA, which achieves an end-to-end performance improvement of 1.620x over the low-level library.
Reference | Supplementary Material | Related Articles | Metrics
The Vibrant Field of Parallel and Distributed Computing — Scan the Special Issue in Honor of Professor Kai Hwang’s 80th Birthday
Guo-Jie Li
Journal of Computer Science and Technology    2023, 38 (1): 1-2.   DOI: 10.1007/s11390-023-0001-7
Abstract490)      PDF(pc) (156KB)(333)   
    It is my great pleasure to write this editorial for the special issue in honor of Professor Kai Hwang' s 80th birthday. The articles are written by Professor Hwang's academic descendants and colleagues, to pay tribute to his decades of contributions to the vibrant field of parallel and distributed computing.
    Professor Hwang obtained his Ph.D. degree in electrical engineering and computer science in 1972, from the University of California at Berkeley. He taught for 44 years at Purdue University and the University of Southern California. He currently serves as Presidential Chair Professor of Computer Science and Engineering at the Chinese University of Hong Kong (Shenzhen). In his five decades of academic career, Professor Hwang has supervised 21 Ph.D. students, authored 10 textbooks, and published over 260 scientific papers. He served as the founding Editor-in-Chief of the Journal of Parallel and Distributed Computing (JPDC), an influential international journal in this field.
    A much-told tale in China's computer science community is that in 2005, 2009 and 2018, the China Computer Federation (CCF) conferred its prestigious Outstanding Achievement Award (Award for Overseas Outstanding Contributions) to Kai Hwang, his Ph.D. student Lionel Ni, and Ni's Ph.D. student Xian-He Sun. This special issue shows a deeper academic lineage of five generations: Hwang-Ni-Sun-Cameron-Ge.
    Lionel Ni serves as Chair Professor and Founding President of the Hong Kong University of Science and Technology (Guangzhou). He contributes two papers to this special issue. First, Ni' s team presents HXPY, a parallel computing software package for financial time-series data processing, that is orders of magnitudes faster than its counterparts. Second, Ni and his colleagues present a comprehensive and timely survey on ubiquitous WiFi and acoustic sensing researches, a growing subfield of distributed computing.
    Xian-He Sun is a University Distinguished Professor at the Illinois Institute of Technology and the Editor-in- Chief of IEEE Transactions on Parallel and Distributed Systems. His team has contributed a comprehensive review of the memory-bounded speedup model, which is called Sun-Ni's law in Kai Hwang's textbooks.
    Kirk W. Cameron is a professor at Virginia Tech and a pioneer in green HPC. His team presents a retrospective on scalability beyond Amdahl's law, especially how power-performance measurement and modeling at scale influenced server and supercomputer design.
    Rong Ge is a Dean's Distinguished Professor in the School of Computing at Clemson University. Her team has devised a vision called the paradigm of power-bounded HPC, which is different from previous low-power computing and power-aware computing paradigms.
    Interestingly, three other first-generation Ph.D. students of Professor Hwang have also contributed to this issue. Zhi-Wei Xu is a professor at the University of Chinese Academy of Sciences and the Editor-in-Chief of the Journal of Computer Science and Technology. His team presents Information Superbahn, a perspective on future computing utility with low entropy and high goodput characteristics.
    Another academic star spawned by Hwang is Ahmed Louri, a chair professor at George Washington University and the Editor-in-Chief of IEEE Transactions on Computers. His team presents GShuttle, a graph convolutional neural network acceleration scheme that minimizes off-chip DRAM and on-chip SRAM accesses.
    Yet another is Dhabaleswar K. Panda, a professor at Ohio State University and an influential HPC expert. His MVAPICH software has been used by thousands of organizations worldwide. His team is credited with a cutting- edge study of communication performance on the world's first exascale supercomputer with the most recent interconnect (Slingshot).
    Not to be overlooked are two other second-generation Ph.D. students who have contributed review articles. Yun-Hao Liu is a chair professor at Tsinghua University, Beijing, and the Editor-in-Chief of ACM Transactions on Sensor Network. His team has presented a timely survey of clock synchronization techniques for Industrial Internet scenarios.
    Xiaoyi Lu is an assistant professor at the University of California at Merced. His team has come up with a comprehensive survey on xCCL, a host of industry-led collective communication libraries for deep learning, and has answered why the industry has chosen xCCL instead of classic MPI.
    The parade of Hwang-nurtured talents is hardly complete without Hai Jin as a representative of many postdoctorals who worked with Professor Hwang. Dr. Jin is a chair professor at Huazhong University of Science and Technology and a co-Editor-in-Chief of Computer Systems Science and Engineering, an open-access journal. His team has invented a new method for Chinese entity linking, a natural language processing problem that is important workload for parallel and distributed systems.
    Wei-Min Zheng is a professor at Tsinghua University, Beijing, and a past president of China Computer Federation. It is he who, being a long-time colleague, has translated Hwang's textbook into Chinese. His team has devised a perspective on unified programming model for heterogeneous high-performance computers, a fundamental issue of future parallel and distributed computing.
Related Articles | Metrics
Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing
She Sun, Shuai Ma, Jing-He Song, Wen-Hai Yue, Xue-Lian Lin, and Tiejun Ma
Journal of Computer Science and Technology    2022, 37 (5): 1026-1048.   DOI: 10.1007/s11390-022-2409-x
Abstract471)      PDF   
With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez et al. is exactly opposite to that of De Montjoye et al., which raises the first question: "what is the true situation of the privacy preservation for trajectories in terms of reidentification?'' Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: "what is the true situation of the utility of anonymized trajectories?'' To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, k-anonymity and ε-differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye et al. and Sánchez et al. To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications.
Reference | Supplementary Material | Related Articles | Metrics
RV16: An Ultra-Low-Cost Embedded RISC-V Processor Core
Yuan-Hu Cheng, Li-Bo Huang, Yi-Jun Cui, Sheng Ma, Yong-Wen Wang, and Bing-Cai Sui
Journal of Computer Science and Technology    2022, 37 (6): 1307-1319.   DOI: 10.1007/s11390-022-0910-x
Accepted: 07 May 2022

Abstract464)      PDF   
Embedded and Internet of Things (IoT) devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment. To reduce the overhead of the embedded processor as much as possible, this paper designs and implements a configurable 32-bit in-order RISC-V processor core based on the 16-bit data path and units, named RV16. The evaluation results show that, compared with the traditional 32-bit RISC-V processor with similar features, RV16 consumes fewer hardware resources and less power consumption. The maximum performance of RV16 running Dhrystone and CoreMark benchmarks is 0.92 DMIPS/MHz and 1.51 CoreMark/MHz, respectively, reaching 75% and 71% of traditional 32-bit processors, respectively. Moreover, a properly configured RV16 running program also consumes less energy than a traditional 32-bit processor.
Reference | Supplementary Material | Related Articles | Metrics
Accelerating DAG-Style Job Execution via Optimizing Resource Pipeline Scheduling
Yubin Duan, Ning Wang, and Jie Wu
Journal of Computer Science and Technology    2022, 37 (4): 852-868.   DOI: 10.1007/s11390-021-1488-4
Accepted: 23 November 2021

Abstract459)      PDF   
The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.
Reference | Supplementary Material | Related Articles | Metrics
Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism
Yu-Jing Feng, De-Jian Li, Xu Tan, Xiao-Chun Ye, Dong-Rui Fan, Wen-Ming Li, Da Wang, Hao Zhang, and Zhi-Min Tang
Journal of Computer Science and Technology    2022, 37 (4): 942-959.   DOI: 10.1007/s11390-020-0555-6
Accepted: 17 December 2020

Abstract442)      PDF   
The dataflow architecture, which is characterized by a lack of a redundant unified control logic, has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency, especially of applications used in high-performance computing (HPC). Importantly, the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner. Therefore, a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts. Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection, or a handshake mechanism in which the data is only sent after acknowledgments are received. However, these mechanisms are characterized by both inefficient data transfer and increased area cost. Good performance of the dataflow architecture depends on the efficiency of data transfer. In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost, we propose a Look-Ahead Acknowledgment (LAA) mechanism. LAA accelerates the execution flow by speculatively acknowledging ahead without penalties. Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%, with a reduction in the average execution time by 17.4% and an increase in the average power efficiency of dataflow processors by 22.4%. Crucially, our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%. In conclusion, the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.
Reference | Supplementary Material | Related Articles | Metrics
Quasi-Developable B-Spline Surface Design with Control Rulings
Zi-Xuan Hu, Peng-Bo Bo, and Cai-Ming Zhang
Journal of Computer Science and Technology    2022, 37 (5): 1221-1238.   DOI: 10.1007/s11390-022-0680-5
Accepted: 10 February 2022

Abstract439)      PDF   
We propose a method for generating a ruled B-spline surface fitting to a sequence of pre-defined ruling lines and the generated surface is required to be as-developable-as-possible. Specifically, the terminal ruling lines are treated as hard constraints. Different from existing methods that compute a quasi-developable surface from two boundary curves and cannot achieve explicit ruling control, our method controls ruling lines in an intuitive way and serves as an effective tool for computing quasi-developable surfaces from freely-designed rulings. We treat this problem from the point of view of numerical optimization and solve for surfaces meeting the distance error tolerance allowed in applications. The performance and the efficacy of the proposed method are demonstrated by the experiments on a variety of models including an application of the method for path planning in 5-axis computer numerical control (CNC) flank milling.
Reference | Supplementary Material | Related Articles | Metrics
SMRI: A New Method for siRNA Design for COVID-19 Therapy
Meng-Xin Chen, Xiao-Dong Zhu, Hao Zhang, Zhen Liu, and Yuan-Ning Liu
Journal of Computer Science and Technology    2022, 37 (4): 991-1002.   DOI: 10.1007/s11390-021-0826-x
Accepted: 31 August 2021

Abstract431)      PDF   
First discovered in Wuhan, China, SARS-CoV-2 is a highly pathogenic novel coronavirus, which rapidly spreads globally and becomes a pandemic with no vaccine and limited distinctive clinical drugs available till March 13th, 2020. Ribonucleic Acid interference (RNAi) technology, a gene-silencing technology that targets mRNA, can cause damage to RNA viruses effectively. Here, we report a new efficient small interfering RNA (siRNA) design method named Simple Multiple Rules Intelligent Method (SMRI) to propose a new solution of the treatment of COVID-19. To be specific, this study proposes a new model named Base Preference and Thermodynamic Characteristic model (BPTC model) indicating the siRNA silencing efficiency and a new index named siRNA Extended Rules index (SER index) based on the BPTC model to screen high-efficiency siRNAs and filter out the siRNAs that are difficult to take effect or synthesize as a part of the SMRI method, which is more robust and efficient than the traditional statistical indicators under the same circumstances. Besides, to silence the spike protein of SARS-CoV-2 to invade cells, this study further puts forward the SMRI method to search candidate high-efficiency siRNAs on SARS-CoV-2's S gene. This study is one of the early studies applying RNAi therapy to the COVID-19 treatment. According to the analysis, the average value of predicted interference efficiency of the candidate siRNAs designed by the SMRI method is comparable to that of the mainstream siRNA design algorithms. Moreover, the SMRI method ensures that the designed siRNAs have more than three base mismatches with human genes, thus avoiding silencing normal human genes. This is not considered by other mainstream methods, thereby the five candidate high-efficiency siRNAs which are easy to take effect or synthesize and much safer for human body are obtained by our SMRI method, which provide a new safer, small dosage and long efficacy solution for the treatment of COVID-19.
Reference | Supplementary Material | Related Articles | Metrics
Neural Emotion Detection via Personal Attributes
Xia-Bing Zhou, Zhong-Qing Wang, Xing-Wei Liang, Min Zhang, and Guo-Dong Zhou
Journal of Computer Science and Technology    2022, 37 (5): 1146-1160.   DOI: 10.1007/s11390-021-0606-7
Accepted: 13 April 2021

Abstract431)      PDF   
There has been a recent line of work to automatically detect the emotions of posts in social media. In literature, studies treat posts independently and detect their emotions separately. Different from previous studies, we explore the dependence among relevant posts via authors' backgrounds, since the authors with similar backgrounds, e.g., "gender", "location", tend to express similar emotions. However, personal attributes are not easy to obtain in most social media websites. Accordingly, we propose two approaches to determine personal attributes and capture personal attributes between different posts for emotion detection: the Joint Model with Personal Attention Mechanism (JPA) model is used to detect emotion and personal attributes jointly, and capture the attributes-aware words to connect similar people; the Neural Personal Discrimination (NPD) model is employed to determine the personal attributes from posts and connect the relevant posts with similar attributes for emotion detection. Experimental results show the usefulness of personal attributes in emotion detection, and the effectiveness of the proposed JPA and NPD approaches in capturing personal attributes over the state-of-the-art statistic and neural models.
Reference | Supplementary Material | Related Articles | Metrics
Towards Exploring Large Molecular Space: An Efficient Chemical Genetic Algorithm
Jian-Fu Zhu, Zhong-Kai Hao, Qi Liu, Yu Yin, Cheng-Qiang Lu, Zhen-Ya Huang, and En-Hong Chen
Journal of Computer Science and Technology    2022, 37 (6): 1464-1477.   DOI: 10.1007/s11390-021-0970-3
Accepted: 20 April 2021

Abstract421)      PDF   
Generating molecules with desired properties is an important task in chemistry and pharmacy. An efficient method may have a positive impact on finding drugs to treat diseases like COVID-19. Data mining and artificial intelligence may be good ways to find an efficient method. Recently, both the generative models based on deep learning and the work based on genetic algorithms have made some progress in generating molecules and optimizing the molecule’s properties. However, existing methods have defects in the experimental evaluation standards. These methods also need to be improved in efficiency and performance. To solve these problems, we propose a method named the Chemical Genetic Algorithm for Large Molecular Space (CALM). Specifically, CALM employs a scalable and efficient molecular representation called molecular matrix. And we design corresponding crossover, mutation, and mask operators inspired by domain knowledge and previous studies. We apply our genetic algorithm to several tasks related to molecular property optimization and constraint molecular optimization. The results of these tasks show that our approach outperforms the other state-of-the-art deep learning and genetic algorithm methods, where the z tests performed on the results of several experiments show that our method is more than 99% likely to be significant. At the same time, based on the experimental results, we point out the defects in the experimental evaluation standard which affects the fair evaluation of all previous work. Avoiding these defects helps to objectively evaluate the performance of all work.
Reference | Supplementary Material | Related Articles | Metrics
ML-Parser: An Efficient and Accurate Online Log Parser
Yu-Qian Zhu, Jia-Ying Deng, Jia-Chen Pu, Peng Wang, Shen Liang and Wei Wang
Journal of Computer Science and Technology    2022, 37 (6): 1412-1426.   DOI: 10.1007/s11390-021-0730-4
Accepted: 18 September 2021

Abstract416)      PDF   
A log is a text message that is generated in various services, frameworks, and programs. The majority of log data mining tasks rely on log parsing as the first step, which transforms raw logs into formatted log templates. Existing log parsing approaches often fail to effectively handle the trade-off between parsing quality and performance. In view of this, in this paper, we present Multi-Layer Parser (ML-Parser), an online log parser that runs in a streaming manner. Specifically, we present a multi-layer structure in log parsing to strike a balance between efficiency and effectiveness. Coarse-grained tokenization and a fast similarity measure are applied for efficiency while fine-grained tokenization and an accurate similarity measure are used for effectiveness. In experiments, we compare ML-Parser with two existing online log parsing approaches, Drain and Spell, on ten real-world datasets, five labeled and five unlabeled. On the five labeled datasets, we use the proportion of correctly parsed logs to measure the accuracy, and ML-Parser achieves the highest accuracy on four datasets. On the whole ten datasets, we use Loss metric to measure the parsing quality. ML-Parse achieves the highest quality on seven out of the ten datasets while maintaining relatively high efficiency.
Reference | Supplementary Material | Related Articles | Metrics
Synthetic Data Generation and Shuffled Multi-Round Training Based Offline Handwritten Mathematical Expression Recognition
Lan-Fang Dong, Han-Chao Liu, and Xin-Ming Zhang
Journal of Computer Science and Technology    2022, 37 (6): 1427-1443.   DOI: 10.1007/s11390-021-0722-4
Accepted: 16 September 2021

Abstract411)      PDF   
Offline handwritten mathematical expression recognition is a challenging optical character recognition (OCR) task due to various ambiguities of handwritten symbols and complicated two-dimensional structures. Recent work in this area usually constructs deeper and deeper neural networks trained with end-to-end approaches to improve the performance. However, the higher the complexity of the network, the more the computing resources and time required. To improve the performance without more computing requirements, we concentrate on the training data and the training strategy in this paper. We propose a data augmentation method which can generate synthetic samples with new LaTeX notations by only using the official training data of CROHME. Moreover, we propose a novel training strategy called Shuffled Multi-Round Training (SMRT) to regularize the model. With the generated data and the shuffled multi-round training strategy, we achieve the state-of-the-art result in expression accuracy, i.e., 59.74% and 61.57% on CROHME 2014 and 2016, respectively, by using attention-based encoder-decoder models for offline handwritten mathematical expression recognition.
Reference | Supplementary Material | Related Articles | Metrics
Ubiquitous WiFi and Acoustic Sensing: Principles, Technologies, and Applications
Jia-Ling Huang, Yun-Shu Wang, Yong-Pan Zou, Kai-Shun Wu, and Lionel Ming-shuan Ni
Journal of Computer Science and Technology    2023, 38 (1): 25-63.   DOI: 10.1007/s11390-023-3073-5
Accepted: 23 January 2023

Abstract405)      PDF   
With the increasing pervasiveness of mobile devices such as smartphones, smart TVs, and wearables, smart sensing, transforming the physical world into digital information based on various sensing medias, has drawn researchers' great attention. Among different sensing medias, WiFi and acoustic signals stand out due to their ubiquity and zero hardware cost. Based on different basic principles, researchers have proposed different technologies for sensing applications with WiFi and acoustic signals covering human activity recognition, motion tracking, indoor localization, health monitoring, and the like. To enable readers to get a comprehensive understanding of ubiquitous wireless sensing, we conduct a survey of existing work to introduce their underlying principles, proposed technologies, and practical applications. Besides we also discuss some open issues of this research area. Our survey reals that as a promising research direction, WiFi and acoustic sensing technologies can bring about fancy applications, but still have limitations in hardware restriction, robustness, and applicability.
Reference | Supplementary Material | Related Articles | Metrics
Real-Time Semantic Segmentation via an Efficient Multi-Column Network
Cheng-Li Peng and Jia-Yi Ma
Journal of Computer Science and Technology    2022, 37 (6): 1478-1491.   DOI: 10.1007/s11390-022-0888-4
Accepted: 10 February 2022

Abstract395)      PDF   
Existing semantic segmentation networks based on the multi-column structure can hardly satisfy the efficiency and precision requirements simultaneously due to their shallow spatial branches. In this paper, we propose a new efficient multi-column network termed as LadderNet to address this problem. Our LadderNet includes two branches where the spatial branch generates high-resolution output feature map and the context branch encodes accurate semantic information. In particular, we first propose a channel attention fusion block and a global context module to enhance the information encoding ability of the context branch. Subsequently, a new branch fusion method, i.e., fusing some middle feature maps of the context branch into the spatial branch, is developed to improve the depth of the spatial branch. Meanwhile, we design a feature fusing module to enhance the fusion quality of these two branches, leading to a more efficient network. We compare our model with other state-of-the-arts on PASCAL VOC 2012 and Cityscapes benchmarks. Experimental results demonstrate that, compared with other state-of-the-art methods, our LadderNet can achieve average 1.25% mIoU improvement with comparable or less computation.
Reference | Supplementary Material | Related Articles | Metrics
Efficient Partitioning Method for Optimizing the Compression on Array Data
Shuai Han, Xian-Min Liu, and Jian-Zhong Li
Journal of Computer Science and Technology    2022, 37 (5): 1049-1067.   DOI: 10.1007/s11390-022-2371-7
Abstract385)      PDF   
Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that the array partitioning can affect the compression performance significantly, this paper aims to design the efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.
Reference | Supplementary Material | Related Articles | Metrics
SMART: Speedup Job Completion Time by Scheduling Reduce Tasks
Jia-Qing Dong, Ze-Hao He, Yuan-Yuan Gong, Pei-Wen Yu, Chen Tian, Wan-Chun Dou, Gui-Hai Chen, Nai Xia, and Hao-Ran Guan
Journal of Computer Science and Technology    2022, 37 (4): 763-778.   DOI: 10.1007/s11390-022-2118-5
Abstract378)      PDF   
Distributed computing systems have been widely used as the amount of data grows exponentially in the era of information explosion. Job completion time (JCT) is a major metric for assessing their effectiveness. How to reduce the JCT for these systems through reasonable scheduling has become a hot issue in both industry and academia. Data skew is a common phenomenon that can compromise the performance of such distributed computing systems. This paper proposes SMART, which can effectively reduce the JCT through handling the data skew during the reducing phase. SMART predicts the size of reduce tasks based on part of the completed map tasks and then enforces largest-first scheduling in the reducing phase according to the predicted reduce task size. SMART makes minimal modifications to the original Hadoop with only 20 additional lines of code and is readily deployable. The robustness and the effectiveness of SMART have been evaluated with a real-world cluster against a large number of datasets. Experiments show that SMART reduces JCT by up to 6.47%, 9.26%, and 13.66% for Terasort, WordCount and InvertedIndex respectively with the Purdue MapReduce benchmarks suite (PUMA) dataset.
Reference | Supplementary Material | Related Articles | Metrics
Novel Positive Multi-Layer Graph Based Method for Collaborative Filtering Recommender Systems
Bushra Alhijawi and Ghazi AL-Naymat
Journal of Computer Science and Technology    2022, 37 (4): 975-990.   DOI: 10.1007/s11390-021-0420-2
Accepted: 29 April 2021

Abstract374)      PDF   
Recommender systems play an increasingly important role in a wide variety of applications to help users find favorite products. Collaborative filtering has remarkable success in terms of accuracy and becomes one of the most popular recommendation methods. However, these methods have shown unpretentious performance in terms of novelty, diversity, and coverage. We propose a novel graph-based collaborative filtering method, namely Positive Multi-Layer Graph-Based Recommender System (PMLG-RS). PMLG-RS involves a positive multi-layer graph and a path search algorithm to generate recommendations. The positive multi-layer graph consists of two connected layers: the user and item layers. PMLG-RS requires developing a new path search method that finds the shortest path with the highest cost from a source node to every other node. A set of experiments are conducted to compare the PMLG-RS with well-known recommendation methods based on three benchmark datasets, MovieLens-100K, MovieLens-Last, and Film Trust. The results demonstrate the superiority of PMLG-RS and its high capability in making relevant, novel, and diverse recommendations for users.
Reference | Supplementary Material | Related Articles | Metrics
TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis
Yi-Fan Zhang, Lei Sun, and Qiang Cao
Journal of Computer Science and Technology    2022, 37 (6): 1290-1306.   DOI: 10.1007/s11390-022-1499-9
Accepted: 12 April 2022

Abstract374)      PDF   
Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, a mid-level multi-unit decoding core, and a high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.
Reference | Supplementary Material | Related Articles | Metrics
Preface
Guo-Liang Li, Nan Tang and Cheng-Liang Chai
Journal of Computer Science and Technology    2022, 37 (5): 1003-1004.   DOI: 10.1007/s11390-022-0005-8
Abstract370)      PDF(pc) (195KB)(223)   

Data science targets the data life cycle of real applications, studying phenomena at scales, complexities, and granularities never before possible. This data life cycle encompasses databases and data engineering often leveraging statistical, machine learning, and artificial intelligence methods and, in many instances, using massive and heterogeneous collections of potentially noisy datasets. In this special section, we focus on data-intensive components of data science pipelines; and solve problems in areas of interest to our community (e.g., data curation, optimization, performance, storage, and systems).

To promote the recent work on scalable data science, we organize this special section at Journal of Computer Science and Technology (JCST). We received XX papers from all over the world. First, the guest editors preformed quick reviews and immediately rejected insufficiently highquality submissions. Then, each remaining submission was reviewed by at least three invited international reviewers. All the papers were carried out two rounds of reviews, and the authors were asked to address all the major and minor issues in their submissions during the review process. Eventually we accepted seven high-quality submissions in terms of clarity, novelty, significance, and relevance.

The first paper "GAM: A GPU-Accelerated Algorithm for MaxRS Queries in Road Networks" by Jian Chen et al. proposes a novel GPU-accelerated algorithm GAM to tackle maximizing range sum queries in road networks efficiently with a two-level framework. The framework first proposes an effective multi-grained pruning technique to prune the cells derived from partitioning the road network, and then GPU-friendly storage structure is designed to compute the final result in the remaining cells.

The second paper "Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishingby Sun et al. systematically evaluates the individual privacy in terms of unicity and the utility in terms of practical applications of the anonymized trajectory data. This paper reveals the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories.

The third paper "Efficient Partitioning Method for Optimizing the Compression on Array Databy Han et al. utilizes header compression to address the problem of array partitioning for optimizing the compression performance. The paper designs a greedy strategy which can help to find the partition point with the best compression performance.

The forth paper "Discovering Cohesive Temporal Subgraphs with Temporal Density Aware Explorationby Zhu et al. proposes a temporal subgraph model to discover cohesive temporal subgraphs by capturing both the structural and the temporal characteristics of temporal cohesive subgraphs. This paper designs strategies to mine temporal densest subgraphs efficiently by decomposing the temporal graph into the sequence of snapshots.

The fifth paper "Incremental User Identification Across Social Networks Based on User-Guider Similarity Index" by Kou et al. proposes an incremental user identification method across social networks based on User-guider Similarity Index. The paper first constructs a novel User guider Similarity Index to speed up the matching between users, and then applies a two-phase user identification strategy to efficiently identify users.

The sixth paper "An Exercise Collection Auto-Assembling Framework with Knowledge Tracing and Reinforcement Learning" by Zhao et al. introduces an exercise collection auto-assembling framework, in which the assembled exercise collection can meet the teacher’s requirements on the difficulty index and the discrimination index. The paper designs a two-stage approach where a knowledge tracing model is used to predict the students’ answers and a deep reinforcement learning model to select exercises to satisfy the query parameters.


Related Articles | Metrics
HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
Jia-dong Guo, Jing-shu Peng, Hang Yuan, and Lionel Ming-shuan Ni
Journal of Computer Science and Technology    2023, 38 (1): 3-24.   DOI: 10.1007/s11390-023-2879-5
Accepted: 10 January 2023

Abstract367)      PDF   
A tremendous amount of data has been generated by global financial markets everyday, and such time-series data needs to be analyzed in real time to explore its potential value. In recent years, we have witnessed the successful adoption of machine learning models on financial data, where the importance of accuracy and timeliness demands highly effective computing frameworks. However, traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues, such as the outlier handling with stock suspension in Pandas and TA-Lib. In this paper, we propose HXPY, a high-performance data processing package with a C++/Python interface for financial time-series data. HXPY supports miscellaneous acceleration techniques such as the streaming algorithm, the vectorization instruction set, and memory optimization, together with various functions such as time window functions, group operations, down-sampling operations, cross-section operations, row-wise or column-wise operations, shape transformations, and alignment functions. The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts. From MiBs to GiBs data, HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.
Reference | Supplementary Material | Related Articles | Metrics
Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach
Hui Yu, Xin-Yu Jiang, Jin Zhao, Hao Qi, Yu Zhang, Xiao-Fei Liao, Hai-Kun Liu, Fu-Bing Mao, and Hai Jin
Journal of Computer Science and Technology    2022, 37 (4): 797-813.   DOI: 10.1007/s11390-022-2101-1
Abstract356)      PDF   
Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.
Reference | Supplementary Material | Related Articles | Metrics
A Probabilistic Framework for Temporal Cognitive Diagnosis in Online Learning Systems
Jia-Yu Liu, Fei Wang , Hai-Ping Ma, Zhen-Ya Huang, Qi Liu, En-Hong Chen, and Yu Su
Journal of Computer Science and Technology    DOI: 10.1007/s11390-022-1332-5
Accepted: 01 August 2022

An Efficient Scheme to Defend Data-to-Control-Plane Saturation Attacks in Software-Defined Networking
Xuan-Bo Huang, Kai-Ping Xue, Yi-Tao Xing, Ding-Wen Hu, Ruidong Li, and Qi-Bin Sun
Journal of Computer Science and Technology    2022, 37 (4): 839-851.   DOI: 10.1007/s11390-022-1495-0
Accepted: 24 May 2022

Abstract334)      PDF   
Software-defined networking (SDN) decouples the data and control planes. However, attackers can lead catastrophic results to the whole network using manipulated flooding packets, called the data-to-control-plane saturation attacks. The existing methods, using centralized mitigation policies and ignoring the buffered attack flows, involve extra network entities and make benign traffic suffer from long network recovery delays. For these purposes, we propose LFSDM, a saturation attack detection and mitigation system, which solves these challenges by leveraging three new techniques: 1) using linear discriminant analysis (LDA) and extracting a novel feature called control channel occupation rate (CCOR) to detect the attacks, 2) adopting the distributed mitigation agents to reduce the number of involved network entities and, 3) cleaning up the buffered attack flows to enable fast recovery. Experiments show that our system can detect the attacks timely and accurately. More importantly, compared with the previous work, we save 81% of the network recovery delay under attacks ranging from 1,000 to 4,000 packets per second (PPS) on average, and 87% of the network recovery delay under higher attack rates with PPS ranging from 5,000 to 30,000.
Reference | Supplementary Material | Related Articles | Metrics
xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning
Adam Weingram, Yuke Li , Hao Qi, Darren Ng, Liuyao Dai, and Xiaoyi Lu
Journal of Computer Science and Technology    2023, 38 (1): 166-195.   DOI: 10.1007/s11390-023-2894-6
Accepted: 03 January 2023

Abstract334)      PDF   
Machine learning techniques have become ubiquitous both in industry and academic applications. Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches. Collective communications greatly simplify inter- and intra-node data transfer and are an essential part of the distributed training process as information such as gradients must be shared between processing nodes. In this paper, we survey the current state-of-the-art collective communication libraries (namely xCCL, including NCCL, oneCCL, RCCL, MSCCL, ACCL, and Gloo), with a focus on the industry-led ones for deep learning workloads. We investigate the design features of these xCCLs, discuss their use cases in the industry deep learning workloads, compare their performance with industry-made benchmarks (i.e., NCCL Tests and PARAM), and discuss key take-aways and interesting observations. We believe our survey sheds light on potential research directions of future designs for xCCLs.
Reference | Supplementary Material | Related Articles | Metrics
Approximation Designs for Energy Harvesting Relay Deployment in Wireless Sensor Networks
Yi Wang, Yi-Xue Liu, Shun-Jia Zhu, Xiao-Feng Gao, and Chen Tian
Journal of Computer Science and Technology    2022, 37 (4): 779-796.   DOI: 10.1007/s11390-022-1964-5
Abstract331)      PDF   
Energy harvesting technologies allow wireless devices to be recharged by the surrounding environment, providing wireless sensor networks (WSNs) with higher performance and longer lifetime. However, directly building a wireless sensor network with energy harvesting nodes is very costly. A compromise is upgrading existing networks with energy harvesting technologies. In this paper, we focus on prolonging the lifetime of WSNs with the help of energy harvesting relays (EHRs). EHRs are responsible for forwarding data for sensor nodes, allowing them to become terminals and thus extending their lifetime. We aim to deploy a minimum number of relays covering the whole network. As EHRs have several special properties such as the energy harvesting and depletion rate, it brings great research challenges to seek an optimal deployment strategy. To this end, we propose an approximation algorithm named Effective Relay Deployment Algorithm, which can be divided into two phases: disk covering and connector insertion using the partitioning technique and the Steinerization technique, respectively. Based on probabilistic analysis, we further optimize the performance ratio of our algorithm to (5 + 6/K) where K is an integer denoting the side length of a cell after partitioning. Our extensive simulation results show that our algorithm can reduce the number of EHRs to be deployed by up to 45% compared with previous work and thus validate the efficiency and effectiveness of our solution.
Reference | Supplementary Material | Related Articles | Metrics
Distributed Game-Theoretical D2D-Enabled Task Offloading in Mobile Edge Computing
En Wang, Han Wang, Peng-Min Dong, Yuan-Bo Xu, and Yong-Jian Yang
Journal of Computer Science and Technology    2022, 37 (4): 919-941.   DOI: 10.1007/s11390-022-2063-3
Abstract321)      PDF   
Mobile Edge Computing (MEC) has been envisioned as a promising distributed computing paradigm where mobile users offload their tasks to edge nodes to decrease the cost of energy and computation. However, most of the existing studies only consider the congestion of wireless channels as a crucial factor affecting the strategy-making process, while ignoring the impact of offloading among edge nodes. In addition, centralized task offloading strategies result in enormous computation complexity in center nodes. Along this line, we take both the congestion of wireless channels and the offloading among multiple edge nodes into consideration to enrich users' offloading strategies and propose the Parallel User Selection Algorithm (PUS) and Single User Selection Algorithm (SUS) to substantially accelerate the convergence. More practically, we extend the users' offloading strategies to take into account idle devices and cloud services, which considers the potential computing resources at the edge. Furthermore, we construct a potential game in which each user selfishly seeks an optimal strategy to minimize its cost of latency and energy based on acceptable latency, and find the potential function to prove the existence of Nash equilibrium (NE). Additionally, we update PUS to accelerate its convergence and illustrate its performance through the experimental results of three real datasets, and the updated PUS effectively decreases the total cost and reaches Nash equilibrium.
Reference | Supplementary Material | Related Articles | Metrics
Reliability and Incentive of Performance Assessment for Decentralized Clouds
Jiu-Chen Shi, Xiao-Qing Cai, Wen-Li Zheng, Quan Chen, De-Ze Zeng, Tatsuhiro Tsuchiya, and Min-Yi Guo
Journal of Computer Science and Technology    2022, 37 (5): 1176-1199.   DOI: 10.1007/s11390-022-2120-y
Accepted: 21 July 2022

Abstract317)      PDF   
Decentralized cloud platforms have emerged as a promising paradigm to exploit the idle computing resources across the Internet to catch up with the ever-increasing cloud computing demands. As any user or enterprise can be the cloud provider in the decentralized cloud, the performance assessment of the heterogeneous computing resources is of vital significance. However, with the consideration of the untrustworthiness of the participants and the lack of unified performance assessment metric, the performance monitoring reliability and the incentive for cloud providers to offer real and stable performance together constitute the computational performance assessment problem in the decentralized cloud. In this paper, we present a robust performance assessment solution RODE to solve this problem. RODE mainly consists of a performance monitoring mechanism and an assessment of the claimed performance (AoCP) mechanism. The performance monitoring mechanism first generates reliable and verifiable performance monitoring results for the workloads executed by untrusted cloud providers. Based on the performance monitoring results, the AoCP mechanism forms a unified performance assessment metric to incentivize cloud providers to offer performance as claimed. Via extensive experiments, we show RODE can accurately monitor the performance of cloud providers on the premise of reliability, and incentivize cloud providers to honestly present the performance information and maintain the performance stability.
Reference | Supplementary Material | Related Articles | Metrics
The Memory-Bounded Speedup Model and Its Impacts in Computing
Xian-He Sun and Xiaoyang Lu
Journal of Computer Science and Technology    2023, 38 (1): 64-79.   DOI: 10.1007/s11390-022-2911-1
Accepted: 01 December 2022

Abstract312)      PDF   

With the surge of big data applications and the worsening of the memory-wall problem, the memory system, instead of the computing unit, becomes the commonly recognized major concern of computing. However, this "memory-centric" common understanding has a humble beginning. More than three decades ago, the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation. The memory-bounded model was well received even by then. It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990's as a must-know for scalable computing. These include Prof. Kai Hwang's book "Scalable Parallel Computing" in which he introduced the memory-bounded speedup model as the Sun-Ni's law, parallel with the Amdahl's and the Gustafson's law. Through the years, the impacts of this model have grown far beyond parallel processing and into the fundamental of computing. In this article, we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue, to stimulate new solutions for big data applications, and to promote data-centric thinking and rethinking.

Reference | Supplementary Material | Related Articles | Metrics
Accumulative Time Based Ranking Method to Reputation Evaluation in Information Networks
Hao Liao, Qi-Xin Liu, Ze-Cheng Huang, Ke-Zhong Lu, Chi Ho Yeung, and Yi-Cheng Zhang
Journal of Computer Science and Technology    2022, 37 (4): 960-974.   DOI: 10.1007/s11390-021-0471-4
Accepted: 03 December 2021

Abstract311)      PDF   
Due to over-abundant information on the Web, information filtering becomes a key task for online users to obtain relevant suggestions and how to extract the most related item is always a key topic for researchers in various fields. In this paper, we adopt tools used to analyze complex networks to evaluate user reputation and item quality. In our proposed Accumulative Time Based Ranking (ATR) algorithm, we take into account the growth record of the network to identify the evolution of the reputation of users and the quality of items, by incorporating two behavior weighting factors which can capture the hidden facts on reputation and quality dynamics for each user and item respectively. Our proposed ATR algorithm mainly combines the iterative approach to rank user reputation and item quality with temporal dependence compared with other reputation evaluation methods. We show that our algorithm outperforms other benchmark ranking algorithms in terms of precision and robustness on empirical datasets from various online retailers and the citation datasets among research publications. Therefore, our proposed method has the capability to effectively evaluate user reputation and item quality.
Reference | Supplementary Material | Related Articles | Metrics
A QoS Based Reliable Routing Mechanism for Service Customization
Bo Yi, Xing-Wei Wang, Min Huang, and Qiang He
Journal of Computer Science and Technology    2022, 37 (6): 1492-1508.   DOI: 10.1007/s11390-021-0686-4
Accepted: 23 December 2021

Abstract310)      PDF   
Due to the rapid development of the Internet technology such as 5G/6G and artificial intelligence, more and more new network applications appear. Customers using these applications may have different individual demands and such a trend causes great challenges to the traditional integrated service and routing model. In order to satisfy the individual demands of customers, the service customization should be considered, during which the cost of Internet Service Provider (ISP) naturally increases. Hence, how to reach a balance between the customer satisfaction and the ISP profit becomes vitally important. Targeting at addressing this critical problem, this work proposes a service customization oriented reliable routing mechanism, which includes two modules, that is, the service customization module and the routing module. In particular, the former (i.e., the service customization module) is responsible for classifying services by analyzing and processing the customer’s demands. After that, the IPv6 protocol is used to implement the service customization, since it naturally supports differentiated services via the extended header fields. The latter is responsible for transforming the customized services into specific routing policies. Specifically, the Nash equilibrium based economic model is firstly introduced to make a perfect balance between the user satisfaction and the ISP profits, which could finally produce a win-win solution. After that, based on the customized service policies, an optimized grey wolf algorithm is designed to establish the routing path, during which the routing reliability is formulated and calculated. Finally, the experiments are carried out and the proposed mechanism is evaluated. The results indicate that the proposed service customization and routing mechanism improves the routing reliability, user satisfaction and ISP satisfaction by about 8.42%, 15.5% and 17.75% respectively compared with the classical open shortest path first algorithm and the function learning based algorithm.
Reference | Supplementary Material | Related Articles | Metrics
Generous or Selfish? Weighing Transaction Forwarding Against Malicious Attacks in Payment Channel Networks
Yi Qin, Qin Hu, Dong-Xiao Yu, and Xiu-Zhen Cheng
Journal of Computer Science and Technology    2022, 37 (4): 888-905.   DOI: 10.1007/s11390-022-2032-x
Accepted: 08 June 2022

Abstract310)      PDF   
Scalability has long been a major challenge of cryptocurrency systems, which is mainly caused by the delay in reaching consensus when processing transactions on-chain. As an effective mitigation approach, the payment channel networks (PCNs) enable private channels among blockchain nodes to process transactions off-chain, relieving long-time waiting for the online transaction confirmation. The state-of-the-art studies of PCN focus on improving the efficiency and availability via optimizing routing, scheduling, and initial deposits, as well as preventing the system from security and privacy attacks. However, the behavioral decision dynamics of blockchain nodes under potential malicious attacks is largely neglected. To fill this gap, we employ the game theory to study the characteristics of channel interactions from both the micro and macro perspectives under the situation of channel depletion attacks. Our study is progressive, as we conduct the game-theoretic analysis of node behavioral characteristics from individuals to the whole population of PCN. Our analysis is complementary, since we utilize not only the classic game theory with the complete rationality assumption, but also the evolutionary game theory considering the limited rationality of players to portray the evolution of PCN. The results of numerous simulation experiments verify the effectiveness of our analysis.
Reference | Supplementary Material | Related Articles | Metrics
Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking
Zhi-Jing Wu, Yi-Qun Liu, Jia-Xin Mao, Min Zhang, and Shao-Ping Ma
Journal of Computer Science and Technology    2022, 37 (4): 814-838.   DOI: 10.1007/s11390-022-2031-y
Abstract309)      PDF   
Document ranking is one of the most studied but challenging problems in information retrieval (IR). More and more studies have begun to address this problem from fine-grained document modeling. However, most of them focus on context-independent passage-level relevance signals and ignore the context information. In this paper, we investigate how information gain accumulates with passages and propose the context-aware Passage Cumulative Gain (PCG). The fine-grained PCG avoids the need to split documents into independent passages. We investigate PCG patterns at the document level (DPCG) and the query level (QPCG). Based on the patterns, we propose a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) and show that PCGM can effectively predict PCG sequences. Finally, we apply PCGM to the document ranking task using two approaches. The first one is leveraging DPCG sequences to estimate the gain of an individual document. Experimental results on two public ad hoc retrieval datasets show that PCGM outperforms most existing ranking models. The second one considers the cross-document effects and leverages QPCG sequences to estimate the marginal relevance. Experimental results show that predicted results are highly consistent with users' preferences. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.
Reference | Supplementary Material | Related Articles | Metrics
An Efficient Reinforcement Learning Game Framework for UAV-Enabled Wireless Sensor Network Data Collection
Tong Ding, Ning Liu, Zhong-Min Yan, Lei Liu, and Li-Zhen Cui
Journal of Computer Science and Technology    2022, 37 (6): 1356-1368.   DOI: 10.1007/s11390-022-2419-8
Abstract287)      PDF   
With the developing demands of massive-data services, the applications that rely on big geographic data play crucial roles in academic and industrial communities. Unmanned aerial vehicles (UAVs), combining with terrestrial wireless sensor networks (WSN), can provide sustainable solutions for data harvesting. The rising demands for efficient data collection in a larger open area have been posed in the literature, which requires efficient UAV trajectory planning with lower energy consumption methods. Currently, there are amounts of inextricable solutions of UAV planning for a larger open area, and one of the most practical techniques in previous studies is deep reinforcement learning (DRL). However, the overestimated problem in limited-experience DRL quickly throws the UAV path planning process into a locally optimized condition. Moreover, using the central nodes of the sub-WSNs as the sink nodes or navigation points for UAVs to visit may lead to extra collection costs. This paper develops a data-driven DRL-based game framework with two partners to fulfill the above demands. A cluster head processor (CHP) is employed to determine the sink nodes, and a navigation order processor (NOP) is established to plan the path. CHP and NOP receive information from each other and provide optimized solutions after the Nash equilibrium. The numerical results show that the proposed game framework could offer UAVs low-cost data collection trajectories, which can save at least 17.58% of energy consumption compared with the baseline methods.
Reference | Supplementary Material | Related Articles | Metrics

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved