Bimonthly    Since 1986
ISSN 1000-9000(Print)
CN 11-2296/TP
Indexed in:
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
Other Countries: Springer
  • Table of Content
      28 February 2023, Volume 38 Issue 1 Previous Issue   
    For Selected: View Abstracts Toggle Thumbnails
    Special Issue in Honor of Professor Kai Hwang’s 80th Birthday
    The Vibrant Field of Parallel and Distributed Computing — Scan the Special Issue in Honor of Professor Kai Hwang’s 80th Birthday
    Guo-Jie Li
    Journal of Computer Science and Technology, 2023, 38 (1): 1-2.  DOI: 10.1007/s11390-023-0001-7
    Abstract   PDF(156KB) ( 107 )   Chinese Summary
        It is my great pleasure to write this editorial for the special issue in honor of Professor Kai Hwang' s 80th birthday. The articles are written by Professor Hwang's academic descendants and colleagues, to pay tribute to his decades of contributions to the vibrant field of parallel and distributed computing.
        Professor Hwang obtained his Ph.D. degree in electrical engineering and computer science in 1972, from the University of California at Berkeley. He taught for 44 years at Purdue University and the University of Southern California. He currently serves as Presidential Chair Professor of Computer Science and Engineering at the Chinese University of Hong Kong (Shenzhen). In his five decades of academic career, Professor Hwang has supervised 21 Ph.D. students, authored 10 textbooks, and published over 260 scientific papers. He served as the founding Editor-in-Chief of the Journal of Parallel and Distributed Computing (JPDC), an influential international journal in this field.
        A much-told tale in China's computer science community is that in 2005, 2009 and 2018, the China Computer Federation (CCF) conferred its prestigious Outstanding Achievement Award (Award for Overseas Outstanding Contributions) to Kai Hwang, his Ph.D. student Lionel Ni, and Ni's Ph.D. student Xian-He Sun. This special issue shows a deeper academic lineage of five generations: Hwang-Ni-Sun-Cameron-Ge.
        Lionel Ni serves as Chair Professor and Founding President of the Hong Kong University of Science and Technology (Guangzhou). He contributes two papers to this special issue. First, Ni' s team presents HXPY, a parallel computing software package for financial time-series data processing, that is orders of magnitudes faster than its counterparts. Second, Ni and his colleagues present a comprehensive and timely survey on ubiquitous WiFi and acoustic sensing researches, a growing subfield of distributed computing.
        Xian-He Sun is a University Distinguished Professor at the Illinois Institute of Technology and the Editor-in- Chief of IEEE Transactions on Parallel and Distributed Systems. His team has contributed a comprehensive review of the memory-bounded speedup model, which is called Sun-Ni's law in Kai Hwang's textbooks.
        Kirk W. Cameron is a professor at Virginia Tech and a pioneer in green HPC. His team presents a retrospective on scalability beyond Amdahl's law, especially how power-performance measurement and modeling at scale influenced server and supercomputer design.
        Rong Ge is a Dean's Distinguished Professor in the School of Computing at Clemson University. Her team has devised a vision called the paradigm of power-bounded HPC, which is different from previous low-power computing and power-aware computing paradigms.
        Interestingly, three other first-generation Ph.D. students of Professor Hwang have also contributed to this issue. Zhi-Wei Xu is a professor at the University of Chinese Academy of Sciences and the Editor-in-Chief of the Journal of Computer Science and Technology. His team presents Information Superbahn, a perspective on future computing utility with low entropy and high goodput characteristics.
        Another academic star spawned by Hwang is Ahmed Louri, a chair professor at George Washington University and the Editor-in-Chief of IEEE Transactions on Computers. His team presents GShuttle, a graph convolutional neural network acceleration scheme that minimizes off-chip DRAM and on-chip SRAM accesses.
        Yet another is Dhabaleswar K. Panda, a professor at Ohio State University and an influential HPC expert. His MVAPICH software has been used by thousands of organizations worldwide. His team is credited with a cutting- edge study of communication performance on the world's first exascale supercomputer with the most recent interconnect (Slingshot).
        Not to be overlooked are two other second-generation Ph.D. students who have contributed review articles. Yun-Hao Liu is a chair professor at Tsinghua University, Beijing, and the Editor-in-Chief of ACM Transactions on Sensor Network. His team has presented a timely survey of clock synchronization techniques for Industrial Internet scenarios.
        Xiaoyi Lu is an assistant professor at the University of California at Merced. His team has come up with a comprehensive survey on xCCL, a host of industry-led collective communication libraries for deep learning, and has answered why the industry has chosen xCCL instead of classic MPI.
        The parade of Hwang-nurtured talents is hardly complete without Hai Jin as a representative of many postdoctorals who worked with Professor Hwang. Dr. Jin is a chair professor at Huazhong University of Science and Technology and a co-Editor-in-Chief of Computer Systems Science and Engineering, an open-access journal. His team has invented a new method for Chinese entity linking, a natural language processing problem that is important workload for parallel and distributed systems.
        Wei-Min Zheng is a professor at Tsinghua University, Beijing, and a past president of China Computer Federation. It is he who, being a long-time colleague, has translated Hwang's textbook into Chinese. His team has devised a perspective on unified programming model for heterogeneous high-performance computers, a fundamental issue of future parallel and distributed computing.
    Related Articles | Metrics
    HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
    Jia-dong Guo, Jing-shu Peng, Hang Yuan, and Lionel Ming-shuan Ni
    Journal of Computer Science and Technology, 2023, 38 (1): 3-24.  DOI: 10.1007/s11390-023-2879-5
    Abstract   PDF(2997KB) ( 24 )   Chinese Summary
    A tremendous amount of data has been generated by global financial markets everyday, and such time-series data needs to be analyzed in real time to explore its potential value. In recent years, we have witnessed the successful adoption of machine learning models on financial data, where the importance of accuracy and timeliness demands highly effective computing frameworks. However, traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues, such as the outlier handling with stock suspension in Pandas and TA-Lib. In this paper, we propose HXPY, a high-performance data processing package with a C++/Python interface for financial time-series data. HXPY supports miscellaneous acceleration techniques such as the streaming algorithm, the vectorization instruction set, and memory optimization, together with various functions such as time window functions, group operations, down-sampling operations, cross-section operations, row-wise or column-wise operations, shape transformations, and alignment functions. The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts. From MiBs to GiBs data, HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.
    References | Supplementary Material | Related Articles | Metrics
    Ubiquitous WiFi and Acoustic Sensing: Principles, Technologies, and Applications
    Jia-Ling Huang, Yun-Shu Wang, Yong-Pan Zou, Kai-Shun Wu, and Lionel Ming-shuan Ni
    Journal of Computer Science and Technology, 2023, 38 (1): 25-63.  DOI: 10.1007/s11390-023-3073-5
    Abstract   PDF(2025KB) ( 24 )   Chinese Summary
    With the increasing pervasiveness of mobile devices such as smartphones, smart TVs, and wearables, smart sensing, transforming the physical world into digital information based on various sensing medias, has drawn researchers' great attention. Among different sensing medias, WiFi and acoustic signals stand out due to their ubiquity and zero hardware cost. Based on different basic principles, researchers have proposed different technologies for sensing applications with WiFi and acoustic signals covering human activity recognition, motion tracking, indoor localization, health monitoring, and the like. To enable readers to get a comprehensive understanding of ubiquitous wireless sensing, we conduct a survey of existing work to introduce their underlying principles, proposed technologies, and practical applications. Besides we also discuss some open issues of this research area. Our survey reals that as a promising research direction, WiFi and acoustic sensing technologies can bring about fancy applications, but still have limitations in hardware restriction, robustness, and applicability.
    References | Supplementary Material | Related Articles | Metrics
    The Memory-Bounded Speedup Model and Its Impacts in Computing
    Xian-He Sun and Xiaoyang Lu
    Journal of Computer Science and Technology, 2023, 38 (1): 64-79.  DOI: 10.1007/s11390-022-2911-1
    Abstract   PDF(1694KB) ( 44 )   Chinese Summary

    With the surge of big data applications and the worsening of the memory-wall problem, the memory system, instead of the computing unit, becomes the commonly recognized major concern of computing. However, this "memory-centric" common understanding has a humble beginning. More than three decades ago, the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation. The memory-bounded model was well received even by then. It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990's as a must-know for scalable computing. These include Prof. Kai Hwang's book "Scalable Parallel Computing" in which he introduced the memory-bounded speedup model as the Sun-Ni's law, parallel with the Amdahl's and the Gustafson's law. Through the years, the impacts of this model have grown far beyond parallel processing and into the fundamental of computing. In this article, we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue, to stimulate new solutions for big data applications, and to promote data-centric thinking and rethinking.

    References | Supplementary Material | Related Articles | Metrics
    Adventures Beyond Amdahl's Law: How Power-Performance Measurement and Modeling at Scale Drive Server and Supercomputer Design
    Kirk W. Cameron
    Journal of Computer Science and Technology, 2023, 38 (1): 80-86.  DOI: 10.1007/s11390-022-2950-7
    Abstract   PDF(434KB) ( 7 )   Chinese Summary
    Amdahl’s Law painted a bleak picture for large-scale computing. The implication was that parallelism was limited and therefore so was potential speedup. While Amdahl's contribution was seminal and important, it drove others vested in parallel processing to define more clearly why large-scale systems are critical to our future and how they fundamentally provide opportunities for speedup beyond Amdahl’s predictions. In the early 2000s, much like Amdahl, we predicted dire consequences for large-scale systems due to power limits. While our early work was often dismissed, the implications were clear to some: power would ultimately limit performance. In this retrospective, we discuss how power-performance measurement and modeling at scale led to contributions that have driven server and supercomputer design for more than a decade. While the influence of these techniques is now indisputable, we discuss their connections, limits and additional research directions necessary to continue the performance gains our industry is accustomed to.
    References | Related Articles | Metrics
    The Paradigm of Power Bounded High-Performance Computing
    Rong Ge, Xizhou Feng, Pengfei Zou, and Tyler Allen
    Journal of Computer Science and Technology, 2023, 38 (1): 87-102.  DOI: 10.1007/s11390-023-2885-7
    Abstract   PDF(2422KB) ( 8 )   Chinese Summary
    Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers. To cope with this reality, it is necessary to understand how power bounds impact performance, especially for systems built from high-end nodes, each consisting of multiple power hungry components. Because placing an inappropriate power bound on a node or a component can lead to severe performance loss, coordinating power allocation among nodes and components is mandatory to achieve desired performance given a total power budget. In this article, we describe the paradigm of power bounded high-performance computing, which considers coordinated power bound assignment to be a key factor in computer system performance analysis and optimization. We apply this paradigm to the problem of power coordination across multiple layers for both CPU and GPU computing. Using several case studies, we demonstrate how the principles of balanced power coordination can be applied and adapted to the interplay of workloads, hardware technology, and the available total power for performance improvement.
    References | Supplementary Material | Related Articles | Metrics
    Information Superbahn: Towards a Planet-Scale, Low-Entropy and High-Goodput Computing Utility
    Zhi-Wei Xu, Zhen-Ying Li, Zi-Shu Yu, and Feng-Zhi Li
    Journal of Computer Science and Technology, 2023, 38 (1): 103-114.  DOI: 10.1007/s11390-022-2898-7
    Abstract   PDF(1465KB) ( 52 )   Chinese Summary

    In a 1961 lecture to celebrate MIT’s centennial, John McCarthy proposed the vision of utility computing, including three key concepts of pay-per-use service, large computer and private computer. Six decades have passed, but McCarthy’s computing utility vision has not yet been fully realized, despite advances in grid computing, services computing and cloud computing. This paper presents a perspective of computing utility called Information Superbahn, building on recent advances in cloud computing. This Information Superbahn perspective retains McCarthy’s vision as much as possible, while making essential modern requirements more explicit, in the new context of a networked world of billions of users, trillions of devices, and zettabytes of data. Computing utility offers pay-per-use computing services through a 1) planet-scale, 2) low-entropy and 3) high-goodput utility. The three salient characteristics of computing utility are elaborated. Initial evidence is provided to support this viewpoint.

    References | Supplementary Material | Related Articles | Metrics
    GShuttle: Optimizing Memory Access Efficiency for Graph Convolutional Neural Network Accelerators
    Jia-Jun Li, Ke Wang, Hao Zheng, and Ahmed Louri
    Journal of Computer Science and Technology, 2023, 38 (1): 115-127.  DOI: 10.1007/s11390-023-2875-9
    Abstract   PDF(1030KB) ( 7 )   Chinese Summary
    Graph convolutional neural networks (GCNs) have emerged as an effective approach to extending deep learning for graph data analytics, but they are computationally challenging given the irregular graphs and the large number of nodes in a graph. GCNs involve chain sparse-dense matrix multiplications with six loops, which results in a large design space for GCN accelerators. Prior work on GCN acceleration either employs limited loop optimization techniques, or determines the design variables based on random sampling, which can hardly exploit data reuse efficiently, thus degrading system efficiency. To overcome this limitation, this paper proposes GShuttle, a GCN acceleration scheme that maximizes memory access efficiency to achieve high performance and energy efficiency. GShuttle systematically explores loop optimization techniques for GCN acceleration, and quantitatively analyzes the design objectives (e.g., required DRAM accesses and SRAM accesses) by analytical calculation based on multiple design variables. GShuttle further employs two approaches, pruned search space sweeping and greedy search, to find the optimal design variables under certain design constraints. We demonstrated the efficacy of GShuttle by evaluation on five widely used graph datasets. The experimental simulations show that GShuttle reduces the number of DRAM accesses by a factor of 1.5 and saves energy by a factor of 1.7 compared with the state-of-the-art approaches.
    References | Supplementary Material | Related Articles | Metrics
    High Performance MPI over the Slingshot Interconnect
    Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, and Dhabaleswar K. Panda
    Journal of Computer Science and Technology, 2023, 38 (1): 128-145.  DOI: 10.1007/s11390-023-2907-5
    Abstract   PDF(2288KB) ( 12 )   Chinese Summary
    The Slingshot interconnect designed by HPE/Cray is becoming more relevant in high-performance computing with its deployment on the upcoming exascale systems. In particular, it is the interconnect empowering the first exascale and highest-ranked supercomputer in the world, Frontier. It offers various features such as adaptive routing, congestion control, and isolated workloads. The deployment of newer interconnects sparks interest related to performance, scalability, and any potential bottlenecks as they are critical elements contributing to the scalability across nodes on these systems. In this paper, we delve into the challenges the Slingshot interconnect poses with current state-of-the-art MPI (message passing interface) libraries. In particular, we look at the scalability performance when using Slingshot across nodes. We present a comprehensive evaluation using various MPI and communication libraries including Cray MPICH, OpenMPI + UCX, RCCL, and MVAPICH2 on CPUs and GPUs on the Spock system, an early access cluster deployed with Slingshot-10, AMD MI100 GPUs and AMD Epyc Rome CPUs to emulate the Frontier system. We also evaluate preliminary CPU-based support of MPI libraries on the Slingshot-11 interconnect.
    References | Supplementary Material | Related Articles | Metrics
    A Survey on Clock Synchronization in the Industrial Internet
    Fan Dang, Xi-Kai Sun, Ke-Bin Liu, Yi-Fan Xu, and Yun-Hao Liu
    Journal of Computer Science and Technology, 2023, 38 (1): 146-165.  DOI: 10.1007/s11390-023-2908-4
    Abstract   PDF(1190KB) ( 16 )   Chinese Summary
    Clock synchronization is one of the most fundamental and crucial network communication strategies. With the expansion of the Industrial Internet in numerous industrial applications, a new requirement for the precision, security, complexity, and other features of the clock synchronization mechanism has emerged in various industrial situations. This paper presents a study of standardized clock synchronization protocols and techniques for various types of networks, and a discussion of how these protocols and techniques might be classified. Following that is a description of how certain clock synchronization protocols and technologies, such as PROFINET, Time-Sensitive Networking (TSN), and other well-known industrial networking protocols, can be applied in a number of industrial situations. This study also investigates the possible future development of clock synchronization techniques and technologies.
    References | Supplementary Material | Related Articles | Metrics
    xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning
    Adam Weingram, Yuke Li , Hao Qi, Darren Ng, Liuyao Dai, and Xiaoyi Lu
    Journal of Computer Science and Technology, 2023, 38 (1): 166-195.  DOI: 10.1007/s11390-023-2894-6
    Abstract   PDF(1558KB) ( 15 )   Chinese Summary
    Machine learning techniques have become ubiquitous both in industry and academic applications. Increasing model sizes and training data volumes necessitate fast and efficient distributed training approaches. Collective communications greatly simplify inter- and intra-node data transfer and are an essential part of the distributed training process as information such as gradients must be shared between processing nodes. In this paper, we survey the current state-of-the-art collective communication libraries (namely xCCL, including NCCL, oneCCL, RCCL, MSCCL, ACCL, and Gloo), with a focus on the industry-led ones for deep learning workloads. We investigate the design features of these xCCLs, discuss their use cases in the industry deep learning workloads, compare their performance with industry-made benchmarks (i.e., NCCL Tests and PARAM), and discuss key take-aways and interesting observations. We believe our survey sheds light on potential research directions of future designs for xCCLs.
    References | Supplementary Material | Related Articles | Metrics
    Improving Entity Linking in Chinese Domain by Sense Embedding Based on Graph Clustering
    Zhao-Bo Zhang, Zhi-Man Zhong, Ping-Peng Yuan, and Hai Jin
    Journal of Computer Science and Technology, 2023, 38 (1): 196-210.  DOI: 10.1007/s11390-023-2835-4
    Abstract   PDF(1134KB) ( 10 )   Chinese Summary
    Entity linking refers to linking a string in a text to corresponding entities in a knowledge base through candidate entity generation and candidate entity ranking. It is of great significance to some NLP (natural language processing) tasks, such as question answering. Unlike English entity linking, Chinese entity linking requires more consideration due to the lack of spacing and capitalization in text sequences and the ambiguity of characters and words, which is more evident in certain scenarios. In Chinese domains, such as industry, the generated candidate entities are usually composed of long strings and are heavily nested. In addition, the meanings of the words that make up industrial entities are sometimes ambiguous. Their semantic space is a subspace of the general word embedding space, and thus each entity word needs to get its exact meanings. Therefore, we propose two schemes to achieve better Chinese entity linking. First, we implement an n-gram based candidate entity generation method to increase the recall rate and reduce the nesting noise. Then, we enhance the corresponding candidate entity ranking mechanism by introducing sense embedding. Considering the contradiction between the ambiguity of word vectors and the single sense of the industrial domain, we design a sense embedding model based on graph clustering, which adopts an unsupervised approach for word sense induction and learns sense representation in conjunction with context. We test the embedding quality of our approach on classical datasets and demonstrate its disambiguation ability in general scenarios. We confirm that our method can better learn candidate entities’ fundamental laws in the industrial domain and achieve better performance on entity linking through experiments.
    References | Supplementary Material | Related Articles | Metrics
    Unified Programming Models for Heterogeneous High-Performance Computers
    Zi-Xuan Ma, Yu-Yang Jin, Shi-Zhi Tang, Hao-Jie Wang, Wei-Cheng Xue, Ji-Dong Zhai, and Wei-Min Zheng
    Journal of Computer Science and Technology, 2023, 38 (1): 211-218.  DOI: 10.1007/s11390-023-2888-4
    Abstract   PDF(949KB) ( 24 )   Chinese Summary
    Unified programming models can effectively improve program portability on various heterogeneous high-performance computers. Existing unified programming models put a lot of effort to code portability but are still far from achieving good performance portability. In this paper, we present a preliminary design of a performance-portable unified programming model including four aspects: programming language, programming abstraction, compilation optimization, and scheduling system. Specifically, domain-specific languages introduce domain knowledge to decouple the optimizations for different applications and architectures. The unified programming abstraction unifies the common features of different architectures to support common optimizations. Multi-level compilation optimization enables comprehensive performance optimization based on multi-level intermediate representations. Resource-aware lightweight runtime scheduling system improves the resource utilization of heterogeneous computers. This is a perspective paper to show our viewpoints on programming models for emerging heterogeneous systems.
    References | Supplementary Material | Related Articles | Metrics
  Journal Online
Just Accepted
Top Cited Papers
Top 30 Most Read
Paper Lists of Areas
Special Issues
   ScholarOne Manuscripts
   Log In

User ID:


  Forgot your password?

Enter your e-mail address to receive your account information.

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved