Bimonthly    Since 1986
ISSN 1000-9000(Print)
CN 11-2296/TP
Indexed in:
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
Other Countries: Springer
  • Table of Content
      05 January 2016, Volume 31 Issue 1 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Computer Architecture and Systems with Emerging Technologies
    Wen-Guang Chen, Xiao-Fei Liao
    Journal of Computer Science and Technology, 2016, 31 (1): 1-2.  DOI: 10.1007/s11390-015-1590-6
    Abstract   PDF(83KB) ( 930 )   Chinese Summary
    Related Articles | Metrics
    Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication
    Lixue Xia, Peng Gu, Boxun Li, Tianqi Tang, Xiling Yin, Wenqin Huangfu, Shimeng Yu, Yu Cao, Yu Wang, Huazhong Yang
    Journal of Computer Science and Technology, 2016, 31 (1): 3-19.  DOI: 10.1007/s11390-016-1608-8
    Abstract   PDF(1604KB) ( 1594 )   Chinese Summary
    Matrix-vector multiplication is the key operation for many computationally intensive algorithms. The emerging metal oxide resistive switching random access memory (RRAM) device and RRAM crossbar array have demonstrated a promising hardware realization of the analog matrix-vector multiplication with ultra-high energy efficiency. In this paper, we analyze the impact of both device level and circuit level non-ideal factors, including the nonlinear current-voltage relationship of RRAM devices, the variation of device fabrication and write operation, and the interconnect resistance as well as other crossbar array parameters. On top of that, we propose a technological exploration flow for device parameter configuration to overcome the impact of non-ideal factors and achieve a better trade-off among performance, energy, and reliability for each specific application. Our simulation results of a support vector machine (SVM) and Mixed National Institute of Standards and Technology (MNIST) pattern recognition dataset show that RRAM crossbar array based SVM is robust to input signal fluctuation but sensitive to tunneling gap deviation. A further resistance resolution test presents that a 6-bit RRAM device is able to realize a recognition accuracy around 90%, indicating the physical feasibility of RRAM crossbar array based SVM. In addition, the proposed technological exploration flow is able to achieve 10.98% improvement of recognition accuracy on the MNIST dataset and 26.4% energy savings compared with previous work. Experimental results also show that more than 84.4% power saving can be achieved at the cost of little accuracy reduction.
    References | Related Articles | Metrics
    BACH:A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories
    Jishen Zhao, Cong Xu, Tao Zhang, Yuan Xie
    Journal of Computer Science and Technology, 2016, 31 (1): 20-35.  DOI: 10.1007/s11390-016-1609-7
    Abstract   PDF(981KB) ( 1426 )   Chinese Summary
    Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory (STT-MRAM), resistive memory (ReRAM), and embedded DRAM (eDRAM), to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively.
    References | Related Articles | Metrics
    Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs
    Yun Liang, Shuo Wang
    Journal of Computer Science and Technology, 2016, 31 (1): 36-49.  DOI: 10.1007/s11390-016-1610-1
    Abstract   PDF(2425KB) ( 1005 )   Chinese Summary
    The key to high performance for GPU architecture lies in its massive threading capability to drive a large number of cores and enable execution overlapping among threads. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file takes up so large amount of chip area that it cannot scale to meet the increasing demand of GPU applications. Racetrack memory (RM) is a promising technology for designing large capacity register file on GPUs due to its high data storage density. However, without careful deployment of RM-based register file, the lengthy shift operations of RM may hurt the performance. In this paper, we explore RM for designing high-performance register file for GPU architecture. High storage density RM helps to improve the thread level parallelism (TLP), but if the bits of the registers are not aligned to the ports, shift operations are required to move the bits to the access ports before they are accessed, and thus the read/write operations are delayed. We develop an optimization framework for RM-based register file on GPUs, which employs three different optimization techniques at the application, compilation, and architecture level, respectively. More clearly, we optimize the TLP at the application level, design a register mapping algorithm at the compilation level, and design a preshifting mechanism at the architecture level. Collectively, these optimizations help to determine the TLP without causing cache and register file resource contention and reduce the shift operation overhead. Experimental results using a variety of representative workloads demonstrate that our optimization framework achieves up to 29% (21% on average) performance improvement.
    References | Related Articles | Metrics
    Modelling Spiking Neural Network from the Architecture Evaluation Perspective
    Yu Ji, You-Hui Zhang, Wei-Min Zheng
    Journal of Computer Science and Technology, 2016, 31 (1): 50-59.  DOI: 10.1007/s11390-016-1611-0
    Abstract   PDF(815KB) ( 1448 )   Chinese Summary
    The brain-inspired spiking neural network (SNN) computing paradigm offers the potential for low-power and scalable computing, suited to many intelligent tasks that conventional computational systems find difficult. On the other hand, NoC (network-on-chips) based very large scale integration (VLSI) systems have been widely used to mimic neurobiological architectures (including SNNs). This paper proposes an evaluation methodology for SNN applications from the aspect of micro-architecture. First, we extract accurate SNN models from existing simulators of neural systems. Second, a cycle-accurate NoC simulator is implemented to execute the aforementioned SNN applications to get timing and energyconsumption information. We believe this method not only benefits the exploration of NoC design space but also bridges the gap between applications (especially those from the neuroscientists' community) and neuromorphic hardware. Based on the method, we have evaluated some typical SNNs in terms of timing and energy. The method is valuable for the development of neuromorphic hardware and applications.
    References | Related Articles | Metrics
    Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems
    Li Shen, Fan Xu, Zhi-Ying Wang
    Journal of Computer Science and Technology, 2016, 31 (1): 60-76.  DOI: 10.1007/s11390-016-1612-z
    Abstract   PDF(695KB) ( 881 )   Chinese Summary
    Thread level speculation provides not only a simple parallel programming model, but also an effective mechanism for thread-level parallelism exploitation. The performance of software speculative parallel models is limited by high global overheads caused by different types of loops. These loops usually have different characteristics of dependencies and different requirements of optimization strategies. In this paper, we propose three comprehensive optimization techniques to reduce different factors of global overheads, aiming at requirements from different types of loops. Inter-thread fetching can reduce the high mis-speculation rate of the loops with frequent dependencies and out-of-order committing can reduce the control overhead of the loops with infrequent dependencies, while enhanced dynamic task granularity resizing can reduce the control overhead and optimize the global overhead of the loops with changing characteristics of dependencies. All these three optimization techniques have been implemented in HEUSPEC, a software TLS system. Experimental results indicate that they can satisfy the demands from different groups of benchmarks. The combination of these techniques can improve the performance of all benchmarks and reach a higher average speedup.
    References | Related Articles | Metrics
    Computer Architectures and Systems
    Modular Timing Constraints for Delay-Insensitive Systems
    Hoon Park, Anping He, Marly Roncken, Xiaoyu Song, Ivan Sutherland
    Journal of Computer Science and Technology, 2016, 31 (1): 77-106.  DOI: 10.1007/s11390-016-1613-y
    Abstract   PDF(7014KB) ( 966 )   Chinese Summary
    This paper introduces ARCtimer, a framework for modeling, generating, verifying, and enforcing timing constraints for individual self-timed handshake components. The constraints guarantee that the component's gate-level circuit implementation obeys the component's handshake protocol specification. Because the handshake protocols are delay insensitive, self-timed systems built using ARCtimer-verified components are also delay-insensitive. By carefully considering time locally, we can ignore time globally. ARCtimer comes early in the design process as part of building a library of verified components for later system use. The library also stores static timing analysis (STA) code to validate and enforce the component's constraints in any self-timed system built using the library. The library descriptions of a handshake component's circuit, protocol, timing constraints, and STA code are robust to circuit modifications applied later in the design process by technology mapping or layout tools. In addition to presenting new work and discussing related work, this paper identifies critical choices and explains what modular timing verification entails and how it works.
    References | Related Articles | Metrics
    Optimizations for High Performance Network Virtualization
    Fan-Fu Zhou, Ru-Hui Ma, Jian Li, Li-Xia Chen, Wei-Dong Qiu, Hai-Bing Guan
    Journal of Computer Science and Technology, 2016, 31 (1): 107-116.  DOI: 10.1007/s11390-016-1614-x
    Abstract   PDF(319KB) ( 1423 )   Chinese Summary
    The increasing requirements of intensive interoperaterbility among the distributed nodes desiderate the high performance network connections, owing to the substantial growth of cloud computing and datacenters. Network I/O virtualization aggregates the network resource and separates it into manageable parts for particular servers or devices, which provides effective consolidation and elastic management with high agility, flexibility and scalability as well as reduced cost and cabling. However, both network I/O virtualization aggregation and the increasing network speed incur higher traffic density, which generates a heavy system stress for I/O data moving and I/O event processing. Consequently, many researchers have dedicated to enhancing the system performance and alleviating the system overhead for high performance networking virtualization. This paper first elaborates the mainstreaming I/O virtualization methodologies, including device emulation, split-driver model and hardware assisted model. Then, the paper discusses and compares their specific advantages in addition to performance bottlenecks in practical utilities. This paper mainly focuses on the comprehensive survey of stateof-the-art approaches for performance optimizations and improvements as well as the portability management for network I/O virtualization. The approaches include various novel data delivery schemes, overhead mitigations for interrupt processing and adequate resource allocations for dynamic network states. Finally, we highlight the diversity of I/O virtualization besides the performance improvements in network virtualization infrastructure.
    References | Related Articles | Metrics
    Metcalfe's Law and Network Quality:An Extension of Zhang et al.
    Leo Van Hove
    Journal of Computer Science and Technology, 2016, 31 (1): 117-123.  DOI: 10.1007/s11390-016-1615-9
    Abstract   PDF(155KB) ( 994 )   Chinese Summary
    Zhang et al. exploited data on Facebook and Tencent to validate Metcalfe's law, which states that the aggregate value of a communications network is proportional to the square of the number of users. This note points out that the value of a social network may be driven not only by its size, but also by increases in the variety and quality of the services offered. I therefore extend Zhang et al.'s approach by explicitly controlling for changes in network quality over time. For the case of Tencent, I also filter out revenues and costs that are unrelated to Tencent's core (social network) services. I find that these two extensions only strengthen Zhang et al.'s conclusions:Metcalfe's law now outperforms the other laws even more clearly.
    References | Related Articles | Metrics
    Techniques for Design and Implementation of an FPGA-Specific Physical Unclonable Function
    Ji-Liang Zhang, Qiang Wu, Yi-Peng Ding, Yong-Qiang Lv, Qiang Zhou, Zhi-Hua Xia, Xing-Ming Sun, Xing-Wei Wang
    Journal of Computer Science and Technology, 2016, 31 (1): 124-136.  DOI: 10.1007/s11390-016-1616-8
    Abstract   PDF(1292KB) ( 1382 )   Chinese Summary
    Physical unclonable function (PUF) makes use of the uncontrollable process variations during the production of IC to generate a unique signature for each IC. It has a wide application in security such as FPGA intellectual property (IP) protection, key generation and digital rights management. Ring oscillator (RO) PUF and Arbiter PUF are the most popular PUFs, but they are not specially designed for FPGA. RO PUF incurs high resource overhead while obtaining less challenge-response pairs, and requires "hard macros" to implement on FPGAs. The arbiter PUF brings low resource overhead, but its structure has big bias when it is mapped on FPGAs. Anderson PUF can address these weaknesses of current Arbiter and RO PUFs implemented on FPGAs. However, it cannot be directly implemented on the new generation 28 nm FPGAs. In order to address these problems, this paper designs and implements a delay-based PUF that uses two LUTs in an SLICEM to implement two 16-bit shift registers of the PUF, 2-to-1 multiplexers in the carry chain to implement the multiplexers of the PUF, and any one of the 8 flip-flops to latch 1-bit PUF signatures. The proposed delay-based PUF is completely realized on 28 nm commercial FPGAs, and the experimental results show its high uniqueness, reliability and reconfigurability. Moreover, we test the impact of aging on it, and the results show that the effect of aging on the proposed PUF is insignificant, with only 6% bit-flips. Finally, the prospects of the proposed PUF in the FPGA binding and volatile key generation are discussed.
    References | Related Articles | Metrics
    A Unified Buffering Management with Set Divisible Cache for PCM Main Memory
    Mei-Ying Bian, Su-Kyung Yoon, Jeong-Geun Kim, Sangjae Nam, Shin-Dug Kim
    Journal of Computer Science and Technology, 2016, 31 (1): 137-146.  DOI: 10.1007/s11390-016-1617-7
    Abstract   PDF(2034KB) ( 1091 )   Chinese Summary
    This research proposes a phase-change memory (PCM) based main memory system with an effective combination of a superblock-based adaptive buffering structure and its associated set divisible last-level cache (LLC). To achieve high performance similar to that of dynamic random-access memory (DRAM) based main memory, the superblock-based adaptive buffer (SABU) is comprised of dual DRAM buffers, i.e., an aggressive superblock-based pre-fetching buffer (SBPB) and an adaptive sub-block reusing buffer (SBRB), and a set divisible LLC based on a cache space optimization scheme. According to our experiment, the longer PCM access latency can typically be hidden using our proposed SABU, which can significantly reduce the number of writes over the PCM main memory by 26.44%. The SABU approach can reduce PCM access latency up to 0.43 times, compared with conventional DRAM main memory. Meanwhile, the average memory energy consumption can be reduced by 19.7%.
    References | Related Articles | Metrics
    Data Management and Data Mining
    AS-Index:A Structure for String Search Using n-Grams and Algebraic Signatures
    Camelia Constantin, Céedric du Mouza, Witold Litwin, Philippe Rigaux, Thomas Schwarz
    Journal of Computer Science and Technology, 2016, 31 (1): 147-166.  DOI: 10.1007/s11390-016-1618-6
    Abstract   PDF(634KB) ( 1018 )   Chinese Summary
    We present the AS-Index, a new index structure for exact string search in disk resident databases. AS-Index relies on a classical inverted file structure, whose main innovation is a probabilistic search based on the properties of algebraic signatures used for both n-grams hashing and pattern search. Specifically, the properties of our signatures allow to carry out a search by inspecting only two of the posting lists. The algorithm thus enjoys the unique feature of requiring a constant number of disk accesses, independently from both the pattern size and the database size. We conduct extensive experiments on large datasets to evaluate our index behavior. They confirm that it steadily provides a search performance proportional to the two disk accesses necessary to obtain the posting lists. This makes our structure a choice of interest for the class of applications that require very fast lookups in large textual databases. We describe the index structure, our use of algebraic signatures, and the search algorithm. We discuss the operational trade-offs based on the parameters that affect the behavior of our structure, and present the theoretical and experimental performance analysis. We next compare the AS-Index with the state-of-the-art alternatives and show that 1) its construction time matches that of its competitors, due to the similarity of structures, 2) as for search time, it constantly outperforms the standard approach, thanks to the economical access to data complemented by signature calculations, which is at the core of our search method.
    References | Related Articles | Metrics
    Context-Based Moving Object Trajectory Uncertainty Reduction and Ranking in Road Network
    Jian Dai, Zhi-Ming Ding, Jia-Jie Xu
    Journal of Computer Science and Technology, 2016, 31 (1): 167-184.  DOI: 10.1007/s11390-016-1619-5
    Abstract   PDF(552KB) ( 1078 )   Chinese Summary
    To support a large amount of GPS data generated from various moving objects, the back-end servers usually store low-sampling-rate trajectories. Therefore, no precise position information can be obtained directly from the back-end servers and uncertainty is an inherent characteristic of the spatio-temporal data. How to deal with the uncertainty thus becomes a basic and challenging problem. A lot of researches have been rigidly conducted on the uncertainty of a moving object itself and isolated from the context where it is derived. However, we discover that the uncertainty of moving objects can be efficiently reduced and effectively ranked using the context-aware information. In this paper, we focus on context-aware information and propose an integrated framework, Context-Based Uncertainty Reduction and Ranking (CURR), to reduce and rank the uncertainty of trajectories. Specifically, given two consecutive samplings, we aim to infer and rank the possible trajectories in accordance with the information extracted from context. Since some context-aware information can be used to reduce the uncertainty while some context-aware information can be used to rank the uncertainty, to leverage them accordingly, CURR naturally consists of two stages:reduction stage and ranking stage which complement each other. We also implement a prototype system to validate the effectiveness of our solution. Extensive experiments are conducted and the evaluation results demonstrate the efficiency and high accuracy of CURR.
    References | Related Articles | Metrics
    RiMOM-IM:A Novel Iterative Framework for Instance Matching
    Chao Shao, Lin-Mei Hu, Juan-Zi Li, Zhi-Chun Wang, Tonglee Chung, Jun-Bo Xia
    Journal of Computer Science and Technology, 2016, 31 (1): 185-197.  DOI: 10.1007/s11390-016-1620-z
    Abstract   PDF(809KB) ( 1212 )   Chinese Summary
    Instance matching, which aims at discovering the correspondences of instances between knowledge bases, is a fundamental issue for the ontological data sharing and integration in Semantic Web. Although considerable instance matching approaches have already been proposed, how to ensure both high accuracy and efficiency is still a big challenge when dealing with large-scale knowledge bases. This paper proposes an iterative framework, RiMOM-IM (RiMOM-Instance Matching). The key idea behind this framework is to fully utilize the distinctive and available matching information to improve the efficiency and control the error propagation. We participated in the 2013 and 2014 competition of Ontology Alignment Evaluation Initiative (OAEI), and our system was ranked the first. Furthermore, the experiments on previous OAEI datasets also show that our system performs the best.
    References | Related Articles | Metrics
    Regular Paper
    A Game-Based Approach for PCTL* Stochastic Model Checking with Evidence
    Yang Liu, Xuan-Dong Li, Yan Ma
    Journal of Computer Science and Technology, 2016, 31 (1): 198-216.  DOI: 10.1007/s11390-016-1621-y
    Abstract   PDF(3971KB) ( 1481 )   Chinese Summary
    Stochastic model checking is a recent extension and generalization of the classical model checking, which focuses on quantitatively checking the temporal property of a system model. PCTL* is one of the important quantitative property specification languages, which is strictly more expressive than either PCTL (probabilistic computation tree logic) or LTL (linear temporal logic) with probability bounds. At present, PCTL* stochastic model checking algorithm is very complicated, and cannot provide any relevant explanation of why a formula does or does not hold in a given model. For dealing with this problem, an intuitive and succinct approach for PCTL* stochastic model checking with evidence is put forward in this paper, which includes:presenting the game semantics for PCTL* in release-PNF (release-positive normal form), defining the PCTL* stochastic model checking game, using strategy solving in game to achieve the PCTL* stochastic model checking, and refining winning strategy as the evidence to certify stochastic model checking result. The soundness and the completeness of game-based PCTL* stochastic model checking are proved, and its complexity matches the known lower and upper bounds. The game-based PCTL* stochastic model checking algorithm is implemented in a visual prototype tool, and its feasibility is demonstrated by an illustrative example.
    References | Related Articles | Metrics
    Adaptive Photon Mapping Based on Gradient
    Chun-Meng Kang, Lu Wang, Yan-Ning Xu, Xiang-Xu Meng, Yuan-Jie Song
    Journal of Computer Science and Technology, 2016, 31 (1): 217-224.  DOI: 10.1007/s11390-016-1622-x
    Abstract   PDF(1994KB) ( 970 )   Chinese Summary
    Photon mapping can simulate some special effects efficiently such as shadows and caustics. Photon mapping runs in two phases:the photon map generating phase and the radiance estimation phase. In this paper, we focus on the bandwidth selection process in the second phase, as it can affect the final quality significantly. Poor results with noise arise if few photons are collected, while bias appears if a large number of photons are collected. In order to solve this issue, we propose an adaptive radiance estimation solution to obtain trade-offs between noise and bias by changing the number of neighboring photons and the shape of the collected area according to the radiance gradient. Our approach can be applied in both the direct and the indirect illumination computation. Finally, experimental results show that our approach can produce smoother quality while keeping the high frequency features perfectly compared with the original photon mapping algorithm.
    References | Related Articles | Metrics
  Journal Online
Just Accepted
Top Cited Papers
Top 30 Most Read
Paper Lists of Areas
Special Issues
   ScholarOne Manuscripts
   Log In

User ID:


  Forgot your password?

Enter your e-mail address to receive your account information.

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved