Bimonthly    Since 1986
ISSN 1000-9000(Print)
CN 11-2296/TP
Indexed in:
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
Other Countries: Springer
  • Table of Content
      05 March 2014, Volume 29 Issue 2 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Special Section on Cloud-Sea Computing Systems
    Cloud-Sea Computing Systems:Towards Thousand-Fold Improvement in Performance per Watt for the Coming Zettabyte Era
    Zhi-Wei Xu
    Journal of Computer Science and Technology, 2014, 29 (2): 177-181.  DOI: 10.1007/s11390-014-1420-2
    Abstract   PDF(883KB) ( 1938 )   Chinese Summary
    We are entering a new era of computing, characterized by the need to handle over one zettabyte (1021 bytes, or ZB) of data. The world's capacities to sense, transmit, store, and process information need to grow three orders of magnitude, while maintain an energy consumption level similar to that of the year 2010. In other words, we need to produce thousand-fold improvement in performance per watt. To face this challenge, in 2012 the Chinese Academy of Sciences launched a 10-year strategic priority research initiative called the Next Generation Information and Communication Technology initiative (the NICT initiative). A research thrust of the NICT program is the Cloud-Sea Computing Systems project. The main idea is to augment conventional cloud computing by cooperation and integration of the cloud-side systems and the sea-side systems, where the "sea-side" refers to an augmented client side consisting of human facing and physical world facing devices and subsystems. The Cloud-Sea Computing Systems project consists of four research tasks: a new computing model called REST 2.0 which extends the REST (representational state transfer) architectural style of Web computing to cloud-sea computing, a three-tier storage system architecture capable of managing ZB of data, a billion-thread datacenter server with high energy effciency, and an elastic processor aiming at energy effciency of one trillion operations per second per watt. This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.
    References | Related Articles | Metrics
    A Functional Sensing Model and a Case Study in Household Electricity Usage Sensing
    Jing-Jie Liu, Lei Nie
    Journal of Computer Science and Technology, 2014, 29 (2): 182-193.  DOI: 10.1007/s11390-014-1421-1
    Abstract   PDF(1867KB) ( 1550 )   Chinese Summary
    Sensing is a fundamental process to acquire information in the physical world for computation. Existing models treat a sensing process as an indivisible whole, such that sampling and reconstructing of signals are designed to be highly associated with each other in a unified procedure. These strongly coupled sensing systems are effcient, but usually lack reusability and upgradeability. We propose a functional sensing model called SDR (Sampling-Design-Reconstruction) to decouple a sensing process into two modules: sampling protocol and reconstruction algorithm. The core of this decoupling is a design space, which is a common data structure constructed using functions of the sensing target as prior knowledge, to seamlessly bridge the sampling protocol and reconstruction algorithm together. We demonstrate that existing types of household electricity usage sensing systems can be successfully decoupled by introducing corresponding design spaces.
    References | Related Articles | Metrics
    EasiSMP:A Resource-Oriented Programming Framework Supporting Runtime Propagation of RESTful Resources
    Jie-Fan Qiu, Dong Li, Hai-Long Shi, Chen-Da Hou, Li Cui
    Journal of Computer Science and Technology, 2014, 29 (2): 194-204.  DOI: 10.1007/s11390-014-1422-0
    Abstract   PDF(2352KB) ( 1181 )   Chinese Summary
    In order to simplify programming for building sensor networks, macro-programming methods have been pro-posed in prior work. Most of them are designed for the dedicated networks and specific scenarios where devices are mostly homogeneous. Nevertheless the methods rarely consider those shared networks which are composed of heterogeneous devices, e.g., sensors, actuators, mobile devices, and share resources among themselves. In this paper, we present EasiSMP, a resource-oriented programming framework for these shared networks and generic application scenarios. In this framework, the devices and their functionalities are abstracted into RESTful virtual resources (VRs) each of which is labelled by a uni-form resource identifier (URI). The post-deployment VR can be globally accessed and reused to propagate new resource(s) at runtime. To support the resource propagation, programming primitives are proposed and a virtual resource engine (VRE) is studied. To perform evaluation, EasiSMP is deployed into a relic monitoring network. Experimental results show that programming using Ea-siSMP is concise, and the average deployment overhead is decreased by up to 27% compared with the node-level programming.
    References | Related Articles | Metrics
    SeaHttp:A Resource-Oriented Protocol to Extend REST Style for Web of Things
    Chen-Da Hou, Dong Li, Jie-Fan Qiu, Hai-Long Shi, Li Cui
    Journal of Computer Science and Technology, 2014, 29 (2): 205-215.  DOI: 10.1007/s11390-014-1423-z
    Abstract   PDF(2426KB) ( 2381 )   Chinese Summary
    Web of Things (WoT) makes it possible to connect tremendous embedded devices to web in Representational State Transfer (REST) style. Some lightweight RESTful protocols have been proposed for the WoT to replace the HTTP protocol running on embedded devices. However, they keep the principal characteristic of the REST style. In particular, they support one-to-one requests in the client-server mode by four standard RESTful methods (GET, PUT, POST, and DELETE). This characteristic is however inconsistent with the practical networks of embedded devices, which typically perform a group operation. In order to meet the requirement of group communication in the WoT, we propose a resource-oriented protocol called SeaHttp to extend the REST style by introducing two new methods, namely BRANCH and COMBINE respectively. SeaHttp supports parallel processing of group requests by means of splitting and merging them. In addition SeaHttp adds spatiotemporal attributes to the standard URI for naming a dynamic request group of physical resource. Experimental results show that SeaHttp can reduce average energy consumption of group communication in the WoT by 18.5%, compared with the Constrained Application Protocol (CoAP).
    References | Related Articles | Metrics
    A Task Execution Framework for Cloud-Assisted Sensor Networks
    Hai-Long Shi, Dong Li, Jie-Fan Qiu, Chen-Da Hou, Li Cui
    Journal of Computer Science and Technology, 2014, 29 (2): 216-226.  DOI: 10.1007/s11390-014-1424-y
    Abstract   PDF(1824KB) ( 1448 )   Chinese Summary
    As sensor networks are increasingly being deployed, there will be more sensors available in the same region, making it strategic to select the suitable ones to execute users' applications. We propose a task execution framework, named sTaskAlloc, to execute application energy effciently by two main parts. First, considering that the energy consumption of an application is inversely proportional to the utilization rate of sensors, we present a hot sensor selection algorithm, HotTasking, to minimize the energy consumption of new added applications by selecting the most suitable sensor. Second, when a sensor is shared by multiple applications, proposed MergeOPT (a concurrent tasks optimization algorithm) is used to optimize energy consumption further by eliminating redundant sampling tasks. Experimental results show that sTaskAlloc can save more than 76% of energy for new added applications compared with existing methods and reduce up to 72% of sampling tasks when a sensor is shared by more than 10 applications.
    References | Related Articles | Metrics
    An Elastic Architecture Adaptable to Various Application Scenarios
    Yue Wu, Yun-Ji Chen, Tian-Shi Chen, Qi Guo, Lei Zhang
    Journal of Computer Science and Technology, 2014, 29 (2): 227-238.  DOI: 10.1007/s11390-014-1425-x
    Abstract   PDF(2160KB) ( 1138 )   Chinese Summary
    The quantity of computer applications is increasing dramatically as the computer industry prospers. Meanwhile, even for one application, it has different requirements of performance and power in different scenarios. Although various processors with different architectures emerge to fit for the various applications in different scenarios, it is impossible to design a dedicated processor to meet all the requirements. Furthermore, dealing with uncertain processors significantly aggravates the burden of programmers and system integrators to achieve specific performance/power. In this paper, we propose elastic architecture (EA) to provide a uniform computing platform with high elasticity, i.e., the ratio of worst-case to best-case performance/power/performance-power trade-off, which can meet different requirements for different applications. It is achieved by dynamically adjusting architecture parameters (instruction set, branch predictor, data path, memory hierarchy, concurrency, status&control, and so on) on demand. The elasticity of our prototype implementation of EA, as Sim-EA, ranges from 3.31 to 14.34, with 5.41 in arithmetic average, for SPEC CPU2000 benchmark suites, which provides great flexibility to fulfill the different performance and power requirements in different scenarios. Moreover, Sim-EA can reduce the EDP (energy-delay product) for 31.14% in arithmetic average compared with a baseline fixed architecture. Besides, some subsequent experiments indicate a negative correlation between application intervals' lengths and their elasticities.
    References | Related Articles | Metrics
    A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
    Peng Chen, Lei Zhang, Yin-He Han, Yun-Ji Chen
    Journal of Computer Science and Technology, 2014, 29 (2): 239-246.  DOI: 10.1007/s11390-014-1426-9
    Abstract   PDF(2480KB) ( 1281 )   Chinese Summary
    The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-effciency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.
    References | Related Articles | Metrics
    Prevention from Soft Errors via Architecture Elasticity
    Yi-Xiao Yin, Yun-Ji Chen, Qi Guo, Tian-Shi Chen
    Journal of Computer Science and Technology, 2014, 29 (2): 247-254.  DOI: 10.1007/s11390-014-1427-8
    Abstract   PDF(411KB) ( 1495 )   Chinese Summary
    Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms of soft error mitigation take actions to deal with soft errors only after they have been detected. Instead of the passive responses, this paper proposes a novel mechanism which proactively prevents from the occurrence of soft errors via architecture elasticity. In the light of a predictive model, we adapt the processor architectures holistically and dynamically. The predictive model provides the ability to quickly and accurately predict the simulation target across different program execution phases on any architecture configurations by leveraging an artificial neural network model. Experimental results on SPEC CPU 2000 benchmarks show that our method inherently reduces the soft error rate by 33.2% and improves the energy efficiency by 18.3% as compared with the static configuration processor.
    References | Related Articles | Metrics
    MIMS:Towards a Message Interface Based Memory System
    Li-Cheng Chen, Ming-Yu Chen, Yuan Ruan, Yong-Bing Huang, Ze-Han Cui, Tian-Yue Lu, Yun-Gang Bao
    Journal of Computer Science and Technology, 2014, 29 (2): 255-272.  DOI: 10.1007/s11390-014-1428-7
    Abstract   PDF(1948KB) ( 1426 )   Chinese Summary
    The decades-old synchronous memory bus interface has restricted many innovations in the memory system, which is facing various challenges (or walls) in the era of multi-core and big data. In this paper, we argue that a message-based interface should be adopted to replace the traditional bus-based interface in the memory system. A novel message interface based memory system called MIMS is proposed. The key innovation of MIMS is that processors communicate with the memory system through a universal and flexible message packet interface. Each message packet is allowed to encapsulate multiple memory requests (or commands) and additional semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible for processing packets, scheduling memory requests, preparing responses, and executing specific commands with the help of semantic information. Under the MIMS framework, many previous innovations on memory architecture as well as new optimization opportunities such as address compression and continuous requests combination can be naturally incorporated. The experimental results on a 16-core cycle-detailed simulation system show that: with accurate granularity message, MIMS can improve system performance by 53.21% and reduce energy delay product (EDP) by 55.90%. Furthermore, it can improve effective bandwidth utilization by 62.42% and reduce memory access latency by 51% on average.
    References | Related Articles | Metrics
    Reinventing Memory System Design for Many-Accelerator Architecture
    Ying Wang, Lei Zhang, Yin-He Han, Hua-Wei Li
    Journal of Computer Science and Technology, 2014, 29 (2): 273-280.  DOI: 10.1007/s11390-014-1429-6
    Abstract   PDF(1888KB) ( 1296 )   Chinese Summary
    The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-effciency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual-interference behavior and cannot be effciently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.
    References | Related Articles | Metrics
    A High-Performance and Cost-Effcient Interconnection Network for High-Density Servers
    Wen-Tao Bao, Bin-Zhang Fu, Ming-Yu Chen, Li-Xin Zhang
    Journal of Computer Science and Technology, 2014, 29 (2): 281-292.  DOI: 10.1007/s11390-014-1430-0
    Abstract   PDF(1784KB) ( 2179 )   Chinese Summary
    The high-density server is featured as low power, low volume, and high computational density. With the rising use of high-density servers in data-intensive and large-scale web applications, it requires a high-performance and cost-effcient intra-server interconnection network. Most of state-of-the-art high-density servers adopt the fully-connected intra-server network to attain high network performance. Unfortunately, this solution costs too much due to the high degree of nodes. In this paper, we exploit the theoretically optimized Moore graph to interconnect the chips within a server. Accounting for the suitable size of applications, a 50-size Moore graph, called Hoffman-Singleton graph, is adopted. In practice, multiple chips should be integrated onto one processor board, which means that the original graph should be partitioned into homogeneous connected subgraphs. However, the existing partition scheme does not consider above problem and thus generates heterogeneous subgraphs. To address this problem, we propose two equivalent-partition schemes for the Hoffman-Singleton graph. In addition, a logic-based and minimal routing mechanism, which is both time and area effcient, is proposed. Finally, we compare the proposed network architecture with its counterparts, namely the fully-connected, Kautz and Torus networks. The results show that our proposed network can achieve competitive performance as fully-connected network and cost close to Torus.
    References | Related Articles | Metrics
    SAC:Exploiting Stable Set Model to Enhance CacheFiles
    Jian-Liang Liu, Yong-Le Zhang, Lin Yang, Ming-Yang Guo, Zhen-Jun Liu, Lu Xu
    Journal of Computer Science and Technology, 2014, 29 (2): 293-302.  DOI: 10.1007/s11390-014-1431-z
    Abstract   PDF(1071KB) ( 1726 )   Chinese Summary
    Client cache is an important technology for the optimization of distributed and centralized storage systems. As a representative client cache system, the performance of CacheFiles is limited by transition faults. Furthermore, CacheFiles just supports a simple LRU policy with a tightly-coupled design. To overcome these limitations, we propose to employ Stable Set Model (SSM) to improve CacheFiles and design an enhanced CacheFiles, SAC. SSM assumes that data access can be decomposed to access on some stable sets, in which elements are always repeatedly accessed or not accessed together. Using SSM methods can improve the cache management and reduce the effect of transition faults. We also adopt loosely-coupled methods to design prefetch and replacement policies. We implement our scheme on Linux 2.6.32 and measure the execution time of the scheme with various file I/O benchmarks. Experiments show that SAC can significantly improve I/O performance and reduce execution time up to 84%, compared with the existing CacheFiles.
    References | Related Articles | Metrics
    A Non-Forced-Write Atomic Commit Protocol for Cluster File Systems
    Bing-Qing Shao, Jun-Wei Zhang, Cai-Ping Zheng, Hao Zhang, Zhen-Jun Liu, Lu Xu
    Journal of Computer Science and Technology, 2014, 29 (2): 303-315.  DOI: 10.1007/s11390-014-1432-y
    Abstract   PDF(2861KB) ( 1193 )   Chinese Summary
    Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very ineffcient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%~60%, and the recovery time is only 1 second under 10 thousands uncompleted distributed metadata operations.
    References | Related Articles | Metrics
    Regular Paper
    OpenMDSP:Extending OpenMP to Program Multi-Core DSPs
    Jiang-Zhou He, Wen-Guang Chen, Guang-Ri Chen, Wei-Min Zheng, Zhi-Zhong Tang, Han-Dong Ye
    Journal of Computer Science and Technology, 2014, 29 (2): 316-331.  DOI: 10.1007/s11390-014-1433-x
    Abstract   PDF(1247KB) ( 1500 )   Chinese Summary
    Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, effcient multi-core DSP applications are very diffcult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on FreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.
    References | Related Articles | Metrics
    Continuous Probabilistic Subspace Skyline Query Processing Using Grid Projections
    Lei Zhao, Yan-Yan Yang, Xiaofang Zhou
    Journal of Computer Science and Technology, 2014, 29 (2): 332-344.  DOI: 10.1007/s11390-014-1434-9
    Abstract   PDF(558KB) ( 1634 )   Chinese Summary
    As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes (known as data uncertainty), to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and effciency of our PSS algorithm.
    References | Related Articles | Metrics
  Journal Online
Just Accepted
Top Cited Papers
Top 30 Most Read
Paper Lists of Areas
Special Issues
   ScholarOne Manuscripts
   Log In

User ID:


  Forgot your password?

Enter your e-mail address to receive your account information.

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved