Journal of Computer Science and Technology
Quick Search in JCST
 Advanced Search 
      Home | PrePrint | SiteMap | Contact Us | FAQ
Indexed by   SCIE, EI ...
Bimonthly    Since 1986
   ABOUT JCST            

Journal of Computer Science and Technology
2018 Vol.33 No.1
Published 2018-01-05

Computer Architecture and Systems
Regular Paper
Computer Architecture and Systems
1 Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-Jie Xu, Cong Liu, Hai-Feng Liu, Shan Tang, Allen Rush, Willian Chen, Shao-Li Liu, Yun-Ji Chen, Tian-Shi Chen
BENCHIP: Benchmarking Intelligence Processors
The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BenchIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BenchIP consists of two sets of benchmarks:microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors. BenchIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BenchIP will be open-sourced soon.
2018 Vol. 33 (1): 1-23 [Abstract] ( 155 ) [HTML 1KB] [PDF 1433KB] ( 265 )
24 Rui-Tao Liu, Zuo-Ning Chen
A Large-Scale Study of Failures on Petascale Supercomputers
With the rapid development of supercomputers, the scale and complexity are ever increasing, and the reliability and resilience are faced with larger challenges. There are many important technologies in fault tolerance, such as proactive failure avoidance technologies based on fault prediction, reactive fault tolerance based on checkpoint, and scheduling technologies to improve reliability. Both qualitative and quantitative descriptions on characteristics of system faults are very critical for these technologies. This study analyzes the source of failures on two typical petascale supercomputers called Sunway BlueLight (based on multi-core CPUs) and Sunway TaihuLight (based on heterogeneous manycore CPUs). It uncovers some interesting fault characteristics and finds unknown correlation relationship among main components' faults. Finally the paper analyzes the failure time of the two supercomputers in various grains of resource and different time spans, and builds a uniform multi-dimensional failure time model for petascale supercomputers.
2018 Vol. 33 (1): 24-41 [Abstract] ( 80 ) [HTML 1KB] [PDF 2058KB] ( 93 )
42 Xue-Yan Wang, Qiang Zhou, Yi-Ci Cai, Gang Qu
Spear and Shield: Evolution of Integrated Circuit Camouflaging
Intellectual property (IP) protection is one of the hardcore problems in hardware security. Semiconductor industry still lacks effective and proactive defense to shield IPs from reverse engineering (RE) based attacks. Integrated circuit (IC) camouflaging technique fills this gap by replacing some conventional logic gates in the IPs with specially designed logic cells (called camouflaged gates) without changing the functions of the IPs. The camouflaged gates can perform different logic functions while maintaining an identical look to RE attackers, thus preventing them from obtaining the layout information of the IP directly from RE tools. Since it was first proposed in 2012, circuit camouflaging has become one of the hottest research topics in hardware security focusing on two fundamental problems. How to choose the types of camouflaged gates and decide where to insert them in order to simultaneously minimize the performance overhead and optimize the RE complexity? How can an attacker de-camouflage a camouflaged circuit and complete the RE attack? In this article, we review the evolution of circuit camouflaging through this spear and shield race. First, we introduce the design methods of four different kinds of camouflaged cells based on true/dummy contacts, static random access memory (SRAM), doping, and emerging devices, respectively. Then we elaborate four representative de-camouflaging attacks:brute force attack, IC testing based attack, satisfiability-based (SAT-based) attack, and the circuit partition based attack, and the corresponding countermeasures:clique-based camouflaging, CamoPerturb, AND-tree camouflaging, and equivalent class based camouflaging, respectively. We argue that the current research efforts should be on reducing overhead introduced by circuit camouflaging and defeating de-camouflaging attacks. We point out that exploring features of emerging devices could be a promising direction. Finally, as a complement to circuit camouflaging, we conclude with a brief review of other state-of-the-art IP protection techniques.
2018 Vol. 33 (1): 42-57 [Abstract] ( 91 ) [HTML 1KB] [PDF 864KB] ( 155 )
58 Jian Liu, Yun-Peng Chai, Xiao Qin, Yao-Hong Liu
Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems
Deduplication has been commonly used in both enterprise storage systems and cloud storage. To overcome the performance challenge for the selective restore operations of deduplication systems, solid-state-drive-based (i.e., SSD-based) read cache can be deployed for speeding up by caching popular restore contents dynamically. Unfortunately, frequent data updates induced by classical cache schemes (e.g., LRU and LFU) significantly shorten SSDs' lifetime while slowing down I/O processes in SSDs. To address this problem, we propose a new solution-LOP-Cache-to greatly improve the write durability of SSDs as well as I/O performance by enlarging the proportion of long-term popular (LOP) data among data written into SSD-based cache. LOP-Cache keeps LOP data in the SSD cache for a long time period to decrease the number of cache replacements. Furthermore, it prevents unpopular or unnecessary data in deduplication containers from being written into the SSD cache. We implemented LOP-Cache in a prototype deduplication system to evaluate its performance. Our experimental results indicate that LOP-Cache shortens the latency of selective restore by an average of 37.3% at the cost of a small SSD-based cache with only 5.56% capacity of the deduplicated data. Importantly, LOP-Cache improves SSDs' lifetime by a factor of 9.77. The evidence shows that LOP-Cache offers a cost-efficient SSD-based read cache solution to boost performance of selective restore for deduplication systems.
2018 Vol. 33 (1): 58-78 [Abstract] ( 89 ) [HTML 1KB] [PDF 1427KB] ( 105 )
79 Dongchul Park, Weiping He, David H. C. Du
Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision
Hot data identification is crucial for many applications though few investigations have examined the subject. All existing studies focus almost exclusively on frequency. However, effectively identifying hot data requires equally considering recency and frequency. Moreover, previous studies make hot data decisions at the data block level. Such a fine-grained decision fits particularly well for flash-based storage because its random access achieves performance comparable with its sequential access. However, hard disk drives (HDDs) have a significant performance disparity between sequential and random access. Therefore, unlike flash-based storage, exploiting asymmetric HDD access performance requires making a coarse-grained decision. This paper proposes a novel hot data identification scheme adopting multiple bloom filters to efficiently characterize recency as well as frequency. Consequently, it not only consumes 50% less memory and up to 58% less computational overhead, but also lowers false identification rates up to 65% compared with a state-of-the-art scheme. Moreover, we apply the scheme to a next generation HDD technology, i.e., Shingled Magnetic Recording (SMR), to verify its effectiveness. For this, we design a new hot data identification based SMR drive with a coarse-grained decision. The experiments demonstrate the importance and benefits of accurate hot data identification, thereby improving the proposed SMR drive performance by up to 42%.
2018 Vol. 33 (1): 79-97 [Abstract] ( 87 ) [HTML 1KB] [PDF 1207KB] ( 105 )
98 Yu-Tao Liu, Dong Du, Yu-Bin Xia, Hai-Bo Chen, Bin-Yu Zang, Zhenkai Liang
SplitPass: A Mutually Distrusting Two-Party Password Manager
Using a password manager is known to be more convenient and secure than not using one, on the assumption that the password manager itself is safe. However recent studies show that most popular password managers have security vulnerabilities that may be fooled to leak passwords without users' awareness. In this paper, we propose a new password manager, SplitPass, which vertically separates both the storage and access of passwords into two mutually distrusting parties. During login, all the parties will collaborate to send their password shares to the web server, but none of these parties will ever have the complete password, which significantly raises the bar of a successful attack to compromise all of the parties. To retain transparency to existing applications and web servers, SplitPass seamlessly splits the secure sockets layer (SSL) and transport layer security (TCP) sessions to process on all parties, and makes the joining of two password shares transparent to the web servers. We have implemented SplitPass using an Android phone and a cloud assistant and evaluated it using 100 apps from top free apps in the Android official market. The evaluation shows that SplitPass securely protects users' passwords, while incurring little performance overhead and power consumption.
2018 Vol. 33 (1): 98-115 [Abstract] ( 83 ) [HTML 1KB] [PDF 925KB] ( 96 )
116 Xu Tan, Xiao-Chun Ye, Xiao-Wei Shen, Yuan-Chao Xu, Da Wang, Lunkai Zhang, Wen-Ming Li, Dong-Rui Fan, Zhi-Min Tang
A Pipelining Loop Optimization Method for Dataflow Architecture
With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good foundation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.
2018 Vol. 33 (1): 116-130 [Abstract] ( 112 ) [HTML 1KB] [PDF 754KB] ( 90 )
131 Fa-Qiang Sun, Gui-Hai Yan, Xin He, Hua-Wei Li, Yin-He Han
CPicker: Leveraging Performance-Equivalent Configurations to Improve Data Center Energy Efficiency
The poor energy proportionality of server is seen as the principal source for low energy efficiency of modern data centers. We find that different resource configurations of an application lead to similar performance, but have distinct energy consumption. We call this phenomenon as "performance-equivalent resource configurations (PERC)", and its performance range is called equivalent region (ER). Based on PERC, one basic idea for improving energy efficiency is to select the most efficient configuration from PERC for each application. However, it cannot support every application to obtain optimal solution when thousands of applications are run simultaneously on resource-bounded servers. Here we propose a heuristic scheme, CPicker, based on genetic programming to improve energy efficiency of servers. To speed up convergence, CPicker initializes a high quality population by first choosing configurations from regions that have high energy variation. Experiments show that CPicker obtains above 17% energy efficiency improvement compared with the greedy approach, and less than 4% efficiency loss compared with the oracle case.
2018 Vol. 33 (1): 131-144 [Abstract] ( 90 ) [HTML 1KB] [PDF 1915KB] ( 86 )
145 Xu Tan, Xiao-Wei Shen, Xiao-Chun Ye, Da Wang, Dong-Rui Fan, Lunkai Zhang, Wen-Ming Li, Zhi-Min Zhang, Zhi-Min Tang
A Non-Stop Double Buffering Mechanism for Dataflow Architecture
Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataflow architecture achieves a 16.2% average efficiency improvement over that without the optimization.
2018 Vol. 33 (1): 145-157 [Abstract] ( 81 ) [HTML 1KB] [PDF 1419KB] ( 104 )
158 Li-Jing Wang, Yong-Qiang Lv, Ilya Moiseenko, Dong-Sheng Wang
A Dataflow-Oriented Programming Interface for Named Data Networking
Inheriting from a data-driven communication pattern other than a location-driven pattern, named data networking (NDN) offers better support to network-layer dataflow. However, the application developers have to handle complex tasks, such as data segmentation, packet verification, and flow control, due to the lack of proper transport-layer protocols over the network layer. In this study, we design a dataflow-oriented programming interface to provide transport strategies for NDN, which greatly improves the efficiency in developing applications. This interface presents two application data unit (ADU) retrieval strategies according to different data publishing patterns, in which it adopts an adaptive ADU pipelining algorithm to control the dataflow based on the current network status and data generation rate. The interface also offers network measurement strategies to monitor an abundance of critical metrics influencing the application performance. We verify the functionality and performance of our interface by implementing a video streaming application spanning 11 time zones over the worldwide NDN testbed. Our experiments show that the interface can efficiently support developing high-performance and dataflow-driven NDN applications.
2018 Vol. 33 (1): 158-168 [Abstract] ( 73 ) [HTML 1KB] [PDF 1047KB] ( 93 )
169 Chen Feng, Chun-Dian Li, Rui Li
Indexing Techniques of Distributed Ordered Tables: A Survey and Analysis
Many NoSQL (Not Only SQL) databases were proposed to store and query on a huge amount of data. Some of them like BigTable, PNUTS, and HBase, can be modeled as distributed ordered tables (DOTs). Many additional indexing techniques have been presented to support queries on non-key columns for DOTs. However, there was no comprehensive analysis or comparison of these techniques, which brings troubles to users in selecting or proposing a proper indexing technique for a certain workload. This paper proposes a taxonomy based on six indexing issues to classify indexing techniques on DOTs and provides a comprehensive review of the state-of-the-art techniques. Based on the taxonomy, we propose a performance model named QSModel to estimate the query time and storage cost of these techniques and run experiments on a practical workload from Tencent to evaluate this model. The results show that the maximum error rates of the query time and storage cost are 24.2% and 9.8%, respectively. Furthermore, we propose IndexComparator, an open source project that implements representative indexing techniques. Therefore, users can select the best-fit indexing technique based on both theoretical analysis and practical experiments.
2018 Vol. 33 (1): 169-189 [Abstract] ( 109 ) [HTML 1KB] [PDF 2555KB] ( 73 )
Regular Paper
190 Rong Wang, Tian-Lei Hu, Gang Chen
Where Do Local Experts Go? Evaluating User Geo-Topical Similarity for Top-N Place Recommendation
Recent years have witnessed the ever-growing popularity of location-based social network (LBSN) services. Top-N place recommendation, which aims at retrieving N most appealing places for a target user, has thus gained increasing importance. Yet existing solutions to this problem either provide non-personalized recommendations by selecting nearby popular places, or resort to collaborative filtering (CF) by treating each place as an independent item, overlooking the geographical and semantic correlations among places. In this paper, we propose GoTo, a collaborative recommender that provides top-N personalized place recommendation in LBSNs. Compared with existing methods, GoTo achieves its effectiveness by exploiting the wisdom of the so-called local experts, namely those who are geographically close and have similar preferences with regard to a certain user. At the core of GoTo lies a novel user similarity measure called geo-topical similarity, which combines geographical and semantic correlations among places for discovering local experts. In specific, the geo-topical similarity uses Gaussian mixtures to model users' real-life geographical patterns, and extracts users' topical preferences from the attached tags of historically visited places. Extensive experiments on real LBSN datasets show that compared with baseline methods, GoTo can improve the performance of top-N place recommendation by up to 50% in terms of accuracy.
2018 Vol. 33 (1): 190-206 [Abstract] ( 86 ) [HTML 1KB] [PDF 1041KB] ( 70 )
207 Jun-Li Zhao, Zhong-Ke Wu, Zhen-Kuan Pan, Fu-Qing Duan, Jin-Hua Li, Zhi-Han Lv, Kang Wang, Yu-Cong Chen
3D Face Similarity Measure by Fréchet Distances of Geodesics
3D face similarity is a critical issue in computer vision, computer graphics and face recognition and so on. Since Fréchet distance is an effective metric for measuring curve similarity, a novel 3D face similarity measure method based on Fréchet distances of geodesics is proposed in this paper. In our method, the surface similarity between two 3D faces is measured by the similarity between two sets of 3D curves on them. Due to the intrinsic property of geodesics, we select geodesics as the comparison curves. Firstly, the geodesics on each 3D facial model emanating from the nose tip point are extracted in the same initial direction with equal angular increment. Secondly, the Fréchet distances between the two sets of geodesics on the two compared facial models are computed. At last, the similarity between the two facial models is computed based on the Fréchet distances of the geodesics obtained in the second step. We verify our method both theoretically and practically. In theory, we prove that the similarity of our method satisfies three properties:reflexivity, symmetry, and triangle inequality. And in practice, experiments are conducted on the open 3D face database GavaDB, Texas 3D Face Recognition database, and our 3D face database. After the comparison with iso-geodesic and Hausdorff distance method, the results illustrate that our method has good discrimination ability and can not only identify the facial models of the same person, but also distinguish the facial models of any two different persons.
2018 Vol. 33 (1): 207-222 [Abstract] ( 92 ) [HTML 1KB] [PDF 1963KB] ( 122 )
223 Kang Li, Fa-Zhi He, Hai-Ping Yu
Robust Visual Tracking Based on Convolutional Features with Illumination and Occlusion Handing
Visual tracking is an important area in computer vision. How to deal with illumination and occlusion problems is a challenging issue. This paper presents a novel and efficient tracking algorithm to handle such problems. On one hand, a target's initial appearance always has clear contour, which is light-invariant and robust to illumination change. On the other hand, features play an important role in tracking, among which convolutional features have shown favorable performance. Therefore, we adopt convolved contour features to represent the target appearance. Generally speaking, first-order derivative edge gradient operators are efficient in detecting contours by convolving them with images. Especially, the Prewitt operator is more sensitive to horizontal and vertical edges, while the Sobel operator is more sensitive to diagonal edges. Inherently, Prewitt and Sobel are complementary with each other. Technically speaking, this paper designs two groups of Prewitt and Sobel edge detectors to extract a set of complete convolutional features, which include horizontal, vertical and diagonal edges features. In the first frame, contour features are extracted from the target to construct the initial appearance model. After the analysis of experimental image with these contour features, it can be found that the bright parts often provide more useful information to describe target characteristics. Therefore, we propose a method to compare the similarity between candidate sample and our trained model only using bright pixels, which makes our tracker able to deal with partial occlusion problem. After getting the new target, in order to adapt appearance change, we propose a corresponding online strategy to incrementally update our model. Experiments show that convolutional features extracted by well-integrated Prewitt and Sobel edge detectors can be efficient enough to learn robust appearance model. Numerous experimental results on nine challenging sequences show that our proposed approach is very effective and robust in comparison with the state-of-the-art trackers.
2018 Vol. 33 (1): 223-236 [Abstract] ( 85 ) [HTML 1KB] [PDF 7139KB] ( 94 )
Journal of Computer Science and Technology
   ISSN 1000-9000
   CN 11-2296/TP
Journal Online
     Current Issue
     Online Before Printed
     Top Cited Papers
(Updated on Apr. 15, 2015)
Updated on
     Top 30 Most Read
     Top 30 Most Download

ScholarOne Manuscripts Log In

User ID:


Forgot your password?

Enter your e-mail address to receive an e-mail with your account information.

Copyright © 2006 JCST! All rights reserved Editorial Office
Journal of Computer Science and Technology Institute of Computing Technology Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Support by Beijing Magtech