Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China Sponsored by: Institute of Computing Technology, CAS & China Computer Federation Undertaken by: Institute of Computing Technology, CAS Published by: SCIENCE PRESS, BEIJING, CHINA Distributed by: China: All Local Post Offices Other Countries: Springer
Metcalfe's law states that the value of a network grows as the square of the number of its users (V ∝ n2), which was validated by actual data of Facebook and Tencent in 2013–2015. Since then, the users and the values of Facebook and Tencent have increased significantly. Is Metcalfe's law still valid? This paper leverages the latest data of Facebook and Tencent to fit the network effect laws and makes the following observations: 1) actual data of network values fit a cube law (V ∝ n3) better than Metcalfe's law; 2) actual data of network costs fit a cube law; 3) actual data of network sizes show a growth trend matching the netoid function well. We also discuss the underlying factors affecting such observations and the generality of the network effect laws.
Among the plethora of IoT (Internet of Things) applications, the smart home is one of the fastest-growing. However, the rapid development of the smart home has also made smart home systems a target for attackers. Recently, researchers have made many efforts to investigate and enhance the security of smart home systems. Toward a more secure smart home ecosystem, we present a detailed literature review on the security of smart home systems. Specifically, we categorize smart home systems’ security issues into the platform, device, and communication issues. After exploring the research and specific issues in each of these security areas, we summarize the root causes of the security flaws in today's smart home systems, which include the heterogeneity of internal components of the systems, vendors' customization, the lack of clear responsibility boundaries and the absence of standard security standards. Finally, to better understand the security of smart home systems and potentially provide better protection for smart home systems, we propose research directions, including automated vulnerability mining, vigorous security checking, and data-driven security analysis.
In the post-Moore era, the further reduction of the energy consumption of computing systems cannot be achieved by transistor scaling. Multiple alternative paths are now being actively explored for continuous improvement in energy efficiency, including novel device research and novel computer architecture research. Meanwhile, it is noticed that many applications we widely use today are error-tolerant, that is, a controlled amount of errors occurred in their internal computation do not affect their application-level quality. Given this, a new computing paradigm, called approximate computing, has been attracting more attention in the past decade. Targeting at the error-tolerant applications, it allows some relaxation in the computation accuracy to achieve further reduction in energy consumption. Despite the recent progress in the research on approximate computing, there are still many new opportunities to design better approximate computing circuits and systems, due to the additional dimension brought by the approximate computing paradigm, namely, the error. Thus, we organize this special section on approximate computing circuits and systems at the Journal of Computer Science and Technology (JCST) jointly with the 2022 CCF Chip Conference. There are six articles in total in this special section.
The first and the second articles provide survey on approximate computing circuits and systems. The first article “A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications” by Que et al. presents a survey of approximate computing from arithmetic units design to high-level applications. In this article, the authors try to give readers a comprehensive and insightful understanding of approximate computing. The second article “A Review of Reliability Issues Related to Approximate Circuits” by Wang et al. reviews recent progress on approximate computing from another perspective, i.e., reliability. It focuses on three aspects: error analysis of approximate circuits, reliability prediction and vulnerability test of approximate circuits, and fault-tolerant techniques exploiting approximate computing.
The third, the fourth, and the fifth articles introduce new methods for analyzing and designing traditional CMOS-based approximate circuits and systems. The third article “PMF Estimation in Approximate Circuits with Consideration of Conversion Calculation Order” by Dou and Wang studies how to efficiently evaluate the errors of a complex approximate system consisting of approximate adders and approximate multipliers. It proposes an effective error propagation method for obtaining the probability mass function of a complex approximate system. The fourth article “LMM: A Fixed-Point Linear Mapping Based Approximate Multiplier for IoT” by Wu et al. proposes a fixed-point approximate multiplier that employs a linear mapping technique, which enables the configurability of approximation levels. Besides, a dynamic truncation method and a normalization module are introduced to further optimize the hardware cost. The fifth article “Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators” by Li et al. proposes an approximate processing element dedicatedly designed for convolutional neural network (CNN) accelerators by synergistically considering the data representation, multiplication and accumulation.
The last article “LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerators via Layer-Centric Co-Optimization” by Zhao et al. focuses on performing approximate dot products using emerging resistive random access memory (RRAM). Although there is much progress in the design of RRAM-based architecture, it still faces challenges on energy consumption and endurance. The authors propose a co-optimizing scheme to address these two issues jointly, while still providing an inference accuracy guarantee.
Finally, we would also like to acknowledge the help from the 2022 CCF Chip Conference in jointly organizing this special section. The CCF Chip Conference is jointly organized by the CCF Technical Committees on Integrated Circuit Design, Fault Tolerant Computing, Computer Architecture, and Information Storage Technology. It provides a forum for researchers and engineers working in areas related to integrated circuits to present their latest technical development and research progress and exchange ideas. It is held once every two years. The 2022 CCF Chip Conference was the inaugural one, which was held in Nanjing, China, from July 29 to July 31, 2022. The conference attracted more than 1300 participants. During the conference, the recent progress on chip design, electronic design automation, new computer architecture, advanced storage technology, application of fault-tolerant computing, etc. was presented and extensively discussed.
We hope you enjoy reading the articles in this special section of JCST on approximate computing circuits and systems and hope that you find them useful and informative.
Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers. Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing. However, as video/image processing and machine learning algorithms are widespread, the technique of approximate computing in these applications has become a hot topic. The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms, and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality. This paper presents a survey of approximate computing from arithmetic units design to high-level applications, in which we try to give researchers a comprehensive and insightful understanding of approximate computing. We believe that approximate computing will play an important role in the circuit and system design in the future, especially with the rapid development of artificial intelligence algorithms and their related applications.
As one of the most promising paradigms of integrated circuit design, the approximate circuit has aroused widespread concern in the scientific community. It takes advantage of the inherent error tolerance of some applications and relaxes the accuracy for reductions in area and power consumption. This paper aims to provide a comprehensive survey of reliability issues related to approximate circuits, which covers three concerns: error characteristic analysis, reliability and test, and reliable design involving approximate circuits. The error characteristic analysis is used to compare the outputs of the approximate circuit with those of its precise counterpart, which can help to find the most appropriate approximate design for a specific application in the large design space. With the approximate design getting close to physical realization, manufacturing defects and operational faults are inevitable; therefore, the reliability prediction and vulnerability test become increasingly important. However, the research on approximate circuit reliability and test is insufficient and needs more attention. Furtherly, although there is some existing work combining the approximate design with fault tolerant techniques, the reliability-enhancement approaches for approximate circuits are lacking.
As an emerging computing technology, approximate computing enables computing systems to utilize hardware resources efficiently. Recently, approximate arithmetic units have received extensive attention and have been employed as hardware modules to build approximate circuit systems, such as approximate accelerators. In order to make the approximate circuit system meet the application requirements, it is imperative to quickly estimate the error quality caused by the approximate unit, especially in the high-level synthesis of the approximate circuit. However, there are few studies in the literature on how to efficiently evaluate the errors in the approximate circuit system. Hence, this paper focuses on error evaluation techniques for circuit systems consisting of approximate adders and approximate multipliers, which are the key hardware components in fault-tolerant applications. In this paper, the characteristics of probability mass function (PMF) based estimation are analyzed initially, and then an optimization technique for PMF-based estimation is proposed with consideration of these features. Finally, experiments prove that the optimization technology can reduce the time required for PMF-based estimation and improve the estimation quality.
The development of IoT (Internet of Things) calls for circuit designs with energy and area efficiency for edge devices. Approximate computing which trades unnecessary computation precision for hardware cost savings is a promising direction for error-tolerant applications. Multipliers, as frequently invoked basic modules which consume non-trivial hardware costs, have been introduced approximation to achieve distinct energy and area savings for data-intensive applications. In this paper, we propose a fixed-point approximate multiplier that employs a linear mapping technique, which enables the configurability of approximation levels and the unbiasedness of computation errors. We then introduce a dynamic truncation method into the proposed multiplier design to cover a wider and more fine-grained configuration range of approximation for more flexible hardware cost savings. In addition, a novel normalization module is proposed for the required shifting operations, which balances the occupied area and the critical path delay compared with normal shifters. The introduced errors of our proposed design are analyzed and expressed by formulas which are validated by experimental results. Experimental evaluations show that compared with accurate multipliers, our proposed approximate multiplier design provides maximum area and power savings up to 49.70% and 66.39% respectively with acceptable computation errors.
As a primary computation unit, a processing element (PE) is key to the energy efficiency of a convolutional neural network (CNN) accelerator. Taking advantage of the inherent error tolerance of CNNs, approximate computing with high hardware efficiency has been considered for implementing the computation units of CNN accelerators. However, individual approximate designs such as multipliers and adders can only achieve limited accuracy and hardware improvements. In this paper, an approximate PE is dedicatedly devised for CNN accelerators by synergistically considering the data representation, multiplication and accumulation. An approximate data format is defined for the weights using stochastic rounding. This data format enables a simple implementation of multiplication by using small lookup tables, an adder and a shifter. Two approximate accumulators are further proposed for the product accumulation in the PE. Compared with the exact 8-bit fixed-point design, the proposed PE saves more than 29% and 20% in power-delay product for 3 × 3 and 5 × 5 sum of products, respectively. Also, compared with the PEs consisting of state-of-the-art approximate multipliers, the proposed design shows significantly smaller error bias with lower hardware overhead. Moreover, the application of the approximate PEs in CNN accelerators is analyzed by implementing a multi-task CNN for face detection and alignment. We conclude that 1) an approximate PE is more effective for face detection than for alignment, 2) an approximate PE with high statistically-measured accuracy does not necessarily result in good quality in face detection, and 3) properly increasing the number of PEs in a CNN accelerator can improve its power and energy efficiency.
Resistive random access memory (RRAM) enables the functionality of operating massively parallel dot products and accumulations. RRAM-based accelerator is such an effective approach to bridging the gap between Internet of Things devices’ constrained resources and deep neural networks’ tremendous cost. Due to the huge overhead of Analog to Digital (A/D) and digital accumulations, analog RRAM buffer is introduced to extend the processing in analog and in approximation. Although analog RRAM buffer offers potential solutions to A/D conversion issues, the energy consumption is still challenging in resource-constrained environments, especially with enormous intermediate data volume. Besides, critical concerns over endurance must also be resolved before the RRAM buffer could be frequently used in reality for DNN inference tasks. Then we propose LayCO, a layer-centric co-optimizing scheme to address the energy and endurance concerns altogether while strictly providing an inference accuracy guarantee. LayCO relies on two key ideas: 1) co-optimizing with reduced supply voltage and reduced bit-width of accelerator architectures to increase the DNN’s error tolerance and achieve the accelerator’s energy efficiency, and 2) efficiently mapping and swapping individual DNN data to a corresponding RRAM partition in a way that meets the endurance requirements. The evaluation with representative DNN models demonstrates that LayCO outperforms the baseline RRAM buffer based accelerator by 27x improvement in energy efficiency (over TIMELY-like configuration), 308x in lifetime prolongation and 6x in area reduction (over RAQ) while maintaining the DNN accuracy loss less than 1%.
Non-volatile memories (NVMs) provide lower latency and higher bandwidth than block devices. Besides, NVMs are byte-addressable and provide persistence that can be used as memory-level storage devices (non-volatile main memory, NVMM). These features change storage hierarchy and allow CPU to access persistent data using load/store instructions. Thus, we can directly build a file system on NVMM. However, traditional file systems are designed based on slow block devices. They use a deep and complex software stack to optimize file system performance. This design results in software overhead being the dominant factor affecting NVMM file systems. Besides, scalability, crash consistency, data protection, and cross-media storage should be reconsidered in NVMM file systems. We survey existing work on optimizing NVMM file systems. First, we analyze the problems when directly using traditional file systems on NVMM, including heavy software overhead, limited scalability, inappropriate consistency guarantee techniques, etc. Second, we summarize the technique of 30 typical NVMM file systems and analyze their advantages and disadvantages. Finally, we provide a few suggestions for designing a high-performance NVMM file system based on real hardware Optane DC persistent memory module. Specifically, we suggest applying various techniques to reduce software overheads, improving the scalability of virtual file system (VFS), adopting highly-concurrent data structures (e.g, lock and index), using memory protection keys (MPK) for data protection, and carefully designing data placement/migration for cross-media file system.
Community detection is a vital task in many fields, such as social networks and financial analysis, to name a few. The Louvain method, the main workhorse of community detection, is a popular heuristic method. To apply it to large-scale graph networks, researchers have proposed several parallel Louvain methods (PLMs), which suffer from two challenges: the latency in the information synchronization, and the community swap. To tackle these two challenges, we propose an isolate sets based parallel Louvain method (IPLM) and a fusion IPLM with the hashtables based Louvain method (FIPLM), which are based on a novel graph partition algorithm. Our graph partition algorithm divides the graph network into subgraphs called isolate sets, in which the vertices are relatively decoupled from others. We first describe the concepts and properties of the isolate set. Second we propose an algorithm to divide the graph network into isolate sets, which enjoys the same computation complexity as the breadth-first search. Third, we propose IPLM, which can efficiently calculate and update vertices information in parallel without latency or community swap. Finally, we achieve further acceleration by FIPLM, which maintains a high quality of community detection with a faster speedup than IPLM. Our two methods are for shared-memory architecture, and we implement our methods on an 8-core PC; the experiments show that IPLM achieves a maximum speedup of 4.62x and outputs higher modularity (maximum 4.76%) than the serial Louvain method on 14 of 18 datasets. Moreover, FIPLM achieves a maximum speedup of 7.26x.
Hardware prefetching and replacement policies are two techniques to improve the performance of the memory subsystem. While prefetching hides memory latency and improves performance, interactions take place with the cache replacement policies, thereby introducing performance variability in the application. To improve the accuracy of reuse of cache blocks in the presence of hardware prefetching, we propose Prefetch-Adaptive Intelligent Cache Replacement Policy (PAIC). PAIC is designed with separate predictors for prefetch and demand requests, and uses machine learning to optimize reuse prediction in the presence of prefetching. By distinguishing reuse predictions for prefetch and demand requests, PAIC can better combine the performance benefits from prefetching and replacement policies. We evaluate PAIC on a set of 27 memory-intensive programs from the SPEC 2006 and SPEC 2017. Under single-core configuration, PAIC improves performance over Least Recently Used (LRU) replacement policy by 37.22%, compared with improvements of 32.93% for Signature-based Hit Predictor (SHiP), 34.56% for Hawkeye, and 34.43% for Glider. Under the four-core configuration, PAIC improves performance over LRU by 20.99%, versus 13.23% for SHiP, 17.89% for Hawkeye and 15.50% for Glider.
Online testing is critical to ensuring reliable operations of the next generation of supercomputers based on a kilo-core network-on-chip (NoC) interconnection fabric. We present a parallel software-based self-testing (SBST) solution that makes use of the bounded model checking (BMC) technique to generate test sequences and parallel packets. In this method, the parallel SBST with BMC derives the leading sequence for each router’s internal function and detects all functionally-testable faults related to the function. A Monte-Carlo simulation algorithm is then used to search for the approximately optimum configuration of the parallel packets, which guarantees the test quality and minimizes the test cost. Finally, a multi-threading technology is used to ensure that the Monte-Carlo simulation can reach the approximately optimum configuration in a large random space and reduce the generating time of the parallel test. Experimental results show that the proposed method achieves a high fault coverage with a reduced test overhead. Moreover, by performing online testing in the functional mode with SBST, it effectively avoids the over-testing problem caused by functionally untestable turns in kilo-core NoCs.
Speculative execution attacks can leak arbitrary program data under malicious speculation, presenting a severe security threat. Based on two key observations, this paper presents a software-transparent defense mechanism called speculative secret flow tracking (SSFT), which is capable of defending against all cache-based speculative execution attacks with a low performance overhead. First, we observe that the attacker must use array or pointer variables in the victim code to access arbitrary memory data. Therefore, we propose a strict definition of secret data to reduce the amount of data to be protected. Second, if the load is not data-dependent and control-dependent on secrets, its speculative execution will not leak any secrets. Thus, this paper introduces the concept of speculative secret flow to analyze how secret data are obtained and propagated during speculative execution. By tracking speculative secret flow in hardware, SSFT can identify all unsafe speculative loads (USLs) that are dependent on secrets. Moreover, SSFT exploits three different methods to constrain USLs’ speculative execution and prevent them from leaking secrets into the cache and translation lookaside buffer (TLB) states. This paper evaluates the performance of SSFT on the SPEC CPU 2006 workloads, and the results show that SSFT is effective and its performance overhead is very low. To defend against all speculative execution attack variants, SSFT only incurs an average slowdown of 4.5% (Delay USL-L1Miss) or 3.8% (Invisible USLs) compared to a non-secure processor. Our analysis also shows that SSFT maintains a low hardware overhead.
Image deraining is a highly ill-posed problem. Although significant progress has been made due to the use of deep convolutional neural networks, this problem still remains challenging, especially for the details restoration and generalization to real rain images. In this paper, we propose a deep residual channel attention network (DeRCAN) for deraining. The channel attention mechanism is able to capture the inherent properties of the feature space and thus facilitates more accurate estimations of structures and details for image deraining. In addition, we further propose an unsupervised learning approach to better solve real rain images based on the proposed network. Extensive qualitative and quantitative evaluation results on both synthetic and real-world images demonstrate that the proposed DeRCAN performs favorably against state-of-the-art methods.
Crowdtesting has emerged as an attractive and economical testing paradigm that features testers from different countries, with various backgrounds and working conditions. Recent developments in crowdsourcing testing suggest that it is feasible to manage test populations and processes, but they are often outside the scope of standard testing theory. This paper explores how to allocate service-testing tasks to proper testers in an ever-changing crowdsourcing environment. We formalize it as an optimization problem with the objective to ensure the testing quality of the crowds, while considering influencing factors such as knowledge capability, the rewards, the network connections, and the geography and the skills required. To solve the proposed problem, we design a task assignment algorithm based on the Differential Evolution (DE) algorithm. Extensive experiments are conducted to evaluate the efficiency and effectiveness of the proposed algorithm in real and synthetic data, and the results show better performance compared with other heuristic-based algorithms.
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn