SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China Sponsored by: Institute of Computing Technology, CAS & China Computer Federation Undertaken by: Institute of Computing Technology, CAS Published by: SCIENCE PRESS, BEIJING, CHINA Distributed by: China: All Local Post Offices Other Countries: Springer
Uniform memory multicore neural network accelerators (UNNAs) furnish huge computing power to emerging neural network applications. Meanwhile, with neural network architectures going deeper and wider, the limited memory capacity has become a constraint to deploy models on UNNA platforms. Therefore how to efficiently manage memory space and how to reduce workload footprints are urgently significant. In this paper, we propose Tetris: a heuristic static memory management framework for UNNA platforms. Tetris reconstructs execution flows and synchronization relationships among cores to analyze each tensor’s liveness interval. Then the memory management problem is converted to a sequence permutation problem. Tetris uses a genetic algorithm to explore the permutation space to optimize the memory management strategy and reduce memory footprints. We evaluate several typical neural networks and the experimental results demonstrate that Tetris outperforms the state-of-the-art memory allocation methods, and achieves an average memory reduction ratio of 91.9% and 87.9% for a quad-core and a 16-core Cambricon-X platform, respectively.
Recently, analyzing big data on the move is booming. It requires that the hardware resource should be low volume, low power, light in weight, high-performance, and highly scalable whereas the management software should be flexible and consume little hardware resource. To meet these requirements, we present a system named SOCA-DOM that encompasses a mobile system-on-chip array architecture and a two-tier “software-defined” resource manager named Chameleon. First, we design an Ethernet communication board to support an array of mobile system-on-chips. Second, we propose a two-tier software architecture for Chameleon to make it flexible. Third, we devise data, configuration, and control planes for Chameleon to make it “software-defined” and in turn consume hardware resources on demand. Fourth, we design an accurate synthetic metric that represents the computational power of a computing node. We employ 12 Apache Spark benchmarks to evaluate SOCA-DOM. Surprisingly, SOCA-DOM consumes up to 9.4x less CPU resources and 13.5x less memory than Mesos which is an existing resource manager. In addition, we show that a 16-node SOCA-DOM consumes up to 4x less energy than two standard Xeon servers. Based on the results, we conclude that an array architecture with fine-grained hardware resources and a software-defined resource manager works well for analyzing big data on the move.
Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, a mid-level multi-unit decoding core, and a high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.
Embedded and Internet of Things (IoT) devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment. To reduce the overhead of the embedded processor as much as possible, this paper designs and implements a configurable 32-bit in-order RISC-V processor core based on the 16-bit data path and units, named RV16. The evaluation results show that, compared with the traditional 32-bit RISC-V processor with similar features, RV16 consumes fewer hardware resources and less power consumption. The maximum performance of RV16 running Dhrystone and CoreMark benchmarks is 0.92 DMIPS/MHz and 1.51 CoreMark/MHz, respectively, reaching 75% and 71% of traditional 32-bit processors, respectively. Moreover, a properly configured RV16 running program also consumes less energy than a traditional 32-bit processor.
Due to the wide-spread use of geo-positioning technologies and geo-social networks, the reverse
keyword query has attracted considerable attention from both industry and research communities. A reverse top-k geo- social keyword
(RkGSK) query finds
the users who are spatially near, textually similar,
and socially relevant
to a specified point of interest. RkGSK queries
are useful in many real-life applications. For example,
they can help
the query issuer identify potential customers in marketing decisions. However, the
query constraints could
be too strict sometimes, making it hard to find any result for the RkGSK query. The query issuers
may wonder how to modify their original
queries to get a certain
number of query results. In this paper,
we study non-answer questions
on reverse top-k geo-social keyword
queries (NARGSK). Given an RkGSK query
and the required
number M of query
results, NARGSK aim to find the refined RkGSK query
having M users in its result
set. To efficiently answer
NARGSK, we propose two algorithms (ERQ and NRG) based on query relaxation. As this is the first
work to address
NARGSK to the best of our knowledge, ERQ is the baseline
extended from the state-of-the-art method,
while NRG further
improves the efficiency of ERQ. Extensive experiments using
real-life datasets demonstrate the efficiency of our proposed
algorithms, and the
performance of NRG
is improved by a factor
of 1–2 on average compared with ERQ.
Temporal networks are an effective way to encode temporal information into graph data losslessly. Finding the bursting cohesive subgraph (BCS), which accumulates its cohesiveness at the fastest rate, is an important problem in temporal networks. The BCS has a large number of applications, such as representing emergency events in social media, traffic congestion in road networks and epidemic outbreak in communities. Nevertheless, existing methods demand the BCS lasting for a time interval, which neglects the timeliness of the BCS. In this paper, we design an early bursting cohesive subgraph (EBCS) model based on the k-core to enable identifying the burstiness as soon as possible. To find the EBCS, we first construct a time weight graph (TWG) to measure the bursting level by integrating the topological and temporal information. Then, we propose a global search algorithm, called GS-EBCS, which can find the exact EBCS by iteratively removing nodes from the TWG. Further, we propose a local search algorithm, named LS-EBCS, to find the EBCS by first expanding from a seed node until obtaining a candidate k-core and then refining the k-core to the result subgraph in an optimal time complexity. Subsequently, considering the situation that the massive temporal networks cannot be completely put into the memory, we first design an I/O method to build the TWG and then develop I/O efficient global search and local search algorithms, namely I/O-GS and I/O-LS respectively, to find the EBCS under the semi-external model. Extensive experiments, conducted on four real temporal networks, demonstrate the efficiency and effectiveness of our proposed algorithms. For example, on the DBLP dataset, I/O-LS and LS-EBCS have comparable running time, while the maximum memory usage of I/O-LS is only 6.5 MB, which is much smaller than that of LS-EBCS taking 308.7 MB.
With the developing demands of massive-data services, the applications that rely on big geographic data play crucial roles in academic and industrial communities. Unmanned aerial vehicles (UAVs), combining with terrestrial wireless sensor networks (WSN), can provide sustainable solutions for data harvesting. The rising demands for efficient data collection in a larger open area have been posed in the literature, which requires efficient UAV trajectory planning with lower energy consumption methods. Currently, there are amounts of inextricable solutions of UAV planning for a larger open area, and one of the most practical techniques in previous studies is deep reinforcement learning (DRL). However, the overestimated problem in limited-experience DRL quickly throws the UAV path planning process into a locally optimized condition. Moreover, using the central nodes of the sub-WSNs as the sink nodes or navigation points for UAVs to visit may lead to extra collection costs. This paper develops a data-driven DRL-based game framework with two partners to fulfill the above demands. A cluster head processor (CHP) is employed to determine the sink nodes, and a navigation order processor (NOP) is established to plan the path. CHP and NOP receive information from each other and provide optimized solutions after the Nash equilibrium. The numerical results show that the proposed game framework could offer UAVs low-cost data collection trajectories, which can save at least 17.58% of energy consumption compared with the baseline methods.
Relation extraction has been widely used to find semantic relations between entities from plain text. Dependency trees provide deeper semantic information for relation extraction. However, existing dependency tree based models adopt pruning strategies that are too aggressive or conservative, leading to insufficient semantic information or excessive noise in relation extraction models. To overcome this issue, we propose the Neural Attentional Relation Extraction Model with Dual Dependency Trees (called DDT-REM), which takes advantage of both the syntactic dependency tree and the semantic dependency tree to well capture syntactic features and semantic features, respectively. Specifically, we first propose novel representation learning to capture the dependency relations from both syntax and semantics. Second, for the syntactic dependency tree, we propose a local-global attention mechanism to solve semantic deficits. We design an extension of graph convolutional networks (GCNs) to perform relation extraction, which effectively improves the extraction accuracy. We conduct experimental studies based on three real-world datasets. Compared with the traditional methods, our method improves the F 1 scores by 0.3, 0.1 and 1.6 on three real-world datasets, respectively.
Latent Dirichlet allocation (LDA) is a topic model widely used for discovering hidden semantics in massive text corpora. Collapsed Gibbs sampling (CGS), as a widely-used algorithm for learning the parameters of LDA, has the risk of privacy leakage. Specifically, word count statistics and updates of latent topics in CGS, which are essential for parameter estimation, could be employed by adversaries to conduct effective membership inference attacks (MIAs). Till now, there are two kinds of methods exploited in CGS to defend against MIAs: adding noise to word count statistics and utilizing inherent privacy. These two kinds of methods have their respective limitations. Noise sampled from the Laplacian distribution sometimes produces negative word count statistics, which render terrible parameter estimation in CGS. Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs. It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy. The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived. It is the first time that R′enyi differential privacy (RDP) has been introduced into CGS and we propose RDP-LDA, an effective framework for analyzing the privacy loss of any differentially private CGS. RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained by ε-DP. In RDP-LDA, we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative. And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy. Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs.
Kernel is a kind
of data summary
which is elaborately extracted from a large
dataset. Given a problem, the solution obtained
from the kernel
is an approximate version of the solution
obtained from the
whole dataset with
a provable approximate ratio.
It is widely used in geometric optimization, clustering, and approximate query processing, etc.,
for scaling them up to massive
data. In this paper, we focus
on the minimum ε-kernel (MK) computation that asks for a kernel
of the smallest size
for large-scale data
processing. For the open
problem presented by Wang et al.
that whether the
minimum ε-coreset (MC) problem
and the MK problem can be reduced to each other,
we first formalize the MK problem
and analyze its complexity. Due to the
NP-hardness of the
MK problem in three or higher dimensions, an approximate algorithm, namely Set Cover-Based Minimum ε-Kernel algorithm (SCMK),
is developed to solve it. We prove that the MC problem and the MK problem
can be Turing-reduced to each other.
Then, we discuss the update of MK under
insertion and deletion operations, respectively. Finally, a randomized algorithm, called the Randomized Algorithm of Set Cover-Based Minimum ε-Kernel algorithm (RA-SCMK), is utilized to further reduce
the complexity of SCMK. The efficiency and
effectiveness of SCMK and RA-SCMK
are verified by experimental results on real-world and synthetic datasets. Experiments show that the
kernel sizes of SCMK are 2x and 17.6x smaller
than those of an ANN-based
method on real-world and synthetic datasets, respectively. The speedup ratio
of SCMK over the ANN-based method
is 5.67 on synthetic datasets.
RA-SCMK runs up to
three times faster than SCMK on synthetic datasets.
A log is a text message that is generated in various services, frameworks, and programs. The majority of log data mining tasks rely on log parsing as the first step, which transforms raw logs into formatted log templates. Existing log parsing approaches often fail to effectively handle the trade-off between parsing quality and performance. In view of this, in this paper, we present Multi-Layer Parser (ML-Parser), an online log parser that runs in a streaming manner. Specifically, we present a multi-layer structure in log parsing to strike a balance between efficiency and effectiveness. Coarse-grained tokenization and a fast similarity measure are applied for efficiency while fine-grained tokenization and an accurate similarity measure are used for effectiveness. In experiments, we compare ML-Parser with two existing online log parsing approaches, Drain and Spell, on ten real-world datasets, five labeled and five unlabeled. On the five labeled datasets, we use the proportion of correctly parsed logs to measure the accuracy, and ML-Parser achieves the highest accuracy on four datasets. On the whole ten datasets, we use Loss metric to measure the parsing quality. ML-Parse achieves the highest quality on seven out of the ten datasets while maintaining relatively high efficiency.
Offline handwritten mathematical expression recognition is a challenging optical character recognition (OCR) task due to various ambiguities of handwritten symbols and complicated two-dimensional structures. Recent work in this area usually constructs deeper and deeper neural networks trained with end-to-end approaches to improve the performance. However, the higher the complexity of the network, the more the computing resources and time required. To improve the performance without more computing requirements, we concentrate on the training data and the training strategy in this paper. We propose a data augmentation method which can generate synthetic samples with new LaTeX notations by only using the official training data of CROHME. Moreover, we propose a novel training strategy called Shuffled Multi-Round Training (SMRT) to regularize the model. With the generated data and the shuffled multi-round training strategy, we achieve the state-of-the-art result in expression accuracy, i.e., 59.74% and 61.57% on CROHME 2014 and 2016, respectively, by using attention-based encoder-decoder models for offline handwritten mathematical expression recognition.
Friend recommendation plays a key role in promoting user experience in online social networks (OSNs). However, existing studies usually neglect users’ fine-grained interest as well as the evolving feature of interest, which may cause unsuitable recommendation. In particular, some OSNs, such as the online learning community, even have little work on friend recommendation. To this end, we strive to improve friend recommendation with fine-grained evolving interest in this paper. We take the online learning community as an application scenario, which is a special type of OSNs for people to learn courses online. Learning partners can help improve learners’ learning effect and improve the attractiveness of platforms. We propose a learning partner recommendation framework based on the evolution of fine-grained learning interest (LPRF-E for short). We extract a sequence of learning interest tags that changes over time. Then, we explore the time feature to predict evolving learning interest. Next, we recommend learning partners by fine-grained interest similarity. We also refine the learning partner recommendation framework with users’ social influence (denoted as LPRF-F for differentiation). Extensive experiments on two real datasets crawled from Chinese University MOOC and Douban Book validate that the proposed LPRF-E and LPRF-F models achieve a high accuracy (i.e., approximate 50% improvements on the precision and the recall) and can recommend learning partners with high quality (e.g., more experienced and helpful).
Generating molecules with desired properties is an important task in chemistry and pharmacy. An efficient method may have a positive impact on finding drugs to treat diseases like COVID-19. Data mining and artificial intelligence may be good ways to find an efficient method. Recently, both the generative models based on deep learning and the work based on genetic algorithms have made some progress in generating molecules and optimizing the molecule’s properties. However, existing methods have defects in the experimental evaluation standards. These methods also need to be improved in efficiency and performance. To solve these problems, we propose a method named the Chemical Genetic Algorithm for Large Molecular Space (CALM). Specifically, CALM employs a scalable and efficient molecular representation called molecular matrix. And we design corresponding crossover, mutation, and mask operators inspired by domain knowledge and previous studies. We apply our genetic algorithm to several tasks related to molecular property optimization and constraint molecular optimization. The results of these tasks show that our approach outperforms the other state-of-the-art deep learning and genetic algorithm methods, where the z tests performed on the results of several experiments show that our method is more than 99% likely to be significant. At the same time, based on the experimental results, we point out the defects in the experimental evaluation standard which affects the fair evaluation of all previous work. Avoiding these defects helps to objectively evaluate the performance of all work.
Existing semantic segmentation networks based on the multi-column structure can hardly satisfy the efficiency and precision requirements simultaneously due to their shallow spatial branches. In this paper, we propose a new efficient multi-column network termed as LadderNet to address this problem. Our LadderNet includes two branches where the spatial branch generates high-resolution output feature map and the context branch encodes accurate semantic information. In particular, we first propose a channel attention fusion block and a global context module to enhance the information encoding ability of the context branch. Subsequently, a new branch fusion method, i.e., fusing some middle feature maps of the context branch into the spatial branch, is developed to improve the depth of the spatial branch. Meanwhile, we design a feature fusing module to enhance the fusion quality of these two branches, leading to a more efficient network. We compare our model with other state-of-the-arts on PASCAL VOC 2012 and Cityscapes benchmarks. Experimental results demonstrate that, compared with other state-of-the-art methods, our LadderNet can achieve average 1.25% mIoU improvement with comparable or less computation.
Due to the rapid development of the Internet technology such as 5G/6G and artificial intelligence, more and more new network applications appear. Customers using these applications may have different individual demands and such a trend causes great challenges to the traditional integrated service and routing model. In order to satisfy the individual demands of customers, the service customization should be considered, during which the cost of Internet Service Provider (ISP) naturally increases. Hence, how to reach a balance between the customer satisfaction and the ISP profit becomes vitally important. Targeting at addressing this critical problem, this work proposes a service customization oriented reliable routing mechanism, which includes two modules, that is, the service customization module and the routing module. In particular, the former (i.e., the service customization module) is responsible for classifying services by analyzing and processing the customer’s demands. After that, the IPv6 protocol is used to implement the service customization, since it naturally supports differentiated services via the extended header fields. The latter is responsible for transforming the customized services into specific routing policies. Specifically, the Nash equilibrium based economic model is firstly introduced to make a perfect balance between the user satisfaction and the ISP profits, which could finally produce a win-win solution. After that, based on the customized service policies, an optimized grey wolf algorithm is designed to establish the routing path, during which the routing reliability is formulated and calculated. Finally, the experiments are carried out and the proposed mechanism is evaluated. The results indicate that the proposed service customization and routing mechanism improves the routing reliability, user satisfaction and ISP satisfaction by about 8.42%, 15.5% and 17.75% respectively compared with the classical open shortest path first algorithm and the function learning based algorithm.