SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China Sponsored by: Institute of Computing Technology, CAS & China Computer Federation Undertaken by: Institute of Computing Technology, CAS Published by: SCIENCE PRESS, BEIJING, CHINA Distributed by: China: All Local Post Offices Other Countries: Springer
In multi-label learning, it is rather expensive to label instances since they are simultaneously associated with multiple labels. Therefore, active learning, which reduces the labeling cost by actively querying the labels of the most valuable data, becomes particularly important for multi-label learning. A good multi-label active learning algorithm usually consists of two crucial elements:a reasonable criterion to evaluate the gain of querying the label for an instance, and an effective classification model, based on whose prediction the criterion can be accurately computed. In this paper, we first introduce an effective multi-label classification model by combining label ranking with threshold learning, which is incrementally trained to avoid retraining from scratch after every query. Based on this model, we then propose to exploit both uncertainty and diversity in the instance space as well as the label space, and actively query the instance-label pairs which can improve the classification model most. Extensive experiments on 20 datasets demonstrate the superiority of the proposed approach to state-of-the-art methods.
Multi-label learning deals with the problem where each instance is associated with a set of class labels. In multilabel learning, different labels may have their own inherent characteristics for distinguishing each other, and the correlation information has shown promising strength in improving multi-label learning. In this study, we propose a novel multilabel learning method by simultaneously taking into account both the learning of label-specific features and the correlation information during the learning process. Firstly, we learn a sparse weight parameter vector for each label based on the linear regression model, and the label-specific features can be extracted according to the corresponding weight parameters. Secondly, we constrain label correlations directly on the output of labels, not on the corresponding parameter vectors which conflicts with the label-specific feature learning. Specifically, for any two related labels, their corresponding models should have similar outputs rather than similar parameter vectors. Thirdly, we also exploit the sample correlations through sparse reconstruction. The experimental results on 12 benchmark datasets show that the proposed method performs better than the existing methods. The proposed method ranks in the 1st place at 66.7% case and achieves optimal average rank in terms of all evaluation measures.
The domain adversarial neural network (DANN) methods have been successfully proposed and attracted much attention recently. In DANNs, a discriminator is trained to discriminate the domain labels of features generated by a generator, whereas the generator attempts to confuse it such that the distributions between domains are aligned. As a result, it actually encourages the whole alignment or transfer between domains, while the inter-class discriminative information across domains is not considered. In this paper, we present a Discrimination-Aware Domain Adversarial Neural Network (DA2NN) method to introduce the discriminative information or the discrepancy of inter-class instances across domains into deep domain adaptation. DA2NN considers both the alignment within the same class and the separation among different classes across domains in knowledge transfer via multiple discriminators. Empirical results show that DA2NN can achieve better classification performance compared with the DANN methods.
Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.
Recent years have witnessed the rapid development of online social platforms, which effectively support the business intelligence and provide services for massive users. Along this line, large efforts have been made on the socialaware recommendation task, i.e., leveraging social contextual information to improve recommendation performance. Most existing methods have treated social relations in a static way, but the dynamic influence of social contextual information on users' consumption choices has been largely unexploited. To that end, in this paper, we conduct a comprehensive study to reveal the dynamic social influence on users' preferences, and then we propose a deep model called Dynamic Social-Aware Recommender System (DSRS) to integrate the users' structural and temporal social contexts to address the dynamic socialaware recommendation task. DSRS consists of two main components, i.e., the social influence learning (SIL) and dynamic preference learning (DPL). Specifically, in the SIL module, we arrange social graphs in a sequential order and borrow the power of graph convolution networks (GCNs) to learn social context. Moreover, we design a structural-temporal attention mechanism to discriminatively model the structural social influence and the temporal social influence. Then, in the DPL part, users' individual preferences are learned dynamically by recurrent neural networks (RNNs). Finally, with a prediction layer, we combine the users' social context and dynamic preferences to generate recommendations. We conduct extensive experiments on two real-world datasets, and the experimental results demonstrate the superiority and effectiveness of our proposed model compared with the state-of-the-art methods.
Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts. However, the generalization ability for each specific concept cannot be steadily improved, and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts. This paper proposes to solve these problems by BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) ensemble and local structure mapping. The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection. If a recurrent concept is detected, a historical BIRCH ensemble classifier is selected to be incrementally updated; otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool. The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.
Transfer learning has attracted a large amount of interest and research in last decades, and some effort has been made to build more precise recommendation systems. Most previous transfer recommendation systems assume that the target domain shares the same/similar rating patterns with the auxiliary source domain, which is used to improve the recommendation performance. However, almost all existing transfer learning work does not consider the characteristics of sequential data. In this paper, we study the new cross-domain recommendation scenario by mining novelty-seeking trait. Recent studies in psychology suggest that novelty-seeking trait is highly related to consumer behavior, which has a profound business impact on online recommendation. Previous work performed on only one single target domain may not fully characterize users' novelty-seeking trait well due to the data scarcity and sparsity, leading to the poor recommendation performance. Along this line, we propose a new cross-domain novelty-seeking trait mining model (CDNST for short) to improve the sequential recommendation performance by transferring the knowledge from auxiliary source domain. We conduct systematic experiments on three domain datasets crawled from Douban to demonstrate the effectiveness of our proposed model. Moreover, we analyze the directed influence of the temporal property at the source and target domains in detail.
Community discovery is an important task in social network analysis. However, most existing methods for community discovery rely on the topological structure alone. These methods ignore the rich information available in the content data. In order to solve this issue, in this paper, we present a community discovery method based on heterogeneous information network decomposition and embedding. Unlike traditional methods, our method takes into account topology, node content and edge content, which can supply abundant evidence for community discovery. First, an embedding-based similarity evaluation method is proposed, which decomposes the heterogeneous information network into several subnetworks, and extracts their potential deep representation to evaluate the similarities between nodes. Second, a bottom-up community discovery algorithm is proposed. Via leader nodes selection, initial community generation, and community expansion, communities can be found more efficiently. Third, some incremental maintenance strategies for the changes of networks are proposed. We conduct experimental studies based on three real-world social networks. Experiments demonstrate the effectiveness and the efficiency of our proposed method. Compared with the traditional methods, our method improves normalized mutual information (NMI) and the modularity by an average of 12% and 37% respectively.
Crowd flow prediction has become a strategically important task in urban computing, which is the prerequisite for traffic management, urban planning and public safety. However, due to variousness of crowd flows, multiple hidden correlations among urban regions affect the flows. Besides, crowd flows are also influenced by the distribution of Points-of-Interests (POIs), transitional functional zones, environmental climate, and different time slots of the dynamic urban environment. Thus, we exploit multiple correlations between urban regions by considering the mentioned factors comprehensively rather than the geographical distance and propose multi-graph convolution gated recurrent units (MGCGRU) for capturing these multiple spatial correlations. For adapting to the dynamic mobile data, we leverage multiple spatial correlations and the temporal dependency to build an urban flow prediction framework that uses only a little recent data as the input but can mine rich internal modes. Hence, the framework can mitigate the influence of the instability of data distributions in highly dynamic environments for prediction. The experimental results on two real-world datasets in Shanghai show that our model is superior to state-of-the-art methods for crowd flow prediction.
Scholarships are a reflection of academic achievement for college students. The traditional scholarship assignment is strictly based on final grades and cannot recognize students whose performance trend improves or declines during the semester. This paper develops the Trajectory Mining on Clustering for Scholarship Assignment and Academic Warning (TMS) approach to identify the factors that affect the academic achievement of college students and to provide decision support to help low-performing students attain better performance. Specifically, we first conduct feature engineering to generate a set of features to characterize the lifestyles patterns, learning patterns, and Internet usage patterns of students. We then apply the objective and subjective combined weighted k-means (Wosk-means) algorithm to perform clustering analysis to identify the characteristics of different student groups. Considering the difficulty in obtaining the real global positioning system (GPS) records of students, we apply manually generated spatiotemporal trajectories data to quantify the direction of trajectory deviation with the assistance of the PrefixSpan algorithm to identify low-performing students. The experimental results show that the silhouette coefficient and Calinski-Harabasz index of the Wosk-means algorithm are both approximately 1.5 times to that of the best baseline algorithm, and the sum of the squared error of the Wosk-means algorithm is only the half of the best baseline algorithm.
There is a large amount of heterogeneous data distributed in various sources in the upstream of PetroChina. These data can be valuable assets if we can fully use them. Meanwhile, the knowledge graph, as a new emerging technique, provides a way to integrate multi-source heterogeneous data. In this paper, we present one application of the knowledge graph in the upstream of PetroChina. Specifically, we first construct a knowledge graph from both structured and unstructured data with multiple NLP (natural language progressing) methods. Then, we introduce two typical knowledge graph powered applications and show the benefit that the knowledge graph brings to these applications:compared with the traditional machine learning approach, the well log interpretation method powered by knowledge graph shows more than 7.69% improvement of accuracy.
Deterministic databases can improve the performance of distributed workload by eliminating the distributed commit protocol and reducing the contention cost. Unfortunately, the current deterministic scheme does not consider the performance scalability within a single machine. In this paper, we describe a scalable deterministic concurrency control, Deterministic and Optimistic Concurrency Control (DOCC), which is able to scale the performance both within a single node and across multiple nodes. The performance improvement comes from enforcing the determinism lazily and avoiding read-only transaction blocking the execution. The evaluation shows that DOCC achieves 8x performance improvement than the popular deterministic database system, Calvin.
The message passing interface (MPI) has become a de facto standard for programming models of highperformance computing, but its rich and flexible interface semantics makes the program easy to generate communication deadlock, which seriously affects the usability of the system. However, the existing detection tools for MPI communication deadlock are not scalable enough to adapt to the continuous expansion of system scale. In this context, we propose a framework for MPI runtime communication deadlock detection, namely MPI-RCDD, which contains three kinds of main mechanisms. Firstly, MPI-RCDD has a message logging protocol that is associated with deadlock detection to ensure that the communication messages required for deadlock analysis are not lost. Secondly, it uses the asynchronous processing thread provided by the MPI to implement the transfer of dependencies between processes, so that multiple processes can participate in deadlock detection simultaneously, thus alleviating the performance bottleneck problem of centralized analysis. In addition, it uses an AND⊕OR model based algorithm named AODA to perform deadlock analysis work. The AODA algorithm combines the advantages of both timeout-based and dependency-based deadlock analysis approaches, and allows the processes in the timeout state to search for a deadlock circle or knot in the process of dependency transfer. Further, the AODA algorithm cannot lead to false positives and can represent the source of the deadlock accurately. The experimental results on typical MPI communication deadlock benchmarks such as Umpire Test Suit demonstrate the capability of MPIRCDD. Additionally, the experiments on the NPB benchmarks obtain the satisfying performance cost, which show that the MPI-RCDD has strong scalability.
Workload characterization is critical for resource management and scheduling. Recently, with the fast development of container technique, more and more cloud service providers like Google and Alibaba adopt containers to provide cloud services, due to the low overheads. However, the characteristics of co-located diverse services (e.g., interactive on-line services, off-line computing services) running in containers are still not clear. In this paper, we present a comprehensive analysis of the characteristics of co-located workloads running in containers on the same server from the perspective of hardware events. Our study quantifies and reveals the system behavior from the micro-architecture level when workloads are running in different co-location patterns. Through the analysis of typical hardware events, we provide recommended/unrecommended co-location workload patterns which provide valuable deployment suggestions for datacenter administrators.
In recent years many security attacks occur when malicious codes abuse in-process memory resources. Due to the increasing complexity, an application program may call third-party code which cannot be controlled by programmers but may contain security vulnerabilities. As a result, the users have the risk of suffering information leakage and control flow hijacking. However, current solutions like Intel memory protection extensions (MPX) severely degrade performance, while other approaches like Intel memory protection keys (MPK) lack flexibility in dividing security domains. In this paper, we propose IMPULP, an effective and efficient hardware approach for in-process memory protection. The rationale of IMPULP is user-level partitioning that user code segments are divided into different security domains according to their instruction addresses, and accessible memory spaces are specified dynamically for each domain via a set of boundary registers. Each instruction related to memory access will be checked according to its security domain and the corresponding boundaries, and illegal in-process memory access of untrusted code segments will be prevented. IMPULP can be leveraged to prevent a wide range of in-process memory abuse attacks, such as buffer overflows and memory leakages. For verification, an FPGA prototype based on RISC-V instruction set architecture has been developed. We present eight tests to verify the effectiveness of IMPULP, including five memory protection function tests, a test to defense typical buffer overflow, a test to defense famous memory leakage attack named Heartbleed, and a test for security benchmark. We execute the SPEC CPU2006 benchmark programs to evaluate the efficiency of IMPULP. The performance overhead of IMPULP is less than 0.2% runtime on average, which is negligible. Moreover, the resource overhead is less than 5.5% for hardware modification of IMPULP.
With the rapid increase of memory consumption by applications running on cloud data centers, we need more efficient memory management in a virtualized environment. Exploiting huge pages becomes more critical for a virtual machine's performance when it runs large working set size programs. Programs with large working set sizes are more sensitive to memory allocation, which requires us to quickly adjust the virtual machine's memory to accommodate memory phase changes. It would be much more efficient if we could adjust virtual machines' memory at the granularity of huge pages. However, existing virtual machine memory reallocation techniques, such as ballooning, do not support huge pages. In addition, in order to drive effective memory reallocation, we need to predict the actual memory demand of a virtual machine. We find that traditional memory demand estimation methods designed for regular pages cannot be simply ported to a system adopting huge pages. How to adjust the memory of virtual machines timely and effectively according to the periodic change of memory demand is another challenge we face. This paper proposes a dynamic huge page based memory balancing system (HPMBS) for efficient memory management in a virtualized environment. We first rebuild the ballooning mechanism in order to dispatch memory in the granularity of huge pages. We then design and implement a huge page working set size estimation mechanism which can accurately estimate a virtual machine's memory demand in huge pages environments. Combining these two mechanisms, we finally use an algorithm based on dynamic programming to achieve dynamic memory balancing. Experiments show that our system saves memory and improves overall system performance with low overhead.
As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed to improve the abstraction of distributed datasets and hide the details of parallelism. However, most of them follow the single-layer partitioning method, which limits developers to express a multi-level partitioning operation succinctly. To overcome the problem, we present the NDD (Nested Distributed Dataset) data model. It is a more compact and expressive extension of Spark RDD (Resilient Distributed Dataset), in order to remove the burden on developers to manually write the logic for multi-level partitioning cases. Base on the NDD model, we develop an open-source framework called Bigflow, which serves as an optimization layer over computation engines from most widely used processing frameworks. With the help of Bigflow, some advanced optimization techniques, which may only be applied by experienced programmers manually, are enabled automatically in a distributed data processing job. Currently, Bigflow is processing about 3 PB data volumes daily in the data-centers of Baidu. According to customer experience, it can significantly save code length and improve performance over the intuitive programming style.
Floorplan is an important process whose quality determines the timing closure in integrated circuit (IC) physical design. And generating a floorplan with satisfying timing result is time-consuming because much time is spent on the generation-evaluation iteration. Applying machine learning to the floorplan stage is a potential method to accelerate the floorplan iteration. However, there exist two challenges which are selecting proper features and achieving a satisfying model accuracy. In this paper, we propose a machine learning framework for floorplan acceleration with feature selection and model stacking to cope with the challenges, targeting to reduce time and effort in integrated circuit physical design. Specifically, the proposed framework supports predicting post-route slack of static random-access memory (SRAM) in the early floorplan stage. Firstly, we introduce a feature selection method to rank and select important features. Considering both feature importance and model accuracy, we reduce the number of features from 27 to 15 (44% reduction), which can simplify the dataset and help educate novice designers. Then, we build a stacking model by combining different kinds of models to improve accuracy. In 28 nm technology, we achieve the mean absolute error of slacks less than 23.03 ps and effectively accelerate the floorplan process by reducing evaluation time from 8 hours to less than 60 seconds. Based on our proposed framework, we can do design space exploration for thousands of locations of SRAM instances in few seconds, much more quickly than the traditional approach. In practical application, we improve the slacks of SRAMs more than 75.5 ps (177% improvement) on average than the initial design.
Neuromorphic computing is considered to be the future of machine learning, and it provides a new way of cognitive computing. Inspired by the excellent performance of spiking neural networks (SNNs) on the fields of low-power consumption and parallel computing, many groups tried to simulate the SNN with the hardware platform. However, the efficiency of training SNNs with neuromorphic algorithms is not ideal enough. Facing this, Michael et al. proposed a method which can solve the problem with the help of DNN (deep neural network). With this method, we can easily convert a well-trained DNN into an SCNN (spiking convolutional neural network). So far, there is a little of work focusing on the hardware accelerating of SCNN. The motivation of this paper is to design an SNN processor to accelerate SNN inference for SNNs obtained by this DNN-to-SNN method. We propose SIES (Spiking Neural Network Inference Engine for SCNN Accelerating). It uses a systolic array to accomplish the task of membrane potential increments computation. It integrates an optional hardware module of max-pooling to reduce additional data moving between the host and the SIES. We also design a hardware data setup mechanism for the convolutional layer on the SIES with which we can minimize the time of input spikes preparing. We implement the SIES on FPGA XCVU440. The number of neurons it supports is up to 4 000 while the synapses are 256 000. The SIES can run with the working frequency of 200 MHz, and its peak performance is 1.562 5 TOPS.