SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China Sponsored by: Institute of Computing Technology, CAS & China Computer Federation Undertaken by: Institute of Computing Technology, CAS Published by: SCIENCE PRESS, BEIJING, CHINA Distributed by: China: All Local Post Offices Other Countries: Springer
Entity linking (EL) is the task of determining the identity of textual entity mentions given a predefined knowledge base (KB). Plenty of existing efforts have been made on this task using either "local" information (contextual information of the mention in the text), or "global" information (relations among candidate entities). However, either local or global information might be insufficient especially when the given text is short. To get richer local and global information for entity linking, we propose to enrich the context information for mentions by getting extra contexts from the web through web search engines (WSE). Based on the intuition above, two novel attempts are made. The first one adds web-searched results into an embedding-based method to expand the mention's local information, where we try two different methods to help generate high-quality web contexts:one is to apply the attention mechanism and the other is to use the abstract extraction method. The second one uses the web contexts to extend the global information, i.e., finding and utilizing more extra relevant mentions from the web contexts with a graph-based model. Finally, we combine the two models we propose to use both extended local and global information from the extra web contexts. Our empirical study based on six real-world datasets shows that using extra web contexts to extend the local and the global information could effectively improve the F1 score of entity linking.
Heterogeneous information networks, which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects, are ubiquitous in the real world. In this paper, we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network, and we propose a new method named DEM short for Deep Entity Matching. In contrast to the traditional entity matching methods, DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching. Importantly, we incorporate DEM with the network embedding methodology, enabling highly efficient computing in a vectorized manner. DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly. To illustrate its functionality, we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks. Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.
Linking user accounts belonging to the same user across different platforms with location data has received significant attention, due to the popularization of GPS-enabled devices and the wide range of applications benefiting from user account linkage (e.g., cross-platform user profiling and recommendation). Different from most existing studies which only focus on user account linkage across two platforms, we propose a novel model ULMP (i.e., user account linkage across multiple platforms), with the goal of effectively and efficiently linking user accounts across multiple platforms with location data. Despite of the practical significance brought by successful user linkage across multiple platforms, this task is very challenging compared with the ones across two platforms. The major challenge lies in the fact that the number of user combinations shows an explosive growth with the increase of the number of platforms. To tackle the problem, a novel method GTkNN is first proposed to prune the search space by efficiently retrieving top-k candidate user accounts indexed with well-designed spatial and temporal index structures. Then, in the pruned space, a match score based on kernel density estimation combining both spatial and temporal information is designed to retrieve the linked user accounts. The extensive experiments conducted on four real-world datasets demonstrate the superiority of the proposed model ULMP in terms of both effectiveness and efficiency compared with the state-of-art methods.
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity profiles that correspond to the same real-world entity. Due to its inherently quadratic complexity, blocking was proposed to ameliorate ER, and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pairwise comparisons inside each block in order to reduce the computational cost of ER. This paper presents a comprehensive survey on existing blocking technologies. We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques. We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces. How to use schema information flexibly is of great significance to efficiently process data with the new features of this era. Machine learning is an important tool for ER, but end-to-end and efficient machine learning methods still need to be explored. We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER, incremental blocking ER, deep learning with ER, etc.
Entity linking is a new technique in recommender systems to link users' interaction behaviors in different domains, for the purpose of improving the performance of the recommendation task. Linking-based cross-domain recommendation aims to alleviate the data sparse problem by utilizing the domain-sharable knowledge from auxiliary domains. However, existing methods fail to prevent domain-specific features to be transferred, resulting in suboptimal results. In this paper, we aim to address this issue by proposing an adversarial transfer learning based model ATLRec, which effectively captures domain-sharable features for cross-domain recommendation. In ATLRec, we leverage adversarial learning to generate representations of user-item interactions in both the source and the target domains, such that the discriminator cannot identify which domain they belong to, for the purpose of obtaining domain-sharable features. Meanwhile each domain learns its domain-specific features by a private feature extractor. The recommendation of each domain considers both domain-specific and domain-sharable features. We further adopt an attention mechanism to learn item latent factors of both domains by utilizing the shared users with interaction history, so that the representations of all items can be learned sufficiently in a shared space, even when few or even no items are shared by different domains. By this method, we can represent all items from the source and the target domains in a shared space, for the purpose of better linking items in different domains and capturing cross-domain item-item relatedness to facilitate the learning of domain-sharable knowledge. The proposed model is evaluated on various real-world datasets and demonstrated to outperform several state-of-the-art single-domain and cross-domain recommendation methods in terms of recommendation accuracy.
Simulation is a common technique for the evaluation of new approaches and protocols in networked systems and provides many benefits. However, it is also well known that the relevance of the simulation results for real-world applications depends on the various models which are used within the simulation, e.g., for the characteristics of the radio communication. In this paper, we introduce the Extended Multipath Raytracing Model, an extension to the ray-tracing radio medium available in Cooja, to improve the modelling of wireless links in simulated Wireless Sensor Networks. Our extension allows the simulation of environmental influences onto links on a per node basis, allowing the analysis of various effects observed in experiments in a virtual environment. Furthermore, the packet-based modelling of transmission errors is extended to provide the simulation of bit errors, allowing new usage scenarios, like the simulation of error detection and Forward Error Correction codes in Cooja.
The Automatic Dependent Surveillance-Broadcast (ADS-B) protocol is being adopted for use in unmanned aerial vehicles (UAVs) as the primary source of information for emerging multi-UAV collision avoidance algorithms. The lack of security features in ADS-B leaves any processes dependent upon the information vulnerable to a variety of threats from compromised and dishonest UAVs. This could result in substantial losses or damage to properties. This research proposes a new distance-bounding scheme for verifying the distance and flight trajectory in the ADS-B broadcast data from surrounding UAVs. The proposed scheme enables UAVs or ground stations to identify fraudulent UAVs and avoid collisions. The scheme was implemented and tested in the ArduPilot SITL (Software In The Loop) simulator to verify its ability to detect fraudulent UAVs. The experiments showed that the scheme achieved the desired accuracy in both flight trajectory measurement and attack detection.
To date, bitcoin has been the most successful application of blockchain technology and has received considerable attention from both industry and academia. Bitcoin is an electronic payment system based on cryptography rather than on credit. Regardless of whether people are in the same city or country, bitcoin can be sent by any one person to any other person when they reach an agreement. The market value of bitcoin has been rising since its advent in 2009, and its current market value is US160 billion. Since its development, bitcoin itself has exposed many problems and is facing challenges from all the sectors of society; therefore, adversaries may use bitcoin's weakness to make considerable profits. This survey presents an overview and detailed investigation of data security and privacy in bitcoin system. We examine the studies in the literature/Web in two categories:1) analyses of the attacks to the privacy, availability, and consistency of bitcoin data and 2) summaries of the countermeasures for bitcoin data security. Based on the literature/Web, we list and describe the research methods and results for the two categories. We compare the performance of these methods and illustrate the relationship between the performance and the methods. Moreover, we present several important open research directions to identify the follow-up studies in this area.
In software-defined networking (SDN), controllers are sinks of information such as network topology collected from switches. Organizations often like to protect their internal network topology and keep their network policies private. We borrow techniques from secure multi-party computation (SMC) to preserve the privacy of policies of SDN controllers about status of routers. On the other hand, the number of controllers is one of the most important concerns in scalability of SMC application in SDNs. To address this issue, we formulate an optimization problem to minimize the number of SDN controllers while considering their reliability in SMC operations. We use Non-Dominated Sorting Genetic Algorithm II (NSGA-II) to determine the optimal number of controllers, and simulate SMC for typical SDNs with this number of controllers. Simulation results show that applying the SMC technique to preserve the privacy of organization policies causes only a little delay in SDNs, which is completely justifiable by the privacy obtained.
Dynamic changes of traffic features in unstructured road networks challenge the scene-cognitive abilities of drivers, which brings various heterogeneous traffic behaviors. Modeling traffic with these heterogeneous behaviors would have significant impact on realistic traffic simulation. Most existing traffic methods generate traffic behaviors by adjusting parameters and cannot describe those heterogeneous traffic flows in detail. In this paper, a cognition-driven trafficsimulation method inspired by the theory of cognitive psychology is introduced. We first present a visual-filtering model and a perceptual-information fusion model to describe drivers' heterogeneous cognitive processes. Then, logistic regression is used to model drivers' heuristic decision-making processes based on the above cognitive results. Lastly, we apply the high-level cognitive decision-making results to low-level traffic simulation. The experimental results show that our method can provide realistic simulations for the traffic with those heterogeneous behaviors in unstructured road networks and has nearly the same efficiency as that of existing methods.
Image super-resolution is essential for a variety of applications such as medical imaging, surveillance imaging, and satellite imaging, among others. Traditionally, the most popular color image super-resolution is performed in each color channel independently. In this paper, we show that the super-resolution quality can be further enhanced by exploiting the cross-channel correlation. Inspired by the High-Quality Linear Interpolation (HQLI) demosaicking algorithm by Malvar et al., we design an image super-resolution scheme that integrates intra-channel interpolation with cross-channel details by isotropic linear combinations. Despite its simplicity, our super-resolution method achieves the accuracy comparable with the existing fastest state-of-the-art super-resolution algorithm at 20 times faster speed. It is well applicable to applications that adopt traditional interpolations, for improved visual quality at trivial computation cost. Our comparative study verifies the effectiveness and efficiency of the proposed super-resolution algorithm.
Window detection is a key component in many graphics and vision applications related to 3D city modeling and scene visualization. We present a novel approach for learning to recognize windows in a colored facade image. Rather than predicting bounding boxes or performing facade segmentation, our system locates keypoints of windows, and learns keypoint relationships to group them together into windows. A further module provides extra recognizable information at the window center. Locations and relationships of keypoints are encoded in different types of heatmaps, which are learned in an end-to-end network. We have also constructed a facade dataset with 3 418 annotated images to facilitate research in this field. It has richly varying facade structures, occlusion, lighting conditions, and angle of view. On our dataset, our method achieves precision of 91.4% and recall of 91.0% under 50% IoU (intersection over union). We also make a quantitative comparison with state-of-the-art methods to verify the utility of our proposed method. Applications based on our window detector are also demonstrated, such as window blending.
Generally, data is available abundantly in unlabeled form, and its annotation requires some cost. The labeling, as well as learning cost, can be minimized by learning with the minimum labeled data instances. Active learning (AL), learns from a few labeled data instances with the additional facility of querying the labels of instances from an expert annotator or oracle. The active learner uses an instance selection strategy for selecting those critical query instances, which reduce the generalization error as fast as possible. This process results in a refined training dataset, which helps in minimizing the overall cost. The key to the success of AL is query strategies that select the candidate query instances and help the learner in learning a valid hypothesis. This survey reviews AL query strategies for classification, regression, and clustering under the pool-based AL scenario. The query strategies under classification are further divided into:informative-based, representative-based, informative- and representative-based, and others. Also, more advanced query strategies based on reinforcement learning and deep learning, along with query strategies under the realistic environment setting, are presented. After a rigorous mathematical analysis of AL strategies, this work presents a comparative analysis of these strategies. Finally, implementation guide, applications, and challenges of AL are discussed.
Android is the mobile operating system most frequently targeted by malware in the smartphone ecosystem, with a market share significantly higher than its competitors and a much larger total number of applications. Detection of malware before being published on official or unofficial application markets is critically important due to the typical end users' widespread security inadequacy. In this paper, a novel feature selection method is proposed along with an Android malware detection approach. The feature selection method proposed in this study makes use of permissions, API calls, and strings as features, which are statically extractable from the Android executables (APK files) and it can be used in a machine learning process with different algorithms to detect malware on the Android platform. A novel document frequencybased approach, namely Delta IDF, was designed and implemented for feature selection. Delta IDF was tested upon three universal benchmark datasets that contain Android malware samples and highly promising results were obtained by using several binary classification algorithms.