SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Edited by: Editorial Board of Journal Of Computer Science and Technology
P.O. Box 2704, Beijing 100190, P.R. China Sponsored by: Institute of Computing Technology, CAS & China Computer Federation Undertaken by: Institute of Computing Technology, CAS Published by: SCIENCE PRESS, BEIJING, CHINA Distributed by: China: All Local Post Offices Other Countries: Springer
Lane detection is essential for many aspects of autonomous driving, such as lane-based navigation and highdefinition (HD) map modeling. Although lane detection is challenging especially with complex road conditions, considerable progress has been witnessed in this area in the past several years. In this survey, we review recent visual-based lane detection datasets and methods. For datasets, we categorize them by annotations, provide detailed descriptions for each category, and show comparisons among them. For methods, we focus on methods based on deep learning and organize them in terms of their detection targets. Moreover, we introduce a new dataset with more detailed annotations for HD map modeling, a new direction for lane detection that is applicable to autonomous driving in complex road conditions, a deep neural network LineNet for lane detection, and show its application to HD map modeling.
Stochastic progressive photon mapping (SPPM) is one of the important global illumination methods in computer graphics. It can simulate caustics and specular-diffuse-specular lighting effects efficiently. However, as a biased method, it always suffers from both bias and variance with limited iterations, and the bias and the variance bring multi-scale noises into SPPM renderings. Recent learning-based methods have shown great advantages on denoising unbiased Monte Carlo (MC) methods, but have not been leveraged for biased ones. In this paper, we present the first learning-based method specially designed for denoising-biased SPPM renderings. Firstly, to avoid conflicting denoising constraints, the radiance of final images is decomposed into two components: caustic and global. These two components are then denoised separately via a two-network framework. In each network, we employ a novel multi-residual block with two sizes of filters, which significantly improves the model’s capabilities, and makes it more suitable for multi-scale noises on both low-frequency and high-frequency areas. We also present a series of photon-related auxiliary features, to better handle noises while preserving illumination details, especially caustics. Compared with other state-of-the-art learning-based denoising methods that we apply to this problem, our method shows a higher denoising quality, which could efficiently denoise multi-scale noises while keeping sharp illuminations.
Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.
With the growing popularity of somatosensory interaction devices, human action recognition is becoming attractive in many application scenarios. Skeleton-based action recognition is effective because the skeleton can represent the position and the structure of key points of the human body. In this paper, we leverage spatiotemporal vectors between skeleton sequences as input feature representation of the network, which is more sensitive to changes of the human skeleton compared with representations based on distance and angle features. In addition, we redesign residual blocks that have different strides in the depth of the network to improve the processing ability of the temporal convolutional networks (TCNs) for long time dependent actions. In this work, we propose the two-stream temporal convolutional networks (TSTCNs) that take full advantage of the inter-frame vector feature and the intra-frame vector feature of skeleton sequences in the spatiotemporal representations. The framework can integrate different feature representations of skeleton sequences so that the two feature representations can make up for each other’s shortcomings. The fusion loss function is used to supervise the training parameters of the two branch networks. Experiments on public datasets show that our network achieves superior performance and attains an improvement of 1.2% over the recent GCN-based (BGC-LSTM) method on the NTU RGB+D dataset.
In recent years, the convolutional neural networks (CNNs) for single image super-resolution (SISR) are becoming more and more complex, and it is more challenging to improve the SISR performance. In contrast, the reference image guided super-resolution (RefSR) is an effective strategy to boost the SR (super-resolution) performance. In RefSR, the introduced high-resolution (HR) references can facilitate the high-frequency residual prediction process. According to the best of our knowledge, the existing CNN-based RefSR methods treat the features from the references and the low-resolution (LR) input equally by simply concatenating them together. However, the HR references and the LR inputs contribute differently to the final SR results. Therefore, we propose a progressive channel attention network (PCANet) for RefSR. There are two technical contributions in this paper. First, we propose a novel channel attention module (CAM), which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging. Second, considering that the residual prediction process can be improved when the LR input is enriched with more details, we perform super-resolution progressively, which can take advantage of the reference images in multi-scales. Extensive quantitative and qualitative evaluations on three benchmark datasets, which represent three typical scenarios for RefSR, demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity).
We propose an automatic video segmentation method based on an optimized SaliencyCut equipped with information centroid (IC) detection according to level balance principle in physical theory. Unlike the existing methods, the image information of another dimension is provided by the IC to enhance the video segmentation accuracy. Specifically, our IC is implemented based on the information-level balance principle in the image, and denoted as the information pivot by aggregating all the image information to a point. To effectively enhance the saliency value of the target object and suppress the background area, we also combine the color and the coordinate information of the image in calculating the local IC and the global IC in the image. Then saliency maps for all frames in the video are calculated based on the detected IC. By applying IC smoothing to enhance the optimized saliency detection, we can further correct the unsatisfied saliency maps, where sharp variations of colors or motions may exist in complex videos. Finally, we obtain the segmentation results based on IC-based saliency maps and optimized SaliencyCut. Our method is evaluated on the DAVIS dataset, consisting of different kinds of challenging videos. Comparisons with the state-of-the-art methods are also conducted to evaluate our method. Convincing visual results and statistical comparisons demonstrate its advantages and robustness for automatic video segmentation.
Emotion plays a crucial role in gratifying users’ needs during their experience of movies and TV series, and may be underutilized as a framework for exploring video content and analysis. In this paper, we present EmotionMap, a novel way of presenting emotion for daily users in 2D geography, fusing spatio-temporal information with emotional data. The interface is composed of novel visualization elements interconnected to facilitate video content exploration, understanding, and searching. EmotionMap allows understanding of the overall emotion at a glance while also giving a rapid understanding of the details. Firstly, we develop EmotionDisc which is an effective tool for collecting audiences’ emotion based on emotion representation models. We collect audience and character emotional data, and then integrate the metaphor of a map to visualize video content and emotion in a hierarchical structure. EmotionMap combines sketch interaction, providing a natural approach for users’ active exploration. The novelty and the effectiveness of EmotionMap have been demonstrated by the user study and experts’ feedback.
With the recent tremendous advances of computer graphics rendering and image editing technologies, computergenerated fake images, which in general do not reflect what happens in the reality, can now easily deceive the inspection of human visual system. In this work, we propose a convolutional neural network (CNN)-based model to distinguish computergenerated (CG) images from natural images (NIs) with channel and pixel correlation. The key component of the proposed CNN architecture is a self-coding module that takes the color images as input to extract the correlation between color channels explicitly. Unlike previous approaches that directly apply CNN to solve this problem, we consider the generality of the network (or subnetwork), i.e., the newly introduced hybrid correlation module can be directly combined with existing CNN models for enhancing the discrimination capacity of original networks. Experimental results demonstrate that the proposed network outperforms state-of-the-art methods in terms of classification performance. We also show that the newly introduced hybrid correlation module can improve the classification accuracy of different CNN architectures.
In mobile social networks, next point-of-interest (POI) recommendation is a very important function that can provide personalized location-based services for mobile users. In this paper, we propose a recurrent neural network (RNN)-based next POI recommendation approach that considers both the location interests of similar users and contextual information (such as time, current location, and friends’ preferences). We develop a spatial-temporal topic model to describe users’ location interest, based on which we form comprehensive feature representations of user interests and contextual information. We propose a supervised RNN learning prediction model for next POI recommendation. Experiments based on real-world dataset verify the accuracy and efficiency of the proposed approach, and achieve best F1-score of 0.196 754 on the Gowalla dataset and 0.354 592 on the Brightkite dataset.
Sentence alignment provides multi-lingual or cross-lingual natural language processing (NLP) applications with high-quality parallel sentence pairs. Normally, an aligned sentence pair contains multiple aligned words, which intuitively play different roles during sentence alignment. Inspired by this intuition, we propose to deal with the problem of sentence alignment by exploring the semantic interactionship among fine-grained word pairs within the framework of neural network. In particular, we first employ various relevance measures to capture various kinds of semantic interactions among word pairs by using a word-pair relevance network, and then model their importance by using a multi-view attention network. Experimental results on both monotonic and non-monotonic bitexts show that our proposed approach significantly improves the performance of sentence alignment.
With the quick development of the sharing economy, ride-hailing services have been increasingly popular worldwide. Although the service provides convenience for users, one concern from the public is whether the location privacy of passengers would be protected. Service providers (SPs) such as Didi and Uber need to acquire passenger and driver locations before they could successfully dispatch passenger orders. To protect passengers’ privacy based on their requirements, we propose a cloaking region based order dispatch scheme. In our scheme, a passenger sends the SP a cloaking region in which his/her actual location is not distinguishable. The trade-off of the enhanced privacy is the loss of social welfare, i.e., the increase in the overall pick-up distance. To optimize our scheme, we propose to maximize the social welfare under passengers’ privacy requirements. We investigate a bipartite matching based approach. A theoretical bound on the matching performance under specific privacy requirements is shown. Besides passengers’ privacy, we allow drivers to set up their maximum pick-up distance in our extended scheme. The extended scheme could be applied when the number of drivers exceeds the number of passengers. Nevertheless, the global matching based scheme does not consider the interest of each individual passenger. The passengers with low privacy requirements may be matched with drivers far from them. To this end, a pricing scheme including three strategies is proposed to make up for the individual loss by allocating discounts on their riding fares. Extensive experiments on both real-world and synthetic datasets show the efficiency of our scheme.
With the increasing demand for security, building strong barrier coverage in directional sensor networks is important for effectively detecting un-authorized intrusions. In this paper, we propose an efficient scheme to form the strong barrier coverage by adding the mobile nodes one by one into the barrier. We first present the concept of target circle which determines the appropriate residence region and working direction of any candidate node to be added. Then we select the optimal relay sensor to be added into the current barrier based on its input-output ratio (barrier weight) which reflects the extension of barrier coverage. This strategy looses the demand of minimal required sensor nodes (maximal gain of each sensor) or maximal lifetime of one single barrier, leading to an augmentation of sensors to be used. Numerical simulation results show that, compared with the available schemes, the proposed method significantly reduces the minimal deploy density required to establish k-barrier, and increases the total service lifetime with a high deploy efficiency.
With the advancement of telecommunications, sensor networks, crowd sourcing, and remote sensing technology in present days, there has been a tremendous growth in the volume of data having both spatial and temporal references. This huge volume of available spatio-temporal (ST) data along with the recent development of machine learning and computational intelligence techniques has incited the current research concerns in developing various data-driven models for extracting useful and interesting patterns, relationships, and knowledge embedded in such large ST datasets. In this survey, we provide a structured and systematic overview of the research on data-driven approaches for spatio-temporal data analysis. The focus is on outlining various state-of-the-art spatio-temporal data mining techniques, and their applications in various domains. We start with a brief overview of spatio-temporal data and various challenges in analyzing such data, and conclude by listing the current trends and future scopes of research in this multi-disciplinary area. Compared with other relevant surveys, this paper provides a comprehensive coverage of the techniques from both computational/methodological and application perspectives. We anticipate that the present survey will help in better understanding various directions in which research has been conducted to explore data-driven modeling for analyzing spatio-temporal data.
Field-programmable gate arrays (FPGAs) have recently evolved as a valuable component of the heterogeneous computing. The register transfer level (RTL) design flows demand the designers to be experienced in hardware, resulting in a possible failure of time-to-market. High-level synthesis (HLS) permits designers to work at a higher level of abstraction through synthesizing high-level language programs to RTL descriptions. This provides a promising approach to solve these problems. However, the performance of HLS tools still has limitations. For example, designers remain exposed to various aspects of hardware design, development cycles are still time consuming, and the quality of results (QoR) of HLS tools is far behind that of RTL flows. In this paper, we survey the literature published since 2014 focusing on the performance optimization of HLS tools. Compared with previous work, we extend the scope of the performance of HLS tools, and present a set of three-level evaluation criteria, covering from ease of use of the HLS tools to promotion on specific metrics of QoR. We also propose performance evaluation equations for describing the relation between the performance optimization and the QoR. We find that it needs more efforts on the ease of use for efficient HLS tools. We suggest that it is better to draw an analogy between the HLS development process and the embedded system design process, and to provide more elastic HLS methodology which integrates FPGAs virtual machines.