1 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University Wuhan 430079, China;
2 School of Electronic Information, Wuhan University, Wuhan 430072, China
Abstract This paper investigates the problem of retrieving aerial scene images by using semantic sketches, since the state-of-the-art retrieval systems turn out to be invalid when there is no exemplar query aerial image available. However, due to the complex surface structures and huge variations of resolutions of aerial images, it is very challenging to retrieve aerial images with sketches and few studies have been devoted to this task. In this article, for the first time to our knowledge, we propose a framework to bridge the gap between sketches and aerial images. First, an aerial sketch-image database is collected, and the images and sketches it contains are augmented to various levels of details. We then train a multi-scale deep model by the new dataset. The fully-connected layers of the network in each scale are finally connected and used as cross-domain features, and the Euclidean distance is used to measure the cross-domain similarity between aerial images and sketches. Experiments on several commonly used aerial image datasets demonstrate the superiority of the proposed method compared with the traditional approaches.
 Hu F, Xia G S, Wang Z, Huang X, Zhang L, Sun H. Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification. IEEE Journal of Selected Topics in Applied Earth Obser Xia G S, Wang Z, Xiong C, Zhang L. Accurate annotationvations and Remote Sensing, 2015, 8(5): 2015-2030. of remote sensing images via active spectral clustering with little expert knowledge. Remote Sensing, 2015, 7(11): 15014-15045. Hu F, Xia G S, Hu J, Zhang L. Transferring deep convolutional neural networks for the scene classification of highresolution remote sensing imagery. Remote Sensing, 2015, 7(11): 14680-14707. Aptoula E. Remote sensing image retrieval with global morphological texture descriptors. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(5): 3023-3034. Demir B, Bruzzone L. A novel active learning method in relevance feedback for content-based remote sensing image retrieval. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(5): 2323-2334. Demir B, Bruzzone L. Hashing-based scalable remote sensing image search and retrieval in large archives. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(2): 892-904. Ozkan S, Ates T, Tola E, Soysal M et al. Performance analysis of state-of-the-art representation methods for geographical image retrieval and categorization. IEEE Geoscience and Remote Sensing Letters, 2014, 11(11): 1996-2000. Ferecatu M, Boujemaa N. Interactive remote-sensing image retrieval using active relevance feedback. IEEE Transactions on Geoscience Remote Sensing, 2007, 45(4): 818-826. Du Z, Li X, Lu X. Local structure learning in high resolution remote sensing image retrieval. Neurocomputing, 2016, 207: 813-822. Yang Y, Newsam S. Geographic image retrieval using local invariant features. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(2): 818-832. Liu T, Zhang L, Li P, Lin H. Remotely sensed image retrieval based on region-level semantic mining. EURASIP Journal on Image Video Processing, 2012, 2012(1): 4-15. Eitz M, Hays J, Alexa M. How do humans sketch objects? ACM Transactions on Graphics, 2012, 31(4): 44:1-44:10. Bai X, Latecki L J. Path similarity skeleton graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7): 1282-1292. Wang X, Feng B, Bai X, LiuW, Latecki L J. Bag of contour fragments for robust shape classification. Pattern Recognition, 2014, 47(6): 2116-2125. Bai X, Bai S, Zhu Z, Latecki L J. 3D shape matching via two layer coding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(12): 2361-2373. Shen W, Wang X, Wang Y, Bai X, Zhang Z. DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp.3982-3991. Eitz M, Hildebrand K, Boubekeur T, Alexa M. Sketchbased image retrieval: Benchmark and bag-of-features descriptors. IEEE Transactions on Visualization Computer Graphics, 2011, 17(11): 1624-1636. Hu R, Collomosse J. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision Image Understanding, 2013, 117(7): 790-806. Cao Y, Wang C, Zhang L, Zhang L. Edgel index for largescale sketch-based image search. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2011, pp.761-768. Chen T, Cheng M M, Tan P, Shamir A, Hu S M. Sketch2Photo: Internet image montage. ACM Transactions on Graphics, 2009, 28(5): 124:1-124:10. Yu Q, Liu F, Song Y Z, Xiang T, Hospedales T M, Loy C C. Sketch me that shoe. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp.799-807. Qi Y, Song Y Z, Zhang H, Liu J. Sketch-based image retrieval via Siamese convolutional neural network. In Proc. IEEE International Conference on Image Processing (ICIP), September 2016, pp.2460-2464. Wang X, Duan X, Bai X. Deep sketch feature for crossdomain image retrieval. Neurocomputing, 2016, 207: 387- 397. Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification. In Proc. the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 2010, pp.270-279. Xia G S, Yang W, Delon J, Gousseau Y, Sun H, Ma?tre H. Structural high-resolution satellite image indexing. In Proc. ISPRS TC VⅡ Symposium-100 Years ISPRS, Volume 38, September 2010, pp.298-303. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. Advances in Neural Information Processing Systems (NIPS), December 2012, pp.1097-1105. Zhang Q, Shen X, Xu L, Jia J. Rolling guidance filter. In Proc. the 13th European Conference on Computer Vision (ECCV), September 2014, pp.815-830. Hu J, Jiang T, Tong X, Xia G S, Zhang L. A benchmark for scene classification of high spatial resolution remote sensing imagery. In Proc. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2015, pp.5003- 5006. Xia G S, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L. AID: A benchmark dataset for performance evaluation of aerial scene classification. arXiv:1608.05167, 2016. https://arxiv.org/abs/1608.05167v1, May 2017. Zitnick C L, Dollár P. Edge boxes: Locating object proposals from edges. In Proc. European Conference on Computer Vision (ECCV), September 2014, pp.391-405. Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 2011, 42(3): 145-175. Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110. Grauman K, Darrell T. The pyramid match kernel: Discriminative classification with sets of image features. In Proc. the 10th IEEE International Conference on Computer Vision (ICCV), October 2005, pp.1458-1465. Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2006, pp.2169-2178. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2005, pp.886- 893. Xie S, Tu Z. Holistically-nested edge detection. In Proc. IEEE International Conference on Computer Vision (ICCV), December 2015, pp.1395-1403. Babenko A, Slesarev A, Chigorin A, Lempitsky V. Neural codes for image retrieval. In Proc. European Conference on Computer Vision (ECCV), September 2014, pp.584-599. Mahendran A, Vedaldi A. Understanding deep image representations by inverting them. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp.5188-5196.