Journal of Computer Science and Technology ›› 2022, Vol. 37 ›› Issue (3): 652-665.doi: 10.1007/s11390-022-2189-3

Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia

• Special Section of CVM 2022 • Previous Articles     Next Articles

Element-Arrangement Context Network for Facade Parsing

Yan Tao (陶琰), Yi-Teng Zhang (张翼腾), and Xue-Jin Chen* (陈雪锦), Senior Member, CCF, Member, ACM, IEEE        

  1. National Engineering Laboratory for Brain-Inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei 230026, China
  • Received:2022-01-28 Revised:2022-04-15 Accepted:2022-04-24 Online:2022-05-30 Published:2022-05-30
  • Contact: Xue-Jin Chen E-mail:xjchen99@ustc.edu.cn
  • About author:Xue-Jin Chen is currently a professor with the National Engineering Laboratory for Brain-Inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei. She received her B.S. degree and Ph.D. degree in electronic circuits and systems from University of Science and Technology of China, Hefei, in 2003 and 2008 respectively. From 2008 to 2010, she conducted research as a postdoctoral scholar in the Department of Computer Science at Yale University, City of New Haven. Her research interests include 3D modeling, geometry processing, and content creation. She has authored or co-authored over 60 papers in these areas. She was one recipient of the Honorable Mention Awards of Computational Visual Media in 2019.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant No. 61632006, and Tencent Corporation.

Facade parsing aims to decompose a building facade image into semantic regions of the facade objects. Considering each architectural element on a facade as a parameterized rectangle, we formulate the facade parsing task as object detection, allowing overlapping and nesting, which will support structural 3D modeling and editing for further applications. In contrast to general object detection, the spatial arrangement regularity and appearance similarity between the facade elements of the same category provide valuable context for accurate element localization. In this paper, we propose to exploit the spatial arrangement regularity and appearance similarity of facade elements in a detection framework. Our element-arrangement context network (EACNet) consists of two unidirectional attention branches, one to capture the column-context and the other to capture row-context to aggregate element-specific features from multiple instances on the facade. We conduct extensive experiments on four public datasets (ECP, CMP, Graz50, and eTRIMS). The proposed EACNet achieves the highest mIoU (82.1% on ECP, 77.35% on Graz50, and 82.3% on eTRIMS) compared with the state-of-the-art methods. Both the quantitative and qualitative evaluation results demonstrate the effectiveness of our dual unidirectional attention branches to parse facade elements.

Key words: facade parsing; element detection; layout regularity; spatial context;

[1] Müller P, Zeng G, Wonka P, van Gool L. Image-based procedural modeling of facades. ACM Transactions on Graphics, 2007, 26(3): Article No. 85. DOI: 10.1145/1276377.1276484.

[2] Shen C H, Huang S S, Fu H B, Hu S M. Adaptive partitioning of urban facades. ACM Transactions on Graphics, 2011, 30(6): Article No. 184. DOI: 10.1145/2070781.2024218.

[3] Teboul O, Simon L, Koutsourakis P, Paragios N. Segmentation of building facades using procedural shape priors. In Proc. the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2010, pp.3105-3112. DOI: 10.1109/CVPR.2010.5540068.

[4] Teboul O, Kokkinos I, Simon L, Koutsourakis P, Paragios N. Shape grammar parsing via reinforcement learning. In Proc. the 2011 IEEE Conference on Computer Vision and Pattern Recognition, June 2011, pp.2273-2280. DOI: 10.1109/CVPR.2011.5995319.

[5] Yang C, Han T, Quan L, Tai C L. Parsing façade with rank-one approximation. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1720-1727. DOI: 10.1109/CVPR.2012.6247867.

[6] Cohen A, Schwing A G, Pollefeys M. Efficient structured parsing of facades using dynamic programming. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.3206-3213. DOI: 10.1109/CVPR.2014.410.

[7] Martinović A, Mathias M, Weissenberg J, van Gool L. A three-layered approach to facade parsing. In Proc. the 12th European Conference on Computer Vision, Oct. 2012, pp.416-429. DOI: 10.1007/978-3-642-33786-4.

[8] Mathias M, Martinović A, van Gool L. ATLAS: A three-layered approach to facade parsing. International Journal of Computer Vision, 2016, 118(1): 22-48. DOI: 10.1007/s11263-015-0868-z.

[9] Schmitz M, Mayer H. A convolutional network for semantic facade segmentation and interpolation. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2016, XLI-B3: 709-715. DOI: 10.5194/isprs-archives-XLI-B3-709-2016.

[10] Liu H, Zhang J, Hoi S C H. DeepFacade: A deep learning approach to facade parsing. In Proc. the 26th International Joint Conference on Artificial Intelligence, Aug. 2017, pp.2301-2307. DOI: 10.24963/ijcai.2017/320.

[11] Liu H, Xu Y, Zhang J, Zhu J, Li Y, Hoi S C H. DeepFacade: A deep learning approach to facade parsing with symmetric loss. IEEE Transactions on Multimedia, 2020, 22(12): 3153-3165. DOI: 10.1109/TMM.2020.2971431.

[12] Riemenschneider H, Krispel U, Thaller W, Donoser M, Havemann S, Fellner D, Bischof H. Irregular lattices for complex shape grammar facade parsing. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1640-1647. DOI: 10.1109/CVPR.2012.6247857.

[13] Tyleček R, Šára R. Spatial pattern templates for recognition of objects with regular structure. In Proc. the 35th German Conference on Pattern Recognition, Sept. 2013, pp.364-374. DOI: 10.1007/978-3-642-40602-7.

[14] Bao F, Schwarz M, Wonka P. Procedural facade variations from a single layout. ACM Transactions on Graphics, 2013, 32(1): Article No. 8. DOI: 10.1145/2421636.2421644.

[15] Dang M, Ceylan D, Neubert B, Pauly M. SAFE: Structure-aware facade editing. Computer Graphics Forum, 2014, 33(2): 83-93. DOI: 10.1111/cgf.12313.

[16] Ilčı́k M, Musialski P, Auzinger T, Wimmer M. Layer-based procedural design of façades. Computer Graphics Forum, 2015, 34(2): 205-216. DOI: 10.1111/cgf.12553.

[17] Han F, Zhu S C. Bottom-up/top-down image parsing by attribute graph grammar. In Proc. the 10th IEEE International Conference on Computer Vision, Oct. 2005, pp.1778-1785. DOI: 10.1109/ICCV.2005.50.

[18] Talton J O, Lou Y, Lesser S, Duke J, Měch R, Koltun V. Metropolis procedural modeling. ACM Transactions on Graphics, 2011, 30(2): Article No. 11. DOI: 10.1145/1944846.1944851.

[19] Yeh Y T, Breeden K, Yang L, Fisher M, Hanrahan P. Synthesis of tiled patterns using factor graphs. ACM Transactions on Graphics, 2013, 32(1): Article No. 3. DOI: 10.1145/2421636.2421639.

[20] Rahmani K, Huang H, Mayer H. Facade segmentation with a structured random forest. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017, IV-1/W1: 175-181. DOI: 10.5194/isprs-annals-IV-1-W1-175-2017.

[21] Gaddle R, Jampani V, Marlet R, V Gehler P. Efficient 2D and 3D facade segmentation using auto-context. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(5): 1273-1280. DOI: 10.1109/TPAMI.2017.2696526.

[22] Koziński M, Gadde R, Zagoruyko S, Obozinski G, Marlet R. A MRF shape prior for facade parsing with occlusions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.2820-2828. DOI: 10.1109/CVPR.2015.7298899.

[23] Cohen A, R Oswald M, Liu Y, Pollefeys M. Symmetry-aware façade parsing with occlusions. In Proc. the 2017 International Conference on 3D Vision, Oct. 2017, pp.393-401. DOI: 10.1109/3DV.2017.00052.

[24] Nan L, Sharf A, Zhang H, Cohen-Or D, Chen B. SmartBoxes for interactive urban reconstruction. ACM Trans. Graph., 2010, 29(4): Article No. 93. DOI: 10.1145/1778765.1778830.

[25] Zhang H, Xu K, Jiang W, Lin J, Cohen-Or D, Chen B. Layered analysis of irregular facades via symmetry maximization. ACM Trans. Graph., 2013, 32(4): Article No. 121. DOI: 10.1145/2461912.2461923.

[26] Femiani J, Reyaz Para W, Mitra N, Wonka P. Facade segmentation in the wild. arXiv:1805.08634, 2018. https: //arxiv.org/pdf/1805.08634.pdf, Jan. 2022.

[27] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2961-2969. DOI: 10.1109/ICCV.2017.322.

[28] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. the 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014, pp.580-587. DOI: 10.1109/CVPR.2014.81.

[29] Girshick R. Fast R-CNN. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.1440-1448. DOI: 10.1109/ICCV.2015.169.

[30] Law H, Deng J. CornerNet: Detecting objects as paired keypoints. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.734-750. DOI: 10.1007/978-3-030-01264-9.

[31] Newell A, Huang Z, Deng J. Associative embedding: End-to-end learning for joint detection and grouping. In Proc. the Annual Conference onNeural Information Processing Systems, Dec. 2017, pp.2277-2287.

[32] Zhou X, Wang D, Krähenbühl P. Objects as points. arXiv:1904.07850, 2019. https://arxiv.org/pdf/1904.078 50.pdf, Jan. 2022.

[33] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the Annual Conference on Neural Information Processing Systems, Dec. 2017, pp.5998-6008.

[34] Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2016, pp.1480-1489. DOI: 10.18653/v1/N16-1174.

[35] Roy A, Saffar M, Vaswani A, Grangier D. Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 2021, 9: 53-68. DOI: 10.1162/tacla.

[36] Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: Learning feature matching with graph neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.4938-4947. DOI: 10.1109/CVPR42600.2020.00499.

[37] Kolesnikov A, Dosovitskiy A, Weissenborn D, Heigold G, Uszkoreit J, Beyer L, Minderer M, Dehghani M, Houlsby N, Gelly S, Unterthiner T, Zhai X. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. the 9th International Conference on Learning Representations, May 2021.

[38] Wang S, Li B Z, Khabsa M, Fang H, Ma H. Linformer: Self-attention with linear complexity. arXiv:2006.04768, 2020. https://arxiv.org/pdf/2006.04768.pdf, Jan. 2022.

[39] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7132-7141. DOI: 10.1109/CVPR.2018.00745.

[40] Zhao H, Zhang Y, Liu S, Shi J, Loy C C, Lin D, Jia J. PSANet: Point-wise spatial attention network for scene parsing. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.267-283. DOI: 10.1007/978-3-030-01240-3.

[41] Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.7794-7803. DOI: 10.1109/CVPR.2018.00813.

[42] Strudel R, Garcia R, Laptev I, Schmid C. Segmenter: Transformer for semantic segmentation. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.7262-7272. DOI: 10.1109/ICCV48922.2021.00717.

[43] Wang W, Xie E, Li X, Fan D P, Song K, Liang D, Lu T, Luo P, Shao L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.568-578. DOI: 10.1109/ICCV48922.2021.00061.

[44] Huang Z, Wang X, Wei Y, Huang L, Shi H, Liu W, S Huang T. CCNet: Criss-cross attention for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. DOI: 10.1109/TPAMI.2020.3007032.

[45] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.483-499. DOI: 10.1007/978-3-319-46484-8.

[46] Lin T Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2999-3007. DOI: 10.1109/ICCV.2017.324.

[47] Koziński M, Obozinski G, Marlet R. Beyond procedural facade parsing: Bidirectional alignment via linear programming. In Proc. the 12th Asian Conference on Computer Vision, Nov. 2015, pp.79-94. DOI: 10.1007/978-3-319-16817-3.

[48] Rahmani K, Huang H, Mayer H. High quality facade segmentation base on structured random forest, region proposal network and rectangular fitting. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2018, IV-2: 223-230. DOI: 10.5194/isprs-annals-IV-2-223-2018.

[49] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO: Common objects in context. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.740-755. DOI: 10.1007/978-3-319-10602-1.

[50] Zhang Z, Ganesh A, Liang X, Ma Y. TILT: Transform invariant low-rank textures. International Journal of Computer Vision, 2012, 99(1): 1-24. DOI: 10.1007/s11263-012-0515-x.

[51] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.

[52] Yu F, Wang D, Shelhamer E, Darrell T. Deep layer aggregation. In Proc. the 2018 IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.2403-2412. DOI: 10.1109/CVPR.2018.00255.

[1] Chuan-Kang Li, Hong-Xin Zhang, Jia-Xin Liu, Yuan-Qing Zhang, Shan-Chen Zou, Yu-Tong Fang. Window Detection in Facades Using Heatmap Fusion [J]. Journal of Computer Science and Technology, 2020, 35(4): 900-912.
[2] Wen-Gang Zhou, Hou-Qiang Li, Yijuan Lu, Qi Tian. Encoding Spatial Context for Large-Scale Partial-DuplicateWeb Image Retrieval [J]. , 2014, 29(5): 837-848.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .
[7] Huang Xuedong; Cai Lianhong; Fang Ditang; Chi Bianjin; Zhou Li; Jiang Li;. A Computer System for Chinese Character Speech Input[J]. , 1986, 1(4): 75 -83 .
[8] Xu Xiaoshu;. Simplification of Multivalued Sequential SULM Network by Using Cascade Decomposition[J]. , 1986, 1(4): 84 -95 .
[9] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[10] Zhong Renbao; Xing Lin; Ren Zhaoyang;. An Interactive System SDI on Microcomputer[J]. , 1987, 2(1): 64 -71 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved