Processing math: 100%
We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Hua Q, Hu HW, Qian SY et al. Bi-GAE: A bidirectional generative auto-encoder. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(3): 626−643 May 2023. DOI: 10.1007/s11390-023-1902-1.
Citation: Hua Q, Hu HW, Qian SY et al. Bi-GAE: A bidirectional generative auto-encoder. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(3): 626−643 May 2023. DOI: 10.1007/s11390-023-1902-1.

Bi-GAE: A Bidirectional Generative Auto-Encoder

Funds: This work was supported by the Program of Technology Innovation of the Science and Technology Commission of Shanghai Municipality under Grant No. 21511104700, the Artificial Intelligence Technology Support Project of the Science and Technology Commission of Shanghai Municipality under Grant No. 22DZ1100103, and the Shanghai Informatization Development Special Project under Grant No. 202001030.
More Information
  • Author Bio:

    Qin Hua is currently a Ph.D. candidate in computer science and technology of Shanghai Jiao Tong University (SJTU), Shanghai. His research interests include intelligent resource scheduling and scaling of micro services, deep learning, and time series forecasting

    Han-Wen Hu is currently a Ph.D. candidate in computer science and technology of Shanghai Jiao Tong University (SJTU), Shanghai. His research interests include intelligent transportation, map matching algorithm, and time series forecasting

    Shi-You Qian received his Ph.D. degree in computer science and technology from Shanghai Jiao Tong University (SJTU), Shanghai, in 2015. He is currently an associate researcher with the Department of Computer Science and Engineering, SJTU, Shanghai. His research interests include event matching for content-based publish/subscribe systems, resource scheduling for the hybrid cloud, and driving recommendations with vehicular networks

    Ding-Yu Yang received his Ph.D. degree in computer science and technology from Shanghai Jiao Tong University, Shanghai, in 2015. He is currently a senior engineer at Alibaba Group, Shanghai. His research interests include resource prediction, anomaly detection in cloud computing, and distributed stream processing. He has published over 20 papers in some journals and conferences such as SIGMOD, VLDB, and VLDBJ

    Jian Cao received his Ph.D. degree in computer science and technology from the Nanjing University of Science and Technology, Nanjing, in 2000. He is currently a professor with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai. His main research interests include service computing, network computing, and intelligent data analytics

  • Corresponding author:

    qshiyou@sjtu.edu.cn

    This paper is the result of collaboration between Shanghai Jiao Tong University and Alibaba Group. Shi-You Qian is an associate researcher in Shanghai Jiao Tong University and Ding-Yu Yang is a senior engineer at Alibaba Group. Both of them can always assume the responsibilities and obligations of the corresponding author.

  • Received Date: September 06, 2021
  • Accepted Date: April 27, 2023
  • Improving the generative and representational capabilities of auto-encoders is a hot research topic. However, it is a challenge to jointly and simultaneously optimize the bidirectional mapping between the encoder and the decoder/generator while ensuing convergence. Most existing auto-encoders cannot automatically trade off bidirectional mapping. In this work, we propose Bi-GAE, an unsupervised bidirectional generative auto-encoder based on bidirectional generative adversarial network (BiGAN). First, we introduce two terms that enhance information expansion in decoding to follow human visual models and to improve semantic-relevant feature representation capability in encoding. Furthermore, we embed a generative adversarial network (GAN) to improve representation while ensuring convergence. The experimental results show that Bi-GAE achieves competitive results in both generation and representation with stable convergence. Compared with its counterparts, the representational power of Bi-GAE improves the classification accuracy of high-resolution images by about 8.09%. In addition, Bi-GAE increases structural similarity index measure (SSIM) by 0.045, and decreases Fréchet inception distance (FID) by 2.48 in the reconstruction of 512×512 images.

  • [1]
    Liu W B, Wang Z D, Liu X H, Zeng N Y, Liu Y R, Alsaadi F E. A survey of deep neural network architectures and their applications. Neurocomputing, 2017, 234: 11–26. DOI: 10.1016/j.neucom.2016.12.038.
    [2]
    Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. the 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp.2242–2251. DOI: 10.1109/ICCV.2017.244.
    [3]
    Tewari A, Zollhöfer M, Kim H, Garrido P, Bernard F, Pérez P, Theobalt C. MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proc. the 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp.3735–3744. DOI: 10.1109/ICCV.2017.401.
    [4]
    Li X P, She J. Collaborative variational autoencoder for recommender systems. In Proc. the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2017, pp.305–314. DOI: 10.1145/3097983.3098077.
    [5]
    Zhou C, Paffenroth R C. Anomaly detection with robust deep autoencoders. In Proc. the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2017, pp.665–674. DOI: 10.1145/3097983.3098052.
    [6]
    Doersch C. Tutorial on variational autoencoders. arXiv: 1606.05908, 2016. https://doi.org/10.48550/arXiv.1606.05908, May 2023.
    [7]
    Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath A A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 2018, 35(1): 53–65. DOI: 10.1109/MSP.2017.2765202.
    [8]
    Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2021, 43(12): 4217–4228. DOI: 10.1109/TPAMI.2020.2970919.
    [9]
    Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. Improved training of Wasserstein GANs. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.5769–5779. DOI: 10.5555/3295222.3295327.
    [10]
    Mao X D, Li Q, Xie H R, Lau R Y K, Wang Z, Smolley S P. Least squares generative adversarial networks. In Proc. the 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp.2813–2821. DOI: 10.1109/ICCV.2017.304.
    [11]
    Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv: 1605.09782, 2016. https://doi.org/10.48550/arXiv.1605.09782, May 2023.
    [12]
    Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv: 1511.06434, 2015. https://doi.org/10.48550/arXiv.1511.06434, May 2023.
    [13]
    Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2180–2188. DOI: 10.5555/3157096.3157340.
    [14]
    Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B. Adversarial autoencoders. arXiv: 1511.05644, 2015. https://doi.org/10.48550/arXiv.1511.05644, May 2023.
    [15]
    Pidhorskyi S, Adjeroh D A, Doretto G. Adversarial latent autoencoders. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.14092–14101. DOI: 10.1109/CVPR42600.2020.01411.
    [16]
    Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. arXiv: 1710.10196, 2017. https://doi.org/10.48550/arXiv.1710.10196, May 2023.
    [17]
    Li C L, Chang W C, Cheng Y, Yang Y M, Póczos B. MMD GAN: Towards deeper understanding of moment matching network. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.2200-2210.
    [18]
    Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.214-223. DOI: 10.5555/3305381.3305404.
    [19]
    Wang Z W, She Q, Ward T E. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Computing Surveys, 2021, 54(2): Article No. 37. DOI: 10.1145/3439723.
    [20]
    Pan Z Q, Yu W J, Yi X K, Khan A, Yuan F, Zheng Y H. Recent progress on generative adversarial networks (GANs): A survey. IEEE Access, 2019, 7: 36322–36333. DOI: 10.1109/ACCESS.2019.2905015.
    [21]
    Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.694–711. DOI: 10.1007/978-3-319-46475-6_43.
    [22]
    Berthelot D, Schumm T, Metz L. BEGAN: Boundary equilibrium generative adversarial networks. arXiv: 1703.10717, 2017. https://doi.org/10.48550/arXiv.1703.10717, May 2023.
    [23]
    Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J, Catanzaro B. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.8798–8807. DOI: 10.1109/CVPR.2018.00917.
    [24]
    Zhang H, Xu T, Li H S, Zhang S T, Wang X G, Huang X L, Metaxas D. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.5908–5916. DOI: 10.1109/ICCV.2017.629.
    [25]
    Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. Analyzing and improving the image quality of StyleGAN. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.8107–8116. DOI: 10.1109/CVPR42600.2020.00813.
    [26]
    Rezende D J, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. In Proc. the 31st International Conference on International Conference on Machine Learning, Jun. 2014, pp.1278–1286. DOI: 10.5555/3044805.3045035.
    [27]
    Chen R T Q, Li X C, Grosse R, Duvenaud D. Isolating sources of disentanglement in VAEs. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.2615–2625. DOI:10.5555/3327144.3327186.
    [28]
    Roy A, Grangier D. Unsupervised paraphrasing without translation. arXiv: 1905.12752, 2019. https://doi.org/10.48550/arXiv.1905.12752, May 2023.
    [29]
    Kingma D P, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M. Improved variational inference with inverse autoregressive flow. In Proc. the 30th International Conference on Neural Information Processing Systems (NIPS), Dec. 2016, pp.4743–4751. DOI: 10.5555/3157382.3157627.
    [30]
    Huang H B, Li Z H, He R, Sun Z N, Tan T N. IntroVAE: Introspective variational autoencoders for photographic image synthesis. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.52–63. DOI: 10.5555/3326943.3326949.
    [31]
    Su J L. GAN-QP: A novel GAN framework without gradient vanishing and lipschitz constraint. arXiv: 1811.07296, 2018. https://doi.org/10.48550/arXiv.1811.07296, May 2023.
    [32]
    Arora S, Ge R, Liang Y Y, Ma T Y, Zhang Y. Generalization and equilibrium in generative adversarial nets (GANs). In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.224–232. DOI: 10.1145/3188745.3232194.
    [33]
    Wang W, Sun Y, Halgamuge S. Improving MMD-GAN training with repulsive loss function. arXiv: 1812.09916, 2018. https://doi.org/10.48550/arXiv.1812.09916, May 2023.
    [34]
    Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.2414–2423. DOI: 10.1109/CVPR.2016.265.
    [35]
    Liu Y F, Chen H, Chen Y, Yin W, Shen C H. Generic perceptual loss for modeling structured output dependencies. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp.5420–5428. DOI: 10.1109/CVPR46437.2021.00538.
    [36]
    He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In Proc. the 2015 IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp.1026–1034. DOI: 10.1109/ICCV.2015.123.
    [37]
    Zhao H, Gallo O, Frosio I, Kautz J. Loss functions for image restoration with neural networks. IEEE Trans. Computational Imaging, 2017, 3(1): 47–57. DOI: 10.1109/TCI.2016.2644865.
    [38]
    Deng L. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 2012, 29(6): 141–142. DOI: 10.1109/MSP.2012.2211477.
    [39]
    Hearst M A, Dumais S T, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and their Applications, 1998, 13(4): 18–28. DOI: 10.1109/5254.708428.
    [40]
    Ye J P. Least squares linear discriminant analysis. In Proc. the 24th International Conference on Machine Learning, Jun. 2007, pp.1087–1093. DOI: 10.1145/1273496.1273633.
    [41]
    Rigatti S J. Random forest. Journal of Insurance Medicine, 2017, 47(1): 31–39. DOI: 10.17849/insm-47-01-31-39.1.
    [42]
    Hastie T, Rosset S, Zhu J, Zou H. Multi-class AdaBoost. Statistics and Its Interface, 2009, 2(3): 349–360. DOI: 10.4310/SII.2009.v2.n3.a8.
    [43]
    Zhang H, Goodfellow I, Metaxas D, Odena A. Self-attention generative adversarial networks. arXiv: 1805.08318, 2018. https://doi.org/10.48550/arXiv.1805.08318, May 2023.
    [44]
    Liu Z W, Luo P, Wang X G, Tang X O. Deep learning face attributes in the wild. In Proc. the 2015 IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp.3730–3738. DOI: 10.1109/ICCV.2015.425.
  • Related Articles

    [1]Li-Min Li, Bin-Wu Wang, Xu Wang, Peng-Kun Wang, Yu-Dong Zhang, Yang Wang. Face Anti-Spoofing with Unknown Attacks: A Comprehensive Feature Extraction and Representation Perspective[J]. Journal of Computer Science and Technology, 2024, 39(4): 827-840. DOI: 10.1007/s11390-024-4164-7
    [2]Hui-Xuan Tang, Hui Wei. A Coarse-to-Fine Method for Shape Recognition[J]. Journal of Computer Science and Technology, 2007, 22(2): 329-333.
    [3]Yong-Jin Liu, Kai Tang, Matthew Ming-Fai Yuen. Multiresolution Free Form Object Modeling with Point Sampled Geometry[J]. Journal of Computer Science and Technology, 2004, 19(5).
    [4]Lao Zhiqiang, Pan Yunhe. A Knowledge Representation Model for Video-Based Animation[J]. Journal of Computer Science and Technology, 1998, 13(3): 228-237.
    [5]Chen Bin, Hong Jiarong, Wang Yadong. The Minimum Feature Subset Selection Problem[J]. Journal of Computer Science and Technology, 1997, 12(2): 145-153.
    [6]Zhou Yi, Wu ShiLin. NNF and NNPrF—Fuzzy Petri Nets Based on Neural Network for Knowledge Representation, Reasoning and Learning[J]. Journal of Computer Science and Technology, 1996, 11(2): 133-149.
    [7]Zheng Nanning, Liu Jianqin. Visual Knowledge Representation and Intelligent Image Segmentation[J]. Journal of Computer Science and Technology, 1992, 7(3): 219-225.
    [8]Fan Zhongchun, Xing Hancheng. DKLFRS:A Default Knowledge Logical Framework Representation System[J]. Journal of Computer Science and Technology, 1992, 7(2): 136-142.
    [9]L Wei, Liang Youdong. A New Representation and Algorithm for Constructing Convex Hulls in Higher Dim ensional Spaces[J]. Journal of Computer Science and Technology, 1992, 7(1): 1-5.
    [10]Tai Juwei, Wang Jue, Chen Xin. A Syntactic-Semantic Approach for Pattern Recognition and Knowledge Representation[J]. Journal of Computer Science and Technology, 1988, 3(3): 161-172.

Catalog

    Article views (299) PDF downloads (22) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return