|
Journal of Computer Science and Technology ›› 2020, Vol. 35 ›› Issue (3): 522-537.doi: 10.1007/s11390-020-0305-9
Special Issue: Artificial Intelligence and Pattern Recognition; Computer Graphics and Multimedia
• Special Section of CVM 2020 • Previous Articles Next Articles
Fei Fang1, Fei Luo1, Hong-Pan Zhang1, Hua-Jian Zhou1, Alix L. H. Chow2, Chun-Xia Xiao1,*, Member, CCF, IEEE
[1] Lin T Y, Maire M, Belongie S et al. Microsoft COCO:Common objects in context. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.740-755. [2] Krishna R, Zhu Y, Groth O et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 2017, 123(1):32-73. [3] Mansimov E, Parisotto E, Ba J L et al. Generating images from captions with attention. arXiv:1511.02793, 2015. https://arxiv.org/abs/1511.02793, October 2019. [4] Reed S, Akata Z, Yan X et al. Generative adversarial text to image synthesis. arXiv:1605.05396, 2016. https://arxiv.org/abs/1605.05396, October 2019. [5] Zhang H, Xu T, Li H et al. StackGAN:Text to photorealistic image synthesis with stacked generative adversarial networks. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.5907-5915. [6] Lalonde J F, Hoiem D, Efros A A et al. Photo clip art. ACM Transactions on Graphics, 2007, 26(3):Article No. 3. [7] Chen T, Cheng M M, Tan P et al. Sketch2Photo:Internet image montage. ACM Transactions on Graphics, 2009, 28(5):Article No. 124. [8] Chen T, Tan P, Ma L Q et al. PoseShop:Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(5):824-837. [9] Fang F, Yi M, Feng H et al. Narrative collage of image collections by scene graph recombination. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(9):2559-2572. [10] Zitnick C L, Parikh D. Bringing semantics into focus using visual abstraction. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.3009-3016. [11] Zitnick C L, Parikh D, Vanderwende L. Learning the visual interpretation of sentences. In Proc. the IEEE International Conference on Computer Vision, December 2013, pp.1681-1688. [12] Coyne B, Sproat R. WordsEye:An automatic text-to-scene conversion system. In Proc. the 28th Annual Conference on Computer Graphics and Interactive Techniques, August 2001, pp.487-496. [13] Chang A, Savva M, Manning C D. Learning spatial knowledge for text to 3D scene generation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.2028-2038. [14] Reed S, van den Oord A, Kalchbrenner N et al. Generating interpretable images with controllable structure. In Proc. the International Conference on Learning Representations, April 2017. [15] Goodfellow I, Pouget-Abadie J, Mirza M et al. Generative adversarial nets. In Proc. the Annual Conference on Neural Information Processing Systems, December 2014, pp.2672-2680. [16] Reed S E, Akata Z, Mohan S et al. Learning what and where to draw. In Proc. the Annual Conference on Neural Information Processing Systems, December 2016, pp.217-225. [17] Zhang H, Xu T, Li H et al. StackGAN++:Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8):1947-1962. [18] Xu T, Zhang P, Huang Q et al. AttnGAN:Fine-grained text to image generation with attentional generative adversarial networks. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.1316-1324. [19] Yin G, Liu B, Sheng L et al. Semantics disentangling for text-to-image generation. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.2327-2336. [20] Zhou X, Huang S, Li B et al. Text guided person image synthesis. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.3663-3672. [21] Tan H, Liu X, Li X et al. Semantics-enhanced adversarial nets for text-to-image synthesis. In Proc. the IEEE International Conference on Computer Vision, October 2019, pp.10500-10509. [22] Qiao T, Zhang J, Xu D et al. MirrorGAN:Learning textto-image generation by redescription. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.1505-1514. [23] Johnson J, Gupta A, Li F F. Image generation from scene graphs. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2018, pp.1219-1228. [24] Li W, Zhang P, Zhang L et al. Object-driven text-to-image synthesis via adversarial training. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2019, pp.12174-12182. [25] Hinz T, Heinrich S, Wermter S. Generating multiple objects at spatially distinct locations. arXiv:1901.00686, 2019. https://arxiv.org/abs/1901.00686, October 2019. [26] Xu K, Ba J, Kiros R et al. Show, attend and tell:Neural image caption generation with visual attention. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.2048-2057. [27] Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3128-3137. [28] Johnson J, Karpathy A, Li F F. DenseCap:Fully convolutional localization networks for dense captioning. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4565-4574. [29] Krause J, Johnson J, Krishna R et al. A hierarchical approach for generating descriptive image paragraphs. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3337-3345. [30] Yao L, Torabi A, Cho K et al. Describing videos by exploiting temporal structure. In Proc. the IEEE International Conference on Computer Vision, December 2015, pp.4507-4515. [31] Yu H, Wang J, Huang Z et al. Video paragraph captioning using hierarchical recurrent neural networks. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4584-4593. [32] Li A, Sun J, Ng J Y H et al. Generating holistic 3D scene abstractions for text-based image retrieval. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1942-1950. [33] Fellbaum C. WordNet. In Theory and Applications of Ontology:Computer Applications, Poli P, Healy M, Kameas A (eds.), Springer Netherlands, 2010, pp.231-243. [34] He K, Gkioxari G, Dollár P et al. Mask R-CNN. In Proc. the IEEE International Conference on Computer Vision, October 2017, pp.2980-2988. [35] Laina I, Rupprecht C, Belagiannis V et al. Deeper depth prediction with fully convolutional residual networks. In Proc. the 4th International Conference on 3D Vision, October 2016, pp.239-248. [36] Yeh Y T, Yang L, Watson M et al. Synthesizing open worlds with constraints using locally annealed reversible jump MCMC. ACM Transactions on Graphics, 2012, 31(4):Article No. 56. [37] Pérez P, Gangnet M, Blake A. Poisson image editing. ACM Transactions on Graphics, 2003, 22(3):313-318. [38] Liao Z, Karsch K, Forsyth D. An approximate shading model for object relighting. In Proc. the IEEE Conference on Computer Vission and Pattern Recognition, June 2015, pp.5307-5314. [39] Elder J H. Shape from contour:Computation and representation. Annual Review of Vision Science, 2018, 4(1):423-450. [40] Johnston S F. Lumo:Illumination for cel animation. In Proc. the 2nd International Symposium on NonPhotorealistic Animation and Rendering, June 2002, pp.45-52. [41] Wu T P, Sun J, Tang C K et al. Interactive normal reconstruction from a single image. ACM Transactions on Graphics, 2008, 27(5):Article No. 119. [42] Grosse R, Johnson M K, Adelson E H et al. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In Proc. the 12th IEEE International Conference on Computer Vision, September 2009, pp.2335-2342. [43] Karsch K, Sunkavalli K, Hadap S et al. Automatic scene inference for 3D object compositing. ACM Transactions on Graphics, 2014, 33(3):Article No. 32. [44] Godard C, Aodha M O, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In Proc. the IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6602-6611. |
[1] | Yifan Wu, Fan Yang, Yong Xu, Haibin Ling. Privacy-Protective-GAN for Privacy Preserving Face De-Identification [J]. Journal of Computer Science and Technology, 2019, 34(1): 47-60. |
[2] | ZHANG Xiaopeng(张晓鹏),CHEN Yanyun(陈彦云)and WU Enhua(吴恩华). Hair Image Generation Using Connected Texels [J]. , 2001, 16(4): 0-0. |
|