SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Wei KK, Luo WQ, Tan SQ et al. CTNet: A convolutional Transformer network for color image steganalysis. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 40(2): 413−427, Mar. 2025. DOI: 10.1007/s11390-023-3006-3 |
Compared with convolutional neural network (CNN), Transformer can obtain global receptive field features more effectively and has recently achieved great success in natural language processing and computer vision. Due to the particularity of steganography, however, almost all existing steganalytic networks just employ CNN with local receptive fields to detect embedding artifacts. In this paper, we propose a novel convolutional Transformer network for color image steganalysis. Specifically, we firstly obtain various image residuals for each color channel of an input image in the pre-processing module. To capture more comprehensive steganalytic features, the truncated residuals after channel concatenation will pass through a feature extraction module composed of a CNN group and a Transformer group. The CNN group aims to extract local receptive fields features, while the Transformer group with multi-head self-attention as the key tries to extract global steganalytic features. Finally, we employ a global covariance pooling (GCP) and two fully-connected (FC) layers with dropout for classification. Extensive comparative experiments demonstrate that the proposed method can significantly improve the detection performances in color image steganalysis and achieve state-of-the-art results. Although the proposed method is originally designed for color images, it can also obtain competitive results for grayscale images compared with the current best detector. In addition, we provide numerous ablation studies to verify the rationality of the proposed network architecture.
[1] |
Filler T, Fridrich J. Gibbs construction in steganography. IEEE Trans. Information Forensics and Security, 2010, 5(4): 705–720. DOI: 10.1109/TIFS.2010.2077629.
|
[2] |
Luo W, Huang F, Huang J. Edge adaptive image steganography based on LSB matching revisited. IEEE Trans. Information Forensics and Security, 2010, 5(2): 201–214. DOI: 10.1109/TIFS.2010.2041812.
|
[3] |
Li B, Wang M, Huang J, Li X. A new cost function for spatial image steganography. In Proc. the 2014 IEEE International Conference on Image Processing, Oct. 2014, pp.4206–4210. DOI: 10.1109/ICIP.2014.7025854.
|
[4] |
Fridrich J, Kodovsky J. Rich models for steganalysis of digital images. IEEE Trans. Information Forensics and Security, 2012, 7(3): 868–882. DOI: 10.1109/TIFS.2012.2190402.
|
[5] |
Denemark T, Sedighi V, Holub V, Cogranne R, Fridrich J. Selection-channel-aware rich model for steganalysis of digital images. In Proc. the 2014 IEEE International Workshop on Information Forensics and Security, Dec. 2014, pp.48–53. DOI: 10.1109/WIFS.2014.7084302.
|
[6] |
Tang W, Li H, Luo W, Huang J. Adaptive steganalysis based on embedding probabilities of pixels. IEEE Trans. Information Forensics and Security, 2016, 11(4): 734–745. DOI: 10.1109/TIFS.2015.2507159.
|
[7] |
Xia C, Guan Q, Zhao X, Xu Z, Ma Y. Improving GFR steganalysis features by using Gabor symmetry and weighted histograms. In Proc. the 5th ACM Workshop on Information Hiding and Multimedia Security, Jun. 2017, pp.55–66. DOI: 10.1145/3082031.3083243.
|
[8] |
Ye J, Ni J, Yi Y. Deep learning hierarchical representations for image steganalysis. IEEE Trans. Information Forensics and Security, 2017, 12(11): 2545–2557. DOI: 10.1109/TIFS.2017.2710946.
|
[9] |
Boroumand M, Chen M, Fridrich J. Deep residual network for steganalysis of digital images. IEEE Trans. Information Forensics and Security, 2019, 14(5): 1181–1193. DOI: 10.1109/TIFS.2018.2871749.
|
[10] |
Deng X, Chen B, Luo W, Luo D. Fast and effective global covariance pooling network for image steganalysis. In Proc. the 2019 ACM Workshop on Information Hiding and Multimedia Security, Jul. 2019, pp.230–234. DOI: 10.1145/3335203.3335739.
|
[11] |
Zhang R, Zhu F, Liu J, Liu G. Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis. IEEE Trans. Information Forensics and Security, 2020, 15: 1138–1150. DOI: 10.1109/TIFS.2019.2936913.
|
[12] |
Deng X Q, Chen B L, Luo W Q, Luo D. Universal image steganalysis based on convolutional neural network with global covariance pooling. Journal of Computer Science and Technology, 2022, 37(5): 1134–1145. DOI: 10.1007/s11390-021-0572-0.
|
[13] |
Wei K, Luo W, Liu M, Ye M. Residual guided coordinate attention for selection channel aware image steganalysis. Multimedia Systems, 2023, 29(4): 2125–2135. DOI: 10.1007/s00530-023-01094-x.
|
[14] |
Tang W, Li B, Luo W, Huang J. Clustering steganographic modification directions for color components. IEEE Signal Processing Letters, 2016, 23(2): 197–201. DOI: 10.1109/LSP.2015.2504583.
|
[15] |
Qin X, Li B, Tan S, Zeng J. A novel steganography for spatial color images based on pixel vector cost. IEEE Access, 2019, 7: 8834–8846. DOI: 10.1109/ACCESS.2019.2891316.
|
[16] |
Liao X, Yu Y, Li B, Li Z, Qin Z. A new payload partition strategy in color image steganography. IEEE Trans. Circuits and Systems for Video Technology, 2020, 30(3): 685–696. DOI: 10.1109/TCSVT.2019.2896270.
|
[17] |
Wang Y, Zhang W, Li W, Yu X, Yu N. Non-additive cost functions for color image steganography based on inter-channel correlations and differences. IEEE Trans. Information Forensics and Security, 2020, 15: 2081–2095. DOI: 10.1109/TIFS.2019.2956590.
|
[18] |
Goljan M, Fridrich J, Cogranne R. Rich model for steganalysis of color images. In Proc. the 2014 IEEE International Workshop on Information Forensics and Security, Dec. 2014, pp.185–190. DOI: 10.1109/WIFS.2014.7084325.
|
[19] |
Abdulrahman H, Chaumont M, Montesinos P, Magnier B. Color image stegananalysis using correlations between RGB channels. In Proc. the 10th International Conference on Availability, Reliability and Security, Aug. 2015, pp.448–454. DOI: 10.1109/ARES.2015.44.
|
[20] |
Liao X, Chen G, Yin J. Content-adaptive steganalysis for color images. Security and Communication Networks, 2016, 9(18): 5756–5763. DOI: 10.1002/sec.1734.
|
[21] |
Yang C, Kang Y, Liu F, Song X, Wang J, Luo X. Color image steganalysis based on embedding change probabilities in differential channels. International Journal of Distributed Sensor Networks, 2020, 16(5): 1550147720917826. DOI: 10.1177/1550147720917826.
|
[22] |
Zeng J, Tan S, Liu G, Li B, Huang J. WISERNet: Wider separate-then-reunion network for steganalysis of color images. IEEE Trans. Information Forensics and Security, 2019, 14(10): 2735–2748. DOI: 10.1109/TIFS.2019.2904413.
|
[23] |
Butora J, Yousfi Y, Fridrich J. How to pretrain for steganalysis. In Proc. the 2021 ACM Workshop on Information Hiding and Multimedia Security, Jun. 2021, pp.143–148. DOI: 10.1145/3437880.3460395.
|
[24] |
Wei K, Luo W, Tan S, Huang J. Universal deep network for steganalysis of color image based on channel representation. IEEE Trans. Information Forensics and Security, 2022, 17: 3022–3036. DOI: 10.1109/TIFS.2022.3196265.
|
[25] |
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia. Attention is all you need. Advances in neural information processing systems, 2017, 30.
|
[26] |
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 2020, 63(10): 1872–1897. DOI: 10.1007/s11431-020-1647-3.
|
[27] |
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. the 9th International Conference on Learning Representations, May 2021.
|
[28] |
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.10012–10022. DOI: 10.1109/iccv48922.2021.00986.
|
[29] |
Luo G, Wei P, Zhu S, Zhang X, Qian Z, Li S. Image steganalysis with convolutional vision transformer. In Proc. the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, pp.3089–3093. DOI: 10.1109/ICASSP43922.2022.9747091.
|
[30] |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/cvpr.2016.90.
|
[31] |
Song X, Liu F, Yang C, Luo X, Zhang Y. Steganalysis of adaptive JPEG steganography using 2D Gabor filters. In Proc. the 3rd ACM Workshop on Information Hiding and Multimedia Security, Jun. 2015, pp.15–23. DOI: 10.1145/2756601.2756608.
|
[32] |
Yousfi Y, Butora J, Fridrich J, Giboulot Q. Breaking ALASKA: Color separation for steganalysis in JPEG domain. In Proc. the 2019 ACM Workshop on Information Hiding and Multimedia Security, Jul. 2019, pp.138–149. DOI: 10.1145/3335203.3335727.
|
[33] |
Yousfi Y, Butora J, Khvedchenya E, Fridrich J. ImageNet pre-trained CNNs for JPEG steganalysis. In Proc. the 2020 IEEE International Workshop on Information Forensics and Security, Dec. 2020, pp.1–6. DOI: 10.1109/WIFS49906.2020.9360897.
|
[34] |
Li P, Xie J, Wang Q, Gao Z. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.947–955. DOI: 10.1109/cvpr.2018.00105.
|
[35] |
Holub V, Fridrich J, Denemark T. Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security, 2014, 2014: 1–13. DOI: 10.1186/1687-417X-2014-1.
|
[36] |
Bas P, Filler T, Pevnỳ T. “Break our steganographic system”: The ins and outs of organizing BOSS. In Proc. the 13th International Conference on Information Hiding, May 2011, pp.59–70. DOI: 10.1007/978-3-642-24178-9_5.
|
[37] |
Holub V, Fridrich J. Designing steganographic distortion using directional filters. In Proc. the 2012 IEEE International Workshop on Information Forensics and Security, Dec. 2012, pp.234–239. DOI: 10.1109/WIFS.2012.6412655.
|
[38] |
Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.41–48. DOI: 10.1145/1553374.1553380.
|
[39] |
Cogranne R, Giboulot E, Bas P. Efficient steganography in JPEG images by minimizing performance of optimal detector. IEEE Trans. Information Forensics and Security, 2022, 17: 1328–1343. DOI: 10.1109/TIFS.2021.3111713.
|
[40] |
Xu G. Deep convolutional neural network to detect J-UNIWARD. In Proc. the 5th ACM Workshop on Information Hiding and Multimedia Security, Jun. 2017, pp.67–73. DOI: 10.1145/3082031.3083236.
|
[41] |
Huang J, Ni J, Wan L, Yan J. A customized convolutional neural network with low model complexity for JPEG steganalysis. In Proc. the 2019 ACM Workshop on Information Hiding and Multimedia Security, Jul. 2019, pp.198–203. DOI: 10.1145/3335203.3335734.
|
[1] | Qiang Sun, Yan-Wei Fu, Xiang-Yang Xue. Learning a Mixture of Conditional Gating Blocks for Visual Question Answering[J]. Journal of Computer Science and Technology, 2024, 39(4): 912-928. DOI: 10.1007/s11390-024-2113-0 |
[2] | Jun-Jie Gao, Qiu-Jie Dong, Rui-An Wang, Shuang-Min Chen, Shi-Qing Xin, Chang-He Tu, Wenping Wang. OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer[J]. Journal of Computer Science and Technology, 2024, 39(4): 755-770. DOI: 10.1007/s11390-024-4165-6 |
[3] | Wei-Dong Lin, Yu-Yan Deng, Yang Gao, Ning Wang, Ling-Qiao Liu, Lei Zhang, Peng Wang. CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection[J]. Journal of Computer Science and Technology, 2024, 39(2): 460-471. DOI: 10.1007/s11390-024-1743-6 |
[4] | Zi-Nuo Li, Xu-Hang Chen, Shu-Na Guo, Shu-Qiang Wang, Chi-Man Pun. WavEnhancer: Unifying Wavelet and Transformer for Image Enhancement[J]. Journal of Computer Science and Technology, 2024, 39(2): 336-345. DOI: 10.1007/s11390-024-3414-z |
[5] | Xiao-Qing Deng, Bo-Lin Chen, Wei-Qi Luo, Da Luo. Universal Image Steganalysis Based on Convolutional Neural Network with Global Covariance Pooling[J]. Journal of Computer Science and Technology, 2022, 37(5): 1134-1145. DOI: 10.1007/s11390-021-0572-0 |
[6] | Hua-Peng Wei, Ying-Ying Deng, Fan Tang, Xing-Jia Pan, Wei-Ming Dong. A Comparative Study of CNN- and Transformer-Based Visual Style Transfer[J]. Journal of Computer Science and Technology, 2022, 37(3): 601-614. DOI: 10.1007/s11390-022-2140-7 |
[7] | Chuang-Ye Zhang, Yan Niu, Tie-Ru Wu, Xi-Ming Li. Color Image Super-Resolution and Enhancement with Inter-Channel Details at Trivial Cost[J]. Journal of Computer Science and Technology, 2020, 35(4): 889-899. DOI: 10.1007/s11390-020-0272-1 |
[8] | Han-Li Zhao, Gui-Zhi Nie, Xu-Jie Li, Xiao-Gang Jin, Zhi-Geng Pan. Structure-Aware Nonlocal Optimization Framework for Image Colorization[J]. Journal of Computer Science and Technology, 2015, 30(3): 478-488. DOI: 10.1007/s11390-015-1538-x |
[9] | Xiao-Hui Wang, Jia Jia, Han-Yu Liao, Lian-Hong Cai. Affective Image Colorization[J]. Journal of Computer Science and Technology, 2012, 27(6): 1119-1128. DOI: 10.1007/s11390-012-1290-4 |
[10] | Jie Yang, Mohammed Al-Rawi. Illumination Invariant Recognition of Three-Dimensional Texture in Color Images[J]. Journal of Computer Science and Technology, 2005, 20(3): 378-388. |