SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Li RS, Peng P, Shao ZY et al. Evaluating RISC-V vector instruction set architecture extension with computer vision workloads. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(4): 807−820 July 2023. DOI: 10.1007/s11390-023-1266-6. |
Computer vision (CV) algorithms have been extensively used for a myriad of applications nowadays. As the multimedia data are generally well-formatted and regular, it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms. Single Instruction Multiple Data (SIMD) instructions, capable of conducting the same operation on multiple data items in a single instruction, are extensively employed to improve the efficiency of CV algorithms. In this paper, we evaluate the power and effectiveness of RISC-V vector extension (RV-V) on typical CV algorithms, such as Gray Scale, Mean Filter, and Edge Detection. By our examinations, we show that compared with the baseline OpenCV implementation using scalar instructions, the equivalent implementations using the RV-V (version 0.8) can reduce the instruction count of the same CV algorithm up to 24x, when processing the same input images. Whereas, the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor. In our evaluation, by using the vector co-processor (with eight execution lanes) of Xuantie C906, vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.
[1] |
Lu D, Weng Q. A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing, 2007, 28(5): 823–870. DOI: 10.1080/01431160600746456.
|
[2] |
Zhang Z, Hu Y T, Lipton A J, Venetianer P L, Yu L, Yin W H. Target detection and tracking from video streams. US Patent 7801330. September 21, 2010.
|
[3] |
Zhao W, Chellappa R, Phillips P J, Rosenfeld A. Face recognition: A literature survey. ACM Computing Surveys, 2003, 35(4): 399–458. DOI: 10.1145/954339.954342.
|
[4] |
Nauman A, Qadri Y A, Amjad M, Zikria Y B, Afzal M K, Kim S W. Multimedia internet of things: A comprehensive survey. IEEE Access, 2020, 8: 8202–8250. DOI: 10.1109/ACCESS.2020.2964280.
|
[5] |
Diefendorff K, Dubey P K. How multimedia workloads will change processor design. Computer, 1997, 30(9): 43–45. DOI: 10.1109/2.612247.
|
[6] |
Wolf W, Jerraya A A, Martin G. Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(10): 1701–1713. DOI: 10.1109/TCAD.2008.923415.
|
[7] |
Mijat R. Take GPU processing power beyond graphics with Mali GPU computing. White Paper, ARM, 2012. https://developer.arm.com/-/media/Files/pdf/graphics-and-multimedia/WhitePaper_GPU_Computing_on_Mali.pdf, July 2023.
|
[8] |
Shahbahrami A, Juurlink B H H, Vassiliadis S. A comparison between processor architectures for multimedia applications. In Proc. the 15th Annual Workshop on Circuits, Systems and Signal Processing, Apr. 2004, pp.138–152.
|
[9] |
Reddy V G. Neon technology introduction. ARM Corporation, 2008, 4(1): 1–33.
|
[10] |
Asanović K, Patterson D A. Instruction sets should be free: The case for RISC-V. Technical Report, EECS Department, University of California, Berkeley. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-146.html, July 2023.
|
[11] |
Patterson D, Waterman A. The RISC-V Reader: An Open Architecture Atlas. Strawberry Canyon, 2017.
|
[12] |
Duncan R. A survey of parallel computer architectures. Computer, 1990, 23(2): 5–16. DOI: 10.1109/2.44900.
|
[13] |
Barnes G H, Brown R M, Kato M, Kuck D J, Slotnick D L, Stokes R A. The ILLIAC IV computer. IEEE Trans. Computers, 1968, C-17(8): 746–757. DOI: 10.1109/TC.1968.229158.
|
[14] |
Watson W J. The TI ASC: A highly modular and flexible super computer architecture. In Proc. the Fall Joint Computer Conference, Dec. 1972, pp.221–228.
|
[15] |
Russell R M. The CRAY-1 computer system. Communications of the ACM, 1978, 21(1): 63–72. DOI: 10.1145/359327.359336.
|
[16] |
Peleg A, Wilkie S, Weiser U. Intel MMX for multimedia PCs. Communications of the ACM, 1997, 40(1): 24–38. DOI: 10.1145/242857.242865.
|
[17] |
Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P. The ARM scalable vector extension. IEEE Micro, 2017, 37(2): 26–39. DOI: 10.1109/MM.2017.35.
|
[18] |
Parker J R. Algorithms for Image Processing and Computer Vision (2nd edition). John Wiley & Sons, 2010.
|
[19] |
Bradski G, Kaehler A. Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Inc., 2008.
|
[20] |
Saravanan C. Color image to grayscale image conversion. In Proc. the 2nd International Conference on Computer Engineering and Applications, Mar. 2010, pp.196–199. DOI: 10.1109/ICCEA.2010.192.
|
[21] |
Chandel R, Gupta G. Image filtering algorithms and techniques: A review. International Journal of Advanced Research in Computer Science and Software Engineering, 2013, 3(10): 198–202.
|
[22] |
Maini R, Aggarwal H. Study and comparison of various image edge detection techniques. International Journal of Image Processing, 2009, 3(1): 1–11. DOI: 10.1049/iet-ipr:20080080.
|
[23] |
Cavalcante M, Schuiki F, Zaruba F, Schaffner M, Benini L. Ara: A 1-GHz+ scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22-nm FD-SOI. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2020, 28(2): 530–543. DOI: 10.1109/TVLSI.2019.2950087.
|
[24] |
Tagliavini G, Mach S, Rossi D, Marongiu A, Benini L. Design and evaluation of SmallFloat SIMD extensions to the RISC-V ISA. In Proc. the 2019 Design, Automation & Test in Europe Conference & Exhibition, Mar. 2019, pp.654–657. DOI: 10.23919/DATE.2019.8714897.
|
[25] |
Louis M S, Azad Z, Delshadtehrani L, Gupta S, Warden P, Reddi V J, Joshi A. Towards deep learning using tensorFlow lite on RISC-V. In Proc. the 3rd Workshop on Computer Architecture Research with RISC-V, Jun. 2019. DOI: 10.13140/RG.2.2.30400.89606.
|
[26] |
Waterman A, Asanović K. The RISC-V instruction set manual volume II: Privileged architecture version 20190608-Priv-MSU-Ratified. RISC-V Foundation, 2019. DOI: 10.1109/HOTCHIPS.2013.7478332.
|
[27] |
Lomont C. Introduction to Intel® advanced vector extensions. White Paper, Intel®, 2011. https://hpc.llnl.gov/sites/default/files/intelAVXintro.pdf, July 2023.
|
[28] |
Lee Y. Decoupled vector-fetch architecture with a scalarizing compiler [Ph.D. Thesis]. University of California, Berkeley, 2016.
|
[29] |
Patsidis K, Nicopoulos C, Sirakoulis G C, Dimitrakopoulos G. RISC-V2: A scalable RISC-V vector processor. In Proc. the 2020 IEEE International Symposium on Circuits and Systems, Sept. 2020. DOI: 10.1109/ISCAS45731.2020.9181071.
|
[30] |
Chen C, Xiang X Y, Liu C, Shang Y H, Guo R, Liu D Q, Lu Y M, Hao Z Y, Luo J H, Chen Z J, Li C Q, Pu Y, Meng J Y, Yan X L, Xie Y, Qi X N. Xuantie-910: A commercial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension: Industrial product. In Proc. the 47th ACM/IEEE Annual International Symposium on Computer Architecture, Jun. 2020, pp.52–64. DOI: 10.1109/ISCA45697.2020.00016.
|
[31] |
Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1–7. DOI: 10.1145/2024716.2024718.
|
[1] | Feng Wang, Guo-Jie Luo, Guang-Yu Sun, Yu-Hao Wang, Di-Min Niu, Hong-Zhong Zheng. Area Efficient Pattern Representation of Binary Neural Networks on RRAM[J]. Journal of Computer Science and Technology, 2021, 36(5): 1155-1166. DOI: 10.1007/s11390-021-0906-y |
[2] | Fa-Qiang Sun, Gui-Hai Yan, Xin He, Hua-Wei Li, Yin-He Han. CPicker: Leveraging Performance-Equivalent Configurations to Improve Data Center Energy Efficiency[J]. Journal of Computer Science and Technology, 2018, 33(1): 131-144. DOI: 10.1007/s11390-018-1811-x |
[3] | Yu-Rong Cheng, Ye Yuan, Jia-Yu Li, Lei Chen, Guo-Ren Wang. Keyword Query over Error-Tolerant Knowledge Bases[J]. Journal of Computer Science and Technology, 2016, 31(4): 702-719. DOI: 10.1007/s11390-016-1658-y |
[4] | Lixue Xia, Peng Gu, Boxun Li, Tianqi Tang, Xiling Yin, Wenqin Huangfu, Shimeng Yu, Yu Cao, Yu Wang, Huazhong Yang. Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication[J]. Journal of Computer Science and Technology, 2016, 31(1): 3-19. DOI: 10.1007/s11390-016-1608-8 |
[5] | Qi Wang, Jia-Rui Li, Dong-Hui Wang. Improving the Performance and Energy Efficiency of Phase Change Memory Systems[J]. Journal of Computer Science and Technology, 2015, 30(1): 110-120. DOI: 10.1007/s11390-015-1508-3 |
[6] | Bo Yang, Xiao-Qiong Pang, Jun-Qiang Du, Dan Xie. Effective Error-Tolerant Keyword Search for Secure Cloud Computing[J]. Journal of Computer Science and Technology, 2014, 29(1): 81-89. DOI: 10.1007/s11390-013-1413-6 |
[7] | Jin-Tao Meng, Jian-Rui Yuan, Sheng-Zhong Feng, Yan-Jie Wei. An Energy Efficient Clustering Scheme for Data Aggregation in Wireless Sensor Networks[J]. Journal of Computer Science and Technology, 2013, 28(3): 564-573. DOI: 10.1007/s11390-013-1356-y |
[8] | Xiao-Hang Wang, Peng Liu, Mei Yang, Maurizio Palesi, Ying-Tao Jiang, Michael C Huang. Energy Efficient Run-Time Incremental Mapping for 3-D Networks-on-Chip[J]. Journal of Computer Science and Technology, 2013, 28(1): 54-71. DOI: 10.1007/s11390-013-1312-x |
[9] | Long Zheng, Mian-Xiong Dong, Kaoru Ota, Hai Jin, Song Guo, Jun Ma. Energy Efficiency of a Multi-Core Processor by Tag Reduction[J]. Journal of Computer Science and Technology, 2011, 26(3): 491-503. DOI: 10.1007/s11390-011-1149-0 |
[10] | LI Layuan, LI Chunlin. A Semantics-Based Approach for Achieving Self Fault-Tolerance of Protocols[J]. Journal of Computer Science and Technology, 2000, 15(2): 176-183. |
1. | Nizar El Zarif, Mohammadhossein Askari Hemmat, Theo Dupuis, et al. Polara-Keras2c: Supporting Vectorized AI Models on RISC-V Edge Devices. IEEE Access, 2024, 12: 171836. DOI:10.1109/ACCESS.2024.3498462 |
2. | Lorenzo Carpentieri, Mohammad VazirPanah, Biagio Cosenza. A Performance Analysis of Autovectorization on RVV RISC-V Boards. 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), DOI:10.1109/PDP66500.2025.00026 |