SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Yu-Jing Feng, De-Jian Li, Xu Tan, Xiao-Chun Ye, Dong-Rui Fan, Wen-Ming Li, Da Wang, Hao Zhang, Zhi-Min Tang. Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment Mechanism[J]. Journal of Computer Science and Technology, 2022, 37(4): 942-959. DOI: 10.1007/s11390-020-0555-6 |
[1] |
Dennis J B. Retrospective: A preliminary architecture for a basic data-flow processor. In Proc. the 25 Years of the International Symposia on Computer Architecture, August 1998, pp.2-4. DOI: 10.1145/285930.285932.
|
[2] |
Arvind, Nikhil R S. uting a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, 1990, 39(3): 300-318. DOI: 10.1109/12.48862.
|
[3] |
Sankaralingam K, Nagarajan R, Liu H, Kim C, Huh J, Burger D, Keckler S W, Moore C R. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proc. the 30th Annual International Symposium on Computer Architecture, June 2003, pp.422-433. DOI: 10.1109/ISCA.2003.1207019.
|
[4] |
Swanson S, Michelson K, Schwerin A, Oskin M. WaveScalar. In Proc. the 36th Annual IEEE/ACM International Symposium on Microarchitecture, December 2003, pp.291-302. DOI: 10.1109/MICRO.2003.1253203.
|
[5] |
Pratas F, Oriato D, Pell O, Mata R A, Sousa L. Accelerating the computation of induced dipoles for molecular mechanics with dataflow engines. In Proc. the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, April 2013, pp.177-180. DOI: 10.1109/FCCM.2013.34.
|
[6] |
Fu H, Gan L, Clapp R G, Ruan H, Pell O, Mencer O, Flynn M, Huang X, Yang G. Scaling reverse time migration performance through reconfigurable dataflow engines. IEEE Micro, 2014, 34(1): 30-40. DOI: 10.1109/MM.2013.111.
|
[7] |
Coons K E, Chen X, Burger D, McKinley K S, Kushwaha S K. A spatial path scheduling algorithm for EDGE architectures. In Proc. the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006, pp.129-140. DOI: 10.1145/1168857.1168875.
|
[8] |
Liu D, Yin S, Liu L, Wei S. Polyhedral model based mapping optimization of loop nests for CGRAs. In Proc. the 50th ACM/EDAC/IEEE Design Automation Conference, May 29-June 7, 2013, Article No.19. DOI: 10.1145/2463209.2488757.
|
[9] |
Nowatzki T, Sartin-Tarm M, De Carli L, Sankaralingam K, Estan C, Robatmili B. A general constraint-centric scheduling framework for spatial architectures. ACM SIGPLAN Notices, 2013, 48(6): 495-506. DOI: 10.1145/2499370.2462163.
|
[10] |
Nowatzki T, Gangadhar V, Sankaralingam K. Exploring the potential of heterogeneous von Neumann/dataflow ution models. In Proc. the 42nd Annual International Symposium on Computer Architecture, June 2015, pp.298-310. DOI: 10.1145/2749469.2750380.
|
[11] |
Sankaralingam K, Nagarajan R, McDonald R et al. Distributed microarchitectural protocols in the TRIPS prototype processor. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp.480-491. DOI: 10.1109/MICRO.2006.19.
|
[12] |
Putnam A, Swanson S, Mercaldi M, Michelson K, Petersen A, Schwerin A, Oskin M, Eggers S. The microarchitecture of a pipelined WaveScalar processor: An RTL-based study. Technical Report, University of Washington, 2004. http://cseweb.ucsd.edu/ swanson/papers/TR-2004-11-02.pdf, Sept. 2020.
|
[13] |
Shimada T, Hiraki K, Nishida K, Sekiguchi S. Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations. In Proc. the 13th Annual International Symposium on Computer Architecture, June 1986, pp.226-234.
|
[14] |
Papadopoulos G M, Culler D E. Monsoon: An explicit token-store architecture. In Proc. the 25 Years of the International Symposia on Computer Architecture, August 1998, pp.398-407. DOI: 10.1145/285930.285999.
|
[15] |
Govindaraju V, Ho C H, Nowatzki T, Chhugani J, Satish N, Sankaralingam K, Kim C. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro, 2012, 32(5): 38-51. DOI: 10.1109/MM.2012.51.
|
[16] |
Shen X, Ye X, Tan X, Wang D, Zhang L, Li W, Zhang Z, Fan D. An efficient network-on-chip router for dataflow architecture. Journal of Computer Science and Technology, 2017, 32(1): 11-25. DOI: 10.1007/s11390-017-1703-5.
|
[17] |
Mercaldi M, Swanson S, Petersen A, Putnam A, Schwerin A, Oskin M, Eggers S J. Instruction scheduling for a tiled dataflow architecture. ACM SIGPLAN Notices, 2006, 41(11): 141-150. DOI: 10.1145/1168918.1168876.
|
[18] |
Voitsechov D, Etsion Y. Single-graph multiple flows: Energy efficient design alternative for GPGPUs. In Proc. the 41st ACM/IEEE Annual International Symposium on Computer Architecture, June 2014, pp.205-216. DOI: 10.1109/ISCA.2014.6853234.
|
[19] |
Lee J K F, Smith A J. Branch prediction strategies and branch target buffer design. Computer, 1984, 17(1): 6-22. DOI: 10.1109/MC.1984.1658927.
|
[20] |
Ye X, Fan D, Sun N, Tang S, Zhang M, Zhang H. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture. In Proc. the 2013 International Symposium on Low Power Electronics and Design, September 2013, pp.273-278. DOI: 10.1109/ISLPED.2013.6629308.
[21Han R, Lu X Y, Xu J T. On Big Data Benchmarking. In Big Data Benchmarks, Performance Optimization, and Emerging Hardware, Zhan J, Han R, Weng C (eds.), Springer, 2014, pp.3-18. DOI: 10.1007/978-3-319-13021-7_1. |
[21] |
Burger D, Austin T M. The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News, 1997, 25(3): 13-25. DOI: 10.1145/268806.268810.
|
[22] |
Kurzak J, Tomov S, Dongarra J. Autotuning GEMM kernels for the Fermi GPU. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(11): 2045-2057. DOI: 10.1109/TPDS.2011.311.
|
[23] |
Del Mundo C, Feng W. Towards a performance-portable FFT library for heterogeneous computing. In Proc. the 11th ACM Conference on Computing Frontiers, May 2014, Article No. 11. DOI: 10.1145/2597917.2597943.
|
[24] |
Holewinski J, Pouchet L N, Sadayappan P. High-performance code generation for stencil computations on GPU architectures. In Proc. the 26th ACM International Conference on Supercomputing, June 2012, pp.311-320. DOI: 10.1145/2304576.2304619.
|
[25] |
Stratton J A, Rodrigues C, Sung I, Obeid N, Chang L, Anssari N, Liu G D, Hwu W W. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report, University of Illinois at Urbana-Champaign, 2012. http://impact.crhc.illinois.edu/Shared/Docs/impact-12-01.parboil.pdf, Sept. 2020.
|
[26] |
Siehl K, Zhao X. Supporting energy-efficient computing on heterogeneous CPU-GPU architectures. In Proc. the 5th IEEE International Conference on Future Internet of Things and Cloud, August 2017, pp.134-141. DOI: 10.1109/FiCloud.2017.46.
|
[27] |
Burtscher M, Zecena I, Zong Z. Measuring GPU power with the K20 built-in sensor. In Proc. the 7th Workshop on General Purpose Processing Using GPUs, March 2014, pp.28-36. DOI: 10.1145/2588768.2576783.
|
[1] | Kwangjin Park. Efficient Data Access for Location-Dependent Spatial Queries[J]. Journal of Computer Science and Technology, 2014, 29(3): 449-469. DOI: 10.1007/s11390-014-1442-9 |
[2] | Bin Xiao, Yi-Fu Zhang, Yan-Ping Gao, Liang Yang, Dong-Mei Wu, Bao-Xia Fan. A Robust and Power-Efficient SoC Implementation in 65nm[J]. Journal of Computer Science and Technology, 2013, 28(4): 682-688. DOI: 10.1007/s11390-013-1368-7 |
[3] | Jin-Tao Meng, Jian-Rui Yuan, Sheng-Zhong Feng, Lian-Sheng Tan. Power Adjusting Algorithm: A New Cross-Layer Power Saving Mechanism for Mobile Ad-Hoc Networks[J]. Journal of Computer Science and Technology, 2013, 28(1): 42-53. DOI: 10.1007/s11390-013-1311-y |
[4] | Han-Xin Sun, Kun-Peng Yang, Yu-Lai Zhao, Dong Tong, Xu Cheng. CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs[J]. Journal of Computer Science and Technology, 2008, 23(1): 141-153. |
[5] | Kwangjin Park, Hyunseung Choo, Chong-Sun Hwang. An Efficient Data Dissemination Scheme for Spatial Query Processing[J]. Journal of Computer Science and Technology, 2007, 22(1): 131-134. |
[6] | XU Shiyi, Tukwasibwe Justaf Frank. Forecasting the Efficiency of Test Generation Algorithms for Combinational Circuits[J]. Journal of Computer Science and Technology, 2000, 15(4): 326-337. |
[7] | Wu Hong, Nie Xumin. Extending STL with Efficient Data Structures[J]. Journal of Computer Science and Technology, 1998, 13(4): 317-324. |
[8] | Huang Wenqi, Li Wei. A Hopeful CNF-SAT─Algorithm Its High Efficiency, Industrial Application and Limitation[J]. Journal of Computer Science and Technology, 1998, 13(1): 9-12. |
[9] | Tian Zengping, Wang Yujun, Qu Yunyao, Shi Baile. On the Expressive Power of F-Logic Language[J]. Journal of Computer Science and Technology, 1997, 12(6): 510-519. |
[10] | Chen Shicheng, Zhou Zhongyi. On Interrupt Strategy from the Point of View of System Efficiency[J]. Journal of Computer Science and Technology, 1987, 2(3): 217-225. |
1. | Archana Singh, Girish Lakhera, Megha Ojha, et al. Edge of Intelligence. DOI:10.1002/9781394314409.ch13 |