Peng Qu1, Jin Yan2, You-Hui Zhang1, Member, CCF, ACM, IEEE, Guang R. Gao3, Fellow, ACM, IEEE
1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
2 Department of Computer Science, Brown University, Providence, RI 02912, U.S.A.;
3 Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, U.S.A.
Abstract We have witnessed the tremendous momentum of the second spring of parallel computing in recent years. But, we should remember the low points of the field more than 20 years ago and review the lesson that has led to the question at that point whether "parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it" in an article entitled "the death of parallel computing" written by the late Ken Kennedy-a prominent leader of parallel computing in the world. Facing the new era of parallel computing, we should learn from the robust history of sequential computation in the past 60 years. We should study the foundation established by the model of Turing machine (1936) and its profound impact in this history. To this end, this paper examines the disappointing state of the work in parallel Turing machine models in the past 50 years of parallel computing research. Lacking a solid yet intuitive parallel Turing machine model will continue to be a serious challenge in the future parallel computing. Our paper presents an attempt to address this challenge by presenting a proposal of a parallel Turing machine model. We also discuss why we start our work in this paper from a parallel Turing machine model instead of other choices.
This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB0200505.
About author: Peng Qu received his B.S. degree in computer science and technology from Tsinghua University, Beijing, in 2013. He is now a Ph.D. candidate in the Department of Computer Science and Technology, Tsinghua University, Beijing. His research interests include computer architecture and neuromorphic computing.
Cite this article:
Peng Qu, Jin Yan, You-Hui Zhang, Guang R. Gao.Parallel Turing Machine, a Proposal[J] Journal of Computer Science and Technology, 2017,V32(2): 269-285
 Bouknight W J, Denenberg S A, McIntyre D E, Randall J M, Sameh A H, Slotnick D L. The Illiac IV system. Proceedings of the IEEE, 1972, 60(4):369-388. Russell R M. The CRAY-1 computer system. Communications of the ACM, 1978, 21(1):63-72. Duncan R. A survey of parallel computer architectures. Computer, 1990, 23(2):5-16. Eckert Jr J P, Mauchly J W. Automatic high-speed computing:A progress report on the EDVAC. Report of Work under Contract No.W-670-ORD-4926, More School of Elec. Eng., U. of Pennsylvania, Sept. 1945. von Neumann J. First draft of a report on the EDVAC. IEEE Annals of the History of Computing, 1993, 15(4):27-75. Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8):103-111. Culler D E, Karp R M, Patterson D, Sahay A, Santos E E, Schauser K E, Subramonian R, von Eicken T. LogP:A practical model of parallel computation. Communications of the ACM, 1996, 39(11):78-85. Lee E A. The problem with threads. Computer, 2006, 39(5):33-42. Hemmerling A. Systeme von Turing-Automaten und Zellularr äume auf rahmbaren pseudomustermengen. Elektronische Informationsverarbeitung und Kybernetik, 1979, 15(1/2):47-72. (in German) Wiedermann J. Parallel Turing machines. Technical Report RUU-CS-84-11, Department of Computer Science, University of Utrecht, The Netherlands, 1984. Cooper S B. Turing's Titanic machine? Communications of the ACM, 2012, 55(3):74-83. Goldin D, Wegner P. The church-Turing thesis:Breaking the myth. In Lecture Notes in Computer Science 3526, Cooper S B, Löwe B, Torenvliet L (eds.), Springer, 2005, pp.152-168. Turing A M. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 1936, S2-42(1):230-265. Lamport L. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 1978, 21(7):558-565. Dennis J B. A parallel program execution model supporting modular software construction. In Proc. the 3rd Working Conference on Massively Parallel Programming Models, November 1997, pp.50-60. Gao G R, Sarkar V. Location consistency-A new memory model and cache consistency protocol. IEEE Transactions on Computers, 2000, 49(8):798-813. Garcia E, Orozco D, Gao G R. Energy efficient tiling on a Many-Core Architecture. In Proc. the 4th Workshop on Programmability Issues for Heterogeneous Multicores; the 6th International Conference on High-Performance and Embedded Architectures and Compilers, January 2011, pp.53-66. Dennis J B, Fosseen J B, Linderman J P. Data flow schemas. In Proc. the International Symposium on Theoretical Programming, August 1972, pp.187-216. Gao G R. An efficient hybrid dataflow architecture model. Journal of Parallel and Distributed Computing, 1993, 19(4):293-307. Dennis J B, Gao G R. An efficient pipelined dataflow processor architecture. In Proc. the ACM/IEEE Conference on Supercomputing, November 1988, pp.368-373. Gao G R, Tio R, Hum H H J. Design of an efficient dataflow architecture without data flow. In Proc. International Conference on the 5th Generation Computer Systems, December 1988, pp.861-868. Zuckerman S, Suetterlein J, Knauerhase R, Gao G R. Using a "codelet" program execution model for exascale machines:Position paper. In Proc. the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, June 2011, pp.64-69. Dennis J B. Fresh Breeze:A multiprocessor chip architecture guided by modular programming principles. ACM SIGARCH Computer Architecture News, 2003, 31(1):7-15. Dennis J B, van Horn E C. Programming semantics for multiprogrammed computations. Communications of the ACM, 1966, 9(3):143-155. Gao G R, Suetterlein J, Zuckerman S. Toward an execution model for extreme-scale systems-runnemede and beyond. CAPSL Technical Memo 104, April 2011. Denning P J, Dennis J B. The resurgence of parallelism. Communications of the ACM, 2010, 53(6):30-32. van Emde Boas P. Machine models and simulations. In Handbook of Theoretical Computer Science (vol. A), van Leeuwen (ed.), MIT Press, 1990, pp.1-66. Ito T. Synchronized alternation and parallelism for three-dimensional automata[Ph.D. Thesis]. University of Miyazaki, 2008. Okinaka K, Inoue K, Ito A. A note on hardware-bounded parallel Turing machines. In Proc. the 2nd International Conference on Information, December 2002, pp.90-100. Ito T, Sakamoto M, Taniue A, Matsukawa T, Uchida Y, Furutani H, Kono M. Parallel Turing machines on fourdimensional input tapes. Artificial Life and Robotics, 2010, 15(2):212-215. Dubois M, Scheurich C, Briggs F. Memory access buffering in multiprocessors. ACM SIGARCH Computer Architecture News, 1986, 14(2):434-442. Gharachorloo K, Lenoski D, Laudon J, Gibbons P, Gupta A, Hennessy J. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proc. the 17th Annual International Symposium on Computer Architecture, May 1990, pp.15-26. Chen C, Manzano J B, Gan G, Gao G R, Sarkar V. A study of a software cache implementation of the OpenMP memory model for multicore and manycore architectures. In Lecture Notes in Computer Science 6272, D'Ambra P, Guarracino M, Talia D (eds.), Springer, 2010, pp.341-352. Gao G R, Hum H H J, Monti J M. Towards an efficient hybrid dataflow architecture model. In Proc. Parallel Architectures and Languages Europe, June 1991, pp.355-371. Iannucci R A. Toward a dataflow/von Neumann hybrid architecture. ACM SIGARCH Computer Architecture News, 1988, 16(2):131-140. Dennis J B. First version of a data flow procedure language. In Proc. Programming Symposium, April 1974, pp.362-376. Arvind K, Nikhil R S. Executing a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, 1990, 39(3):300-318. Blumofe R D, Joerg C F, Kuszmaul B C, Leiserson C E, Randall K H, Zhou Y L. Cilk:An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 1996, 37(1):55-69. Hum H H J, Maquelin O, Theobald K B, Tian X M, Tang X A, Gao G R, Cupryk P, Elmasri N, Hendren L J, Jimenez A, Krishnan S, Marquez A, Merali S, Nemawarkar S S, Panangaden P, Xue X, Zhu Y C. A design study of the EARTH multiprocessor. In Proc. the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, June 1995, pp.59-68. Dennis J B, Misunas D P. A preliminary architecture for a basic data-flow processor. In Proc. the 2nd Annual Symposium on Computer Architecture, January 1975, pp.126-132. Hennessy J L, Patterson D A. Computer Architecture:A Quantitative Approach (5th edition). Elsevier, 2012. Lauderdale C, Khan R. Position paper:Towards a codeletbased runtime for exascale computing. In Proc. the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, March 2012, pp.21-26. Cassidy A S Merolla P, Arthur J V, Esser S K, Jackson B, Alvarez-Icaza R, Datta P, Sawada J, Wong T M, Feldman V, Amir A, Rubin D B D, Akopyan F, McQuinn E, Risk W P, Modha D S. Cognitive computing building block:A versatile and efficient digital neuron model for neurosynaptic cores. In Proc. the International Joint Conference on Neural Networks, Aug. 2013. Khan M M, Lester D R, Plana L A, Rast A, Jin X, Painkras E, Furber S B. SpiNNaker:Mapping neural networks onto a massively-parallel chip multiprocessor. In Proc. IEEE International Joint Conference on Neural Networks, June 2008, pp.2849-2856. Ji Y, Zhang Y H, Li S C, Chi P, Jiang C H, Qu P, Xie Y, Chen W G. NEUTRAMS:Neural network transformation and co-design under neuromorphic hardware constraints. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, October 2016.