|
›› 2013,Vol. 28 ›› Issue (3): 508-524.doi: 10.1007/s11390-013-1352-2
所属专题: Computer Architecture and Systems; Computer Networks and Distributed Computing
• Special Section on Selected Paper from NPC 2011 • 上一篇 下一篇
Andrés Ortiz1, Julio Ortega2, Senior Member, IEEE, Antonio F. Díaz2, and Alberto Prieto2, Senior Member, IEEE
Andrés Ortiz1, Julio Ortega2, Senior Member, IEEE, Antonio F. Díaz2, and Alberto Prieto2, Senior Member, IEEE
通常,一些多媒体、实时和高性能计算对通讯要求较高,我们有可能需要多个处理器周期来为这些通讯任务提供高达每秒几千兆的网络带宽。广泛的应用需求与多处理器提供的网络链接之间,迫切需要我们改善网络接口的性能。多核架构通过进一步增加时钟频率以及提高微体系结构的效率,是目前在微处理器研发方面的发展趋势。它通过对可用节点的并行化,为设计有效的通信架构提供了新的思路。尽管目前的操作系统网络堆栈包含多个线程,使它能够在内核中并发执行网络任务,但是,由于需要考虑同步性在访问共享资源和有效使用缓存方面带来的代价问题,这种基于包交换或基于连接的并发机制的实现也非易事。这样以来,近期大部分围绕该主题的共同趋势是将网络中断以及与之相对应的协议与网络应用程序分配给相同的内核,利用这种基于耦合的调度来降低共享资源的竞争和缓存未命中的情况。本文提出和分析了几种网络接口与服务器内核之间的调度方案。基于相应通讯任务与不同的数据结构在内存中存储的位置之间的耦合性以及处理内核的特点,对这些方案进行了优化设计。由于该方法利用多个内核加速给定连接的通讯路径,它可以视为那些考虑几个内核同时处理属于相同或不同连接的包交换方法的一种补充。结合MPI工作流与动态web服务器两种应用对上述方案的通讯性能进行了评估与比较。全系统模拟结果表明,在MPI工作流中,吞吐量提高了35%,延时降低了23%,在动态web服务器中,吞吐量提高了100%,而响应时间和每秒允许的请求数量分别提高了500%及82%。
[1] Balaji P, Feng W, Panda D K. Bridging the EthernetEthernot performance gap. IEEE Micro, 2006, 26(3): 24-40.[2] Bhoedjang R, Rúhl T, Bal H E. User-level network interface protocols. IEEE Computer, 1998, 31(11): 53-60.[3] Gilfeather P, Maccabe A. Modeling protocol offload for message-oriented communication. In Proc. the 2005 IEEE Int. Conf. Cluster Computing, Sept. 2005, pp.1-10.[4] Regnier G, Makineni S, Illikkal R et al. TCP onloading for data center servers. IEEE Computer, 2004, 37(11): 48-58.[5] Shivam P, Chase J. On the elusive benefits of protocol offload.In Proc. 2003 SIGCOMM NICELI, August 2003, pp.179-184.[6] Westrelin R, Fugier N, Nordmark E et al. Studying network protocol offload with emulation: Approach and preliminary results. In Proc. the 12th IEEE Symp. HOTI, Aug. 2004, pp.84-90.[7] Nahum E, Yates D, Kurose J, Towsley, D. Performance issues in parallelized network protocols. In Proc. the 1st USENIX OSDI, November 1994, Article No.10.[8] Willmann P, Rixner S, Cox A. An evaluation of network stack parallelization strategies in modern operating systems. In Proc. the USENIX Technical Conf., May 2006, pp.91-96.[9] Mogul J C. TCP offload is a dumb idea whose time has come. In Proc. the 9th HotOS, May 2003, pp.25-30.[10] Apte V, Hansen T, Reeser P. Performance comparison of dynamic web platforms. Computer Communications, 2003, 26(8): 888-898.[11] Lauritzen K, Sawicki T, Stachura T, Wilson C. Intelr I/O acceleration technology improves network performance, reliability and efficiency. Technology@ Intel Magazine, May 2005, pp.3-11.[12] Foong A, Fung J, Newell D et al. Architectural characterization of processor affinity in network processing. In Proc. the IEEE ISPASS, March 2005, pp.207-218.[13] Jang H, Jin H W. MiAMI: Multi-core aware processor affinity for TCP/IP over multiple network interfaces. In Proc. the 17th Symp. HOTI, Aug. 2009, pp.73-82.[14] Wu W, DeMar P, Crawford M. A transport-friendly NIC for multicore/multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(4): 607-615.[15] Kim H, Pai V, Rixner S. Exploiting task-level concurrency in a programmable network interface. In Proc. the 9th ACM SIGPLAN PPoPP, June 2003, pp.61-72.[16] Kumar A, Huggahalli R. Impact of cache coherence protocols on the processing of network traffic. In Proc. the 40th IEEE/ACM MICRO, Dec. 2007, pp.161-171.[17] Magnusson P, Christensson M, Eskilson J et al. Simics: A full system simulation platform. IEEE Computer, 2002, 35(2): 50-58.[18] Willmann P, Shafer J, Carr D et al. Concurrent direct network access for virtual machine monitors. In Proc. the 13th HPCA, Feb. 2007, pp.306-317.[19] Benvenuti C. Understanding Linux Network Internals (1st edition). OReilly Media Inc., 2005.[20] Love R. Linux Kernel Development (2nd edition). Sams Publishing, 2005.[21] Ortiz A, Ortega J, D′?az A, Prieto, A. Network interfaces for programmable NICs and multicore platforms. Computer Networks , 2010, 54(3): 357-376.[22] Clark D, Jacobson V, Romkey J et al. An analysis of TCP processing overhead. IEEE Communications Magazine, 1989, 27(6): 23-29.[23] GadelRab S. 10-Gigabit Ethernet connectivity for computer servers. IEEE Micro, 2007, 27(3): 94-105.[24] Nahum E, Yates D, Kurose J et al. Cache behaviour of network protocols. In Proc. the ACM SIGMETRICS, June 1997, pp.169-180.[25] Tu T, Hsueh C. Unified UDispatch: A user dispatching tool for multicore systems. Journal of Computer Science and Technology, 2011, 26(3): 375-391.[26] Liao G, Zhu X, Bhuyan L. A new server I/O architecture for high speed networks. In Proc. the 17th HPCA, Feb. 2011, pp.255-265.[27] Ortiz A, Ortega J, D′?az A et al. Protocol offload analysis by simulation. J. Systems Architecture, 2009, 55(1): 25-42.[28] Ortiz A, Ortega, J, Diaz, A, Prieto, A. Protocol offload evaluation using Simics. In Proc. the 2006 IEEE International Conference on Cluster Computing, September 2006, pp.1-9.[29] Pacifici G, Segmuller W, Spreitzer M, Tantawi, A. CPU demand for web serving: Measurement analysis and dynamic estimation. Performance Evaluation, 2008, 65(6/7): 531-553.[30] Yeager N, McGrath R. Web Server Technology: The Advanced Guide for World Wide Web Information Providers. San Francisco CA: Morgan-Kaufmann, Inc., 1996.[31] MPICH2: A high performance and widely portable implementation of the message passing interface (MPI) standard. http://www.mpich.org/, October 2012.[32] Kim H, Rixner S. TCP offload through connection handoff. In Proc. the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, October 2006, pp.279-290.[33] Kim H, Rixner S, Pai V. Network interface data caching. IEEE Transactions on Computers, 2005, 54(11): 1394-1408.[34] Vaidyanathan K, Panda D K. Benefits of I/O acceleration technology (I/OAT) in clusters. In Proc. the 2007 IEEE ISPASS, April 2007, pp.220-229.[35] Shalev L, Marhervaks V, Machulsky Z et al. Loosely coupled TCP acceleration architecture. In Proc. the 14th HOTI, Aug. 2006, pp.3-8.[36] Narayanaswamy G, Balaji P, Feng W. An analysis of 10Gigabit Ethernet protocol staks in multicore environments. In Proc. the 15th HOTI, August 2007, pp.109-116.[37] de Bruijn W, Bos H. Model-T: Rethinking the OS for terabits speeds. In Proc. the INFOCOM Workshop on High-Speed Networks, April 2008, pp.1-6.[38] Wun B, Crowley P. Network I/O acceleration in heterogeneous multicore processors. In Proc. the 14th HOTI, August 2006, pp.9-14.[39] Brecht T, Janakiraman G, Lynn B et al. Evaluating network processing efficiency with processor partitioning and asynchronous I/O. In Proc. the 1st ACM SIGOPS/EuroSys European Conf. Computer Systems, Apr. 2006, pp.265-278.[40] Foong A, Fung J, Newell D. An in-depth analysis of the impact of processor affinity on network performance. In Proc. the 12th IEEE Int. Conf. Networks, Mar. 2004, pp.244-250.[41] Goglin B. NIC-assisted cache-efficient receive stack for message passing over Ethernet. Concurrency and Computation: Practice and Experience, 2011, 23(2): 199-210.[42] Jin H, Yun Y, Jang H C. TCP/IP performance near I/O bus bandwidth on multi-core systems: 10-Gigabit Ethernet vs. multi-port Gigabit Ethernet. In Proc. the International Conference on Parallel Processing, September 2008, pp.87-94.[43] Narayanaswamy G, Balaji P, FengW. Impact of network sharing in multi-core architectures. In Proc. the 17th Int. Conf. Comp. Commun. Networks, Aug. 2008, pp.1-6. |
No related articles found! |
|
版权所有 © 《计算机科学技术学报》编辑部 本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn 总访问量: |