[1] Balaji P, Feng W, Panda D K. Bridging the EthernetEthernot performance gap. IEEE Micro, 2006, 26(3): 24-40.[2] Bhoedjang R, Rúhl T, Bal H E. User-level network interface protocols. IEEE Computer, 1998, 31(11): 53-60.[3] Gilfeather P, Maccabe A. Modeling protocol offload for message-oriented communication. In Proc. the 2005 IEEE Int. Conf. Cluster Computing, Sept. 2005, pp.1-10.[4] Regnier G, Makineni S, Illikkal R et al. TCP onloading for data center servers. IEEE Computer, 2004, 37(11): 48-58.[5] Shivam P, Chase J. On the elusive benefits of protocol offload.In Proc. 2003 SIGCOMM NICELI, August 2003, pp.179-184.[6] Westrelin R, Fugier N, Nordmark E et al. Studying network protocol offload with emulation: Approach and preliminary results. In Proc. the 12th IEEE Symp. HOTI, Aug. 2004, pp.84-90.[7] Nahum E, Yates D, Kurose J, Towsley, D. Performance issues in parallelized network protocols. In Proc. the 1st USENIX OSDI, November 1994, Article No.10.[8] Willmann P, Rixner S, Cox A. An evaluation of network stack parallelization strategies in modern operating systems. In Proc. the USENIX Technical Conf., May 2006, pp.91-96.[9] Mogul J C. TCP offload is a dumb idea whose time has come. In Proc. the 9th HotOS, May 2003, pp.25-30.[10] Apte V, Hansen T, Reeser P. Performance comparison of dynamic web platforms. Computer Communications, 2003, 26(8): 888-898.[11] Lauritzen K, Sawicki T, Stachura T, Wilson C. Intelr I/O acceleration technology improves network performance, reliability and efficiency. Technology@ Intel Magazine, May 2005, pp.3-11.[12] Foong A, Fung J, Newell D et al. Architectural characterization of processor affinity in network processing. In Proc. the IEEE ISPASS, March 2005, pp.207-218.[13] Jang H, Jin H W. MiAMI: Multi-core aware processor affinity for TCP/IP over multiple network interfaces. In Proc. the 17th Symp. HOTI, Aug. 2009, pp.73-82.[14] Wu W, DeMar P, Crawford M. A transport-friendly NIC for multicore/multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(4): 607-615.[15] Kim H, Pai V, Rixner S. Exploiting task-level concurrency in a programmable network interface. In Proc. the 9th ACM SIGPLAN PPoPP, June 2003, pp.61-72.[16] Kumar A, Huggahalli R. Impact of cache coherence protocols on the processing of network traffic. In Proc. the 40th IEEE/ACM MICRO, Dec. 2007, pp.161-171.[17] Magnusson P, Christensson M, Eskilson J et al. Simics: A full system simulation platform. IEEE Computer, 2002, 35(2): 50-58.[18] Willmann P, Shafer J, Carr D et al. Concurrent direct network access for virtual machine monitors. In Proc. the 13th HPCA, Feb. 2007, pp.306-317.[19] Benvenuti C. Understanding Linux Network Internals (1st edition). OReilly Media Inc., 2005.[20] Love R. Linux Kernel Development (2nd edition). Sams Publishing, 2005.[21] Ortiz A, Ortega J, D′?az A, Prieto, A. Network interfaces for programmable NICs and multicore platforms. Computer Networks , 2010, 54(3): 357-376.[22] Clark D, Jacobson V, Romkey J et al. An analysis of TCP processing overhead. IEEE Communications Magazine, 1989, 27(6): 23-29.[23] GadelRab S. 10-Gigabit Ethernet connectivity for computer servers. IEEE Micro, 2007, 27(3): 94-105.[24] Nahum E, Yates D, Kurose J et al. Cache behaviour of network protocols. In Proc. the ACM SIGMETRICS, June 1997, pp.169-180.[25] Tu T, Hsueh C. Unified UDispatch: A user dispatching tool for multicore systems. Journal of Computer Science and Technology, 2011, 26(3): 375-391.[26] Liao G, Zhu X, Bhuyan L. A new server I/O architecture for high speed networks. In Proc. the 17th HPCA, Feb. 2011, pp.255-265.[27] Ortiz A, Ortega J, D′?az A et al. Protocol offload analysis by simulation. J. Systems Architecture, 2009, 55(1): 25-42.[28] Ortiz A, Ortega, J, Diaz, A, Prieto, A. Protocol offload evaluation using Simics. In Proc. the 2006 IEEE International Conference on Cluster Computing, September 2006, pp.1-9.[29] Pacifici G, Segmuller W, Spreitzer M, Tantawi, A. CPU demand for web serving: Measurement analysis and dynamic estimation. Performance Evaluation, 2008, 65(6/7): 531-553.[30] Yeager N, McGrath R. Web Server Technology: The Advanced Guide for World Wide Web Information Providers. San Francisco CA: Morgan-Kaufmann, Inc., 1996.[31] MPICH2: A high performance and widely portable implementation of the message passing interface (MPI) standard. http://www.mpich.org/, October 2012.[32] Kim H, Rixner S. TCP offload through connection handoff. In Proc. the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, October 2006, pp.279-290.[33] Kim H, Rixner S, Pai V. Network interface data caching. IEEE Transactions on Computers, 2005, 54(11): 1394-1408.[34] Vaidyanathan K, Panda D K. Benefits of I/O acceleration technology (I/OAT) in clusters. In Proc. the 2007 IEEE ISPASS, April 2007, pp.220-229.[35] Shalev L, Marhervaks V, Machulsky Z et al. Loosely coupled TCP acceleration architecture. In Proc. the 14th HOTI, Aug. 2006, pp.3-8.[36] Narayanaswamy G, Balaji P, Feng W. An analysis of 10Gigabit Ethernet protocol staks in multicore environments. In Proc. the 15th HOTI, August 2007, pp.109-116.[37] de Bruijn W, Bos H. Model-T: Rethinking the OS for terabits speeds. In Proc. the INFOCOM Workshop on High-Speed Networks, April 2008, pp.1-6.[38] Wun B, Crowley P. Network I/O acceleration in heterogeneous multicore processors. In Proc. the 14th HOTI, August 2006, pp.9-14.[39] Brecht T, Janakiraman G, Lynn B et al. Evaluating network processing efficiency with processor partitioning and asynchronous I/O. In Proc. the 1st ACM SIGOPS/EuroSys European Conf. Computer Systems, Apr. 2006, pp.265-278.[40] Foong A, Fung J, Newell D. An in-depth analysis of the impact of processor affinity on network performance. In Proc. the 12th IEEE Int. Conf. Networks, Mar. 2004, pp.244-250.[41] Goglin B. NIC-assisted cache-efficient receive stack for message passing over Ethernet. Concurrency and Computation: Practice and Experience, 2011, 23(2): 199-210.[42] Jin H, Yun Y, Jang H C. TCP/IP performance near I/O bus bandwidth on multi-core systems: 10-Gigabit Ethernet vs. multi-port Gigabit Ethernet. In Proc. the International Conference on Parallel Processing, September 2008, pp.87-94.[43] Narayanaswamy G, Balaji P, FengW. Impact of network sharing in multi-core architectures. In Proc. the 17th Int. Conf. Comp. Commun. Networks, Aug. 2008, pp.1-6. |