Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (2): 388-402.doi: 10.1007/s11390-019-1915-y

Special Issue: Computer Networks and Distributed Computing

• Computer Networks and Distributed Computing • Previous Articles     Next Articles

A New Approach to Multivariate Network Traffic Analysis

Jinoh Kim1,2, Member, ACM, IEEE, Alex Sim2, Senior Member, IEEE, Member, ACM   

  1. 1 Department of Computer Science, Texas A&M University, Commerce 75428, U.S.A.;
    2 Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, U.S.A.
  • Received:2017-10-13 Revised:2018-07-21 Online:2019-03-05 Published:2019-03-16
  • About author:Jinoh Kim received his Ph.D. degree in computer science from University of Minnesota, Twin Cities. He is currently an assistant professor of the Department of Computer Science at Texas A&M University, Commerce. His research interests span from systems to networks, including large-scale distributed systems, big-data computing, network security and network traffic analysis. Prior to that, he was a researcher at the Lawrence Berkeley National Laboratory in 2010-2011 and an assistant professor of computer science at Lock Haven University of Pennsylvania in 2011-2012. From 1991 to 2005, he was a researcher and a senior researcher at ETRI (a national lab in Korea) participating in various research projects in system/network management and security.
  • Supported by:
    This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and by the Office of Workforce Development for Teachers and Scientists (WDTS), Office of Science, of the U. S. Department of Energy, under the Visiting Faculty Program (VFP).

Network traffic analysis is one of the core functions in network monitoring for effective network operations and management. While online traffic analysis has been widely studied, it is still intensively challenging due to several reasons. One of the primary challenges is the heavy volume of traffic to analyze within a finite amount of time due to the increasing network bandwidth. Another important challenge for effective traffic analysis is to support multivariate functions of traffic variables to help administrators identify unexpected network events intuitively. To this end, we propose a new approach with the multivariate analysis that offers a high-level summary of the online network traffic. With this approach, the current state of the network will display patterns compiled from a set of traffic variables, and the detection problems in network monitoring (e.g., change detection and anomaly detection) can be reduced to a pattern identification and classification problem. In this paper, we introduce our preliminary work with clustered patterns for online, multivariate network traffic analysis with the challenges and limitations we observed. We then present a grid-based model that is designed to overcome the limitations of the clustered pattern-based technique. We will discuss the potential of the new model with respect to the technical challenges including streaming-based computation and robustness to outliers.

Key words: network traffic analysis; multivariate analysis; time-series similarity; network monitoring;

[1] Liu D P, Zhao Y J, Xu H W Sun Y Q, Pei D, Luo J, Jing X W, Feng M. Opprentice: Towards practical and automatic anomaly detection through machine learning. In Proc. the 2015 ACM Internet Measurement Conference, October 2015, pp.211-224.
[2] Krishnamurthy B, Sen S Zhang Y, Chen Y. Sketch-based change detection: Methods, evaluation, and applications. In Proc. the 3rd ACM SIGCOMM Conference on Internet Measurement, October 2003, pp.234-247.
[3] Choi J, Hu K J, Sim A. Relational dynamic Bayesian networks with locally exchangeable measures. Technical Report LBNL-6341E, Lawrence Berkeley National Laboratory, 2013. https://www.osti.gov/servlets/purl/1165582,November2018.
[4] Yu M L, Jose L, Miao R. Software defined traffic measurement with OpenSketch. In Proc. the 10th USENIX Conference on Networked Systems Design and Implementation, April 2013, pp.29-42.
[5] Cho K, Fukuda K, Esaki H, Kato A. Observing slow crustal movement in residential user traffic. In Proc. the 2008 ACM Conference on Emerging Network Experiment and Technology, December 2008, Article No. 12.
[6] Schweller R, Gupta A, Parsons E, Chen Y. Reversible sketches for efficient and accurate change detection over network data streams. In Proc. the 4th ACM SIGCOMM Conference on Internet Measurement, Oct. 2004, pp.207- 212.
[7] Liu Z X, Manousis A, Vorsanger G, Sekar V, Braverman V. One sketch to rule them all: Rethinking network flow monitoring with UnivMon. In Proc. the 2016 ACM SIGCOMM Conference, August 2016, pp.101-114.
[8] Kim J, Sim A. A new approach to online, multivariate network traffic analysis. In Proc. the 26th International Conference on Computer Communications and Networks, July 2017.
[9] Manku G S, Motwani R. Approximate frequency counts over data streams. In Proc. the 28th International Conference on Very Large Data Bases, August 2002, pp.346-357.
[10] Das S, Antony S, Agrawal D, Abbadi A E. CoTS: A scalable framework for parallelizing frequency counting over data streams. In Proc. the 25th IEEE International Conference on Data Engineering, March 2009, pp.1323-1326.
[11] Das S, Antony S, Agrawal D, Abbadi A E. Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proceedings of the VLDB Endowment, 2009, 2(1): 217-228.
[12] Guha S, Koudas N, Shim K. Data-streams and histograms. In Proc. the 33rd Annual ACM Symposium on Theory of Computing, July 2001, pp.471-475.
[13] Aggarwal C, Han J, Wang J, Yu P. A framework for clustering evolving data streams. In Proc. the 29th International Conference on Very Large Data Bases, September 2003, pp.81-92.
[14] Domingos P, Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In Proc. the 8th International Conference on Machine Learning, June 2001, pp.106-113.
[15] Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In Proc. the 41st Annual Symposium on Foundations of Computer Science, November 2000, pp.356- 366.
[16] Guha S, Meyerson A, Mishra N, Motwani R, O'Callaghan L. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 515-528.
[17] Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644.
[18] Matias Y, Vitter J S, Wang M. Wavelet-based histograms for selectivity estimation. In Proc. the 1998 ACM SIGMOD International Conference on Management of Data, June 1998, pp.448-459.
[19] Vitter J S, Wang M. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, June 1999, pp.193-204.
[20] Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Locally adaptive dimensionality reduction for indexing large time series databases. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, May 2001, pp.151-162.
[21] Papadimitriou S, Sun J, Faloutsos C. Dimensionality reduction and forecasting on streams. In Data Streams, Models and Algorithms, Aggarwal C C (ed.), Springer, 2007, pp.261-288.
[22] Lee S, Kim H, Barman D, Lee S, Kim C K, Kwon T, Choi Y. NeTraMark: A network traffic classification benchmark. SIGCOMM Comput. Commun. Rev., 2011, 41(1): 22-30.
[23] Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev., 2005, 35(4): 229-240.
[24] Iliofotou M, Pappu P, Faloutsos M, Mitzenmacher M, Singh S, Varghese G. Network monitoring using traffic dispersion graphs. In Proc. the 7th ACM SIGCOMM Conference on Internet Measurement, October 2007, pp.315-320.
[25] Kim J, Sim A, Suh S, Kim I. An approach to online network monitoring using clustered patterns. In Proc. the 2007 International Conference on Computing, Networking and Communication, January 2017, pp.656-661.
[26] Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S. Scalable k-means++. Proceedings of the VLDB Endowment, 2012, 5(7): 622-633.
[27] Mills-Tettey A, Stentz A, Dias S B. The dynamic Hungarian algorithm for the assignment problem with changing costs. Technical Report, Carnegie Mellon University, 2007. https://www.ri.cmu.edu/pubfiles/pub4/millstetteygayorkor20073/millstetteygayorkor20073.pdf, November 2018.
[28] Dusi M, Este A, Gringoli F, Salgarelli L. Using GMM and SVM-based techniques for the classification of SSHencrypted traffic. In Proc. IEEE International Conference on Communications, June 2009.
[29] Rgringoli F, Salgarelli L, Dusa M, Cascarano N, Risso F, Claffy K. GT: Picking up the truth from the ground for internet traffic. ACM SIGCOMM Computer Communication Review, 2009 39(5): 13-18.
[30] Fontugne R, Borgnat P, Abry P, Fukuda K. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proc. the 2010 ACM Conference on Emerging Networking Experiments and Technology, November 2010, Article No. 8.
[31] Estan C, Keys K, Moore D, Varghese G. Building a better NetFlow. In Proc. the 2004 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 2004, pp.245-256.
[32] Wang M, Li B C, Li Z P. sFlow: Towards resource-efficient and agile service federation in service overlay networks. In Proc. the 24th International Conference on Distributed Computing Systems, March 2004, pp.628-635.
[33] Schikuta E. Grid-clustering: A fast hierarchical clustering method for very large data sets. Technical Report, Rice University, 1993. https://www.researchgate.net/publication/210242098_Grid_Clustering_An_efficient_hierarchical_C_lustering_method_for_very_large_data_sets,November2018.
[34] Kim J, Yoo W, Sim A, Suh S, Kim I. A lightweight network anomaly detection technique. In Proc. the International Workshop on Computing, Networking and Communications, January 2017, pp.896-900.
[35] Tavallaee M, Bagheri E, Lu W, Ghorbani A A. A detailed analysis of the KDD CUP 99 data set. In Proc. the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, July 2009, Article No. 38.
[36] Glazer A, Lindenbaum M, Markovitch S. q-OCSVM: A q-quantile estimator for high-dimensional distributions. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2013, pp.503-511.
[37] Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph. 2015, 34(4): Article No. 66.
[38] Seguy V, Cuturi M. Principal geodesic analysis for probability measures under the optimal transport metric. In Proc. the 2015 Annual Conference on Neural Information Processing Systems, December 2015, pp.3312-3320.
[39] Mellia M, Cigno R L, Neri F. Measuring IP and TCP behavior on edge nodes with Tstat. Comput. Netw., 2005, 47(1): 1-21.
[1] Sheng-Li Pan, Zhi-Yong Zhang, Ying-Jie Zhou, Feng Qian, Guang-Min Hu. Identify Congested Links Based on Enlarged State Space [J]. , 2016, 31(2): 350-358.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Lu Qi; Zhang Fubo; Qian Jiahua;. Program Slicing:Its Improved Algorithm and Application in Verification[J]. , 1988, 3(1): 29 -39 .
[2] Jin Zhiquan; Liu Chengfei; Sun Zhongxiu; Zhou Xiaofang; Chen Peipei; Gu Jianming;. Design and Implementation of a Heterogeneous Distributed Database System[J]. , 1990, 5(4): 363 -373 .
[3] Zhang Xing er; Zhu Xiaojun; Li Jianxin; Dong Jianning;. Source-to-Source Conversion Based on Formal Definition[J]. , 1991, 6(2): 178 -184 .
[4] Zhang Bo; Zhang Ling;. A Relation Matrix Approach to Labelling Temporal Relations in Scheduling[J]. , 1991, 6(4): 339 -346 .
[5] Shen Yidong;. Form alizing Incomplete Knowledge in Incomplete Databases[J]. , 1992, 7(4): 295 -304 .
[6] Wei Guoqing; Ma Songde;. 3D Motion Estimation and Motion Fusion by Affine Region Matching[J]. , 1993, 8(1): 17 -25 .
[7] Li Renwei; He Pei; Zhang Wenhui;. An Introduction to IN CAPS System[J]. , 1993, 8(1): 26 -37 .
[8] Han Qilong; Lu Ruzhan; Sun Yongqiang;. An Improved Bottom-up Method for Implementing Equational Programming Language[J]. , 1994, 9(1): 63 -69 .
[9] Matthew Hennessy;. Process Calculifor Describing Distributed Systems[J]. , 1998, 13(6): 490 .
[10] Hu zhanyi; YANG Changjiang; YANG Yi; MA Songde;. An Inherent Probabilistic Aspect of the Hough Transform[J]. , 1999, 14(1): 44 -48 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved