Journal of Computer Science and Technology ›› 2019, Vol. 34 ›› Issue (3): 657-669.doi: 10.1007/s11390-019-1934-8

Special Issue: Artificial Intelligence and Pattern Recognition

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

BHONEM: Binary High-Order Network Embedding Methods for Networked-Guarantee Loans

Da-Wei Cheng1, Member, CCF, Yi Tu1, Zhen-Wei Ma2, Zhi-Bin Niu3, Member, CCF, Li-Qing Zhang1,*, Member, ACM, IEEE   

  1. 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2 School of Mathematical Science, Shanghai Jiao Tong University, Shanghai 200240, China;
    3 School of Computer Software, Tianjin University, Tianjin 300354, China
  • Received:2018-05-28 Revised:2019-03-17 Online:2019-05-05 Published:2019-05-06
  • Contact: Li-Qing Zhang E-mail:zhang-lq@cs.sjtu.edu.cn
  • About author:Da-Wei Cheng is a Ph.D. candidate in Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai. He received his Bachelor's degree in industrial design from Nanjing University of Aernautics and Astronautics, Nanjing, in 2008. His research fields include machine learning, financial data mining, knowledge discovery and time series pattern recognition.
  • Supported by:
    The work was supported by the National Basic Research 973 Program of China under Grant No. 2015CB856004 and the Key Basic Research Program of Shanghai Science and Technology Commission of China under Grant Nos. 15JC1400103 and 16JC1402800.

Networked-guarantee loans may cause systemic risk related concern for the government and banks in China. The prediction of the default of enterprise loans is a typical machine learning based classification problem, and the networked guarantee makes this problem very difficult to solve. As we know, a complex network is usually stored and represented by an adjacency matrix. It is a high-dimensional and sparse matrix, whereas machine-learning methods usually need lowdimensional dense feature representations. Therefore, in this paper, we propose a binary higher-order network embedding method to learn the low-dimensional representations of a guarantee network. We first set vertices of this heterogeneous economic network by binary roles (guarantor and guarantee), and then define high-order adjacent measures based on their roles and economic domain knowledge. Afterwards, we design a penalty parameter in the objective function to balance the importance of network structure and adjacency. We optimize it by negative sampling based gradient descent algorithms, which solve the limitation of stochastic gradient descent on weighted edges without compromising efficiency. Finally, we test our proposed method on three real-world network datasets. The result shows that this method outperforms other start-of-the-art algorithms for both classification accuracy and robustness, especially in a guarantee network.

Key words: networked-guarantee loan; high-order network embedding; representative learning; gradient descent;

[1] Khandani A E, Kim A J, Lo A W. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 2010, 34(11):2767-2787.
[2] Baesens B, Setiono R, Mues C, Vanthienen J. Using neural network rule extraction and decision tables for credit-risk evaluation. Management Science, 2003, 49(3):255-350.
[3] Hand D J, Henley W E. Statistical classification methods in consumer credit scoring:A review. Journal of the Royal Statistical Society:Series A (Statistics in Society), 1997, 160(3):523-541.
[4] Ruzzier M, Hisrich R D, Antoncic B. SME internationalization research:Past, present, and future. Journal of Small Business and Enterprise Development, 2006, 13(4):476- 497.
[5] DeYoung R, Gron A, Torna G, Winton A. Risk overhang and loan portfolio decisions:Small business loan supply before and during the financial crisis. The Journal of Finance, 2015, 70(6):2451-2488.
[6] Niu Z, Cheng D, Zhang L, Zhang J. Visual analytics for networked-guarantee loans risk management. In Proc. the 2018 IEEE Pacific Visualization Symposium, April 2018, pp.160-169.
[7] Wu D D, Chen S H, Olson D L. Business intelligence in risk management:Some recent progresses. Information Sciences, 2014, 256:1-7.
[8] Peng C Y J, Lee K L, Ingersoll G M. An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 2002, 96(1):3-14.
[9] Safavian S R, Landgrebe D. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 1991, 21(3):660-674.
[10] Cheong S, Oh S H, Lee S Y. Support vector machines with binary tree architecture for multi-class classification. Neural Information Processing - Letters and Reviews, 2004, 2(3):47-51.
[11] Prairie J R, Rajagopalan B, Fulp T J, Zagona E A. Modified k-NN model for stochastic stream-flow simulation. Journal of Hydrologic Engineering, 2006, 11(4):371-378.
[12] Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 1987, 2(1/2/3):37-52.
[13] Anderberg M R. Cluster Analysis for Applications. Academic Press, 1973.
[14] Kao L J, Chiu C C, Chiu F Y. A Bayesian latent variable model with classification and regression tree approach for behavior and credit scoring. Knowledge-Based Systems, 2012. 36:245-252.
[15] Levitsky J. Credit guarantee schemes for SMEs - An international review. Small Enterprise Development, 1997, 8(2):4-17.
[16] Perozzi B, Al-Rfou R, Skiena S. Deepwalk:Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014, pp.701-710.
[17] Keogh E, Mueen A. Curse of dimensionality. In Encyclopedia of Machine Learning and Data Mining (2nd edition), Sammut C, Webb G I (eds.), Springer, 2017, pp.314-315.
[18] Yang C, Sun M, Liu Z, Tu C. Fast network embedding enhancement via high order proximity approximation. In Proc. the 26th International Joint Conference on Artificial Intelligence, August 2017, pp.3894-3900.
[19] Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J. Network embedding as matrix factorization:Unifying DeepWalk, LINE, PTE, and node2vec. In Proc. the 11th ACM International Conference on Web Search and Data Mining, February 2018, pp.459-467.
[20] Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning entity and relation embeddings for knowledge graph completion. In Proc. the 29th AAAI Conference on Artificial Intelligence, January 2015, pp.2181-2187.
[21] Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. arXiv:1711.08752, 2017. https://arxiv.org/abs/1711.08752, March 2019.
[22] Grover A, Leskovec J. node2vec:Scalable feature learning for networks. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, pp.855-864.
[23] Wang D, Cui P, Zhu W. Structural deep network embedding. In Proc. the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2016, pp.1225-1234.
[24] Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. LINE:Large-scale information network embedding. In Proc. the 24th International Conference on World Wide Web, May 2015, pp.1067-1077.
[25] Tang J, Qu M, Mei Q. PTE:Predictive text embedding through large-scale heterogeneous text networks. In Proc. the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2015, pp.1165-1174.
[26] Schuler D A, Cording M. A corporate social performance - Corporate financial performance behavioral model for consumers. Academy of Management Review, 2006, 31(3):540-558.
[27] Tu C, Liu H, Liu Z, Sun M. CANE:Context-aware network embedding for relation modeling. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1722-1731.
[28] Camerer C F, Fehr E. When does "economic man" dominate social behavior? Science, 2006, 311(5757):47-52.
[29] Wang X, Cui P, Wang J, Pei J, Zhu W, Yang S. Community preserving network embedding. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.203-209.
[30] Cao S, Lu W, Xu Q. GraRep:Learning graph representations with global structural information. In Proc. the 24th ACM International Conference on Information and Knowledge Management, October 2015, pp.891-900.
[31] Kompass R. A generalized divergence measure for nonnegative matrix factorization. Neural Computation, 2007, 19(3):780-791.
[32] Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2013, pp.3111-3119.
[33] Recht B, Re C, Wright S, Niu F. Hogwild:A lock-free approach to parallelizing stochastic gradient descent. In Proc. the 25th Annual Conference on Neural Information Processing Systems, December 2011, pp.693-701.
[34] Li A Q, Ahmed A, Ravi S, Smola A J. Reducing the sampling complexity of topic models. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014, pp.891-900.
[35] Barabási A L. Scale-free networks:A decade and beyond. Science, 2009, 325(5939):412-413.
[36] Barabási A L, Bonabeau E. Scale-free networks. Scientific American, 2003, 288(5):60-69.
[37] Ting I. Social Network Mining, Analysis, and Research Trends:Techniques and Applications Hershey:IGI Global, 2011.
[38] Satish N, Sundaram N, Patwary M M A, Seo J, Park J, Hassaan M A, Sengupta S, Yin Z, Dubey P. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proc. the 2014 ACM SIGMOD International Conference on Management of Data, June 2014, pp.979- 990.
[39] Tang L, Liu H. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 2011, 23(3):447-478.
[40] Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola A J. Distributed large-scale natural graph factorization. In Proc. the 22nd International Conference on World Wide Web, May 2013, pp.37-48.
[41] Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. the 20th International Symposium on Computer and Information Sciences, October 2005, pp.284-293.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[5] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[6] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[7] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[8] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[9] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .
[10] Li Minghui;. CAD System of Microprogrammed Digital Systems[J]. , 1987, 2(3): 226 -235 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved