›› 2010, Vol. 25 ›› Issue (3): 458-468.

• Special Section on Trends Changing Data Management • Previous Articles     Next Articles

The Inverse Classification Problem

Charu C. Aggarwal1, Member, ACM, Fellow, IEEE, Chen Chen2 (陈辰), and Jiawei Han3 (韩家炜), Fellow, ACM, IEEE   

  1. 1IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.
    2University of Illinois at Urbana-Champaign, Urbana, Illinois, U.S.A.
  • Received:2009-11-03 Revised:2009-12-09 Online:2010-05-05 Published:2010-05-05
  • About author:
    Charu C. Aggarwal is a research staff member at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. degree from IIT Kanpur in 1993 and his Ph.D. degree from MIT in 1996. He has since worked in the field of performance analysis, databases, and data mining. He has published over 125 papers in refe-reed conferences and journals, and has been granted over 45 patents. Because of the commercial value of the above-mentioned patents, he has thrice been designated a Master Inventor at IBM. He is a recipient of the IBM Outstanding Innovation Award (2008) for his scientific contributions to privacy technology, and a recipient of an IBM Research Division Award (2008) for his scientific contributions to data stream research. He has served on the program and executive committees of most major database/data mining conferences. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering Journal from 2004 to 2008. He is an action editor of the Data Mining and Knowledge Discovery Journal, an associate editor of the ACM SIGKDD Explorations, and an associate editor of the Knowledge and Information Systems Journal. He is a fellow of the IEEE, and a life-member of the ACM.
    Chen Chen is a Ph.D. candidate in the Department of Computer Science, University of Illinois at Urbana-Champaign. He received the B.E. degree in computer science and technology from University of Science and Technology of China in 2003, and the M.S. degree in computer science from University of Illinois at Urbana-Champaign in 2006, respectively. He has been working in the area of data mining in general, and his current research is focused on modeling, managing and analyzing large-scale graph and information network data, with applications from chemical and bio- informatics, social networks, the Web and computer systems.
    Jiawei Han received his Ph.D. degree from the University of Wisconsin in computer science in 1985. He is currently a professor, at the Department of Computer Science in the University of Illinois at Urbana-Champaign. He has been working on research into data mining, data warehousing, database systems, data mining from spatiotemporal data, multimedia data, stream and RFID data, Web data, social network data, and biological data, with over 300 journal and conference publications. He has chaired or served on over 100 program committees of international conferences and workshops, including PC co-chair of 2005 (IEEE) International Conference on Data Mining (ICDM), Americas Coordinator of 2006 International Conference on Very Large Data Bases (VLDB). He is also serving as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data. He is an ACM and IEEE fellow and has received 2004 ACM SIGKDD Innovations Award and 2005 IEEE Computer Society Technical Achievement Award. His book ``Data Mining: Concepts and Techniques'' (2nd ed., Morgan Kaufmann, 2006) has been popularly used as a textbook worldwide.

In this paper, we examine an emerging variation of the classification problem, which is known as the inverse classification problem. In this problem, we determine the features to be used to create a record which will result in a desired class label. Such an approach is useful in applications in which it is an objective to determine a set of actions to be taken in order to guide the data mining application towards a desired solution. This system can be used for a variety of decision support applications which have pre-determined task criteria. We will show that the inverse classification problem is a powerful and general model which encompasses a number of different criteria. We propose a number of algorithms for the inverse classification problem, which use an inverted list representation for intermediate data structure representation and classification. We validate our approach over a number of real datasets.



[1] Aggarwal C, Han J, Wang J, Yu P. A framework for ondemand classification of evolving data streams. May 2006, 18(5): 577-589.



[2] Alsabti K, Ranka S, Singh V. CLOUDS: A decision tree classifier for large datasets. In Proc. KDD, New York, USA, Aug. 27-31, 1998, pp.2-8.



[3] Breiman L, Friedman J, Olshen R A, Stone C J. Classification and Regression Trees. Chapman & Hall, 1984.



[4] Brodley C E, Utgoff P E. Multivariate decision trees. Machine Learning, 1995, 19(1): 45-77.



[5] Breslow L, Aha D. Simplifying decision trees. Knowledge Engineering Review, 1997, 12(1): 1-40.



[6] Duda R, Hart P, Stork D. Pattern Classification. 2nd Edition, New York: John Wiley and Sons Inc., 2001.



[7] Friedman J H. A recursive partitioning decision rule for nonparametric classifiers. IEEE Transactions on Computers, 1977, 26(4): 404-408.



[8] Gehrke J, Ganti V, Ramakrishnan R, Loh W Y. BOAT: Optimistic decision tree construction. In Proc. ACM SIGMOD Int. Conf. Management of Data, Philadelphia, USA, May 31-June 3, 1999, pp.169-180.



[9] James M. Classification Algorithms. Wiley, 1985.



[10] Quinlan J R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.



[11] Smyth P, Gray A, Fayyad U M. Retrofitting decision tree classifiers using kernel density estimation. In Proc. the International Conference on Machine Learning, Taheo City, USA, July 9-12, 1995, pp.506-514.



[12] Achtert E, Kriegel H P, KrÄoger P, Renz M, ZÄufle A. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proc. EDBT, Saint Petersburg, Russia, Mar. 24-26, 2009, pp.886-897.



[13] Tao Y, Yiu M L, Mamoulis N. Reverse nearest neighbor search in metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9): 1239-1252.



[14] Kaelbling L, Littman M, Moore A. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 1996, 4: 237-285.



[15] Sutton R, Barto A. Re-Inforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1988.



[16] Hettich S, Blake C, Merz C. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, 1998, http://archive.ics.uci.edu/ml.



[17] Witten I, Frank E. Data Mining: Practical Machine Learning Tools with Java Implementations. San Francisco: Morgan Kaufmann Publishers, CA, 2000, http://www.cs.waikato.ac.nz/»ml/weka/book.html.



[18] Kohavi R. The power of decision tables. In Proc. European Conference on Machine Learning, Crete, Greece, Apr. 25-27, 1995, pp.174-189.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Wang Nengbin; Liu Haiqing;. An Intelligent Tool to Support Requirements Analysis and Conceptual Design of Database Design[J]. , 1991, 6(2): 153 -160 .
[2] Lin Shan;. Using a Student Model to Improve Explanation in an ITS[J]. , 1992, 7(1): 92 -96 .
[3] Wang Hui; Liu Dayou; Wang Yafei;. Sequential Back-Propagation[J]. , 1994, 9(3): 252 -260 .
[4] Yu Huiqun; Song Guoxin; Sun Yongqiang;. Completeness of the Accumulation Calculus[J]. , 1998, 13(1): 25 -31 .
[5] HE Taosong;. Volumetric Virtual Environments[J]. , 2000, 15(1): 37 -46 .
[6] LIAO Husheng;. An Action Analysis for Combining Partial Evaluation[J]. , 2000, 15(2): 196 -201 .
[7] DU Lin; SUN Yufang;. A New Indexing Method Based on Word Proximity for Chinese Text Retrieval[J]. , 2000, 15(3): 280 -286 .
[8] NIE Xumin; GUO Qing;. Renaming a Set of Non-Horn Clauses[J]. , 2000, 15(5): 409 -415 .
[9] ZHAO YiXin (赵邑新), YIN Xia (尹 霞) and WU JianPing (吴建平). Problems in the Information Dissemination of the Internet Routing[J]. , 2003, 18(2): 0 .
[10] Heng-Chang Liu and Bao-Hua Zhao. A Near-Optimal Optimization Algorithm for Link Assignment in Wireless Ad-Hoc Networks[J]. , 2006, 21(1): 89 -94 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved