Special Issue: Software Systems

• Articles • Previous Articles     Next Articles

Improving Software Quality Prediction by Noise Filtering Techniques

Taghi M. Khoshgoftaar and Pierre Rebours   

  1. Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, U.S.A.
  • Received:2006-03-15 Revised:2007-03-06 Online:2007-05-10 Published:2007-05-10

Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset.

Key words: mean value calculus; real-time systems; decidability;



[1] Taghi M Khoshgoftaar, Shi Zhong, Vedang Joshi. Noise elimination with ensemble-classifier filtering for software quality estimation. -\it Intelligent Data Analysis}, 2005, 9(1): 3$\sim$27. %Elsevier Science

[2] Witten I H, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, 2005. %isbn 1-55860-552-5

[3] Khoshgoftaar T M, Seliya N. Analogy-based practical classification rules for software quality estimation. -\it Empirical Software Engineering Journal}, December 2003, 8(4): 325$\sim$350.

[4] Khoshgoftaar T M, Allen E B. Logistic regression modeling of software quality. -\it International Journal of Reliability, Quality, and Safety Engineering}, 1999, 6(4): 303$\sim$317.

[5] Zhu X, Wu X, Chen Q. Eliminating class noise in large datasets. In -\it Proc. the 20th Int. Conf. Machine Learning}, Washington DC, August 2003, pp.920$\sim$927.

[6] Owen D B. Data Quality Control: Theory and Pragmatics. New York: Marcel Dekker, NY, 1990.

[7] Wang R Y, Storey V C, Firth C P. A framework for analysis of data quality research. -\it IEEE Trans. Knowledge and Data Engineering}, August 1995, 7(4): 623$\sim$639.

[8] Teng C M. A comparison of noise handling techniques. In -\it Proc. the Int. Florida Artificial Intelligence Research Symposium}, 2001, pp.269$\sim$273.

[9] Gamberger D, Lavra-\v c} N, D\u-z}eroski S. Noise elimination in inductive concept learning: A case study in medical diagnosis. In -\it Algorithmic Learning Theory: Proc. the 7th Int. Workshop}, Sydney, Australia, -\it LNCS} 1160, Springer-Verlag, October, 1996, pp.199$\sim$212. %citeseer.nj.nec.com/article/gamberger96noise.html

[10]Teng C M. Evaluating noise correction. In -\it Lecture Notes in Artificial Intelligence: Proc. the 6th Pacific Rim Int. Conf. Artificial Intelligence}, Melbourne, Australia, Springer-Verlag, 2000, pp.188$\sim$198.

[11] Brodley C E, Friedl M A. Identifying mislabeled training data. -\it Journal of Artificial Intelligence Research}, 1999, 11: 131$\sim$167.

[12] Rebours P. Partitioning filter approach to noise elimination: An empirical study in software quality classification
[Thesis]. Florida Atlantic University, Boca Raton, FL, April 2004, Advised by Khoshgoftaar T M.

[13] Khoshgoftarr T M, Allen E B. A practical classification rule for software quality models. -\it IEEE Trans. Reliability}, June 2000, 49(2): 209$\sim$216.

[14] Jain R. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley \& Sons, 1991. % isbn = -0-471-50336-3}

[15] Berenson M L, Levine D M, Goldstein M. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs: Prentice Hall, NJ, 1983.

[16] Christensen R. Analysis of Variance, Design and Regression. Applied Statistical Methods. 1st Edition, Chapman \& Hall, 1996. % isbn = -0-412-06291-7}

[17] Fenton N E, Pfleeger S L. Software Metrics: A Rigorous and Practical Approach. 2nd Edition, Boston: PWS Publishing, MA, 1997.

[18] Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, CA, 1993.

[19] Holte R C. Very simple classification rules perform well on most commonly used datasets. -\it Machine Learning}, 1993, 11: 63$\sim$91.

[20] Atkeson C G, Moore A W, Schaal S. Locally weighted learning. -\it Artificial Intelligence Review}, 1997, 11(1/5): 11$\sim$73.

[21] Cohen W W. Fast effective rule induction. In -\it Proc. the 12th Int. Conf. Machine Learning}, Priedities A, Russell S (eds.), Tahoe City: Morgan Kaufmann, CA, July 1995, pp.115$\sim$123.

[22] Kolodner J. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1993.
[1] LI Xiaoshan;. Decidability of Mean Value Calculus [J]. , 1999, 14(2): 173-180.
[2] Jiamg Xiong;. Some Undecidable Problems on Approximability of NP Optimization Problems [J]. , 1996, 11(2): 126-132.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Liu Mingye; Hong Enyu;. Some Covering Problems and Their Solutions in Automatic Logic Synthesis Systems[J]. , 1986, 1(2): 83 -92 .
[2] Chen Shihua;. On the Structure of (Weak) Inverses of an (Weakly) Invertible Finite Automaton[J]. , 1986, 1(3): 92 -100 .
[3] Gao Qingshi; Zhang Xiang; Yang Shufan; Chen Shuqing;. Vector Computer 757[J]. , 1986, 1(3): 1 -14 .
[4] Jin Lan; Yang Yuanyuan;. A Modified Version of Chordal Ring[J]. , 1986, 1(3): 15 -32 .
[5] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[6] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[7] Min Yinghua; Han Zhide;. A Built-in Test Pattern Generator[J]. , 1986, 1(4): 62 -74 .
[8] Tang Tonggao; Zhao Zhaokeng;. Stack Method in Program Semantics[J]. , 1987, 2(1): 51 -63 .
[9] Min Yinghua;. Easy Test Generation PLAs[J]. , 1987, 2(1): 72 -80 .
[10] Zhu Hong;. Some Mathematical Properties of the Functional Programming Language FP[J]. , 1987, 2(3): 202 -216 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved