Optimized binning technique in decision tree model for predicting the Helicoverpa armigera (Hübner) incidence on cotton


  • ICAR-National Bureau of Agricultural Insect Resources, Bengaluru, Karnataka, 560024, India
  • Jain University, Department of Computer Science, Bengaluru, Karnataka, 560011, India
  • University of Agricultural Sciences, Agricultural Research Station, Raichur, Karnataka, 584102, India


The data mining technique decision tree induction model is a popular method used for prediction and classification problems. The most suitable model in pest forewarning systems is decision tree analysis since pest surveillance data contains biotic, abiotic and environmental variables and IF-THEN rules can be easily framed. The abiotic factors like maximum and minimum temperature, rainfall, relative humidity, etc. are continuous numerical data and are important in climate-change studies. The decision tree model is implemented after pre-processing the data which are suitable for analysis. Data discretization is a pre-processing technique which is used to transform the continuous numerical data into categorical data resulting in interval as nominal values. The most commonly used binning methods are equal-width partitioning and equal-depth partitioning. The total number of bins created for the variable is important because either large number of bins or small number of bins affects the accuracy in results of IF-THEN rules. Hence, optimized binning technique based on Mean Integrated Squared Error (MISE) method is proposed for forming accurate IF-THEN rules in predicting the pest Helicoverpa armigera incidence on cotton crop based on decision tree analysis.


Bin optimization, decision tree, discretization, Helicoverpa armigera, IF-THEN rules, pest prediction

Subject Discipline

Data mining techniques in agriculture

Full Text:


Dhaliwal GS, Arora R. 1996. Integrated pest management: Achievements and Challenges, pp. 308–355. In: Dhaliwal GS, Arora R. (Eds). Principles of Insect Pest Management, NATIC, India.

George HJ, Ron K, Karl P. 1994. Irrelevant features and the subset selection problem. In: William W Cohen and Haym Hirsh (Eds.) Machine Learning: Proceedings of the Eleventh International Conference. 121-129, Morgan Kaufmann Publishers, San Francisco, CA.

Gupta GK. 2006. Classification. In: Introduction to Data Mining with Case Studies, Prentice-Hall of India, 106– 136. https://doi.org/10.1016/B978-044451636-7/50013-9

Leonardo T, Miriam EP. 2002. The distribution and movement of cotton bollworm, Helicoverpa armigera Hübner (Lepidoptera: Noctuidae) larvae on cotton. Philippine J Sci, 131: 91–98.

Pratheepa M, Meena K, Subramaniam KR, Venugopalan R, Bheemanna H. 2011. A decision tree analysis for predicting the occurrence of the pest, Helicoverpa armigera and its natural enemies on cotton based on economic threshold level. Curr Sci. 100(2): 238–246.

Shimazaki H, Shinomoto S. 2007. A method of selecting the binsize of a Time Histogram. Neural Comput.19(6): 1503–1527.

SPSS V 17.0. 2008. Statistical Package for Social Sciences. SPSS Inc. Illinois, Chicago,USA.

Sotiris K, Dimitris K. 2006. Discretization techniques: A recent survey. GESTS International Trans Comput. Sci Engineering. 32(1): 47–58.

Zhao H, Ram S. 2004. Constrained cascade generalization of decision trees. IEEE Trans Knowledge Data Engineering. 16(6): 727–739. Available from: https://dl.acm.org/citation.cfm?id=1437601 https://doi.org/10.1109/TKDE.2004.3


  • There are currently no refbacks.