Skip to main content
Log in

Predictive ability of covariate-dependent Markov models and classification tree for analyzing rainfall data in Bangladesh

  • Original Paper
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript

Abstract

This study attempts to make comparison between different parametric regressive models for the bivariate binary data with a machine learning technique. The data on sequential occurrence of rainfall in consecutive days is considered. The outcomes are classified as rainfall in both days, rainfall in one of the consecutive days, and no rainfall in both days. The occurrence of rainfall in consecutive days is analyzed by using statistical models with covariate dependence and classification tree for the period from 1980 to 2014. We have used relative humidity, minimum temperature, maximum temperature, sea level pressure, sunshine hour, and cloud cover in the model as covariates. The binary outcome variable is defined as the occurrence or non-occurrence of rainfall. Five regions of Bangladesh are considered in this study and one station from each region is selected on the basis of two criteria: (i) contains fewer missing values and (ii) representative of the regional characteristics geographically. Several measures are used to compare the models based on Markov chain and classification tree. It is found that for yearly data, both the Markov model and classification tree performed satisfactorily. However, the seasonal data show variation of rainfall. In some seasons, both models perform equally good such as monsoon, pre-monsoon, and post-monsoon, but in the winter season, the Markov model works poorly whereas classification tree fails to work. Additionally, we also observe that the Markov model performed consistently for each season and performs better compared with the classification tree. It has been demonstrated that the covariate-dependent Markov models can be used as classifiers alternative to the classification tree. It is revealed that the predictive ability of the covariate-dependent Markov model based on Markovian assumption performs either better or equally good compared with the classification tree. The joint models also consistently showed better predictive performance compared with regressive model for whole year data as well as for several seasonal data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abubakar UY, Lawal A, Muhammed A (2013) The use of Markov model in continuous time for prediction of rainfall for crop production. IOSR J Math 7(1):38–45. https://doi.org/10.9790/5728-0713845

    Article  Google Scholar 

  • Arminger G, Enache D, Bonne T (1997) Analyzing credit risk data: a comparison of logistic discrimination, classification tree analysis, and feed forward networks. Comput Stat 12(2):293–310

    Google Scholar 

  • Bahaga TK, Kucharski F, Mengistu Tsidu G, Yang H (2016) Assessment of prediction and predictability of short rains over equatorial East Africa using a multi-model ensemble. Theor Appl Climatol 123(3):637–649. https://doi.org/10.1007/s00704-014-1370-1

    Article  Google Scholar 

  • Bonney GE (1986) Regressive logistic models for familial disease and other binary traits. Biometrics 42(3):611–625

    Article  Google Scholar 

  • Bonney GE (1987) Logistic regression for dependent binary observations. Biometrics 43(4):951–973. https://doi.org/10.2307/2531548

    Article  Google Scholar 

  • Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. CRC Press, Boca Raton, Florida

  • Chaudhuri S, Goswami S, Das D, Middey A (2014) Meta-heuristic ant colony optimization technique to forecast the amount of summer monsoon rainfall: skill comparison with Markov chain model. Theor Appl Climatol 116(3):585–595. https://doi.org/10.1007/s00704-013-0977-y

    Article  Google Scholar 

  • Dahale SD, Panchawagh N, Singh SV, Ranatunge ER, Brikshavana M (1994) Persistence in rainfall occurrence over tropical South-East Asia and equatorial Pacific. Theor Appl Climatol 49(1):27–39. https://doi.org/10.1007/BF00866286

    Article  Google Scholar 

  • Deni SM, Jemain AA (2009) Fitting the distribution of dry and wet spells with alternative probability models. Meteorog Atmos Phys 104(1–2):13–27

    Article  Google Scholar 

  • Dodd LE, Pepe MS (2003) Partial AUC estimation and regression. Biometrics 59(3):614–623

    Article  Google Scholar 

  • Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain derived variables. J Veg Sci 9(5):733–748

    Article  Google Scholar 

  • Englehart PJ, Douglas AV (2009) Diagnosing warm-season rainfall variability in Mexico: a classification tree approach. Int J Climatol 30(5):694–704. https://doi.org/10.1002/joc.1934

    Article  Google Scholar 

  • Gerlitz L (2015) Using fuzzified regression trees for statistical downscaling and regionalization of near surface temperatures in complex terrain. Theor Appl Climatol 122(1):337–352. https://doi.org/10.1007/s00704-014-1285-x

    Article  Google Scholar 

  • Goyal MK (2014) Monthly rainfall prediction using wavelet regression and neural network: an analysis of 1901–2002 data, Assam, India. Theor Appl Climatol 118(1):25–34. https://doi.org/10.1007/s00704-013-1029-3

    Article  Google Scholar 

  • Guisan A, Theurillat J-P, Kienast F (1998) Predicting the potential distribution of plant species in an alpine environment. J Veg Sci 9(1):65–74

    Article  Google Scholar 

  • Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123

    Article  Google Scholar 

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36

    Article  Google Scholar 

  • Huang J, Lu J, Ling CX (2003) Comparing naive Bayes, decision trees, and SVM with AUC and accuracy. In Data Mining, 2003. ICDM 2003. Third IEEE international conference on, pages 553–556. IEEE

  • Islam M, Chowdhury R, Bae S, Singh K (2014) Assessing the association in repeated measures of depression. Adv Appl Statist 42(2):83

    Google Scholar 

  • Islam MA, Chowdhury RI (2006) A higher order Markov model for analyzing covariate dependence. Appl Math Model 30(6):477–488

    Article  Google Scholar 

  • Islam MA, Chowdhury RI (2007) First and higher order transition models with covariate dependence. In: F. Yang (ed) Progress in applied mathematical modeling. Nova Science, New York, pp 153–198

  • Islam MA, Chowdhury RI (2010) Prediction of disease status: a regressive model approach for repeated measures. Statist Methodol 7(5):520–540

    Article  Google Scholar 

  • Islam MA, Chowdhury RI (2017) Quasi-likelihood methods. In: In analysis of repeated measures data. Springer, pp 151–159. https://doi.org/10.1007/978-981-10-3794-8

    Book  Google Scholar 

  • Islam MA, Chowdhury RI, Huda S (2009) Markov models with covaraite dependence for repeated measures. Nova Science, New York

  • Islam MA, Chowdhury RI, Singh KP (2012) A Markov model for analyzing polytomous outcome data. Pak J Stat Oper Res 8(3):593–603

    Article  Google Scholar 

  • Ji F, Ekström M, Evans JP, Teng J (2014) Evaluating rainfall patterns using physics scheme ensembles from a regional atmospheric model. Theor Appl Climatol 115(1):297–304. https://doi.org/10.1007/s00704-013-0904-2

    Article  Google Scholar 

  • Jin L, Zhu J, Huang Y, Zhao H-s, Lin K-p, Jin J (2015) A nonlinear statistical ensemble model for short-range rainfall prediction. Theor Appl Climatol 119(3):791–807. https://doi.org/10.1007/s00704-014-1161-8

    Article  Google Scholar 

  • Lavanya D, Rani KU (2012) Ensemble decision tree classier for breast cancer data. Int J Inf Technol Convergence Serv 2(1):17–24

    Google Scholar 

  • Lawal A, Abubakar UY, Danladi H, Gana AS (2016) Prediction of annual rainfall pattern using hidden Markov model (HMM) in Jos, Plateau State, Nigeria. J Appl Sci Environ Manag 20(3):617–622–622. https://doi.org/10.4314/jasem.v20i3.16

    Article  Google Scholar 

  • Lee S, Cho S, Wong PM (1998) Rainfall prediction using artificial neural networks. J Geogr Inf Decis Anal 2(2):233–242

    Google Scholar 

  • Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W (2003) Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 26(3):172–181

    Article  Google Scholar 

  • Ling CX, Huang J, Zhang H (2003) AUC: a better measure than accuracy in comparing learning algorithms. In Conference of the Canadian Society for Computational Studies of Intelligence. Springer, pp. 329-341

  • Meko DM, Baisan CH (2001) Pilot study of latewood-width of conifers as an indicator of variability of summer rainfall in the North American monsoon region. Int J Climatol 21(6):697–708. https://doi.org/10.1002/joc.646

    Article  Google Scholar 

  • Moore WC, Meyers DA, Wenzel SE, Teague WG, Li H, Li X, D'Agostino Jr R, Castro M, Curran-Everett D, Fitzpatrick AM et al (2010) Identification of asthma phenotypes using cluster analysis in the severe asthma research program. Am J Respir Crit Care Med 181(4):315–323

    Article  Google Scholar 

  • Muenz LR, Rubinstein LV (1985) Markov models for covariate dependence of binary sequences. Bio-metrics 41:91–101

    Article  Google Scholar 

  • Nair A, Mohanty UC, Acharya N (2013) Monthly prediction of rainfall over India and its homogeneous zones during monsoon season: a supervised principal component regression approach on general circulation model products. Theor Appl Climatol 111(1):327–339. https://doi.org/10.1007/s00704-012-0660-8

    Article  Google Scholar 

  • Nourani V, Razzaghzadeh Z, Baghanam AH, Molajou A (2018) ANN-based statistical downscaling of climatic parameters using decision tree predictor screening method. Theor Appl Climatol. https://doi.org/10.1007/s00704-018-2686-z

    Article  Google Scholar 

  • Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model 133(3):225–245

    Article  Google Scholar 

  • Ochola WO, Kerkides P (2003) A Markov chain simulation model for predicting critical wet and dry spells in Kenya: analysing rainfall events in the Kano Plains. Irrig Drain 52(4):327–342. https://doi.org/10.1002/ird.94

    Article  Google Scholar 

  • Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput Geosci 51:350–365

    Article  Google Scholar 

  • Raftery A, Tavare S (1994) Estimation and modelling repeated patterns in high order Markov chains with the mixture transition distribution model. Appl Stat 43(1):179–199

    Article  Google Scholar 

  • Rao NJM, Biazi E (1983) Probability distribution models for daily rainfall data for an Interior Station of Brazil. Arch Meteorol Geophys Bioclimatol B 33(3):261–265. https://doi.org/10.1007/BF02275100

    Article  Google Scholar 

  • Rezac M, Rezac F (2011) How to measure the quality of credit scoring models. Finance a Uver 61(5):486

    Google Scholar 

  • Rudd M, GStat JM, Priestley JL (2017) A comparison of decision tree with logistic regression model for prediction of worst non-financial payment status in commercial credit. https://digitalcommons.kennesaw.edu/dataphdgreylit/5

  • Rudolfer SM, Paliouras G, Peers IS (1999) A comparison of logistic regression to decision tree induction in the diagnosis of carpal tunnel syndrome. Comput Biomed Res 32(5):391–414

    Article  Google Scholar 

  • Sahai A, Soman M, Satyan V (2000) All India summer monsoon rainfall prediction using an artificial neural network. Clim Dyn 16(4):291–302

    Article  Google Scholar 

  • Sinha NC, Ataharul Islam M, Ahamed KS (2011) Logistic regression models for higher order transition probabilities of Markov chain for analyzing the occurrences of daily rainfall data. J Mod Appl Stat Methods 10(1):337–348. https://doi.org/10.22237/jmasm/1304224200

    Article  Google Scholar 

  • Sole X, Guino E, Valls J, Iniesta R, Moreno V (2006) Snpstats: a web tool for the analysis of association studies. Bioinformatics 22(15):1928–1929

    Article  Google Scholar 

  • Solomatine DP, Dulal KN (2003) Model trees as an alternative to neural networks in rainfall runoff modelling. Hydrol Sci J 48(3):399–411

    Article  Google Scholar 

  • Sonnadara DUJ, Jayewardene DR (2015) A Markov chain probability model to describe wet and dry patterns of weather at Colombo. Theor Appl Climatol 119(1):333–340. https://doi.org/10.1007/s00704-014-1117-z

    Article  Google Scholar 

  • Steinberg D, Colla P (2009) CART: classification and regression trees. In: The Top Ten Algorithms in Data Mining, vol 9, p 179

    Chapter  Google Scholar 

  • Therneau T, Atkinson B, Ripley B (2015) rpart: recursive partitioning and regression trees. R package version 4.1–10

  • Therneau TM, Atkinson EJ et al (1997) An introduction to recursive partitioning using the RPART routines. Stats 116:1–52

  • Thuiller W, Araujo MB, Lavorel S (2003) Generalized models vs. classification tree analysis: predicting spatial distributions of plant species at different scales. J Veg Sci 14(5):669–680

    Article  Google Scholar 

  • Yusuf AU (2014) Markov chain model and its application to annual rainfall distribution for crop production. Am J Theor Appl Stat 3(2):39. https://doi.org/10.11648/j.ajtas.20140302.12

    Article  Google Scholar 

  • Zhu W, Zeng N, Wang N et al (2010) Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS implementations, vol 19. NESUG proceedings: Health Care and Life Sciences, Baltimore, p 67

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sultan Mahmud.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Estimates of first-order Markov model for whole year

Table 4 Estimates for the first-order Markov model for whole year

Appendix 2: Estimates of first-order Markov model for different seasons

Table 5 Estimates for the first-order Markov model for different seasons

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahmud, S., Islam, M.A. Predictive ability of covariate-dependent Markov models and classification tree for analyzing rainfall data in Bangladesh. Theor Appl Climatol 138, 335–346 (2019). https://doi.org/10.1007/s00704-019-02812-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00704-019-02812-0

Keywords

Navigation