Skip to main content
Log in

Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology

  • Published:
Journal of Agricultural, Biological, and Environmental Statistics Aims and scope Submit manuscript

Abstract

Understanding how species distributions respond as a function of environmental gradients is a key question in ecology, and will benefit from a multi-species approach. Multi-species data are often high dimensional, in that the number of species sampled is often large relative to the number of sites, and are commonly quantified as either presence–absence, counts of individuals, or biomass of each species. In this paper, we propose a novel approach to the analysis of multi-species data when the goal is to understand how each species responds to their environment. We use a finite mixture of regression models, grouping species into “Archetypes” according to their environmental response, thereby significantly reducing the dimension of the regression model. Previous research introduced such Species Archetype Models (SAMs), but only for binary assemblage data. Here, we extend this basic framework with three key innovations: (1) the method is expanded to handle count and biomass data, (2) we propose grouping on the slope coefficients only, whilst the intercept terms and nuisance parameters remain species-specific, and (3) we develop model diagnostic tools for SAMs. By grouping on environmental responses only, the model allows for inter-species variation in terms of overall prevalence and abundance. The application of our expanded SAM framework data is illustrated on marine survey data and through simulation.

Supplementary materials accompanying this paper appear on-line.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitkin, M., and Aitkin, I. (1996), “A Hybrid EM/Gauss–Newton Algorithm for Maximum Likelihood in Mixture Distributions,” Statistics and Computing, 6, 127–130.

    Article  Google Scholar 

  • Anderson, M. J., Crist, T. O., Chase, J. M., Vellend, M., Inouye, B. D., Freestone, A. L., Sanders, N. J., Cornell, H. V., Comita, L. S., Davies, K. F., Harrison, S. P., Kraft, N. J. B., Stegen, J. C., and Swenson, N. G. (2011), “Navigating the Multiple Meanings of β Diversity: A Roadmap for the Practicing Ecologist,” Ecology Letters, 14, 19–28.

    Article  Google Scholar 

  • Bax, N., and Williams, A. (2000), “Habitat and Fisheries Production in the South East Fishery Ecosystem,” Final Report to the Fisheries Research and Development Corporation, Project No. 94/040.

  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • Dunn, J. R., and Ridgway, K. R. (2002), “Mapping Ocean Properties in Regions of Complex Topography,” Deep-Sea Research. Part 1. Oceanographic Research Papers, 49, 591–604.

    Article  Google Scholar 

  • Dunn, P., and Smyth, G. (2005), “Series Evaluation of Tweedie Exponential Dispersion Model Densities,” Statistics and Computing, 15, 267–280.

    Article  MathSciNet  Google Scholar 

  • Dunn, P. K., and Smyth, G. K. (1996), “Randomized Quantile Residuals,” Journal of Computational and Graphical Statistics, 5, 236–244.

    Google Scholar 

  • Dunstan, P., Foster, S., and Darnell, R. (2011), “Model Based Grouping of Species Across Environmental Gradients,” Ecological Modelling, 222, 955–963.

    Article  Google Scholar 

  • Foster, S., and Bravington, M. (2013), “A Poisson-Gamma Model for Analysis of Ecological Non-negative Continuous Data,” Journal of Environmental and Ecological Statistics, in press.

  • Geoscience Australia (2009), “GA Australian Bathymetry and Topography Grid, ANZLIC Metadata ANZCW0703013116.” Tech. rep., Australian Government Geoscience Australia.

  • Gleason, H. A. (1926), “The Individualistic Concept of the Plant Association,” Bulletin of the Torrey Botanical Club, 53, 7–26.

    Article  Google Scholar 

  • Hilbe, J. M. (2007), Negative Binomial Regression, Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Hui, F. K. C., Warton, D. J., Foster, S., and Dunstan, P. (2013), “To Mix or Not to Mix: Comparing the Predictive Performance of Mixture Models Versus Separate SDMs,” Ecology, in press.

  • Ives, A. R., and Helmus, M. R. (2011), “Generalized Linear Mixed Models for Phylogenetic Analyses of Community Structure,” Ecological Monographs, 81, 511–525.

    Article  Google Scholar 

  • Jørgenson, B. (1997), The Theory of Dispersion Models, London: Chapman and Hall.

    Google Scholar 

  • Keribin, C. (2000), “Consistent Estimation of the Order of Mixture Models,” Sankhya. The Indian Journal of Statistics, 62, 49–66.

    MathSciNet  MATH  Google Scholar 

  • Khalili, A., and Chen, J. (2007), “Variable Selection in Finite Mixture of Regression Models,” Journal of the American Statistical Association, 102, 1025–1038.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, J., Ban, J., and Santiago, L. (2011), “Nonparametric Tests for Homogeneity of Species Assemblages: A Data Depth Approach,” Biometrics, 67, 1481–1488.

    Article  MathSciNet  MATH  Google Scholar 

  • McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models (2nd ed.), London: Chapman & Hall.

    Book  MATH  Google Scholar 

  • McLachlan, G., and Peel, D. (2000), Finite Mixture Models, New York: Wiley.

    Book  MATH  Google Scholar 

  • Nash, S. G., and Sofer, A. (1996), Linear and Nonlinear Programming (1st ed.), McGraw-Hill Series in Industrial Engineering and Management Science, New York: McGraw-Hill Inc.

    Google Scholar 

  • Novotny, V., Miller, S., Hulcr, J., Drew, R., Basset, Y., Janda, M., Setliff, G., Darrow, K., Stewart, A., Auga, J., Isua, B., Molem, K., Manumbor, M., Tamtiai, E., Mogia, M., and Weiblen, G. (2007), “Low Beta Diversity of Herbivorous Insects in Tropical Forests,” Nature, 448, 692–695.

    Article  Google Scholar 

  • Oehlert, G. (1992), “A Note on the Delta Method,” American Statistician, 46, 27–29.

    MathSciNet  Google Scholar 

  • Ovaskainen, O., Hottola, J., and Siitonen, J. (2010), “Modeling Species Co-occurrence by Multivariate Logistic Regression Generates New Hypotheses on Fungal Interactions,” Ecology, 91, 2514–2521.

    Article  Google Scholar 

  • Ovaskainen, O., and Soininen, J. (2011), “Making More Out of Sparse Data: Hierarchical Modeling of Species Communities,” Ecology, 92, 289–295.

    Article  Google Scholar 

  • Peel, D., Bravington, M. V., Kelly, N., Wood, S. N., and Knuckey, I. (2013), “A Model-Based Approach to Designing a Fishery Independent Survey,” Journal of Agricultural, Biological and Environmental Statistics, 18, 1–21.

    Article  MathSciNet  Google Scholar 

  • Ricklefs, R. E. (2008), “Disintegration of the Ecological Community,” The American Naturalist, 172, 741–750.

    Article  Google Scholar 

  • Ridgway, K. R., Dunn, J. R., and Wilkin, J. L. (2002), “Ocean Interpolation by Four-Dimensional Weighted Least Squares—Application to the Waters Around Australia,” Journal of Atmospheric and Oceanic Technology, 19, 1357–1375.

    Article  Google Scholar 

  • Ross, L., Woodin, S., Hester, A., Thompson, D., and Birks, H. (2012), “Biotic Homogenization of Upland Vegetation: Patterns and Drivers at Multiple Spatial Scales Over Five Decades,” Journal of Vegetation Science.

  • Taylor, L. (1961), “Aggregation, Variance and the Mean,” Nature, 189, 732–735.

    Article  Google Scholar 

  • Thibault, K., Supp, S., Giffin, M., White, E., and Ernest, S. (2011), “Species Composition and Abundance of Mammalian Communities,” Ecology, 92, 2316.

    Article  Google Scholar 

  • Venables, W. N., and Ripley, B. D. (1999), Modern Applied Statistics With S (4th ed.), New York: Springer.

    Book  MATH  Google Scholar 

  • Warton, D. I. (2011), “Regularized Sandwich Estimators for Analysis of High Dimensional Data Using Generalized Estimating Equations,” Biometrics, 67, 116–123.

    Article  MathSciNet  MATH  Google Scholar 

  • Warton, D. I., Wright, S. T., and Wang, Y. (2012), “Distance-Based Multivariate Analyses Confound Location and Dispersion Effects,” Methods in Ecology and Evolution, 3, 89–101.

    Article  Google Scholar 

  • Wedel, M., and DeSarbo, W. (1995), “A Mixture Likelihood Approach for Generalized Linear Models,” Journal of Classification, 12, 21–55.

    Article  MATH  Google Scholar 

  • Weisberg, S. (2005), Applied Linear Regression (3rd ed.), Hoboken: Wiley.

    Book  MATH  Google Scholar 

  • Yee, T. W. (2010), “The VGAM Package for Categorical Data Analysis,” Journal of Statistical Software, 32, 1–34.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David I. Warton.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 3.9 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dunstan, P.K., Foster, S.D., Hui, F.K.C. et al. Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology. JABES 18, 357–375 (2013). https://doi.org/10.1007/s13253-013-0146-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-013-0146-x

Key Words

Navigation