Linearly Combining Density Estimators via Stacking

Smyth, Padhraic; Wolpert, David

doi:10.1023/A:1007511322260

Linearly Combining Density Estimators via Stacking

Published: July 1999

Volume 36, pages 59–83, (1999)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Linearly Combining Density Estimators via Stacking

Download PDF

Padhraic Smyth^1,2 &
David Wolpert³

2051 Accesses
99 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents experimental results with both real and artificial data combining unsupervised learning algorithms using stacking. Specifically, stacking is used to form a linear combination of finite mixture model and kernel density estimators for non-parametric multivariate density estimation. The method outperforms other strategies such as choosing the single best model based on cross-validation, combining with uniform weights, and even using the single best model chosen by “Cheating” and examining the test set. We also investigate (1) how the utility of stacking changes when one of the models being combined is the model that generated the data, (2) how the stacking coefficients of the models compare to the relative frequencies with which cross-validation chooses among the models, (3) visualization of combined “effective” kernels, and (4) the sensitivity of stacking to overfitting as model complexity increases.

References

Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.
Google Scholar
Breiman, L. (1996a). Bagging predictors. Machine Learning, 26(2), 123–140.
Google Scholar
Breiman, L. (1996b). Stacked regressions. Machine Learning, 24, 49–64.
Google Scholar
Breiman, L. (1998). Arcing classifiers. Annals of Statistics, 26(3), 824–832.
Google Scholar
Buntine, W. (1991). Bayesian back-propagation. Complex Systems, 5, 603–643.
Google Scholar
Chan, P.K., & Stolfo, S.J. (1996). Sharing learned models among remote database partitions by local meta-learning. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 2–7). Menlo Park, CA: AAAI Press.
Google Scholar
Draper, D. (1995). Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society B, 57, 45–97.
Google Scholar
Escobar, M.D., & West, M. (1995). Bayesian density estimation and inference with mixtures. JASA, 90, 577–588.
Google Scholar
Hall, P. (1987). On Kullback-Leibler loss and density estimation. Ann. Stat., 15, 1491.
Google Scholar
Hansen, L.K., & Salamon, P. (1990). Neural network ensembles. IEEE Trans. Patt. Anal. Mach. Int., 12, 993–1001.
Google Scholar
Jones, M.C., Marron, J.S., & Sheather, S.J. (1996). A brief survey of bandwidth selection schemes for density estimation. J. Am. Stat. Assoc., 91(433), 401–407.
Google Scholar
Jordan, M.I., & Jacobs, R.A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.
Google Scholar
Kim, K., & Bartlett, E.B. (1995). Error estimation by series association for neural network systems. Neural Computation, 7, 799.
Google Scholar
Leblanc, M., & Tibshirani, R.J. (1996). Combining estimates in regression and classification, J. Amer. Statist. Assoc., 91(436), 1641–1650.
Google Scholar
Lehmann, E.L. (1986). Testing statistical hypotheses (2nd ed.). New York: John Wiley and Sons.
Google Scholar
Macready, W., & Wolpert, D.H. (1996). Combining stacking with bagging to improve a learning algorithm, submitted.
Madigan, D., & Raftery, A.E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. J. Am. Stat. Assoc., 89, 1535–1546.
Google Scholar
Marchette, D.J., Priebe, C.E., Rogers, G.W., & Solka, J.L. (1996). Filtered kernel density estimation. Comp. Stats., 11(2), 95–112.
Google Scholar
Merz, C.J., & Pazzani, M.J. (1997). Combining neural network regression estimates with regularized linear weights. In M. Mozer, M.I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems 9 (pp. 564–570).
Neal, R.M. (1993). Bayesian learning via stochastic dynamics. In S.J. Hanson et al. (Eds.), Advances in neural information processing systems 5. Morgan Kauffman.
Perrone, M. (1993). Improving regression estimation. Ph.D. thesis, Brown University, Department of Physics.
Ormeneit, D., & Tresp, V. (1996). Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. Advances in neural information processing 8 (pp. 542–548). MIT Press.
Google Scholar
Rao J.S., & Tibshirani, R. (1996). The out-of-bootstrap method for model averaging and selection. University of Toronto, Statistics Department, preprint.
Ripley, B.D. (1994). Neural networks and related methods for classification (with discussion). J. Roy. Stat. Soc. B, 56, 409–456.
Google Scholar
Roeder, K. (1990). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Am. Stat. Assoc., 85(411), 617–624.
Google Scholar
Scott, D.W. (1992). Multivariate density estimation: Theory, practice, and visualization. New York: John Wiley.
Google Scholar
Silverman, B.W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Google Scholar
Smyth, P. (1996). Clustering using Monte-Carlo cross-validation. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 126–133). Menlo Park, CA: AAAI Press.
Google Scholar
Smyth, P., & Wolpert, D. (1998). An evaluation of linearly combining density estimators. ICS TR–98–25. Information and Computer Science, University of California at Irvine.
Google Scholar
Snedecor, G.W., & Cochran, W.G. (1989). Statistical methods (8th ed.). Ames, IO: Iowa State University Press.
Google Scholar
Titterington, D.M., Smith, A.F.M., & Makov, U.E. (1985). Statistical analysis of finite mixture distributions. Chichester, UK: John Wiley and Sons.
Google Scholar
Wand, M.P., & Jones, M.C. (1995). Kernel smoothing. London: Chapman and Hall.
Google Scholar
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Google Scholar
Wolpert, D.H. (1993). Combining generalizers by using partitions of the learning set. In L. Nadel et al. (Eds.), 1992 lectures in complex systems. Addison-Wesley.
Wolpert, D.H. (1994). Bayesian backpropagation over i-o functions rather than weights. In J. Cowan et al. (Eds.), Advances in neural information processing systems 6. Morgan Kauffman.
Wolpert, D.H. (1995). On the Bayesian ‘Occam's factors’ argument for Occam's razor. In T. Petsche et al. (Eds.), Computational learning theory and natural learning systems III.
Wolpert, D.H. (1996a). The existence of a priori distinctions between learning algorithms. Neural Computation, 8, 1391–1420.
Google Scholar
Wolpert, D.H. (1996b). Reconciling Bayesian and non-Bayesian analysis. In G.R. Heidbreder (Eds.), Maximum entropy and Bayesian methods. Kluwer Academic Publishers.
Wolpert, D.H., & Wolf, D.R. (1995). Estimating functions of probability distributions from a finite set of samples. Phys. Rev. E, 52, 6841.
Google Scholar

Download references

Author information

Authors and Affiliations

Information and Computer Science, University of California, Irvine, CA, 92697-3425
Padhraic Smyth
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109
Padhraic Smyth
Caelum Research, NASA Ames Research Center, MS 269-2, Mountain View, CA, 94035
David Wolpert

Authors

Padhraic Smyth
View author publications
You can also search for this author in PubMed Google Scholar
David Wolpert
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smyth, P., Wolpert, D. Linearly Combining Density Estimators via Stacking. Machine Learning 36, 59–83 (1999). https://doi.org/10.1023/A:1007511322260

Download citation

Issue Date: July 1999
DOI: https://doi.org/10.1023/A:1007511322260

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Linearly Combining Density Estimators via Stacking

Abstract

Article PDF

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Siamese Neural Networks: An Overview

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Linearly Combining Density Estimators via Stacking

Abstract

Article PDF

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Siamese Neural Networks: An Overview

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation