A Stable and Interpretable NNLS Ensemble for House Price Prediction on the Ames Dataset
DOI:
https://doi.org/10.54097/xbqqjf85Keywords:
NNLS; House Price Prediction; Ames Dataset.Abstract
Predicting house prices has broad real-world impact across valuation, lending, and taxation. This paper proposes a principled stacking method for the Ames Housing dataset that blends eight strong regressors—linear, bagging, and boosting—via a simplex-constrained Non-Negative Least Squares (NNLS) optimizer. By enforcing non-negativity and a sum-to-one constraint, the meta-learner produces interpretable convex weights and mitigates collinearity among base models. A pragmatically tuned clipping policy is introduced to stabilize the conversion from log-space pre- dictions back to prices. In a 10-fold out-of-fold evaluation, the ensemble achieves competitive accuracy (R2 =0.886 in price space; R2 =0.906 in log space), closely tracking the best single model while remaining fully transparent. Beyond accuracy, the method offers operational simplicity: standard, stable hyperparameter ranges; minimal preprocessing; and negligible inference overhead for blending. Taken together, these properties yield a reproducible and reliable blueprint for tabular regression ensembles that balances performance, robustness, and interpretability, and can be adopted with minimal engineering effort in commercial settings.
Downloads
References
[1] Andrade-Girón D C, Marin-Rodriguez W J, Zuñiga-Rojas M G. Intelligent feature selection ensemble model for price prediction in real estate markets. Informatics, 2025, 12(2): 52. DOI: https://doi.org/10.3390/informatics12020052
[2] Zhan C, Liu Y, Wu Z, Zhao M, Chow T W S. A hybrid machine learning framework for forecasting house price. Expert Systems with Applications, 2023, 233: 120981. DOI: https://doi.org/10.1016/j.eswa.2023.120981
[3] Hussain M I, Munir A, Mamun M, Chowdhury S H, Uddin N, Hossain M M. A transparent house price prediction framework using ensemble learning, genetic algorithm-based tuning, and ANOVA-based feature analysis. FinTech, 2025, 4(3): 33. DOI: https://doi.org/10.3390/fintech4030033
[4] De Cock D. Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. Journal of Statistics Education, 2011, 19(3). DOI: https://doi.org/10.1080/10691898.2011.11889627
[5] Duchi J C, Shalev-Shwartz S, Singer Y, Chandra T. Efficient projections onto the ℓ1-ball for learning in high dimensions. In ICML, 2008, pages 272–279. DOI: https://doi.org/10.1145/1390156.1390191
[6] Lawson C L, Hanson R J. Solving Least Squares Problems. Philadelphia: SIAM, 1995. DOI: https://doi.org/10.1137/1.9781611971217
[7] Bro R, De Jong S. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 1997, 11(5): 393–401. DOI: https://doi.org/10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L
[8] Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In KDD, 2016, pages 785–794. DOI: https://doi.org/10.1145/2939672.2939785
[9] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T Y. LightGBM: A highly efficient gradient boosting decision tree. In NeurIPS, 2017.
[10] Prokhorenkova L, Gusev G, Vorobev A, Dorogush A V, Gulin A. CatBoost: Unbiased boosting with categorical features. In NeurIPS, 2018.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Business, Economics and Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







