A Stable and Interpretable NNLS Ensemble for House Price Prediction on the Ames Dataset

Zeyu Xiao

doi:10.54097/xbqqjf85

Authors

Zeyu Xiao City University of Macau, Macau, China

DOI:

https://doi.org/10.54097/xbqqjf85

Keywords:

NNLS; House Price Prediction; Ames Dataset.

Abstract

Predicting house prices has broad real-world impact across valuation, lending, and taxation. This paper proposes a principled stacking method for the Ames Housing dataset that blends eight strong regressors—linear, bagging, and boosting—via a simplex-constrained Non-Negative Least Squares (NNLS) optimizer. By enforcing non-negativity and a sum-to-one constraint, the meta-learner produces interpretable convex weights and mitigates collinearity among base models. A pragmatically tuned clipping policy is introduced to stabilize the conversion from log-space pre- dictions back to prices. In a 10-fold out-of-fold evaluation, the ensemble achieves competitive accuracy (R2 =0.886 in price space; R2 =0.906 in log space), closely tracking the best single model while remaining fully transparent. Beyond accuracy, the method offers operational simplicity: standard, stable hyperparameter ranges; minimal preprocessing; and negligible inference overhead for blending. Taken together, these properties yield a reproducible and reliable blueprint for tabular regression ensembles that balances performance, robustness, and interpretability, and can be adopted with minimal engineering effort in commercial settings.

Downloads

Download data is not yet available.

References

[1] Andrade-Girón D C, Marin-Rodriguez W J, Zuñiga-Rojas M G. Intelligent feature selection ensemble model for price prediction in real estate markets. Informatics, 2025, 12(2): 52. DOI: https://doi.org/10.3390/informatics12020052

[2] Zhan C, Liu Y, Wu Z, Zhao M, Chow T W S. A hybrid machine learning framework for forecasting house price. Expert Systems with Applications, 2023, 233: 120981. DOI: https://doi.org/10.1016/j.eswa.2023.120981

[3] Hussain M I, Munir A, Mamun M, Chowdhury S H, Uddin N, Hossain M M. A transparent house price prediction framework using ensemble learning, genetic algorithm-based tuning, and ANOVA-based feature analysis. FinTech, 2025, 4(3): 33. DOI: https://doi.org/10.3390/fintech4030033

[4] De Cock D. Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. Journal of Statistics Education, 2011, 19(3). DOI: https://doi.org/10.1080/10691898.2011.11889627

[5] Duchi J C, Shalev-Shwartz S, Singer Y, Chandra T. Efficient projections onto the ℓ1-ball for learning in high dimensions. In ICML, 2008, pages 272–279. DOI: https://doi.org/10.1145/1390156.1390191

[6] Lawson C L, Hanson R J. Solving Least Squares Problems. Philadelphia: SIAM, 1995. DOI: https://doi.org/10.1137/1.9781611971217

[7] Bro R, De Jong S. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 1997, 11(5): 393–401. DOI: https://doi.org/10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L

[8] Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In KDD, 2016, pages 785–794. DOI: https://doi.org/10.1145/2939672.2939785

[9] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T Y. LightGBM: A highly efficient gradient boosting decision tree. In NeurIPS, 2017.

[10] Prokhorenkova L, Gusev G, Vorobev A, Dorogush A V, Gulin A. CatBoost: Unbiased boosting with categorical features. In NeurIPS, 2018.

A Stable and Interpretable NNLS Ensemble for House Price Prediction on the Ames Dataset

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Indexing

Latest publications

Information