Advanced search
1 file | 4.36 MB Add to list

GeoRF : a geospatial random forest

Author
Organization
Abstract
The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and targets, they fall short when it comes to considering spatial factors. This limitation arises from their reliance on univariate, axis-parallel splits that result in rectangular areas on a map. To address this issue and enhance both performance and interpretability, we propose a solution that introduces two novel bivariate splits: an oblique and Gaussian split designed specifically for geographic coordinates. Our innovation, called Geospatial Random Forest (geoRF), builds upon Geospatial Regression Trees (GeoTrees) to effectively incorporate geographic features and extract maximum spatial insights. Through an extensive benchmark, we show that our geoRF model outperforms traditional spatial statistical models, other spatial RF variations, machine learning and deep learning methods across a range of geospatial tasks. Furthermore, we contextualize our method's computational time complexity relative to baseline approaches. Our prediction maps illustrate that geoRF produces more robust and intuitive decision boundaries compared to conventional tree-based models. Utilizing impurity-based feature importance measures, we validate geoRF's effectiveness in highlighting the significance of geographic coordinates, especially in data sets exhibiting pronounced spatial patterns.
Keywords
Random forest, Spatial data, Real estate, Explainability, PRICES

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 4.36 MB

Citation

Please use this url to cite or link to this publication:

MLA
Geerts, Margot, et al. “GeoRF : A Geospatial Random Forest.” DATA MINING AND KNOWLEDGE DISCOVERY, vol. 38, no. 6, 2024, pp. 3414–48, doi:10.1007/s10618-024-01046-7.
APA
Geerts, M., vanden Broucke, S., & De Weerdt, J. (2024). GeoRF : a geospatial random forest. DATA MINING AND KNOWLEDGE DISCOVERY, 38(6), 3414–3448. https://doi.org/10.1007/s10618-024-01046-7
Chicago author-date
Geerts, Margot, Seppe vanden Broucke, and Jochen De Weerdt. 2024. “GeoRF : A Geospatial Random Forest.” DATA MINING AND KNOWLEDGE DISCOVERY 38 (6): 3414–48. https://doi.org/10.1007/s10618-024-01046-7.
Chicago author-date (all authors)
Geerts, Margot, Seppe vanden Broucke, and Jochen De Weerdt. 2024. “GeoRF : A Geospatial Random Forest.” DATA MINING AND KNOWLEDGE DISCOVERY 38 (6): 3414–3448. doi:10.1007/s10618-024-01046-7.
Vancouver
1.
Geerts M, vanden Broucke S, De Weerdt J. GeoRF : a geospatial random forest. DATA MINING AND KNOWLEDGE DISCOVERY. 2024;38(6):3414–48.
IEEE
[1]
M. Geerts, S. vanden Broucke, and J. De Weerdt, “GeoRF : a geospatial random forest,” DATA MINING AND KNOWLEDGE DISCOVERY, vol. 38, no. 6, pp. 3414–3448, 2024.
@article{01J0T2EYVFYMZY6RPD3M57QRC9,
  abstract     = {{The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and targets, they fall short when it comes to considering spatial factors. This limitation arises from their reliance on univariate, axis-parallel splits that result in rectangular areas on a map. To address this issue and enhance both performance and interpretability, we propose a solution that introduces two novel bivariate splits: an oblique and Gaussian split designed specifically for geographic coordinates. Our innovation, called Geospatial Random Forest (geoRF), builds upon Geospatial Regression Trees (GeoTrees) to effectively incorporate geographic features and extract maximum spatial insights. Through an extensive benchmark, we show that our geoRF model outperforms traditional spatial statistical models, other spatial RF variations, machine learning and deep learning methods across a range of geospatial tasks. Furthermore, we contextualize our method's computational time complexity relative to baseline approaches. Our prediction maps illustrate that geoRF produces more robust and intuitive decision boundaries compared to conventional tree-based models. Utilizing impurity-based feature importance measures, we validate geoRF's effectiveness in highlighting the significance of geographic coordinates, especially in data sets exhibiting pronounced spatial patterns.}},
  author       = {{Geerts, Margot and vanden Broucke, Seppe and De Weerdt, Jochen}},
  issn         = {{1384-5810}},
  journal      = {{DATA MINING AND KNOWLEDGE DISCOVERY}},
  keywords     = {{Random forest,Spatial data,Real estate,Explainability,PRICES}},
  language     = {{eng}},
  number       = {{6}},
  pages        = {{3414--3448}},
  title        = {{GeoRF : a geospatial random forest}},
  url          = {{http://doi.org/10.1007/s10618-024-01046-7}},
  volume       = {{38}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: