TY - GEN
T1 - Comparative Evaluation of Machine Learning Models in Forecasting Crop Yields Amid Climate Change
AU - Aboulhosn, Sally
AU - Akkawi, Mariam
AU - Kadry, Seifedine
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Climate change increasingly threatens global agriculture through rising carbon dioxide (CO₂) emissions, temperature anomalies, and irregular rainfall. Accurate crop yield prediction is therefore essential for ensuring food security and effective adaptation planning. This study systematically compares three machine learning models—Multiple Linear Regression (MLR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—for predicting crop yields using an extensive, multi-country dataset with climate and soil variables. We introduce a robust preprocessing pipeline that includes Gaussian noise-based augmentation, anomaly-based feature engineering, and dual normalization strategies to improve model generalisability under climate stress. Performance is assessed across different training sizes (70/30 and 80/20 train-test splits) and hyperparameter configurations. XGBoost consistently outperforms the other models, achieving the lowest MSE (0.3841) and the highest R2 (0.6186) thanks to its ability to model nonlinear climate-yield interactions effectively. Key insights include (1) aridity index and temperature anomalies as dominant predictors, (2) water management and crop rotation as effective adaptation strategies, and (3) preprocessing as crucial for model robustness. This work presents a scalable and interpretable framework for applying machine learning to climate-resilient agriculture.
AB - Climate change increasingly threatens global agriculture through rising carbon dioxide (CO₂) emissions, temperature anomalies, and irregular rainfall. Accurate crop yield prediction is therefore essential for ensuring food security and effective adaptation planning. This study systematically compares three machine learning models—Multiple Linear Regression (MLR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—for predicting crop yields using an extensive, multi-country dataset with climate and soil variables. We introduce a robust preprocessing pipeline that includes Gaussian noise-based augmentation, anomaly-based feature engineering, and dual normalization strategies to improve model generalisability under climate stress. Performance is assessed across different training sizes (70/30 and 80/20 train-test splits) and hyperparameter configurations. XGBoost consistently outperforms the other models, achieving the lowest MSE (0.3841) and the highest R2 (0.6186) thanks to its ability to model nonlinear climate-yield interactions effectively. Key insights include (1) aridity index and temperature anomalies as dominant predictors, (2) water management and crop rotation as effective adaptation strategies, and (3) preprocessing as crucial for model robustness. This work presents a scalable and interpretable framework for applying machine learning to climate-resilient agriculture.
KW - Climate Change
KW - Crop Yield Prediction
KW - Machine Learning
KW - Multiple Linear Regression
KW - Random Forest
KW - XGBoost
UR - https://www.scopus.com/pages/publications/105023312537
U2 - 10.1007/978-3-032-07735-6_18
DO - 10.1007/978-3-032-07735-6_18
M3 - Conference contribution
AN - SCOPUS:105023312537
SN - 9783032077349
T3 - Lecture Notes in Networks and Systems
SP - 207
EP - 217
BT - Data Science and Network Engineering - Proceedings of ICDSNE 2025
A2 - Namasudra, Suyel
A2 - Kar, Nirmalya
A2 - Kumar Patra, Sarat
A2 - Kim, Byung-Gyu
PB - Springer Science and Business Media Deutschland GmbH
T2 - 3rd International Conference on Data Science and Network Engineering, ICDSNE 2025
Y2 - 18 July 2025 through 19 July 2025
ER -