In the realm of data science and statistical computing, the R language stands out as a powerful tool, particularly for machine learning applications. This article, which is the first installment of a blog post series dedicated to spatial machine learning using R, dives deep into the intricacies of three prominent machine learning frameworks: caret, tidymodels, and mlr3. These frameworks are designed to facilitate the development of predictive models, yet their approaches and functionalities offer unique advantages depending on the specific requirements of a project.

Spatial machine learning differs significantly from traditional machine learning due to the spatial nature of the data involved. In many cases, variables that are in proximity to one another tend to exhibit greater similarity than those that are located further apart. This characteristic underscores the importance of incorporating spatial attributes into the modeling process, ensuring that the models not only analyze the data effectively but also account for geographical relationships.

The primary objective of this blog post is to elucidate the workflows associated with these three frameworks through a practical example: predicting temperature in Spain based on a multitude of covariates. The datasets utilized in this analysis include temperature_train, which encompasses temperature readings from 195 different locations across Spain, and predictor_stack, which consists of various covariates intended for temperature prediction, including population density, distance to the coast, and elevation.

To get started, the necessary R libraries must be loaded, including terra and sf for spatial data handling. The datasets are then accessed online and processed to extract relevant covariate values at the locations specified in the training dataset:

library(terra)library(sf)train_points <- sf::read_sf("https:predictor_stack <- terra::rast("https:

Following this, we prepare our data for modeling by extracting the covariate values corresponding to the training points. The dataset temperature_train is now primed for modeling, containing both the temperature measurements and the associated covariate values.

The next step involves specifying the model, which varies according to the framework being used. Each frameworkcaret, tidymodels, and mlr3has its unique approach to model specification, including defining the model type, resampling methods, and hyperparameters. In this case, we utilize random forest models implemented in the ranger package with the following hyperparameters:

  • mtry: the number of variables randomly sampled as candidates at each split (set to 8)
  • splitrule: the rule for splitting, specified as extratrees
  • min.node.size: the minimum size of terminal nodes (set to 5)

Moreover, we implement a spatial cross-validation method comprising 5 folds to maintain the integrity of spatial data during model training and evaluation. The data is segmented into spatial blocks, wherein the model is trained on a specific set of blocks and evaluated on others. Each framework offers a different method for defining these resampling techniques.

For caret, hyperparameters are defined using the expand.grid() function, while the resampling method is set up through trainControl() along with the blockCV package to create the folds. Conversely, tidymodels requires the establishment of a modeling formula through the recipe() function, coupled with a model defined via the parsnip package. Creating a workflow in tidymodels combines the recipe and the model seamlessly. Lastly, mlr3 necessitates definitions for the task, learner, and resampling, all of which are woven together to facilitate the model training process.

Once the models are trained, performance evaluation becomes crucial. The evaluation metrics commonly employed in regression tasks include the root mean square error (RMSE) and the coefficient of determination (R). Each framework computes these metrics effectively, providing insights into model accuracy and reliability.

Following model evaluation, predictions are made to generate temperature maps for Spain, utilizing the respective predict functions from each framework. The final aspect of this analysis is determining the area of applicability (AoA), which helps assess the models applicability based on training data similarities. The AoA serves as a crucial diagnostic tool, indicating where predictions may be valid and where caution should be exercised.

In conclusion, this blog post provides a robust comparison of three powerful machine learning frameworks in Rcaret, tidymodels, and mlr3demonstrating their application in spatial machine learning. While all three frameworks possess overlapping functionalities, their distinct design philosophies and implementations cater to varying user needs. Throughout future blog posts, we will delve deeper into each framework, exploring additional functionalities, including feature engineering, variable selection, and hyperparameter tuning.

For further reading and insights on spatial machine learning, be sure to check back for the next installments in this series.

Reuse CC BY 4.0

Citation: Nowosad, Jakub. "Spatial Machine Learning with R: Caret, Tidymodels, and Mlr3." April 30, 2025.

Profile Image

Angela Thompson