Bootstrap resampling is a powerful technique used in statistics and data analysis to estimate the uncertainty of a statistic by repeatedly sampling from the original data. In R, we can easily implement a bootstrap function using the lapply, rep, and sample functions. In this blog post, we will explore how to write a bootstrap function in R and provide an example using the “mpg” column from the popular “mtcars” dataset.
To create a bootstrap function in R, we can follow these steps:
Let’s begin by loading the “mtcars” dataset, which is included in the base R package:
data(mtcars)
We’ll define a function called bootstrap() that takes two arguments: data (the input data vector) and n (the number of bootstrap iterations).
bootstrap function(data, n) resampled_data lapply(1:n, function(i) resample sample(data, replace = TRUE) # Perform desired operations on the resampled data, e.g., compute a statistic # and return the result >) return(resampled_data) > bootstrapped_samples bootstrap(mtcars$mpg, 5) bootstrapped_samples
[[1]] [1] 21.0 18.1 33.9 21.4 17.3 19.2 19.2 15.8 16.4 30.4 18.1 14.3 32.4 10.4 15.0 [16] 16.4 30.4 17.8 21.4 19.2 17.3 22.8 14.3 22.8 30.4 18.7 13.3 13.3 15.2 10.4 [31] 15.0 13.3 [[2]] [1] 18.7 32.4 21.0 10.4 15.0 14.7 24.4 10.4 32.4 10.4 21.0 19.7 21.4 10.4 30.4 [16] 17.3 10.4 22.8 15.2 15.2 21.4 15.8 21.4 33.9 24.4 15.2 18.1 19.2 21.0 24.4 [31] 15.5 21.0 [[3]] [1] 15.5 30.4 21.0 22.8 27.3 18.1 21.0 13.3 15.2 17.3 15.8 21.0 18.1 14.3 17.8 [16] 15.8 21.0 18.1 19.2 24.4 19.2 22.8 18.7 14.3 26.0 21.4 22.8 32.4 14.7 15.2 [31] 15.2 14.3 [[4]] [1] 13.3 21.0 13.3 15.0 19.2 18.1 18.1 19.2 22.8 18.7 26.0 21.4 14.7 14.3 17.8 [16] 22.8 19.7 21.4 30.4 30.4 18.7 17.3 16.4 21.5 18.1 21.0 17.8 21.4 14.3 19.7 [31] 32.4 18.7 [[5]] [1] 15.0 21.4 21.5 26.0 17.3 30.4 18.1 17.8 17.3 30.4 24.4 32.4 21.0 17.8 33.9 [16] 32.4 19.2 22.8 19.7 16.4 17.8 22.8 14.3 33.9 21.5 10.4 21.4 26.0 33.9 14.7 [31] 21.5 18.1
In the above code, we use lapply to generate a list of n resampled datasets. Inside the lapply function, we use the sample function to randomly sample from the original data with replacement ( replace = TRUE ). This ensures that each resampled dataset has the same length as the original dataset.
Within the lapply function, you can perform any desired operations on the resampled data. This could involve calculating statistics, fitting models, or conducting hypothesis tests. Customize the code within the lapply function to suit your specific needs.
Example: Bootstrapping the “mpg” column in mtcars: Let’s illustrate the usage of our bootstrap function by resampling the “mpg” column from the “mtcars” dataset. We will calculate the mean of the resampled datasets.
# Step 1: Load the dataset data(mtcars) # Step 2: Define the bootstrap function bootstrap function(data, n) resampled_data lapply(1:n, function(i) resample sample(data, replace = TRUE) mean(resample) # Calculate the mean of each resampled dataset >) return(resampled_data) > # Step 3: Perform the bootstrap resampling bootstrapped_means bootstrap(mtcars$mpg, n = 1000) # Display the first few resampled means head(bootstrapped_means)
[[1]] [1] 20.21562 [[2]] [1] 20.09375 [[3]] [1] 19.59375 [[4]] [1] 20.13437 [[5]] [1] 21.17813 [[6]] [1] 21.5375
In the above example, we resample the “mpg” column of the “mtcars” dataset 1000 times. The bootstrap() function calculates the mean of each resampled dataset and returns a list of resampled means. The head() function is then used to display the first few resampled means.
Of course we do not have to specify a statistic function in the bootstrap, we can choose to just return bootstrap samples and then perform some sort of statistic on it. Look at the following example using the above bootstrapped_samples data.
quantile(unlist(bootstrapped_samples), probs = c(0.025, 0.25, 0.5, 0.75, 0.975))
2.5% 25% 50% 75% 97.5% 10.400 15.725 19.200 22.800 33.900
mean(unlist(bootstrapped_samples))