The {rabbit} package provides functions to standardise raw tri-axial
accelerometer data and extract movement dynamic metrics from it.
Accelerometer data is often collected in a raw format that may not be
immediately suitable for analysis.
standardise_data()reads in raw accelerometer data, standardises column names, and converts the timestamp to a consistent format.extract_movement_dynamics()takes the standardised data and calculates various movement dynamics metrics using rolling window calculations, including mean acceleration, variance, covariance, Overall Dynamic Body Acceleration (ODBA), and Vectorial Dynamic Body Acceleration (VDBA).crop_ends(),crop_ends_diurnal(), andcrop_ends_nocturnal()useful for removing records that are outside of the desired time window, such as records collected before the device was fitted or after it was removed.
Accelerometer data is often collected at high frequencies, usually 10 to
50Hz, resulting in large datasets. Compared to the original code
inherited by the team, {rabbit} is 1000–2000 times faster. A file
that previously took over a day to process can now be done in under a
minute. Speed gains come from efficient design combined with the
{RcppRoll} package, which provides optimised C++ implementations of
rolling window functions.
You can install the development version of rabbit from GitHub using
the remotes package:
# install.packages("remotes")
remotes::install_github("traitecoevo/rabbit")This basic example uses less than one hour of data from a bilby called Piccolo:
library(rabbit)
library(dplyr)
file_in <- system.file("extdata", "raw_Pic2Jan_50000.parquet", package = "rabbit")
df <- standardise_data(file_in = file_in, vars = c("Timestamp", "accX", "accY", "accZ")) |>
extract_movement_dynamics()
# First rows are NA due to rolling window calculations
df |> filter(!is.na(time))Alternatively, you can read in the data first and then standardise it:
library(arrow)
df <- arrow::read_parquet(file_in)
df_mvt <- standardise_data(df, vars = c("Timestamp", "accX", "accY", "accZ")) |>
extract_movement_dynamics()
df_mvt |> filter(!is.na(time))The following code loops through all files within a folder and extracts
the animal ID from the file name using the {stringr} package. For ID
extraction, files should be labelled with the animal ID followed by _.
library(stringr)
library(arrow)
csv_files <- list.files(path = "/Raw", pattern = "*.csv", full.names = TRUE)
# Create output folder if it doesn't exist
dir.create("/Processed", showWarnings = FALSE)
# Loop through each .csv file, calculate, and save the results
for (csv_file in csv_files) {
cat("Processing:", csv_file, "\n")
# Extract ID from filename
file_name <- tools::file_path_sans_ext(basename(csv_file))
animal_id <- stringr::str_split(file_name, "_")[[1]][1]
result <- standardise_data(file_in = csv_file,
vars = c("Timestamp", "X", "Y", "Z"),
time_function = lubridate::dmy_hms) |>
extract_movement_dynamics(window_size = 25)
result$ID <- animal_id
result <- dplyr::relocate(result, ID)
output_file <- sub("Raw", "Processed", csv_file) |>
sub(pattern = "\\.csv$", replacement = ".parquet")
arrow::write_parquet(result, output_file)
cat("Saved:", output_file, "\n\n")
gc()
}Files can then be cropped to remove records at the start and end of each file. This is useful for removing records collected before the device was attached or after it was removed.
crop_ends()removes all records before the first instance of a specified time and after the last instance of a specified timecrop_ends_diurnal()removes all records before the first instance of 01:00:00 and after the last instance of 01:00:00crop_ends_nocturnal()removes all records before the first instance of 12:00:00 and after the last instance of 12:00:00
# List all .parquet files in folder
parquet_files <- list.files(path = "/Processed", pattern = "\\.parquet$", full.names = TRUE)
# Create output folder if it doesn't exist
dir.create("/Cropped", showWarnings = FALSE)
# Create an empty list to collect summary rows
summary_list <- vector("list", length(parquet_files))
for (i in seq_along(parquet_files)) {
parquet_file <- parquet_files[i]
cat("Processing:", parquet_file, "\n")
# Load and remove incomplete time rows
df <- arrow::read_parquet(parquet_file) |>
filter(!is.na(time))
# Crop each file with desired crop_ends() function
result2 <- crop_ends_nocturnal(df)
rm(df)
gc()
# Store first and last rows in list
summary_list[[i]] <- bind_rows(result2[1, ], result2[nrow(result2), ])
# Save cropped file
new_filename <- sub("\\.parquet$", "_crop.parquet", basename(parquet_file))
arrow::write_parquet(result2, file.path("/Cropped", new_filename))
rm(result2)
gc()
cat("Saved:", new_filename, "\n\n")
}
# Combine all summary rows at the end
summary_tibble <- bind_rows(summary_list)
print(summary_tibble)The output is now ready for analysis or classification of behaviour using your own analytical methods or behavioural classifier.
sumVDBA is the best measure of heat-generating movement or activity:
df_mvt |>
filter(!is.na(time)) |>
ggplot(aes(x = time, y = sumVDBA)) +
geom_point(size = 0.2) +
theme_classic()We can compare performance of the fast (RcppRoll), base (pure R),
and orig (original code) implementations on a real accelerometer file
(50,000 rows):
bench <-
arrow::read_parquet(system.file("extdata", "raw_Pic2Jan_50000.parquet", package = "rabbit")) |>
standardise_data(vars = c("Timestamp", "accX", "accY", "accZ"),
time_function = lubridate::dmy_hms) |>
dplyr::select(time, x, y, z) |>
benchmark_movement_dynamics()
bench |>
dplyr::group_by(task, method) |>
dplyr::summarise(median_sec = median(elapsed_sec))| method | time (seconds) | relative speed |
|---|---|---|
| fast | 0.146 | 1,835x faster than orig |
| base | 0.983 | 272x faster than orig |
| orig | 267.848 (~4.5 minutes) | baseline |
The fast method is ~7x faster than the pure R base implementation,
and ~1,835x faster than the original code. On a full day’s recording at
25Hz, this difference becomes even more pronounced, a file that
previously took over a day to process can now be done in seconds.
