rabbit

The {rabbit} package provides functions to standardise raw tri-axial accelerometer data and extract movement dynamic metrics from it. Accelerometer data is often collected in a raw format that may not be immediately suitable for analysis.

standardise_data() reads in raw accelerometer data, standardises column names, and converts the timestamp to a consistent format.
extract_movement_dynamics() takes the standardised data and calculates various movement dynamics metrics using rolling window calculations, including mean acceleration, variance, covariance, Overall Dynamic Body Acceleration (ODBA), and Vectorial Dynamic Body Acceleration (VDBA).
crop_ends(), crop_ends_diurnal(), and crop_ends_nocturnal()useful for removing records that are outside of the desired time window, such as records collected before the device was fitted or after it was removed.

Speed

Accelerometer data is often collected at high frequencies, usually 10 to 50Hz, resulting in large datasets. Compared to the original code inherited by the team, {rabbit} is 1000–2000 times faster. A file that previously took over a day to process can now be done in under a minute. Speed gains come from efficient design combined with the {RcppRoll} package, which provides optimised C++ implementations of rolling window functions.

Installation

You can install the development version of rabbit from GitHub using the remotes package:

# install.packages("remotes")
remotes::install_github("traitecoevo/rabbit")

Example

This basic example uses less than one hour of data from a bilby called Piccolo:

library(rabbit)
library(dplyr)

file_in <- system.file("extdata", "raw_Pic2Jan_50000.parquet", package = "rabbit")

df <- standardise_data(file_in = file_in, vars = c("Timestamp", "accX", "accY", "accZ")) |>
  extract_movement_dynamics()

# First rows are NA due to rolling window calculations
df |> filter(!is.na(time))

Alternatively, you can read in the data first and then standardise it:

library(arrow)

df <- arrow::read_parquet(file_in)

df_mvt <- standardise_data(df, vars = c("Timestamp", "accX", "accY", "accZ")) |>
  extract_movement_dynamics()

df_mvt |> filter(!is.na(time))

The following code loops through all files within a folder and extracts the animal ID from the file name using the {stringr} package. For ID extraction, files should be labelled with the animal ID followed by _.

library(stringr)
library(arrow)

csv_files <- list.files(path = "/Raw", pattern = "*.csv", full.names = TRUE)

# Create output folder if it doesn't exist
dir.create("/Processed", showWarnings = FALSE)

# Loop through each .csv file, calculate, and save the results
for (csv_file in csv_files) {
  cat("Processing:", csv_file, "\n")

  # Extract ID from filename
  file_name <- tools::file_path_sans_ext(basename(csv_file))
  animal_id <- stringr::str_split(file_name, "_")[[1]][1]

  result <- standardise_data(file_in = csv_file,
                             vars = c("Timestamp", "X", "Y", "Z"),
                             time_function = lubridate::dmy_hms) |>
    extract_movement_dynamics(window_size = 25)

  result$ID <- animal_id
  result <- dplyr::relocate(result, ID)

  output_file <- sub("Raw", "Processed", csv_file) |>
    sub(pattern = "\\.csv$", replacement = ".parquet")

  arrow::write_parquet(result, output_file)

  cat("Saved:", output_file, "\n\n")
  gc()
}

Files can then be cropped to remove records at the start and end of each file. This is useful for removing records collected before the device was attached or after it was removed.

crop_ends() removes all records before the first instance of a specified time and after the last instance of a specified time
crop_ends_diurnal() removes all records before the first instance of 01:00:00 and after the last instance of 01:00:00
crop_ends_nocturnal() removes all records before the first instance of 12:00:00 and after the last instance of 12:00:00

# List all .parquet files in folder
parquet_files <- list.files(path = "/Processed", pattern = "\\.parquet$", full.names = TRUE)

# Create output folder if it doesn't exist
dir.create("/Cropped", showWarnings = FALSE)

# Create an empty list to collect summary rows
summary_list <- vector("list", length(parquet_files))

for (i in seq_along(parquet_files)) {

  parquet_file <- parquet_files[i]
  cat("Processing:", parquet_file, "\n")

  # Load and remove incomplete time rows
  df <- arrow::read_parquet(parquet_file) |>
    filter(!is.na(time))

  # Crop each file with desired crop_ends() function
  result2 <- crop_ends_nocturnal(df)
  rm(df)
  gc()

  # Store first and last rows in list
  summary_list[[i]] <- bind_rows(result2[1, ], result2[nrow(result2), ])

  # Save cropped file
  new_filename <- sub("\\.parquet$", "_crop.parquet", basename(parquet_file))
  arrow::write_parquet(result2, file.path("/Cropped", new_filename))
  rm(result2)
  gc()

  cat("Saved:", new_filename, "\n\n")
}

# Combine all summary rows at the end
summary_tibble <- bind_rows(summary_list)
print(summary_tibble)

Analysis and classifying behaviour

The output is now ready for analysis or classification of behaviour using your own analytical methods or behavioural classifier.

Identifying high activity periods

sumVDBA is the best measure of heat-generating movement or activity:

df_mvt |>
  filter(!is.na(time)) |>
  ggplot(aes(x = time, y = sumVDBA)) +
  geom_point(size = 0.2) +
  theme_classic()

Benchmarking

We can compare performance of the fast (RcppRoll), base (pure R), and orig (original code) implementations on a real accelerometer file (50,000 rows):

bench <-
  arrow::read_parquet(system.file("extdata", "raw_Pic2Jan_50000.parquet", package = "rabbit")) |>
  standardise_data(vars = c("Timestamp", "accX", "accY", "accZ"),
                   time_function = lubridate::dmy_hms) |>
  dplyr::select(time, x, y, z) |>
  benchmark_movement_dynamics()

bench |>
  dplyr::group_by(task, method) |>
  dplyr::summarise(median_sec = median(elapsed_sec))

method	time (seconds)	relative speed
fast	0.146	1,835x faster than orig
base	0.983	272x faster than orig
orig	267.848 (~4.5 minutes)	baseline

The fast method is ~7x faster than the pure R base implementation, and ~1,835x faster than the original code. On a full day’s recording at 25Hz, this difference becomes even more pronounced, a file that previously took over a day to process can now be done in seconds.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
R		R
inst/extdata		inst/extdata
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
Readme.Rmd		Readme.Rmd
codecov.yml		codecov.yml
rabbit.Rproj		rabbit.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rabbit

Speed

Installation

Example

Analysis and classifying behaviour

Identifying high activity periods

Benchmarking

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rabbit

Speed

Installation

Example

Analysis and classifying behaviour

Identifying high activity periods

Benchmarking

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages